| |||
| Home > VFP Support Code > VFP Computation Engine | |||
The computation engine is provided by the vfpsupport libraries.
All objects are Read-Only Position Independent. They make no use of static data, and they are therefore compatible with both Read-Write Position Independent (RWPI) and non-RWPI applications.
The computation engine neither performs a stack check nor corrupts the SL register, so applications using stack checking must be sure to allocate sufficient stack before calling the computation engine. At the time of writing ARM's implementation of the VFP Computation Engine can use up to 132 bytes stack, and can make nested calls to _fp_trap() from this maximum stack depth. The library contains sufficient information for the linker to check its stack depth.
void _VFP_Computation_Engine(_VFP_Computation_Description *cdesc)
This function accepts a list of VFP computation operations
in cdesc, and performs the given transformations
on the hardware VFP registers. Operands are read from the real hardware
VFP registers using FMRS, FMRDH, and FMRDL, and written back to
the hardware VFP registers using FMSR, FMDHR, and FMDLR.
The operations that can be specified in cdesc are
exactly those that are encoded in the VFP instruction set as a CDP
instruction. This includes trivial operations such as VFP-to-VFP register
moves (FCPY), even though all implementations of VFP coprocessors
must implement these in hardware. This is necessary for supporting
sequences of bounced operations, where preceding operations caused
a bounce after copy operations had been accepted by the VFP coprocessor.
In the course of fulfilling the request, the only hardware VFP instructions that the provided computation engine uses are copy operations between the ARM and VFP register bank:
FMSR, FMDLR, FMDHR, FMDRR, FMSRR, FMXR (write to VFP regs from ARM regs)
FMRS, FMRDL, FMRDH, FMRRD, FMRRS, FMRX (read from VFP regs to ARM regs)
In other words, it reads operands out of VFP registers, performs all the data processing in integer registers, and then writes results back to the target VFP registers. It does not appeal to the VFP to do any part of its thinking for it, and it is therefore compatible with any VFP coprocessor.
A subarchitecture optimized computation engine could use the VFP coprocessor for some of its calculations, but it must never cause a bounce.
A _VFP_Computation_Description object is used to pass information from the subarchitecture specific exception decoding code to the VFP computation engine.
struct _VFP_Computation_Description {
uint32 count;
uint32 flags;
struct {
uint32 op;
uint32 op_dbg;
} desc[MAXCOUNT];
};
This structure contains a variable length array of desc entries,
where each entry represents an operation. The operation is encoded
in op just like a VFP CDP instruction word, except
that bits [26..24] denote the vector length that is used for issuing
the instruction, (encoded minus 1 mod 8, as the LEN field of the
FPSCR). Bits [31..27], [11..9], and [4] are ignored. The op_dbg field is
also ignored.
In other words, the caller must ensure all op entries
represent a valid kind of VFP operation. The caller must replace
bits [26..24] with the vector length minus one.
The vector length is interpreted as if it had been in the FPSCR when the equivalent CDP instruction was executed, so the operation uses the vector addressing modes as defined in the ARM Architecture Reference Manual. For instructions that are always scalar, the value of the iterations field is ignored.
The vector stride and rounding mode for all operations are taken straight from the hardware FPSCR, because it is impossible for these to change in the middle of the instruction sequence. (It is also impossible for the vector length in the FPSCR to change; the vector length is specified individually in order to be able to resume a partially executed vector instruction.)
Bits [7..0] of the count word give the length of the array. The array has a SUBARCHITECTURE DEFINED maximum length. In current implementations this is only 2 entries. This is expected to be no greater than 16 entries in future VFP implementations. Bits [31..8] of the count word are ignored.
The flags word is currently ignored.
For forward compatibility all unused fields are cleared to zero by the subarchitecture support code that creates this structure.
The iterations field might contain any value for instruction encodings that are always scalar. This allows code to copy the vector length directly out of the FPSCR for any operation that has not yet begun execution in the coprocessor.
There is no explicit return value.
Results of the VFP operations are returned by being written back to the VFP register bank, as specified in the instruction patterns passed in.
If the computation engine receives an instruction pattern
it does not recognize, it signals an error by calling _VFP_Computation_Error().
void _VFP_Computation_Error( _VFP_Computation_Description *cdesc, uint32 index)
The index argument identifies the entry of cdesc that is invalid. If no implementation of _VFP_Computation_Error is provided then errors are ignored.
_VFP_Computation_Engine() signals floating-point
traps where necessary. Trap handlers can either produce a result,
or cause a longjmp out of _VFP_Computation_Engine().
Traps are signaled by calling _vfp_fp_trap().
This function takes the same arguments as the _fp_trap() function that
forms part of ARM floating-point library support.
The _vfp_fp_trap() wrapper provided in
vfpfptrap.s prevents the corruption of VFP register state by saving
callee-save VFP registers, and then calls the standard _fp_trap() handler.
The implementation of _vfp_fp_trap() provided
performs no stack checking.