Signal Processor Implementation
[Last modified 11:13:00 PM on Tuesday, 27 July 2010]
Once techniques had been devised to perform the various types of matrix multiplications, the remaining task was to demonstrate their practicality. This involved combining several multiplications into a VHDL description of a matrix multiplier component, using only the single set of four complex multipliers.
The particular algorithm is not important here, because ultimately any multiplication algorithm can be implemented in this way.
The specific details are explored in the documentation that is available from the
downloads area, but some of the design issues are described on this page.
Clamping overflowed values
When a multiplication between two signed 16 bit numbers occurs, the result is a 31 bit value. That is, twice the number of integer bits, twice the number of fractional bits, and one sign bit. However, numbers of this size are too big to be stored in the memories, and so they must be cropped in some way. Furthermore, since some of the results are to be reused in the calculations, they need to be cropped so that they keep the original number of fractional and integer bits.
The fractional bits can simply be cropped, with no further processing necessary. However, this is not the case with the integer bits. In particular, we need to deal with the case where the value of the result is outside of the allowed range.
For example, consider an 8 bit integer result, with an allowed range of ±16. That range consists of the least significant 5 bits, so designate the 5th least significant bit as the "cropped sign bit". Now consider four examples:
5 : 00000101 17 : 00010001
-5 : 11111011 -17 : 11101111
The examples 5 and -5 are within the allowed range, while 17 and -17 are not. Examination of the binary encoding of these numbers will reveal that, for the numbers within the allowed range, all of the bits from the cropped sign bit to the original sign bit are the same. This is not the case for numbers that are not within range. Hence, we have a mechanism for detecting range overflows.
The technique for clamping is then :
- Check if all of the bits above the clamping point are the same
- If they are, simply crop the number
- If not,
- Set the sign bit of the cropped number to the original sign bit
- Set all of the other bits to the inverse of the sign bit.
This will then ensure that and out of range numbers will be cropped to their upper or lower extreme, as appropriate.
Disabling parts of the matrices
An extension to the design was to consider ways of "disabling" certain rows and columns of the matrix, so that the 4x4 matrices would behave like 3x3, or 2x2 matrices. The disabled rows and columns would simply contain zeroes. This proved to be a simple matter of:
- Disabling the appropriate multiplier. Due to the technique of reading in columns or rows at a time, it happens that each multiplier always operates on data from the same row or column. (The only exception is for mode 0.)
- Initialise the appropriate rows or columns to zero at the start of the algorithm.
The result is that the matrices behave as if the disabled sections were not present. Furthermore, the power consumption will also approach the levels as if the disabled sections were missing, since the disabled multiplier saves power by preventing any glitches from propagating through it. The calculation will still use the same number of clock cycles, but this is not an issue for this design.