10111001
00101010
00111001
Sum: 10101010
Carry: 00111001
Result: 100011100
The sum and carry can then be recombined in a normal addition to form the correct result. This process may seem more complicated and pointless in the above trivial example, but the power of this technique is that any amount of numbers can be added together in this manner. It is only the final recombination of the final carry and sum that requires a carry propagating addition.
The use of a wallace tree [1], arranges the adder tree so that all of the output bits could be obtained while minimising the size of the circuit. However, in the case of multipliers, we know what the expected output size will be, and so we can set all of the input and output sizes to that value. We do not care about any overflowing sign bits, so they can be discarded and the carries can simply be shifted left to the correct alignment. All of the results can then be grouped together as one and continually reduced until we are left with two values. This is demonstrated by figure 1.

Figure 1: Carry-save adder tree for when overflowing carries from the MSB do not matter
3:2 Compressors
The design of the 3:2 compressor is simple, with the following truth table showing that it is nothing more than a full adder:
| Inputs | Sum | Carry | ||
| A | B | C | ||
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 1 | 0 |
| 0 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 | 0 |
| 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 | 1 |

Figure 2: Architecture of the full word 3:2 compressor, using individual bit 3:2 compressors.
4:2 Compressors
The discussion so far has referred only to 3:2 carry-save adders, but it is also possible to add four bits in this format. In reality, there are actually five inputs (one being a carry in), and three outputs (two carries and the sum):

Figure 3 : High level view of the 4:2 compressor
- The outputs represent the sum of the five inputs, so it is really a 5 bit adder
- Both carries are of equal weighting (i.e. add "1" to the next column)
- To avoid carry propagation, the value of Cout depends only on A, B, C and D. It is independent of Cin.
- The Cout signal forms the input to the Cin of a 4:2 of the next column.
| Inputs | Cin = 0 | Cin = 1 | Cout | |||||
| A | B | C | D | Carry | Sum | Carry | Sum | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 |
| 0 | 0 | 1 | 0 | |||||
| 0 | 1 | 0 | 0 | |||||
| 1 | 0 | 0 | 0 | |||||
| 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 |
| 0 | 1 | 1 | 0 | |||||
| 1 | 1 | 0 | 0 | |||||
| 0 | 1 | 0 | 1 | |||||
| 1 | 0 | 1 | 0 | |||||
| 1 | 0 | 0 | 1 | |||||
| 0 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 |
| 1 | 1 | 1 | 0 | |||||
| 1 | 1 | 0 | 1 | |||||
| 1 | 1 | 1 | 0 | |||||
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
A k-bit 4:2 word adder is then formed as shown below, in figure 4.

Figure 4: Architecture of the full word 4:2 compressor, using individual bit 4:2 compressors
References
- [1] Behrooz Parhami, "Computer Arithmetic", Oxford Press, 2000, pp131
