#### ISSN: 2455-2631

# Modified Low Power and High Speed Row and Column Bypass Multiplier using FPGA

<sup>1</sup>Amala Maria Alex, <sup>2</sup>Nidhish Antony

<sup>1</sup>MTech in VLSI & Embedded Systems, <sup>2</sup> Assistant Professor ECE Dept Mangalam college of Engineering, Kerala, Kottayam, India

Abstract— The demand for electronic portable devices is gaining more attention in recent decades. Portable devices are demanding for low power. Multiplier is the critical part of any arithmetic operation in many DSP applications. So it is essential to design multipliers that utilize less power and high speed of operation. One main aspect of low power design is to minimize switching activities to reduce dynamic power dissipation. So the proposed bypassing logic will reduce dynamic power dissipation as well as signal propagation delay. Row and column bypass multiplier is a new design which reduces switching activities with architecture optimization. The switching activity should not occur unnecessarily and it should be avoided by bypassing. The adders corresponding to those rows and columns which are required to be bypassed need not get activated and signal get bypassed to the further stage. With the help of tristate buffer as a control gating element, unnecessary signal propagation can be stopped. Thus the unwanted switching activity can be reduced. The proposed multiplier design is efficient in terms of power by 20% or more when probability of occurrence of zero is more. These features make the proposed design more suitable for DSP applications like filtering, DCT and FFT.

Index Terms—FFT-Fast Fourier Transform,DSP-Discrete Cosine Transform,DSP-Digital signal processing,CSM-Carry Save Multiplier, CSA-Carry Select Adder, ADPCM-Adaptive Differential Pulse Code Modulation, QC-Quantum Cost, GO-Garbage Output, NC-Number of constant input.

## **I.INTRODUCTION**

The multiplier is the most critical arithmetic unit in many DSP applications such as digital filtering, fast fourier transform and discrete cosine transform. There is a need to design multipliers that reduce power dissipation and signal propagation delay. DSP applications like filters require faster calculations for updating their coefficients. Multipliers and their associated circuits (adders and accumulators) along with registers consume a significant portion of power for most of the DSP applications. Therefore, it makes sense to increase their performance by customization and architecture optimization [1]. So in this work I am adopting bypassing technique and it is almost critical design. This bypassing technique would result into much lesser power consumption even though architecture is bigger than the usual. This technique also reduces delay and quantum cost of the design. This bypassing technique does low power design at architecture level. In CMOS circuits, the power consumption can be classified into static power dissipation and dynamic power consumption. The expression for total power is given by,

$$P = \alpha f_c C_L V D D^2 + I_{SC} V D D + I_{leakage} V D D$$
 (1)

The first term represents dynamic power consumption which is dominant among both and the rest represents static power consumption. Static power dissipation is due to leakage current and dynamic power dissipation is due to charging and discharging of the load capacitances. Equation (1) gives total power consumption where,  $\alpha$  is the switching activity,  $f_c$  is the clock frequency, CL is the load capacitance.VDD is the supply voltage,  $I_{SC}$  is the short circuit current and  $I_{leakage}$  is the leakage current. As the switching activity ( $\alpha$ ) of the circuit increases dynamic power consumption will also increases. This power consumption can be reduced by minimizing the unnecessary switching activity. Bypassing scheme is the technique to avoid unnecessary switching activity. Bypassing scheme disables the switching activities in some rows and columns to reduces the switching power consumption. In addition to reduction in power consumption it is also beneficial in reducing the propagation delay. The concept of power saving by employing bypassing mainly depends on switching probability. Higher the probability better power reduction is possible. In normal multipliers the probability of input bit is 'zero' occurrence is uniform for normal distribution. But, it may vary for real time application which is having random distribution. While implementing multiplier in applications like DCT,ADPCM it is required to analyze 'zero' bit input data for amount of power reduction[1].

## II. EXISTING WORK

Conventionally multipliers are iterative type multipliers and array type multipliers. Iterative type multipliers use same hardware with series of add and shift operation for the computation of multiplication. Reuse of hardware is possible but it requires more clock cycles to initiate addition operation of common hardware. Pipelining is possible because of repeated, compact and simple structure. Here the structure is regular so that layout is favorable for realizing parallel processing. All partial products are generated after one AND gate delay and more summed up sequentially using array of full adders

#### Parrel Array Multiplier

|      |      |      | A3 A2 A1 A0<br>B3 B2 B1 B0 |      |      |      |   |
|------|------|------|----------------------------|------|------|------|---|
|      |      |      | A3B0                       | A2B0 | A1B0 | A0B0 | • |
|      |      | A3B1 | A2B1                       | A1B0 | A0B1 |      |   |
|      | A3B2 | A2B2 | A1B2                       | A0B2 |      |      |   |
| A3B3 | A2B3 | A1B3 | A0B3                       |      |      |      |   |
|      |      |      |                            |      |      |      |   |
| P6   | P5   | P4   | P3                         | P2   | P1   | P0   |   |

The architecture view of a 4x4 standard Braun multiplier is shown in Fig.1.



Fig.1: Braun Array Multiplier [1]

Braun multiplier is a simple parallel multiplier generally called carry save multiplier (CSM). This parallel multiplier is used to perform the unsigned bit multiplication. The Braun multiplier structure is made up of an array of AND gates and full adders. To implement n x n multiplier require n(n-1) full adders and 2 x n AND gates. The delay introduced by the Braun multiplier depends on the delay of the full adders and also the delay of the final adder in the last stage which is a ripple carry adder. The dynamic power dissipation of the multiplier resulting from the switching activities can be reduced via bypassing techniques by using Row bypassing Column bypassing and reversible logic techniques.

## **Bypass Multipliers**

Bypassing with reference to multiplier means turning of some columns or rows or both in the multiplier array whenever certain multiplier or multiplicand or both bits are zero. In normal array type multiplier we have an array of Carry Select Adders/Full Adders. If one of the inputs is zero the sum of adder is nothing but the input other than zero. Instead of unnecessary addition of zero we can skip the addition and provide input to the next level [1]. In that condition it is beneficial to bypass the other input to the sum. Here addition operation is not required to derive the sum output. The power that is saved here is unnecessary addition with zeros.

Types of bypassing schemes are:

- Row Bypassing schemes
- Column Bypassing schemes
- Row and column Bypassing schemes

Row bypassing technique bypasses only rows of multiplier but not columns. Similarly, column bypassing technique bypasses columns of multiplier. Bypassing is two dimensional in proposed multiplier and all adders are provided with bypassing hardware. Any row or column can be bypassed if corresponding bit coefficient is zero. So no adder unit in the multiplier unit is left from bypassing logic. There is no chance of unnecessary switching operation in any adder unit in multiplier design. Additional logic is used in proposed model in case of any conflicts if both row bypassing an column bypassing occur simultaneously.

## Row Bypassing Multiplier

Row bypassing technique is based on number of zeros in the multiplier bits. In this multiplier operation, some rows of adders in the basic multiplier array are disable during operation to save the power. The internal structure of the row bypassing adder cell is shown below.



Fig.2: Structure of FA(Row Bypassing)<sup>[2]</sup>

Each full adder is added with three tri-state buffers to halt the input. Tristate buffers are helpful in halting inputs when that particular full adder is bypassed. Two 2:1 multiplexers are connected at output side of sum and carry to switch between bypassed path and normal path. Consider a multiplier with multiplier bits B and multiplicand A as shown in fig.3. A simple thought to improve performance is, as soon as bj was found to be zero, ie all partial products  $AiBj, 0 \le i \le n-1$ , are zero, complete row is bypassed to avoid to avoid triggering those adding unit in the row to reduce power reduction. Hence, two multiplexers to realize the bypassing operation, are required in the adding unit. If  $j^{th}$  bit is zero, hence corresponding partial product is zero.



Fig.3:Row Bypassing Multiplier<sup>[1]</sup>

To eliminate the redundant signal transitions, disable the adders whose partial product is zero, while shifting and bypassing the partial product of the previous adder rows to the next row of adder. Thus, the outputs from the  $(j+1)^{th}$  row of CSAs without affecting the multiplication result. Drawback of row bypassing multiplier is the requirement of additional correcting circuitry. For example, let b2 be zero. In this case, CSA in the second row will be bypassed, and the outputs from the first row are fed directly to the third row of CSA. However, since rightmost FA in the second row is disabled, it does not execute the addition and the output is not correct. In order to solve this problem, extra components need to be added and additional components such as a NOT gate and AND gate are added.

# Column Bypassing Multiplier

Column bypassing technique is based on number of zeros in the multiplicand bits. In this multiplicand operation, some columns of adders in the basic multiplier array are disabled during operation to save power. The internal structure of column bypassing adder cell is shown below [1][2].



Fig.4: Structure of FA (column bypassing) [2]

In column bypassing multiplier, if the bit A in the multiplicand is zero then addition operation in i<sup>th</sup> column can be bypassed for power reduction. Since, the operation in (i+1) column are bypassed. All partial product inputs in that column are not affecting carry in any stage. So, here one 2:1 multiplexer is saved which is not require at the carry output. Hence, Column bypassing multiplier is easier to design when compared to Row Bypassing multiplier by means of reduced architecture. In Column Bypassing multiplier shown in fig 5 each full adder is added with only two tri-state buffers to the partial product inputs. Tri-state buffer is not required at the carry input side in the case of column bypass multiplier. One 2:1 multiplexer is connected at output side of sum to switch between bypassed path and normal path. Carry is not affected in any case of the Column Bypass multiplier design. There are two advantages to this approach. First, it eliminates the extra correcting circuitry. Secondly, the modified Full adder is simpler than that used in row bypassing multiplier.



Fig.5:Column Bypassing Multiplier<sup>[1]</sup>

## Row and Column Bypassing Multiplier.

The model is a row and column bypassing multiplier, it is desired to bypass  $(i+1)^{th}$  column and  $j^{th}$  row, if both bit (A) in the multiplicand and bit (B) in the multiplier is zero respectively. Here carry propagation is based on considering multiplicand bit  $(A_i)$  and multiplier bit (Bj)[2]. Prior designs considered reducing power either only with multiplicand and multiplier bits. Hence, to detect the bit wise nullity of the multiplicand in the vertical direction as well as the partial product in the horizontal direction in an array multiplier to remove the unnecessary operations taken place in the corresponding adding cells. The advantage of this design is less power consumption as less switching activity



Fig.6:Structure of FA(Row and Column Bypassing)<sup>[2]</sup>

Here, each full adder is added with a tristate buffers to halt the inputs when it is required to bypass the inputs. Two 2:1 multiplexers are connected at output side of sum (Sout) and carry(Cout) to switch between bypassed path and normal path. The last row of full adder is used for propagating carry to higher bits of the result[1]. The addition in the (i+1)<sup>th</sup> column or j<sup>th</sup> row can be bypassed if multiplicand bit ,Ai, or multiplier bit, bj, is zero. On the other hand, to get the correct carry propagation in a row and column dimensional bypassing scheme, the carry bit from the previous row must be considered. If the corresponding bit in the multiplicand and multiplier is zero, the operations in the row and column can be disabled. Consider the multiplier in fig.6,the tristate buffers placed at the input of the adder cells, if the buffer state is one, disable signal transitions in the adders which are bypassed, and then the input sum bit are passed to downwards. When the corresponding partial product is zero, the carry adders disabled unnecessary transitions and bypassed the inputs to outputs. In otherwords, there are two bits to be added, and the output carry bit must be zero, and the output sum bit is equal to input sum bit. The operations in column 'i' can be ignored and the adders can be disabled since the outputs are known.



Fig.7: Row and Column Bypassing Multiplier [2]

Row and column bypass multiplier uses signal gating element as AND gate in the modified full adder. By using AND gate at the side of carry out, Cout of MFA it controls row bypassing and column bypassing at the same time. If bit Ai is '0' ie, when column is bypassing, AND gate does not allow signal at other input of AND gate and Cout will be automatically zero. It is desired in the case of column bypassing since carry in bypassed column is always a zero. If Bj is '0' i.e, when row is bypassing carry input, Ci is directly given as one input of carry multiplexer(MUX) which is selected as output of MUX, since B is the select line of the MUX. Such a way a single AND gate controls both bypassing scheme in adder cell (MFA). AND gates in proposed multiplier is shown in fig.7 are crucial, since it takes care of carry propagation in case of any row or column is bypassing. For example if second row is to be bypassed i.e,B[2]=0 then Row and Column bypassed Full adder blocks in second row were bypassed. Consider a case when FA block in second row is producing a carry out which is not add with Full adder block in the second row since it is

bypassed. Hence, carry propagation is affected due to bypassing. So an alternative path to propagate carry out is required, which is done by using additional circuitry.

In fig.8 P0,P1,P2......P15 are partial product inputs.WhereP0=A0&B0,P1=A0&B1,................P15=A3&B3 and Y(7:0)=A[3:0]XB[3:0]. This model is more efficient in terms of power and speed when probability of occurrence of zero is more. Irrespective of whether multiplicand or multiplier operator is having more probability of occurrence of zero, it gives better performance unlike Row Bypassing or Column Bypassing alone[1][2][4]. Since the proposed model is having a two dimensional approach more amount of power can be reduced for higher bits like 16,32and 64 when compared to existing models [2].

## III. MODIFIED ROW AND COLUMN BYPASS MULTIPLIER

Instead of using ordinary full adder and half adder here I use modified adders using reversible gates. Reversible logic has emerged as one of the most important approaches for power optimization with its application in low power VLSI design. They are also the fundamental requirement for the emerging field of the quantum computing having with applications in the domains like nanotechnology, Virtual instrumentation etc.

The modified row and column bypassing multiplier consists of following units which is illustrated in figure

- 1. A+B Half Adder: This unit is used to find the sum and carry and realized using a Peres Gate. The Quantum cost of this unit is 4 (QC=4), number of gates is 1(NC=1), Number of constant inputs is 1(NC=1) and Garbage is 1(GO=1). This unit is used to determine the sum in last stage of the multiplier [3].
- 2. Full Adder: This unit uses a single Double Peres Gate. The Quantum cost of this unit is 6 (QC=6), number of gates is 1(NC=1), Number of constant inputs is 1(NC=1) and Garbage is 2(GO=2). This unit is used in the last row of CSA to incorporate ripple carry addition [3].



Fig.8:Reversible (a)Full adder(RFA) (b) Half adder (RHA)



Fig.9: Modified row and column bypassing multiplier using reversible half and full adder

## IV. IMPLEMENTATION AND RESULTS

The design of Modified Row and Column bypass multiplier is logically verified using Xilinx ISE 13.2 and Xilinx Xpower Estimator. The performance of this design is compared with array multiplier, row bypassing, column bypassing and row and column bypass multipliers.

Braun multiplier does not pocess extra correcting circuitry. It's limitation is that it cannot stop the switching activity even if the bit coefficient is zero so the power and area consumption is high. In row bypassing technique extra correction circuitry is required and structure of full adder is difficult. Power consumed by this technique is less but area was high compared to Braun multiplier. In column bypassing technique power consumption is low but the area was high compared to the Braun Multiplier. In the case of row and column bypassing technique area, power and delay get reduced as compared with Braun multiplier. In the modified row and column bypassed multiplier the power and delay get further reduced and the circuit get more efficient



Fig. 10: Simulation result of Modified row and column bypassing multiplier

The modified row and column bypass multiplier provides bypassing in both dimensions and uses reversible full and half adders so produces more reduction in power consumption ,area and delay .

TABLE.1 COMPARISON OF DIFFERENT MULTIPLIERS

| Measuring quantity     | Type of Multipliers |                   |                      |                   |                        |  |  |  |
|------------------------|---------------------|-------------------|----------------------|-------------------|------------------------|--|--|--|
| Power(mW)              | Array<br>multiplier | Row<br>multiplier | column<br>multiplier | R&C<br>multiplier | Modified<br>Multiplier |  |  |  |
|                        | 123                 | 113               | 95                   | 108               | 97                     |  |  |  |
| Delay(ns)              | 21.131              | 20.0621           | 17.423               | 19.311            | 16.010                 |  |  |  |
| Area(no of slices)     | 18                  | 27                | 17                   | 25                | 23                     |  |  |  |
| Power-Delay<br>Product | 2599.113            | 2267.0173         | 1655.185             | 2085.588          | 1552.97                |  |  |  |
| Area-Delay<br>Product  | 380.358             | 541.6767          | 296.191              | 482.775           | 368.23                 |  |  |  |



Fig.11:Dynamic power consumption

From the graph it is clear that column bypass multiplier has low power dissipation but it does not perform Row bypassing. Row and column bypass multiplier is the most efficient one as it perform row and column bypassing. Power of it can further be reduced using Modified Row and Column bypassing Multiplier. Since it reduces the power consumption it eventually increases number of garbage outputs. It is the main drawback of modified model. It can be overcome by using these garbage output for useful purposes. The plotted values correspond to dynamic power consumption which are obtained from Xilinx Power Estimator. Power consumption is tabulated in the figure 11.

## V. CONCLUSION

Difficulty in prediction of which operand having more probability of zero occurrence among multiplier operand and multiplicand operand is giving rise to concept of two dimensional bypassing technique[1]. Combined features are incorporated in the proposed model and performance is also related with Row bypass multiplier in terms of delay and Column bypass in terms of power consumption. Column bypassing multiplier is better in terms of delay among all bypass multipliers. But it's limitation is that it does not give better performance if multiplier bits have more probability of zero occurrence. The proposed model gives better performance irrespective of the probability. But Proposed model gives reduces switching activity effectively if probability of occurrence of zero is higher and more power reduction is possible for higher bit width suitable for DSP applications.

## REFERENCES

- [1]K.Benarji Srinivas, Mohammed Aneesh Y, "Low power and high speed Row and column Bypass Multiplier", 2014 IEEE International conference on computational Intelligence and computing Research.
- [2] Sharvar S. Tantarpale, "Low-Cost Low-Power Improved Bypassing Based Multiplier", *International Journal of Computer Science and Communication Engineering IJCSCE Special issueon*" Recent Advances In Engineering & Technology" NCRACET-2013,pp(114-117).
- [3] Prashant .R. Yelekar, Prof. Sujata S. Chiwande, "Introduction To Reversible Logic Gates & Its Application", 2<sup>nd</sup> National Conference On Information And Communication Technology (NCICT)2011 Proceedings Published In International Journal Of Computer Application (IJCA), Pp.5-8.
- [4] Prabhu E, Mangalam H, Saranya K"Design Of Low Power Digital Fir Filter Based On Bypassing Multiplier "International Journal Of Computer Applications (0975-8887) Volume 70-No.9, May 2013.
- [5] 1] Oscal T. -C. Chen, Sandy Wang, and Yi-Wen Wu, .Minimization of Switching Activities of Partial Products for Designing Low-Power Multipliers., *IEEE Transactions on VLSI Systems*, June 2003 vol. 11, no. 3.
- [6] Rajendra M. Patrikar, K. Murali, Li Er Ping, .Thermal distribution calculations for block level placement in embedded systems., *Microelectronics Reliability* 44(2004) 129-134
- [7] Hichem Belhadj, Behrooz Zahiri, Albert Tai .Power-sensitive design techniques on FPGA devices., *Proceedings of International conference on IC Taipai* (2003).
- [8] A. Wu, .High performance adder cell for low power pipelined multiplier., in *Proc. IEEE Int. Symp. on Circuits and Systems*, May 1996, vol. 4, pp. 57-60.
- [9] S. Hong, S. Kim, M.C. Papaefthymiou, and W.E.Stark, .Low power parallel multiplier design for DSP applications through coefficient optimization., *in Proc. of Twelfth Annual IEEE Int. ASIC/SOC onf.*, Sep. 1999, pp. 286-290.
- [10] C. R. Baugh and B. A. Wooley, .A two.s complement parallel array multiplication algorithm., *IEEE Trans. Comput.*, Dec. 1973, vol. C-22, pp. 1045-1047.
- [11] I. S. Abu-Khater, A. Bellaouar, and M. Elmasry, Circuit techniques for CMOS low-power highperformance multipliers., *IEEE J. Solid-State Circuits*, Oct. 1996, vol. 31, pp. 1535-1546.