Systolic Architecture for High Speed FIR Filter Using VLSI Technology

The tremendous growth of computer and Internet technology wants the data to be processed athigh speed. Low power consumption, high throughput and optimized hardware are the most important design criteria’s for VLSI implementation. This project gives an efficient design of high speed FIR filters using systolic architecture. In this paper we have implemented 7th,8th,11th order FIR filter with 8 bit normalized input. To obtain efficient results we have selected B1 design from all the designs of systolic arrays.GF (28) multiplier and XOR adder are used for multiplication and addition in filter. Hamming window technique is used to derive the filter coefficients. The coefficients of filter are found out using Matlab. The FIR filter architecture is effectively synthesized and simulated using Xilinx ISE 8.1i in VHDL and Modelsim simulator. Maximum frequency, timing simulation delay and number of slices were used as performance metrics. The results obtained are compared with the existing results achieved for FIR filter, thus our work proves that the objective of high speed has been achieved successfully with the use of minimum number of slices


INTRODUCTION
The Finite Impulse Response (FIR) digital filters are widely used in digital signal processing applications due to their stability and linear phase properties. Digital Signal Processing (DSP) is widely used inreal time applications such as video, image processing and wireless communication.Multipliers and adders are the keycomponents of such high performance system such as FIR filter. A system performance can be determinedby the performance of anadder and multiplier becausethey are the processing elements in the system, they are generally most area and time consuming. Therefore optimizing the speed and the area of the adder and multiplier is the major task. Optimizing the speeds of multiplier and adder will enhance the speed of the filter.
Many previous efforts [8]- [10] have been focused on high speed, area optimization and power reduction implementations. One approach byPramod Kumar Meher et.al [10] in July 2008 has proposed the design optimization of one-and twodimensional fully pipelined computing structures for area-delay-power-efficient-speed implementation of FIR filter by systolic decomposition of distributed arithmetic (DA) based inner product computation. Inner product computation is done by accessing lookup table (LUT) followed by shift accumulation operations of the LUT output. The major difficulty encountered in this scheme is that as the filter size or the number of bits used to represent coefficients increases the memory requirement increases exponentially.
Oscar Gustafson et.al [13] worked on implementation of Low complexity FIR filter using serial arithmetic. The effects of digit size on FIR filters using multiplier block techniques are studied. In digit serial arithmetic words are divided into digits of d bits that process one digit at a time. This provides a tradeoff between area, speed and power consumption.
Another approach by Jongsun Park et.al [8] proposed in April 2003, uses computation sharing multiplier which specifically targets computation reuse in vector-scalar products for high speed design using direct form and transposed direct form as their FIR filter architecture.This paper compared the performance of their implementation with Wallace tree and carry-save multipliers, which shows that sharing multiplier scheme improves speed by 33% and 52% respectively.
H.T.Kung et.al [1], discussed various systolic arrays in his work and mentioned the advantages and features of systolic architectures.
Inthis paperwe have implemented high speed FIR filter using systolic architecture.Evaluating the advantages and disadvantages of all the systolic array designs we have chosen the B1 design proposed by K.Parhi [3]. 8-bit normalised test input signal, generated in Matlab is given to the filter. Hamming window method is used to find the coefficients of the proposed filter. According to the various developments in this particular field, the distinguishing element of our paper is that we have designed FIR filter which uses systolic architecture and Galois Field multiplier which has not been worked on previously.
The rest of the paper isorganized as below: Section (I) deals with the introduction to the proposed FIR filter.Section (II), deals with the fundamentals of Systolic FIR filter and its features. The proposed FIR filter's architecture and its implementationis broadly discussed in Section (III). The various software tools used for the proposed filter design and implementation are discussed in section (IV). The results obtained by implementation of GF multiplier and FIR filter are tabulated and compare with the other filter implementation results in Section (V). Hence conclusion is drawn in Section (VI).

SYSTOLIC ARCHITECTURE FOR FIR FILTER FIR Filter
A finite impulse response (FIR) filter is a filter whose response to any bounded input is of finite durationi.e. we get a bounded output.
Where, x[n] is input signal, N is the order of the filter and y[n] is the output signal A FIR filter has a number of useful properties which sometimes make it preferable to an infinite impulse response (IIR) filter, that are listed as below: 1.
It requires no feedback. They are nonrecursive in nature. The relative errors which occur are not iterative and are not compounded.

2.
They are inherently stable, since the output is a sum of a finite number of finite multiples of the input values, so there is no iterative loop in the equation like IIR filters. For every bounded input,the output of FIR filter is bounded.

3.
It can easily be designed to be linear phase by making the coefficient sequence symmetric. Linear phase means there is a constant group delay in the filter response.
The primary objective of our proposed architecture is to increase the speed for which the non-recursive nature, stability and constant group delay of FIR filter becomes advantageous properties as compared to an IIR filter for hardware implementation. A u g u s t 0 1 , 2 0 1 5

Systolic architecture
A systolic architecture is a pipelined network arrangement of Processing Elements (PEs), which are called cells. It is a combined form of parallel computing and pipelining, where cells compute the data that arrives as input and store them independently. Systolic architecture represent a network of processing element (PEs) that rhythmically compute and pass data through the stem, the PEs regularly pump data in and out such that regular flow of data is maintained, as a result systolic array features modularity and regularity which are important property for VLSI implementation. The data in systolic array may be passed through coprocessors in combination of host computersthat hosts the PEs and the final result is return to host computer.
The features of systolic array can be listed as below: 1. Synchrony:A systolic array is controlled by a global clock with fixed length ofclock cycles. This global clock also synchronizes this array.

2.
Modularity and regularity:Modular processing units which are connected withhomogeneous interconnections and also the computingnetwork can be extended indefinitely.

3.
Pipelining ability:The array exhibits a linear rate pipelineability to speed up processing rate, i.e., it shouldachieve an O(N) speedup, in terms of processingrate, where N is the number of ProcessingElements. Here the efficiency of the array ismeasured by the following:Where Ts is the processing time in a single Processor, and Tp is the processing time in the arrayprocessor.

4.
Simple and regular design:In integrated-circuit technology, the cost ofdesign grows with the complexity of the system. By using this, cost is reduced and hence it is a cost-effective design.

OUR IMPLEMENTATION
This section describes our proposed system and methodology for implementation of FIR filter based on systolic architecture, using GF multiplier. From all the systolic arrays we chose B 1 design which broadcast input to all processing elements, weights stay and output is collected after shifting the output at each PE byte wise [3]. [3].GF(2 8 ) multiplier is designed by using isomorphic transformation technique. Coefficients are calculated using Hamming window method. Various order filters like 7, 8 and 11 are implemented and compared. By exploiting the advantages of systolic architecture and GF multiplier, high speed and low area implementation of FIR filter is achieved.

Architecture Of Proposed Filter
From all the systolic arrayswe chose B 1 design which broadcast input to all processing elements, weights stay and output is collected after shifting the output at each PE byte wise [3].Due to broadcast inputs, high throughput is achieved.
Hardware utilization factor of B1 is 1 and it produces the output in one clock cycle. It does not require extra bus for an output and it also uses less memory as compared to other designs except F design.
Architecture consists of 7 processing elements and each processing element has a multiplier and a accumulator. Normalized 8 bit input is broadcasted to all the PEs, each GF multiplier, multiplies the input with the filter coefficient. The multiplied output is XOR with the previous multiplied output stored in a register. We get the filter output at every clock cycle. Figure.1 Signal flow graph for 7 th order FIR filter A u g u s t 0 1 , 2 0 1 5

Figure 2.Space-time representation of B1 design
The Space-time representation of B1 design is shown in fig from which we can see that the incoming x value is available at all the processors at the same time. Specifically, the input data is "broadcast" to the processors. Similarly, the weights ai, appear at the processors at the same spatial co ordinates. Thus aivalues stay. And the outputs yi, appear at the processors at different space and time. Hence the outputs move.

GF Multiplier
Galois Field theory deals with numbers that arerepresented in binary andthey have the properties of a mathematical "field," and are finite in scope. Galois operations comprises of Addition, multiplication and logarithms. The whole Galois field design is purely combinational and it is a design which does not require clock.
The message signal is taken in form of the multiplicand that denotes 8 bit of data. Galois algorithm is implemented on the multiplicand using the generator key irreducible polynomial and a 8 bit multiplier key. Mathematically 8 bit multiplication results in the 16 bit of the result but the Galois technique will result 8 bit resultant for 8 bit multiplication. As for the case of n bit multiplication it will result in n bit result.
We have implemented GF multiplier by using Isomorphic transformation which decomposes GF (2 8 ) to GF((2) 2 ) by using different combinations of φ and λ.
The advantages of GF Multiplier are as follows: 1. The Galois Field multiplier has small hardware footprint.
2. It has short response latency.
3. Final Implementation of Galois Field multiplier makes use of only XOR gates.

4.
It gives direct answer of multiplication. Therefore no registers are required to store partial products.
5. Isomorphic transformation converts the 8 bit number into smaller one bit number, due to this complexity of multiplication operation is reduced. A u g u s t 0 1 , 2 0 1 5

Finding the Filter Coefficients
For the design of FIR filter, the important step is to find the filter coefficients. The filter coefficients can be calculated in several ways as: window design method, frequency sampling method and paks-McClellan method. The simplest way of designing filter is by using the windowing techniques. Filter coefficients can be obtained using any of the various windows available.
Hamming window method is particularly well-suited for this implementation because of their advantages which are as mentioned below: It minimizes the side lobe level and offers better selectivity for higher frequencysignals. Here, hamming window is used for the calculation of filter coefficients.
The hamming window impulse response coefficients is given as below: We calculated coefficients for filter using MATLAB software for the following specifications: Cut-off frequency:1KHz (Scalable) Order of the filter:7, 8, 11 The obtained filter coefficients by hamming function in Matlab were converted to fixed point decimal. These coefficients are converted to binary and stored in memory, which are accessed in the main filter program.

Test Input Signal
Composite signal is given as an input to the FIR filter to test the filter response. The test signal is generated in MATLAB. This signal is sampled and random 10 values are normalized and stored in registers. Chirp command is used to generate this test signal which is a frequency-swept cosine waveform, in which the frequency varies in similar time intervals.

DESIGN ENVIRONMENT
FIR filter is implemented using XILINX ISE 8.1i. Behavioural simulation is done on built code to verify the functionality of the FIR filter. Timing simulation is also performed on MODELSIM simulator.
The GF multiplier is implemented on Virtex2p, Virtex E and Virtex 4 for the purpose of comparison with other multipliers.
The proposed filter was simulated on Virtex4 FPGA device with speed grade 12.MATLAB is used to find filter coefficients and to generate thecomposite input signal.\

RESULTS AND COMPARATIVE STUDY GF multiplier
GF multiplier is implemented using Xilinx 8.1 synthesis tool and output is observed on Modelsim simulator. We selected Galois Field multiplier for the implementation as it does not give partial product so it helps to increase the speed of the filter. Table 1 shows the comparison of GF multiplier with other multipliers used by Jongsun Park [8] and Laxman.S [15]. From the Table 1 we can conclude that GF multiplier is 45% faster than Wallace Tree multiplier, 52% faster than array multiplier and 24% faster than Booth multiplier.

FIR Filter Results
FIR filter was implemented on XILINX synthesis tool with VIRTEX 4 device using speed grade 12 and output is observed on Modelsim simulator. Muhammad et.al [8] 11 Virtex E 55 --------Jongsun Park, Khurram Muhammad et.al [8] 11 Virtex E 40 --------Jongsun Park, Khurram Muhammad et.al [8] 11 Virtex E 83.33 -------- The results obtained from timing simulation were compared with existing architectures of FIR filter. The comparison clearly indicate that our proposed FIR filter architecture using Galois Field multiplier utilizes less area and operates on maximum frequency of 166MHz .Our implemented filter achieved 71% increase in speed using very amount of hardware resources compared to Jongsun Park [8] which can be gauged based on throughput per slice parameter in Table 2.

CONCLUSION
This paper describes design of Systolic architecture, GF multiplier & final FIR implementation for achieving high speed. It is achieved by using an efficient implementation of systolic array structure.