An International Open Access, Peer-Reviewed Refereed Journal Impact Factor: 6.4 Website: <a href="https://ijarmt.com">https://ijarmt.com</a> ISSN No.: 3048-9458 # VLSI Architecture for Discrete Hartley Transform using MCSLA based Partition Technique #### Nikhil Srivastava M. Tech. Scholar, Department of Electronics and Communication, Bhabha Engineering Research Institute, Bhopal #### Prof. Suresh S. Gawande Guide, Department of Electronics and Communication, Bhabha Engineering Research Institute, Bhopal Abstract— A discrete Hartley transform (DHT) algorithm can be efficiently implemented on a highly modular and parallel architecture having a regular structure is consisting of multiplier and adder. Discrete Hartley Transform (DHT) is one of the transform used for converting data in time domain into frequency domain using only real values. We have proposed a new algorithm for calculating DHT of length 2<sup>N</sup>, where N=3 and 4. This project presents a high-performance VLSI architecture for computing the Discrete Hartley Transform (DHT), leveraging the Modified Carry Select Adder (MCSLA)-based partition technique. The DHT is widely used in signal and image processing due to its real-valued transformation, which reduces computational complexity compared to the Discrete Fourier Transform (DFT). To enhance speed and efficiency, we propose a novel architecture utilizing MCSLA for optimized arithmetic operations within the DHT. The design is synthesized using Verilog HDL and implemented on FPGA. Performance metrics such as area, delay, power consumption, and throughput are analyzed and compared with conventional DHT architectures. The proposed method shows significant improvements in speed and resource utilization, making it suitable for real-time signal processing applications. *Keywords*— Discrete Hartley Transform (DHT), Xilinx Vertex family, Modified Carry Select Adder (MCSLA) An International Open Access, Peer-Reviewed Refereed Journal Impact Factor: 6.4 Website: <a href="https://ijarmt.com">https://ijarmt.com</a> ISSN No.: 3048-9458 #### **INTRODUCTION** Signal processing includes signal synthesis, information extraction, and analysis. The sort of signal and the type of information it carries determine this. Generally speaking, it is a mathematical and graphical tool for signal processing, such as data compression, data transmission, noise removal, filtering, smoothing, and identification, among others. To name a few, the signal could be an image, sound, control signal, or time-varying sensed data. Time domain representation is the mathematical depiction of a signal as a function of time. Another way to represent the same signal is as a sum of sine and cosine functions of frequencies; this is known as the frequency domain representation. Transformation is the process of transforming a signal from the time domain into the frequency domain. While a frequency domain representation provides details about the many frequency components that make up the signal, a time domain signal shows how the signal varies over time. The real-world signal is often visualized using frequency domaina representation. The majority of natural signals are analog, meaning that suitable analog systems can process them directly. However, the digital signal processing (DSP) techniques are preferred due to several benefits, including flexibility, accuracy, and speed in designing discrete time systems and the use of computers for analysis, as a result of advancements in computer and integrated circuit (IC) technology. Speech processing, data transmission, image processing, instrumentation, biomedical engineering, seismology, oil exploration, nuclear explosion detection, and processing of signals from space are just a few of the many systems that use DSP techniques [1-2]. Discrete transforms are used in many different fields of science, engineering, and technology, including digital signal processing. These transformations' depiction of signals results in practical problem-solving and frequently offers deeper understanding of physical events. Since these transformations have been essential to signal processing for a number of years, transform coding remains a topic of interest for both theoretical and applied work in this area. We have concentrated on the discrete orthogonal transformations in the current study. In order to lower the bandwidth needed for transmission, these transforms are used to compress data from natural sources, such as audio signals. Transform coding refers to several coding systems used for compression. The fact that a signal's energy is concentrated in a small percentage of transform coefficients is used by transform coding. An International Open Access, Peer-Reviewed Refereed Journal Impact Factor: 6.4 Website: <a href="https://ijarmt.com">https://ijarmt.com</a> ISSN No.: 3048-9458 #### DISCRETE HARTLEY TRANSFORM Discrete Hartley Transform is abbreviated for DHT and this transform was proposed by R. V. L. Hartley in 1942. DHT is the analogous to Fast Fourier transform which provides the only real value at any cost. The main difference from the DFT is that it transforms the real inputs to real outputs with no intrinsic involvement of complex value. DFT can be used to compute the DHT, and vice versa. Figure 1: Block Diagram of DHT In other words, Discrete Hartley transform is used to convert real values into real ones. It requires decomposition of data into stages using butterfly similar to FFT. But the butterfly used in DHT is quite different in terms of coefficients or multipliers. With the increase in number of DHT sequence length the number of coefficients is also increased simultaneously. Let $N \ge 4$ be a power of two. For any real input sequence $\{x(i): i = 0,1,2,\ldots,N-1\}$ $$X(k) = DHT(N)\{x(i)\}$$ $$= \sum_{i=0}^{N-1} x(i) \cdot cas[2ki\pi/N] \quad for \ k = 0,1....N-1$$ Where cas(x) = cos(x) + sin(x) #### • Algorithm for 8-Point DHT- First two stages do not include any multiplication. Remaining terms are multiplied by the first coefficient. In the next stage again two new coefficients are introduced which is multiplied by the lower half of the third stage. In each stage multiplying of coefficients stage precedes its summing stage. An International Open Access, Peer-Reviewed Refereed Journal **Impact Factor: 6.4** Website: https://ijarmt.com ISSN No.: 3048-9458 Figure 2-point Discrete Hartley Transform After coefficient multiplication it is preceded by its summing stage to form the common terms used in the final stage. Last stage includes only summing of terms. Finally we get the transformed data sequence in order and do not need any permutation. #### Mathematical calculation for N=8 $$X(0) = [(x(0) + x(4)) + (x(2) + x(6))] + [(x(1) + x(5)) + (x(3) + x(7))]$$ $$X(2) = [(x(0) + x(4)) - (x(2) + x(6))] + [(x(1) + x(5)) - (x(3) + x(7))]$$ $$X(4) = [(x(0) + x(4)) + (x(2) + x(6))] - [(x(1) + x(5)) + (x(3) + x(7))]$$ $$X(6) = [(x(0) + x(4)) - (x(2) + x(6))] - [(x(1) + x(5)) - (x(3) + x(7))]$$ $$X(1) = [(x(0) - x(4)) + (x(2) - x(6))] + c(x(1) - x(5))$$ $$X(3) = [(x(0) - x(4)) - (x(2) - x(6))] + c(x(3) - x(7))$$ $$X(5) = [(x(0) - x(4)) + (x(2) - x(6))] - c(x(1) - x(5))$$ $$X(5) = [(x(0) - x(4)) + (x(2) - x(6))] - c(x(1) - x(5))$$ $$X(3) = [(x(0) - x(4)) - (x(2) - x(6))] - c(x(3) - x(7))$$ with $$c = \sqrt{2}$$ An International Open Access, Peer-Reviewed Refereed Journal Impact Factor: 6.4 Website: <a href="https://ijarmt.com">https://ijarmt.com</a> ISSN No.: 3048-9458 Where c is the multiplier #### PARALLEL AND PIPELINE STRUCTURE The proposed DHT algorithm has been designed in parallel as well as pipelined structure. In parallel structure, 4 such units as shown in Fig. 3 are implemented in parallel such that 4 Hartley's coefficients are obtained simultaneously. The hardware circuit obtained for 1 unit of the parallel structure is given in Fig. 3. On the other hand, in case of the pipeline structure, there are registers incorporated between multiple stages of the structure. To implement the pipeline structure, only 1 such unit is required which gives all Hartley's coefficients at output in subsequent clock cycles, one by one. The hardware circuit for the same is given in Fig. 4. In above figure, the dotted blocks denote the registers between multiple stages. However, it must be noted that word format taken for both the structures are different. Since in pipeline structure, values in previous stages are updated in every clock cycle, we need to transmit additional information along with the data. The information related to p, and q are also added to the transmitted data so that we can retain these information in every upcoming stage of the pipeline. Word format for parallel structure is given as: 1 sign bit, 10 integer bits and 20 fraction bits. While, word format for pipeline structure is given as: first few bits (4-6 bits) to transmit additional information, then 1 sign bit, 10 integer bits and 20 fraction bits. Therefore, at every stage, the values of p and q are read first to identify the coefficient and then the required computations are done. These units are replicated four times (for q=0,1,2,3) and then added together to get one Hartley's coefficient at the output. An International Open Access, Peer-Reviewed Refereed Journal Impact Factor: 6.4 Website: <a href="https://ijarmt.com">https://ijarmt.com</a> ISSN No.: 3048-9458 Figure 3: Hardware Circuit for parallel structure Figure 4: Hardware Circuit for pipeline structure #### PROPOSED TECHNOLOGY This multiplication technique reduces the delay and complexity. It converts multiplication into simple logical AND operation using associated circuits of full, half adders and AND gate. Further, the area is reduced by using various compressors for adding the partial products of multiplication. An International Open Access, Peer-Reviewed Refereed Journal Impact Factor: 6.4 Website: <a href="https://ijarmt.com">https://ijarmt.com</a> ISSN No.: 3048-9458 Figure 5: Proposed Architecture The MCSLA is consist of reduce full adder (RFA). RFA is consist of multiplier is shown in fig. 6. In the MCSLA, the full adder circuit is greatly reduced. The number of gate counts is reduced in FA circuit. Figure 6: RFA Circuit #### MCSLA based Partition Technique We consider two K-bit operands $a_{K-1}$ , $a_{K-2}$ ,... $a_2$ , $a_1$ , $a_0$ and $b_{K-1}$ $b_{K-2}$ ... $b_2$ $b_1$ $b_0$ for K by K multiplier, the partial products of two K-bit numbers are $a_i$ $b_j$ where i, j go from 0,1,...K-1. The longest two columns in the middle of the partial products contribute to the maximum delay. We then proceed to sum up each column of the two parts in parallel. Partition Multiplier Method: - Let X and Y be numbers of p-bits such that multiplication of the inputs X and Y gives the 2p bits product. Here X and Y are split into q numbers as $X_0, X_1, X_2, \dots, X_{q-1}$ and $Y_0, Y_1, Y_2, \dots, Y_{q-1}$ each consisting of r bits. Therefore $X_0$ contains the even partitions of X and $X_1$ contains the odd partitions of X, similarly $Y_0$ contains the even partitions of Y and $Y_1$ contains the odd partitions of Y. $X_0$ Contain multiplier all the partition of Y. $X_0$ is multiplier of $Y_0$ , the lower significant bit (LSB) of the $X_0$ add with LSB of the $Y_0$ bit position and most significant bit (MSB) of the $X_0$ add with MSB of the $Y_0$ bit position. $$Z = X \times Y = X_0 \times Y_0 + X_0 \times Y_1 + X_1 \times Y_0 + X_1 \times Y_1$$ An International Open Access, Peer-Reviewed Refereed Journal Impact Factor: 6.4 Website: <a href="https://ijarmt.com">https://ijarmt.com</a> ISSN No.: 3048-9458 Figure 7: Flow chart of proposed partition multiplier method #### SIMULATION RESULT 8-bit DHT architecture is designed by using Xilinx 14.2i. MCLA, 16-bit MCSLA, partition multiplier using MCSLA. Figure 8: Full Adder using MUX Figure 9: RTL view of Full Adder using MUX Figure 10: 16-bit Adder using MCSLA Figure 11: RTL view of 16-bit Adder using MCSLA Figure 12: Partition Multiplier using MCSLA Figure 13: RTL view of Partition Multiplier using MCSLA Figure 14: 8-bit DHT using MCSLA Figure 15: RTL view of 8-bit DHT using MCSLA | Name | Value | 0 ns 200 ns 400 ns 600 ns | |--------------|------------------|---------------------------------| | ▶ 😽 x0[15:0] | 0000010101010100 | (00000000) 000001010101010 | | 🕨 😽 x1[15:0] | 0001110101010100 | 00000000\ 000111010101010 | | x2[15:0] | 0000010101010111 | 00000000\ 0000010101010111 | | x3[15:0] | 0000110101010110 | 00000000\ 0000110101010101 | | ×4[15:0] | 0000010101010101 | 00000000 0000010101010101 | | > 3 x5[15:0] | 0001010101010100 | 00000000 | | ► 😽 x6[15:0] | 0011010101010100 | 00000000 | | x7[15:0] | 0001000101010100 | 00010001010101010 | | > 😽 y0[15:0] | 1001011000000110 | 00000000 1001011000000110 | | > 😽 y1[15:0] | 0111000010001010 | 01000001 01110000100010 10 | | > 😽 y2[15:0] | 1010001111111100 | (00000000) 10100011111111100 | | > 🥞 y3[15:0] | 0000111110000100 | 01000000 0000111110000100 | | > 😽 y4[15:0] | 0001010000000010 | 00000000 0001010000000010 | | > 🥞 y5[15:0] | 1001100111111010 | (10110110) 10011001111111010 | | > 🥞 y6[15:0] | 0001110000000000 | (00000000) 0001110000000000 | | > 🥞 y7[15:0] | 1000111111110100 | (10110111) 10001111111110100 | Figure 16: Waveform view of 8-bit DHT using MCSLA ``` Advanced HDL Synthesis Report Macro Statistics # Multipliers : 8 4x4-bit multiplier : 8 # Multiplexers : 192 1-bit 2-to-1 multiplexer : 192 # Xors : 650 1-bit xor2 : 650 ``` An International Open Access, Peer-Reviewed Refereed Journal Impact Factor: 6.4 Website: <a href="https://ijarmt.com">https://ijarmt.com</a> ISSN No.: 3048-9458 ``` Device utilization summary: Selected Device : 6slx4tqq144-3 Slice Logic Utilization: Number of Slice LUTs: 696 out of 2400 295 Number used as Logic: 696 out of 2400 29% Slice Logic Distribution: Number of LUT Flip Flop pairs used: 696 Number with an unused Flip Flop: 696 out of 696 100% 0 out of Number with an unused LUT: 696 0.9 Number of fully used LUT-FF pairs: 0 out of 696 0 Number of unique control sets: IO Utilization: 264 Number of IOs: 264 out of 102 258% (*) Number of bonded IOBs: Timing Summary: Speed Grade: -3 Minimum period: No path found Minimum input arrival time before clock: No path found Maximum output required time after clock: No path found Maximum combinational path delay: 21.775ns ``` Figure Result Analysis #### **CONCLUSION** In this work, a high-performance and area-efficient VLSI architecture for the Discrete Hartley Transform (DHT) was designed using a Modified Carry Select Adder (MCSLA)-based partition technique. The proposed design addresses the computational complexity and delay issues often associated with conventional DHT architectures by incorporating optimized arithmetic blocks and partitioned processing stages. By leveraging the speed advantages of MCSLA, the architecture significantly reduces the critical path delay and enhances throughput, making it suitable for real-time signal processing applications. Synthesis and simulation results demonstrate that the proposed design achieves better performance in terms of area utilization, power consumption, and speed when compared to traditional DHT implementations. The proposed MCSLA-based DHT architecture presents an effective solution for high-speed and low-power signal processing systems, particularly in embedded and portable applications where resource constraints are critical. An International Open Access, Peer-Reviewed Refereed Journal Impact Factor: 6.4 Website: <a href="https://ijarmt.com">https://ijarmt.com</a> ISSN No.: 3048-9458 #### **REFRENCES** - [1] Riya Jain and Priyanka Jain, "FPGA Implementation of DHT through Parallel and Pipeline Structure", International Conference on Computer Communication and Informatics (*ICCCI* -2021), Jan. 27 29, 2021, Coimbatore, India. - [2] Nikhil Advaith Gudala and Trond Ytterdal, "Implementation of High Speed and Low Power Carry Select Adder with BEC", IInternational Midwest Symposium on Circuits and Systems IEEE 2021. - [3] Jain, N. Pandey, "New Algorithm for DHT and its Verilog Implementation," International Journal of Innovative Technology and Exploring Engineering (IJITEE), Vol. 9, 2020. - [4] N. Hamed S. Timarchi "Low-Power and Fast Full Adder by Exploring New XOR and XNOR Gates" IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2018. - [5] Low Power Approximate Adders for General Computing Using Differential Transmission Gate" on 28.03.2018 National Conference on Innovations in Communication and Computing NCICC" 18 SNS College of Technology. - [6] Omid Akbari, Mehdi Kamal, Ali Afzali-Kusha, and Massoud Pedram, "Dual-Quality 4:2 Compressors for Utilizing in Dynamic Accuracy Configurable Multipliers", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 34, Issue 7, 2017. - [7] Doru Florin Chiper, *Senior Member, IEEE,* "A Novel VLSI DHT Algorithm for a Highly Modular and Parallel Architecture", IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 60, NO. 5, MAY 2013. - [8] Sushma R. Huddar and Sudhir Rao, Kalpana M., "Novel High Speed Vedic Mathematics Multiplier using Compressors", 978-1 4673-5090-7/13/\$31.00 ©2013 IEEE. - [9] S. S. Kerur, Prakash Narchi, Jayashree C N, Harish M Kittur and Girish V A, "Implementation of Vedic multiplier for Digital Signal Processing", International Conference on VLSI, Communication & Instrumentation (ICVCI) 2011, Proceedings published by International Journal of Computer Applications® (IJCA), pp.1 -6. - [10] Himanshu Thapaliyal and M.B Srinivas, "VLSI Implementation of RSA Encryption System Using Ancient Indian Vedic Mathematics", Center for VLSI and Volume-2, Issue-1, January–March 2025 571 An International Open Access, Peer-Reviewed Refereed Journal Impact Factor: 6.4 Website: <a href="https://ijarmt.com">https://ijarmt.com</a> ISSN No.: 3048-9458 Embedded System Technologies, International Institute of Information Technology Hyderabad, India. - [11] Jagadguru Swami Sri Bharati Krishna Tirthaji Maharaja, "Vedic Mathematics: Sixteen simple Mathematical Formulae from the Veda", Delhi (2011). - [12] Sumit Vaidya and Depak Dandekar. "Delay-power perfor-mance comparison of multipliers in VLSI circuit design". International Journal of Computer Networks & Communications (IJCNC), Vol.2, No.4, July 2010. - [13] P. D. Chidgupkar and M. T. Karad, "The Implementation of Vedic Algorithms in Digital Signal Processing", Global J. of Eng. Edu, Vol.8, No.2, 204, UICEE Published in Australia. - [14] Asmita Haveliya, "Design and Simulation of 32-Point FFT Using Radix-2 Algorithm for FPGA Implementation", Second International Conference on Advanced Computing & Communication Technologies IEEE 2012. - [15] S. Correa, L. C. Freitas, A. Klautau and J. C. W. A. Costa, "VHDL Implementation of a Flexible and Synthesizable FFT Processor", IEEE LATIN AMERICA TRANSACTIONS, VOL. 10, NO. 1, JAN. 2012.