Improved Adaptive Huffman Compression Algorithm

1. ABSTRACT In information age, sending the data from one end to another end need lot of space as well as time. Data compression is a technique to compress the information source (e.g. a data file, a speech signal, an image, or a video signal) in possible few numbers of bits. One of the major factors that influence the Data Compression technique is the procedure to encode the source data and space required for encoded data. There are many data compressions methods which are used for data compression and out of which Huffman is mostly used for same. Huffman algorithms have two ranges static as well as adaptive. Static Huffman algorithm is a technique that encoded the data in two passes. In first pass it requires to calculate the frequency of each symbol and in second pass it constructs the Huffman tree. Adaptive Huffman algorithm is expanded on Huffman algorithm that constructs the Huffman tree but take more space than Static Huffman algorithm. This paper introduces a new data compression Algorithm which is based on Huffman coding. This algorithm not only reduces the number of pass but also reduce the storage space in compare to adaptive Huffman algorithm and comparable to static.


INTRODUCTION
Compression Technique is a standard that squeeze source data, to take less space for storing the same data. The only threat to compress the data is loss of valuable data there are many algorithms existing in the information technology and every algorithm has its own advantages and disadvantages. Huffman is one of the techniques which are highly used by many solution provider companies now in these days. This technique has two principal algorithms static as well as adaptive. This paper is organized as follows Section 4 illustrates the brief introduction of Static Huffman algorithm Section 5 defines the brief introduction of Adaptive Huffman algorithm Section 6 defines the design strategy of Improved adaptive Huffman algorithm and finally the conclusion are given in Section 7.

Static Huffman algorithm
The Static Huffman algorithm developed by David Huffman (1952) generates the encoded data in two passes that are as follows

Results: Encoded data is seventeen bits long
For implementing Static Huffman Encoding require Two Passes.

Adaptive Huffman algorithm
Expanding on the static Huffman algorithm, Faller and Gallagher [Faller 1973;Gallagher 1978], and later Knuth [Knuth 1985] and Vitter [Vitter 1987], developed a way to perform Static Huffman algorithm as one pass that are as follows 1. Initially Adaptive Huffman algorithm generates a Huffman tree with all different symbols frequency count to one and took code for first symbol in the source data. For the second symbol it generates the second Huffman tree and took the code for second symbol (First and Second symbol may be same or different) and so on up to last bit (byte) of source data.

Concept
The basic concept behind an adaptive compression algorithm is very simple: Initialize the model Repeat for each character { Encode character Update the model } Decompression works the same way. As long as both sides have the same initialize and update model algorithms, they will have the same information.
Example Adaptive Huffman Trees for encoding the Data " caaaddbddd" 1.

Result: Encoded data is twenty two bits long
For implementing Adaptive Huffman algorithm we must need to know in advance that how many different symbols are present in the source data. Adaptive Huffman Encoded data takes more space than static Huffman Encoded data. Adaptive Huffman Encoding generates a different tree for every next symbol (different or same). Every tree makes a different code for next symbol (even for same symbol) therefore it is must to remember all trees for decoding the data.

Improved Adaptive Huffman Algorithm
Static Huffman algorithm first scans all the source data and count frequency of each symbol. It then sorts the frequency table in decreasing order. In second pass it constructs the tree based on the pass one table. But in this method some time the source data is so lengthy it takes so much time to construct a table that is wastage of time as well as space required to store the table.
In Adaptive Huffman algorithm, while encoding the symbols, after each symbol is encoded we need to update the tree. Same is also done in decoding the code. That means there is some processing overburden involved. The encoded data by Adaptive Huffman algorithm requires more space than Static Huffman encoded data. The other major drawback of the Adaptive Huffman algorithm for encoding the data it must require in advance that how many different symbols are present in the source data. So it will first scan all the source data to determine that how many different symbols are present in the source data.
The some other major drawbacks of adaptive Huffman algorithm are as follows: a. Adaptive Huffman require more space to store the compressed data. b. Adaptive Huffman must know in advance that how many different symbols are present in the data. So it will scan all the string before constructing the first tree. c. It is very time consuming, it first construct the tree and than take the code for the symbol, for the next symbol it do the same ( up to the last symbol). d. In adaptive Huffman algorithm many of the different symbols having same code in the encoded data which creates a lot of confusing while decompressed the data. e. In adaptive Huffman same symbol that occurs frequently has different code. So it can create confusion while decompressing the data. f. Finally while decompressing the data we need all trees, for smaller data it is ok but for large data it needs a huge storage space. Improved Adaptive Huffman algorithm which is based on existing Huffman algorithm having a one pass in compare to the existing static Huffman algorithms and requires less space for storing the encoded data as compare to adaptive Huffman algorithm. The purposed method with its algorithm is as follows: 1. Initially Improved adaptive Huffman algorithm generates strictly binary tree on reading first symbol in the source data. For the next symbol it generates a tree and so on up to last symbol of source data. On reading the last symbol it makes the final Huffman tree.
Advantages of Improved Adaptive Huffman over Adaptive Huffman are:

Result: Encoded data is seventeen bits long
This method requires only one pass to encode source data and there is no need to scan each one of symbol before encoding. Encoded data needs less space as compare to Adaptive Huffman Encoding

Conclusion
Improved adaptive Huffman algorithm requires only single tree to compress the data instead of all tree required by Adaptive Huffman which takes lot of storage space. The Adaptive Huffman algorithm required to scan all the characters in a string at start to construct a tree where as Improved Adaptive Huffman start from first character which is time saving. It is further concluded that the encoded data of new Huffman algorithm is less than of adaptive and comparable to static.