ENHANCED MULTIQUERY SYSTEM USING KNN FOR CONTENT BASED IMAGE RETRIEVAL

Content Based Image Retrieval (CBIR) techniques are becoming an essential requirement in the multimedia systems with the widespread use of internet, declining cost of storage devices and the exponential growth of un-annotated digital image information available in recent years. Therefore multi query systems have been used rather than a single query in order to bridge the semantic gaps and in order to understand user’s requirements. Moreover, query replacement algorithm has been used in the previous works in which user provides multiple images to the query image set referred as representative images. Feature vectors are extracted for each image in the representative image set and every image in the database. The centroid, Crep of the representative images is obtained by computing the mean of their feature vectors. Then every image in the representative image set is replaced with the same candidate image in the dataset one by one and new centroids are calculated for every replacement .The distance between each of the centroids resulting from the replacement and the representative image centroid Crep is calculated using Euclidean distance. The cumulative sum of these distances determines the similarity of the candidate image with the representative image set and is used for ranking the images. The smaller the distance, the similar will be the image with the representative image set. But it has some research gaps like it takes a lot of time to extract feature of each and every image from the database and compare our image with the database images and complexity as well as cost increases. So in our proposed work, the KNN algorithm is applied for classification of images in the database image set using the query images and the candidate images are reduced to images returned after classification mechanism which leads to decrease the execution time and reduce the number of iterations. Hence due to hybrid model of multi query and KNN, the effectiveness of image retrieval in CBIR system increases. The language used in this work is C /C++ with Open CV libraries and IDE is Visual studio 2015. The experimental results show that our method is more effective to improve the performance of the retrieval of images.


INTRODUCTION
Content-Based Image Retrieval (CBIR) systems are search engines for image databases, which index images according to their content. A typical task solved by CBIR systems is that a user submits a query image or series of images and the system is required to retrieve images from the database as similar as possible. Another task is a support for browsing through large image databases, where the images are supposed to be grouped or organized in accordance with similar properties. Although the image retrieval has been an active research area for many years this difficult problem is still far from being solved. There are two main reasons, the first is so called semantic gap, which is the difference between information that can be extracted from the visual data and the interpretation that the same data have for a user in a given situation. The other reason is called sensory gap, which is the di fference between a real object and its computational representation derived from sensors, which measurements are significantly influenced by the acquisition conditions.. The feature vectors of images in the database form a feature database. The retrieval p rocess is initiated when a user queries the system using an example image or sketch of the object. The query image is converted into the internal representation of feature vector using the same feature extraction routine that was used for building the feature database. The similarity measure is employed to calculate the distance between the feature vectors of query image and those of the target images in the feature database. Finally, the retrieval is performed using an indexing scheme which facilitates the efficient searching of the image database. Recently, user's relevance feedback is also incorporated to further improve the retrieval process in order to produce perceptually and semantically more meaningful retrieval results.

RELATED WORK
Savvas A. Chatzichristofis et al. (2008) deals with a new low level feature that is extracted from the images and can be used for indexing and retrieval. This feature is called "Color and Edge Directivity Descriptor" and incorporates color and texture in formation in a histogram. CEDD size is limited to 54 bytes per image, rendering this descriptor suitable for use in large image databases. Khadidja et al. (2013) focused on CBIR and basic concepts pertaining to it, as well as Relevance Feedback and its various mechanisms. An important contribution in this work is a comparative analysis of CBIR systems using reference feedback: major models and approaches are discussed in detail from early heuristic methods to recently optimal learning algorithms, with more emphasize on their advantages and weaknesses.
Bhavneet Kaur et al. (2014) used the OPEN CV platform since it provides a C interface to implement various image processing algorithms. The work merges the feature extraction technique with this most suitable platform avail able for image algorithms. They have also computed the performance of the technique used in terms of various parameters like execution time, rotation, detect ability, accuracy, etc.  ) presented a survey on low -level feature description techniques for Content Based Image Retrieval is presented with its various applications. In the modern era, with the explosive growth of image databases, huge amount of image and video archive led to rise of a new research and development of efficien t method to searching, locating and retrieving of image. For this purpose, an efficient tool for searching, locating and retrieval of image is requi red.
Ghanshyam Raghuwanshi et al. (2015) proposes a novel technique for texture image retrieval based on tet rolet transforms. Tetrolets provide fine texture information due to its different way of analysis. Tetrominoes are applied at each decomposition level of an image and best combination of tetrominoes is selected, which better shows the geometry of an image at each level. All three high pass components of the decomposed image at each level are used as input values for feature extraction.
Jayant Mankar et al. (2016) states every day an enormous amount of data is retrieved and transmitted on the Internet. Internet gives rise to have the relevant information more quickly. Most of the users or researchers required the image data from the available image database. For the retrieval of concern image data from the huge database is tedious task in terms of the storage and retrieval time.

MOTIVATION OF THE WORK
In the last two decades, CBIR systems have been improved a lot. However, there still remain some problems which have not been answered satisfactorily. First and foremost, problem is of semantic gap, which exist between low level feature representation of images and the actual visual perception o f the image. Researchers all over the globe are working in the direction of narrowing down this semantic gap. Semantic gap is a big problem which can be seen as a collection of many small problems. The semantic gap is the lack of coincidence between the in formation that one can extract from the visual data and the interpretation that the same data has for a user in a given situation. Negative images which user don't want in the output as a result. In another word, images with high low -level feature similarities may still be different in terms of user perception. So, similarity by low-level features, not always mean semantic similarity of these images. As number of images present in the database may be in large quantity, it takes a lot of time to feature each and every image from the database and compare our image with the database images. Every image has to be compared with images present in the database, so it increases the complexity of the system. If there are more number of comparisons, more hardware is required and thereby increasing the cost. In this work, we have identified such problems and tried to provide an effective solution to these problems.

OBJECTIVES
This paper is devoted to improving existing techniques involved in feature extraction, simil arity matching and reducing the overall computation time of image retrieval system while increasing the accuracy. The main contributions of the thesis are listed below:  Design and development of a multistage model for image retrieval to improve the retrie val accuracy by filtering down irrelevant images at each stage. The accuracy of region based image retrieval system is improved by introducing novel region  To reduce the number of iterations by using the image classification mechanism  To reduce the proces sing time of the algorithm, thereby improving the overall efficiency of the system.


To implement the proposed algorithm in OpenCV environment and evaluate the performance with the existing algorithm

KNN
An instance based learning method called the K-Nearest Neighbor or K-NN algorithm has been used in many applications in areas such as data mining, statistical pattern recognition, image processing. Successful applications include recognition of handwriting, satellite image and EKG pattern. In data mining, w e often need to compare samples to see how similar they are to each other. For samples whose features have continuous values, it is customary to consider samples to be similar to each other if the distances between them are small. Other than the most popul ar choice of Euclidean distance, there are of course many other ways to define distance. The k-means clustering algorithm attempts to split a given anonymous data set (a set containing no information as to class identity) into a fixed number (k) of cluster s.
Initially k number of so called centroids are chosen. A centroid is a data point (imaginary or real) at the center of a clust er. In Praat each centroid is an existing data point in the given input data set, picked at random, such that all centroids are unique (that is, for all centroids ci and cj, ci ≠ cj). These centroids are used to train a KNN classifier. The resulting cla ssifier is used to classify (using k = 1) the data and thereby produce an initial randomized set of clusters. Each centroid is thereafter set to the arithmetic mean of the cluster it defines. The process of classification and centroid adjustment is repeated until the values of the centroids stabilize. The final centroids will be used to produce the final classification/clustering of the input data, effectively turning the set of initially anonymous data points into a set of data points, each with a class identity.

Figure 1. KNN Classification
The training examples are vectors in a multidimensional feature space, each with a class lab el. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.
In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point.

IMPROVED MULTI QUERY CBIR USING QUERY REPLACEMENT
A multi query system using query replacement algorithm was used in the previous work which utilizes the stat istical features of a query image set to determine the similarity of the candidate images in the database for ranking and retrieval. Using this method with smaller number of query images, high retrieval precision rate is obtained but the major drawback is high computation cost during the run time. In this work a novel query replacement algorithm using KNN is proposed for boosting the efficiency of a multi query CBIR system. First of all, the database images are fetched and are classified using KNN algorithm based on the query images. The database is reduced to the classified images. The algorithm is based on the principle that if an element in set X is to be replaced with an element in set Y, it will cause minimum information change if the replaced element has high similarity with the element being replaced.
 User provides R images to the query image set referred as representative images and D images in the database images referred as candidate images.


The KNN algorithm is applied for classification of images in the database image using the query images.


The candidate images are reduced to images returned after classification mechanism.

PROPOSED ALGORITHM
Input: Each category image from the database was used in turn as a query with the scope set as 10, 20, 40, 60, 80, 100 and 200. After performing all the retrievals, the results are evaluated. Number of iterations, processing time, precision and recall are evaluated for the existing work and proposed work. Results from using the multi query replacement algorithm without KNN and with KNN are represented by using bar charts and tables. For the particular Query Image A, B, C, D, the cumulative sum of all the distances is computed. The system retrieved images according to the user's preference or the query images provided by the user. This attribute is useful to find pa rticular kinds of images, for which no exact query can be found. Searching can be performed on an image that are somewhat similar.

PERFORMANCE EVALUATION
The performance of a retrieval system is evaluated based on several criteria. Some of the commonly used performance measures are average precision, average recall, average retrieval rate. All these parameters are computed using precision and recall values computed for each query image. The precision of the retrieval is defined as the fraction of the retrieved images that are indeed relevant for the query: The recall is the fraction of relevant images that is returned by the query: Only 20 images will be further used for multi query replacement algorithm which will further decreases the number of iterations and the processing time. In the table 2, we have calculated the precision and recall for the same set of i mages that are used in table 1. The precision is ranging from 0.67 to 1.0 which is a sign of improvement over the existing work. The number of iterations have been reduced substantially. The overall execution time is computed in milliseconds and has been reduced in comparison with existing work.   . illustrates the improvement in precision when KNN classification is applied on multi query replacement mechanism. After applying the classification using KNN, the precision of the proposed work has been increased and thereby increasing the overall efficiency of the system. A good CBIR system should have higher value of precision. In the figure 8, we have illustrated the precision-recall graph of the proposed work. For a given experiment, as the size increases, both the Recall and precision increased. This is expected as the number of positive and negative images can only remain constant or increase. The trade-off between the two values is governed by the size used.

CONCLUSION
We have reviewed the main components of a content based image retrieval system by applying KNN with multiple query replacement mechanism, including image feature representation, indexing, query processing, and query-image matching and user's interaction, while highlighting the current state of the art and the key -challenges. It has been acknowledged that it remains much room for potential improvement in the development of content based image retrieval system due to semantic gap between image similarity outcome and user's perception. Contributions of soft-computing approaches and natural language processing methods are especially required to narrow this gap.
For our investigation, it was necessary to survey performance evaluation techniques. It was noted that those sample size factors are often omitted in CBIR performance meas ures. Their proven impact emphasizes the need to illustrate them for a proper representation of a system. This can be overcome by more in -depth analysis of results beyond the standard measures used as well as normalization procedures with respect to scope and number of irrelevant semantic classes. In the future scope, we can extend the multi query replacement mechanism with another available classifier and find out the differences between the present work.