ENHANCED CBIR MECHANISM USING STEERABLE PYRAMID AND MEDIAN VECTOR ALGORITHM

Recently, digital content has become a significant and inevitable asset of or any enterprise and the need for visual content management is on the rise as well. Content-based image retrieval has attracted voluminous research in the last decade paving way for development of numerous techniques and systems besides creating interest on fields that support these systems. CBIR indexes the images based on the features obtained from visual content so as to facilitate speedy retrieval. In this thesis work, we present a steerable pyramid based image retrieval system that uses color, contours and texture as visual features to describe the content of an image region. We have initially used steerable pyramid to extract texture features from query image and database images and store them in feature vectors. Second, to speed up retrieval and similarity computation, the database images are classified and the extracted regions are clustered according to their feature vectors using median vector algorithm. This process is performed before query matching takes place. Therefore to answer a query our system does not need to search the entire database images; instead just a number of candidate images are required to be searched for image similarity. Our proposed system has the advantage of increasing the retrieval accuracy and decreasing the retrieval time. The experimental evaluation of the system is based on a satellite and medical image database. From the experimental results, it is evident that our system performs significantly better and faster compared with other existing systems. In our analysis, we provide a comparison between retrieval results based on features extracted from the whole image using steerable pyramid with median vector and features extracted from same image without median vector. The results demonstrate that each type of feature is effective for a particular type of images according to its semantic contents, and using a combination of them giving better retrieval results for almost all different classes of images in the dataset.


INTRODUCTION
With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and images has been used in many fields like medical treatment, satellite data, video and still images repositories, digital forensics and surveillance system. This has created an ongoing demand of systems that can store and retrieve multimedia data in an effective way. Many multimedia information storage and retrieval systems have been developed till now for catering these demands. The most common retrieval systems are Text Based Image.
Retrieval (TBIR) systems, where the search is based on automatic or manual annotation of images. A conventional TBIR searches the database for the similar text surrounding the image as given in the query string. The commonly used TBIR system is Google Images. The text based systems are fast as the string matching is computationally less time consuming process. However, it is sometimes difficult to express the whole visual content of images in words and TBIR may end up in producing irrelevant results. In addition, annotation of images is not always correct and consumes a lot of time. For finding the alternative way of searching and overcoming the limitations imposed by TBIR systems more intuitive and user friendly content based image retrieval systems (CBIR) were developed. A CBIR system uses visual contents of the images described in the form of low level features like color, texture, shape and spatial locations to represent the images in the databases. The system retrieves similar images when an example image or sketch is presented as input to the system. Querying in this way eliminates the need of describing the visual content of images in words and is close to human perception of visual data. Content-based image retrieval research has produced a number of search engines. The commercial image providers, for the most part, are not using these techniques. The main reason is that most CBIR systems require an example image and then retrieve similar images from their databases. Real users do not have example images; they start with an idea, not an image. Some CBIR systems allows users to draw the sketch of the images wanted. Such systems require the users to have their objectives in mind first and therefore can only be applied in some s pecific domains, like trademark matching, and painting purchasing. Most earlier CBIR systems rely on global image features, such as color histogram and texture statistics. Global features cannot capture object properties, so local features are favored for object class recognition. For the same reason, higher-level image features are preferred to lower-level ones. Similar image elements, like pixels, patches, and lines can be grouped together to form higher-level units, which are more likely to correspond to objects or object parts. Different types of features can be combined to improve the feature discriminability. For example, using color and texture to identify trees is more reliable than using color or texture alone. The context information is also helpful for detecting objects. A boat candidate region more likely corresponds to a boat if it is inside a blue region. While improving the ability of our system by designing higher-level image features and combining individual ones, we should be prepared to apply more and more features since a limited number of features cannot satisfying the requirement of recognizing many different objects in ordinary photographic images. To open our system to new features and to smooth the procedure of J A N U A R Y , 2 0 1 7 combining different features, we propose a new concept call ed an abstract region; each feature type that can be extracted from an image is represented by a region in the image plus a feature vector acting as a representative for that region. The idea is that all features will be regions, each with its own set of a ttributes, but with a common representation. This uniform representation enables our system to handle multiple different feature types and to be extendable to new features at any time.

RELATED WORK
Swapnalini Pattanaik et al. (2012) gives an overview idea of retrieving images from a large database. CBIR is used for automatic indexing and retrieval of images depending upon contents of images known as features. The features may be low level or High level. The low-level features include color, texture and shape. The high-level feature describes the concept of human brain. The difference between low level features extracted from images and the high -level information need of the user known as semantic gap.
Yanzhi Chen et al. (2012) proposed a discriminative criterion for improving result quality. This criterion lends itself to the addition of extra query data, and they showed that multiple query images can be comb index to produce enhanced results. Experiments compare the performance of the method to state-of-the-art in object retrieval, and show how performance is lifted by the inclusion of further query images.  2016) proposes the content based image retrieval as one of most technique of data and multimedia technology. As image collections are growing at a rapid rate, and demand for efficient and effective tools fo r retrieval of query images from database is increased significantly. Between, content-based image retrieval systems have become very popular for browsing, in searching and retrieving images from a large database of digital images as it requires relatively less human intervention.
Ru-Ze Liang et al. (2016) studies the problem of content-based image retrieval. In this problem, the most popular performance measure is the top precision measure, and the most important component of a retrieval system is the similarity function used to compare a query image against a database image. However, up to now, there is no existing similarity learning method proposed to optimize the top precision measure.

PROBLEM FORMULATION
The motivation of our research is to improve several aspects of content-based image retrieval by finding the latent correlation between low-level visual features and high-level semantics and integrating them into a unified vector space model. To be more specific, the significance of this approach i s to design and implement an effective and efficient framework of image retrieval techniques, using a variety of visual features such as color, texture, shape and spatial relationships. Steerable Pyramid, an information retrieval technique, is incorporated with content-based image retrieval. By using this technique, we aim to extract the underlying semantic structure of image content and hence to bridge the gap between low-level features and high-level concepts. Improved retrieval performance and more efficient indexing structure can also be achieved.
• The semantic gap between the user's needs and the capability of CBIR algorithms remains significant. Significant effort has been put into using low-level image properties such as color.
• There is an existence of a semantic gap between the pixel values and the interpretation of the image. Part of the problem is that representing an image by simple color feature will usually results in loss of information so that different pictures may map onto the same set of features.
• If an image has area A and each pixel has L possible levels, then there are LA such images possible. By taking only color feature vector, there will be significant reduction in the precision of the retrieved results as steerable pyramids wil l keep on losing the information at the higher level.

OBJECTIVES
• To study the existing CBIR mechanisms and their limitations.
• To identify the semantic gaps in the existing mechanisms • To analyze the number of positive images in the result set.
• To optimize the processing time of the algorithm, thereby improving the overall efficiency of the system.
• To implement the proposed algorithm in OpenCV environment and evaluate the performance with the existing algorithm .

STEERABLE PYRAMID
The Steerable Pyramid is a linear multi-scale, multi-orientation image decomposition that provides a useful front-end for image-processing and computer vision applications. We developed this representation in 1990, in order to overcome the limitations of orthogonal separable wavelet decompositions that were then becoming popular for image processing (specifically, those representations are heavily aliased, and do not represent oblique orientations well). Once the orthogonality constraint is dropped, it makes sense to com pletely reconsider the filter design problem (as opposed to just re-using orthogonal wavelet filters in a redundant representation, as is done in cycle -spinning or undecimated wavelet transforms!).
The basic functions of the steerable pyramid are Kth-order directional derivative operators (for any choice of K), that come in different sizes and K+1 orientations. As directional derivatives, they span a rotation -invariant subspace, and they are designed and sampled such that the whole transform forms a tight frame. An example decomposition of an image of a white disk on a black background is shown to the right. This particular steerable pyramid contains 4 orientation sub bands, at 2 scales. The smallest sub band is the residual low pass information. The residu al high pass sub band is not shown. The block diagram for the decomposition (both analysis and synthesis) is shown to the right. Initially, the image is separated into low and high pass sub bands, using filters L0 and H0.

Figure 1. Decomposition using Steerable Pyramids
The low pass sub band is then divided into a set of oriented bandpass sub bands and a lower -pass sub band. This lower pass sub band is subsampled by a factor of 2 in the X and Y directions. The recursive (pyramid) construction of a pyram id is achieved by inserting a copy of the shaded portion of the diagram at the location of the solid circle (i.e., the low pass branch). The right side of the diagram is the synthesis part. The synthesized image is reconstructed by up sampling the lower low-pass sub band by the factor of 2 and adding up with the set of band -pass sub bands and the high-pass sub band.

RESEARCH METHODOLOGY
 D represents the number of images in the database and Q represents the query image.
 Steerable pyramid mechanism is used to extract the various features like color, texture and contour from the query image and every image in the database.


Multiple scales and rotation invariance is used by steerable pyramid to extract the features.
 Extracted features are entered into the median vector algorithm.


The median is computed for each image to the entire color, texture and contour feature vector.


The median vector is used to compute the similarity of the images.

EXPERIMENTAL RESULTS AND DISCUSSIONS
The evaluation of the performance of the proposed descriptors is done by using the LULC (Land use Land cover) dataset. It is a manually constructed data set consisting of 21 image classes containing each 100 images of size 256 × 256. It contains the following classes: agricultural, airplane, baseball diamond, beach, buildings, chaparral, dense residential, forest, freeway, golf course, harbor, intersection, medium residential, mobile home park, overpass, parking lot, river, runway, sparse residential, storage tanks, and tennis court. We ha ve also conducted multiple number of experiments on medical images like arms, brain, nose, legs etc.  The first step is to extract the color, texture and contour feature vector using steerable pyramid mechanism. This mechanism can find out the low level feature vectors by going into the detail of each pixel. 4 level mechanism is used for the steerable pyramid.

Figure 4. Medical Dataset with different categories
The figure 4 shows the different categories of images for the medical dataset. It includes the images like brain, hands, legs, knees, brain, oral etc. In the figure 5, we have tried to represent the different levels of detail using steerable pyramid.
In the level 1, the original image is grown up to 2x and it keeps on increasing till level 4. The features vectors of color, texture and contour are extracted for the query image and for the database images. We have shown the contours for different categories of images like airplane, rooftops, medical dataset etc in figure 6.

Figure 6. Contours for different categories of LULC dataset
The next step of the query process in this approach is to compute the distance between the transformed feature vector of the query image, q, and that of each of the images in the database, d. This distance is defined as dist (q, d) = Euclidean distance where ||q|| and ||d|| are the norms of those vectors. With res pect to the query image and each of the database images, we now have the distances between each pair of sub images by the previous step. These distance values dist (qi, di) are then combined into one distance value between these two images in an approach s imilar to the computation of Euclidean distance using median vector algorithm. Given a query image q and a candidate database image d, with corresponding sub images d1, …, d5 and the below figure 7 represents the matching of the query image with different categories of images.     Figure 8 shows the precision-recall graph of the existing work and the proposed work. From the graph it is clear that the value of precision and recall has been improved in the proposed work than in the existing work. The precision reduces to 0.62 only for one single experiment. The recall value has also been improved and is ranging from 0.72 to 1.0

Figure 8. Precision-Recall Curve of Exiting and Work on Satellite Images
In Figure 9, the bar chart comparison for the existing work and proposed work has bee n demonstrated for multiple number of experiments. The time is mentioned in milliseconds. The time is reduced from 5 -30% for different categories of images.    The figure 10 shows the precision-recall curve of the existing work and the proposed work for the medical dataset. There is lot of improvement in the precision and recall in the present work. Improvement in precision means CBIR system is retrieving more number of relevant images.  Figure 12 shows the descriptor count comparison for existing work and proposed work. In the existing work, only color feature vector has been used where as in the proposed work, we have used the color, texture and contour feature vectors.

Figure 9. Bar Chart Showing Execution Time Comparison of Exiting Work and Proposed Work
In the above figure, we have shown the total number of descriptors for different number of experiments.

CONCLUSION AND FUTURE SCOPE
Visual feature such as color, texture and contour using steerable pyramid and median vector mechanism. Features are extracted on both whole image level and database image level to better capture salient object descriptions. To negotiate the gap between low-level visual features and high-level concepts, median vector mechanism is applied and integrated with these content-based retrieval techniques in a vector space model. Our research provides the following contributions. First, the steerable pyramid using different level structure is applied to image retrieval and used to uncover the underlying semantic structure of visual contents. The proposed technique is a unified yet open -ended framework that is able to accommodate virtually any vector feature model. Preliminary experiments confirmed that this approach does improve the retrieval performance by linking low-level features and high-level semantics, and better reflects human perception of visual contents. Secondly, the median vector method, together with steerable pyramid provides a robust and efficient CBIR scheme for both capturing the spatial relationship of salient image regions and describing object -level concepts. Experiments show that combining the color, texture and feature vector achieves the best performan ce in the comparison of various approaches. Finally, since it is obvious that neither single color feature nor textual features are sufficient to capture the overall contents of visual data, we propose a seamless integration of all the feature vectors such as color, texture and contour, taking advantage of using our vector space model and median vectors. The combined feature vector, on which latent semantic indexing will be performed afterwards, is normalized and weighted. Preliminary results reveal that it is a very promising approach to further bridging the semantic gap and achieving better retrieval performance. The results presented in the previous section are quite interesting and are certainly worthy of further study. Our hope is that latent semantic analysis will find that different image features co-occur with similar query images and consequently lead to improved techniques of semantic image retrieval. We are have currently conducted multiple number of experiments on different categories of images like forest, airplane etc of LULC dataset and arms, legs, brain etc of medical dataset. After evaluating the results, we have reached up to the solution that we have been able to improve the CBIR mechanism using the proposed mechanism in this work. We will further test and benchmark this integrated image retrieval framework over various large image databases, along with tuning the relevance feedback to achieve optimal performance with highly reduced dimensionality. Relevance feedback will also be helpful whe n incorporated into our proposed scheme. Making use of relevance feedback to infer user preference should also be incorporated to elevate the retrieval performance.