A REVIEW ON MULTISCALE TEXTURE FEATURES USING STEERABLE PYRAMIDS

As a result of recent advancements in digital storage technology, it is now possible to create large and extensive databases of digital imagery. These collections may contain millions of images and terabytes of data. For users to make the most of these databases effective, efficient methods of searching must be devised. Having a computer do the indexing based on a CBIR scheme attempts to address the shortcomings of human-based indexing. Since a computer can process images at a much higher rate, while never tiring, the manpower issue is solved. In this paper, we will discuss the architecture of CBIR with steerable pyramids and their shortcomings


INTRODUCTION
Content-based image retrieval (CBIR) has become an important research area in computer vision as digital image collections are rapidly being created and made available to multitudes of users through the World Wide Web. There are collections of images from art museums, medical institutes, and environmental agencies, to name a few. In the commercial sector, companies have been formed that are making large collections of photographic images of real-world scenes available to users who want them for illustrations in books, articles, advertisements, and other media meant for the public at large. The largest of these companies have collections of over a million digital images that are constantly growing bigger. Incredibly, the indexing of these images is all being done manually-a human indexer selects and inputs a set of keywords for each image. Each keyword can be augmented by terms from a thesaurus that supplies synonyms and other terms that previous users have tried in searches that led to related images. Keywords can also be obtained from captions, but these are less reliable. Content-based image retrieval research has produced a number of search engines. The commercial image providers, for the most part, are not using these techniques. The main reason is that most CBIR systems require an example image and then retrieve similar images from their databases. Real users do not have example images; they start with an idea, not an image. Some CBIR systems allows users to draw the sketch of the images wanted. Such systems require the users to have their objectives in mind first and therefore can only be applied in some specific domains, like trademark matching, and painting purchasing. Thus the recognition of generic classes of objects and concepts is needed to provide automated indexing of images for CBIR. However, the task is not easy. Computer programs can extract features from an image, but there is no simple one-to-one mapping between features and objects. While eliminating this gap complectly may require a very long time, we can build and utilize image features smartly to shorten the distance. Most earlier CBIR systems rely on global image features, such as color histogram and texture statistics. Global features cannot capture object properties, so local features are favored for object class recognition. For the same reason, higher-level image features are preferred to lower-level ones. Similar image elements, like pixels, patches, and lines can be grouped together to form higher-level units, which are more likely to correspond to objects or object parts. Different types of features can be combined to improve the feature discriminability. For example, using color and texture to identify trees is more reliable than using color or texture alone. The context information is also helpful for detecting objects. A boat candidate region more likely corresponds to a boat if it is inside a blue region. While improving the ability of our system by designing higher-level image features and combining individual ones, we should be prepared to apply more and more features since a limited number of features cannot satisfying the requirement of recognizing many different objects in ordinary photographic images.

COLOUR FEATURES
Colour is one of the major features used in CBIR systems. This popularity is attributed to the ease in implementation and the distinguishing di?erences between colours. It is a robust feature to changes such as the scene layout or viewing angle. Colour can be represented with different models such as HSI, YIQ, CMY and RGB. The RGB model is the most widely known one and can be visualized as a cube. One corner of the cube is the origin L(0, 0, 0) and each of the three primary colours Red, Green and Blue are assigned an edge to represent the axis from the origin. Any other individual colour obtained after combining the red, green and blue components in certain proportions then lie in this coordinate space. The origin represents black as it is the point of lowest red, green and blue values. Understandably, the opposite corner with the highest red, green and blue values represents white. The 3D coordinate space is similar to the way our three sets of retinal cones work in our human visual system. The RGB model is nonetheless limited in representing the full human perception which includes details such as the brightness and purity of a colour. Those are however implicit in the coordinate space and the non linear transformation from RGB to HSI is used to capture those additional properties.
Comparing the color content of images is an obvious, and consequently popular, choice for performing image retrieval duties. Color acts as a robust descriptor that can often simplify the tasks of object identification and extraction from a given image [4]. For example, in Figure 2 it is much easier to locate image pixels of the flower from the rest of the image when using the color image as opposed to the grayscale version. Due to the very nature of color representation, the color data itself provides multiple measurements at any given pixel location in an image. Because of the inherent properties of color, the last two decades have produced a number of interesting methods by which color image retrieval can be performed. A selection of these methods will be discussed following a review of the fundamentals of color and its methods of representation.

THE CIE LAB COLOR SPACE
The CIE L*a*b* color space was developed to be perceptually uniform and posses a Euclidean metric [5]. This means that Euclidean distance between two points (colors) would correlate strongly with human visual perception. CIE L*a*b* is based directly off of the CIE XYZ color model, where the X, Y, and Z components represent tristimulus capable of expressing any color that can be perceived by the average human observer [6]. These primary colors are nonreal, meaning that they are unable to be realized by actual color stimuli. It is not possible to directly transform RGB coordinates to CIE L*a*b* space due to the fact that RGB is not an absolute color space; it cannot produce all humanly discernable colors.

TEXTURE-BASED CBIR
Another popular approach to CBIR involves the use of texture in order to index database images. Texture in the realm of image processing gives information about the local spatial arrangement of colors or intensities in a given image [3]. Images that have similar texture properties should therefore have the same spatial arrangements of colors or intensities, but not necessarily the same colors. Because of this, the use texture-based image indexing and retrieval techniques is quite different than those used strictly for color.
In the field of computer vision and image processing, there is no clear-cut definition of texture. This is because available texture definitions are based on texture analysis methods and the features extracted from the image. However, texture can be thought of as repeated patterns of pixels over a spatial domain, of which the addition of noise to the patterns and their repetition frequencies results in textures that can appear to be random and unstructured. Texture properties are the visual patterns in an image that have properties of homogeneity that do not result from the presence of only a single color or intensity. The different texture properties as perceived by the human eye are, for example, regularity, directionality, smoothness, and coarseness, see

FEATURE EXTRACTION FROM IMAGES
The extraction of the texture and color content of the images take place both during the database population phase and querying phase. Depending on the user's intention, the texture feature extraction can be performed in three diff erent ways: Fully Automatic Texture Feature Extraction: The system is capable of determining a rectangular region on the image representing the texture characteristics of the image. Since this region is relatively smaller than the whole image and it is a good representation, dealing with the automatically segmented region provides two things: the feature extraction time decreases, and the query processing phase is accelerated.

Semi-Automatic Texture Feature Extraction:
In most of the applications, the users are not interested in the texture of the whole image but a specific region-of-interest. Since the user is provided drawing facilities on the loaded image, the region-of-interest is determined simply by dragging and dropping the mouse on the image. Similar to the fully automatic case, processing the region-of-interests fastens the system.

Texture Feature Extraction of Whole Image:
The texture feature extraction for the whole image is the default case, and is meaningful when the whole image is of interest.

WHAT IS A STEERABLE PYRAMID?
The Steerable Pyramid is a linear multi-scale, multi-orientation image decomposition that provides a useful front-end for image-processing and computer vision applications. We developed this representation in 1990, in order to overcome the limitations of orthogonal separable wavelet decompositions that were then becoming popular for image processing (specifically, those representations are heavily aliased, and do not represent oblique orientations well). Once the orthogonality constraint is dropped, it makes sense to completely reconsider the filter design problem (as opposed to just re-using orthogonal wavelet filters in a redundant representation, as is done in cycle-spinning or undecimated wavelet transforms!).
The basis functions of the steerable pyramid are Kth-order directional derivative operators (for any choice of K), that come in different sizes and K+1 orientations. As directional derivatives, they span a rotation-invariant subspace, and they are designed and sampled such that the whole transform forms a tight frame. An example decomposition of an image of a white disk on a black background is shown to the right. This particular steerable pyramid contains 4 orientation subbands, at 2 scales. The smallest subband is the residual lowpass information. The residual highpass subband is not shown. The block diagram for the decomposition (both analysis and synthesis) is shown to the right. Initially, the image is separated into low and highpass subbands, using filters L0 and H0. The lowpass subband is then divided into a set of oriented bandpass subbands and a low(er)-pass subband. This low(er)pass subband is subsampled by a factor of 2 in the X and Y directions. The recursive (pyramid) construction of a pyramid is achieved by inserting a copy of the shaded portion of the diagram at the location of the solid circle (i.e., the lowpass branch).

RESEARCH MOTIVATION
The most popular features used in CBIR systems generally fall within three major categories: color, texture, and object/shape. Many people agree that content-based image retrieval (CBIR) remains well behind content-based text retrieval. The semantic gap between the user's needs and the capability of CBIR algorithms remains significant. Significant effort has been put into using low-level image properties such as color. Simpler methods use them as part of global statistics (histograms) over the whole image (some methods even forego such characterizations and rely on global transforms such as wavelets.). There is an existence of a semantic gap between the pixel values and the interpretation of the image. Part of the problem is that representing an image by simple color feature will usually results in loss of information so that different pictures may map onto the same set of features. If an image has area A and each pixel has L possible levels, then there are LA such images possible. By taking only color feature vector, there will be significant reduction in the precision of the retrieved results as steerable pyramids will keep on loosing the information at the higher level.

CONCLUSION AND FUTURE SCOPE
This research paper reviewed the main components of a content based image retrieval system, including image feature representation, indexing, query processing, and query-image matching and user's interaction, while highlighting the current state of the art and the key -challenges. It has been acknowledged that it remains much room for potential improvement in the development of content based image retrieval system due to semantic gap between image similarity outcome and user's perception. Since humans classify images according to their objects and concepts, the system must have the ability to recognize object and concept classes in order to automate the process of image annotation. Multiple feature vectors including the texture, shape, contours and color must be used to reduce the semantic gap in steerable pyramids.