A SIGNATURE IDENTIFICATION SYSTEM WITH PRINCIPAL COMPONENT ANALYSIS AND STENTIFORD THINNING ALGORITHMS

Several biometric security systems have been implemented. Biometric is the use of a person’s physiological or behavioural characteristics to identify the individual. An example of behavioural method of biometric is signature identification. Signature identification is the use of handwritten signature to identify a person. This paper attempts to design and implement an algorithm for handwritten signature identification. The signature identification system consists of signature acquisition, preprocessing, features extraction and matching stages. Signature acquisition can be either online or offline (both were considered in this research work). Online signatures are obtained by signing on digital tablets while offline signatures are scanned (or snapped) into the system. Preprocessing stage of the system include turning the image to greyscale. The grey image is further converted to binary (black and white). The image is then thinned, using Stentiford thinning algorithm. Stentiford thinning algorithm in an iterative thinning method with a good thinned imaged output. The image is finally cropped to rid the image of unnecessary white spaces. For features extraction, principal component analysis is used. Principal Component Analysis is a good statistical tool for identifying pattern in data. Features extracted from each signature are stored as a template. After features extraction, the distances between signature templates are computed using Manhattan distance. If the distance exceeds a certain threshold, the test signature is rejected (otherwise it is accepted). The designed system has a FAR of 4% and an FRR of 6% for offline signatures. A FAR of 2% and an FRR of 3% were obtained for online signatures


INTRODUCTION
Identification is the process of recognizing someone or something because of previous knowledge. Handwritten signature can be seen as the scripted name or legal mark of an individual executed by the hand for the purpose of authenticating writing in a permanent form (Vamsi, 2008). Handwritten signature can be written on a paper with a pen or on a sensitive tablet. Hand written signature is a form of behavioural traits in Biometrics, which involves muscles of the finger, hand, wrist and in some cases arm. These muscles are controlled by the nerve impulse. As soon as a person gets used to writing his/her signature, it then becomes controlled by the brain with little or no attention.
In identification, a person to be identified submits a claim; which is either accepted or rejected. In literature, however, verification and identification are interchangeably used for biometrics recognition (Jain et.al., 1997;Sandhu et.al., 2009). Signature forgery refers to the act of falsely replicating the signature of another person. The three types of signature forgery (Abu, 2010) include the random forgery where a signature is written by a person without prior knowledge of the shape of the original signature. This type of forgery is easy to detect. It usually does not have the shape of the original signature. The second been the simple forgery, which is written by a person with knowledge of the shape of the original signature but with less practice and the skilled forgery that is probably the most difficult to detect. The signer is skilled at forgery and he does this by tracing, drawing or imitating the original signature. This type of forgery is written with good imitation of the genuine signature.
Signature identification is needed to detect forged signatures from genuine signature. Signature identification is a process in which the signature of an individual is verified whether the signature belongs to the claimed person or not. It is a technique that finds for each sample in one of the signatures, the corresponding sample in the other signature that is closest to the original sample using some predefined metrics (Abu, 2010). Identity verification is a present day challenge across the globe. Every day, billions worth of contracts are concluded by handwritten signatures on documents, and how these can be replaced by electronic signature is a hot policy in technology (Parvinder, 2009).
Signature as a means of identification has been in use for a while. Its use moves through different areas, banking, education, offices, just to mention a few. Now the probability that a forged signature will be accepted as genuine mainly depends on the amount of care taken when examining it. Many bank card transactions in stores are accepted without even a glance at the specimen signature on the card. But even diligent signature checking does not reduce the risk of fraud to zero. An experiment shows that 105 professional document examiners, who each did 144 pairwise comparisons, misattributed 6.5% of documents. Meanwhile, a control group of 34 untrained people of the same educational level got it wrong 38.3% of the time, and the nonprofessionals' performance could not be improved by giving them monetary incentives. Errors made by professionals are a subject of continuing discussion in the industry but are thought to reflect the examiner's preconceptions and context (Parvinder, 2009).
In this paper, an attempt is made to develop a signature identification system to process and identify both online and offline signature using principal component analysis and Stentiford thinning algorithms. Performance analysis of the algorithm is also carried out and reported.

LITERATURE REVIEW
Several systems have been proposed for signature verification, one of which is by Shohel, (2007). In the paper, hand glove was used to extract features and PCA was used to remove noise from the extracted features. In the use of PCA, some valuable information is also removed. This in turn affects the performance of the system. Ravi and Sudhir, (2011) used neural network which took time to process based on the number of signatures to train. An assumption that the values of feature set or structural description extracted from genuine signature are more stable than forged ones. Abu and Sabbir, (2010) used a simplified skeletonization technique for their offline signature identification. The output of the thinned image was poor and this affected the efficiency of the system. Skeletonization is a key preprocessing step for offline signature identification. This is because it reduces the image to a more compact image. A point that has more than a pixel is represented with a pixel, after thinning is performed on the image. Thinning removes points that are not connected to another point from the image. This makes the signature image more stable.
Rosario, (2010) considered the use of signature image divided into sectors. The changes in the size of signatures were not considered, however, it was a key factor that affected the performance of the system. The concept of graph theory (Tomislav and Miroslav, 2011;Martens, 1996) was used for online signature verification, however not all graph types were considered and some graph characteristics are not simple. The paper attempt to provide an enhanced algorithm for online and offline signature identification system, that can accept, process and identify user signature.
According to Rosario, (2010), several works has been done appreciably in the area of handwritten signature detection and analysis. Among them are Sayeed, Andrews, Besar, and Kiong, (2007). They worked on Forgery Detection in Dynamic Signature Verification by Entailing Principal Component Analysis (DiLecce et.al., 2000). They used a hand glove of 5DT Data Glove 14 Ultra model. The data glove interfaces with the computer via a cable to the platform independent USB port. This structure can be further simplified by interfacing with the computer wirelessly by means of Bluetooth technology with up to 20m distance. The limitation of this work is the lack of use of the offline features of the signature and also the fact that some useful information are eliminated when the PCs were taken. The advantage of increased discernment in between the original and forged signatures using 14-electrode glove over 5-electrode glove has been discussed here and proved by experiments with many subjects. Calculation of the sum of mean squares of Euclidean distance has been used to project the advantage of our proposed method. 3.1% and 7.5% of equal error rates for 14 and 5 channels further reiterate the effectiveness of this technique. The paper Rigoli and Kosmala, (2012) presents an extensive investigation of J u n e 2 3 , 2 0 1 5 various HMM-based techniques for signature verification. Different feature extraction methods and HMM topologies are compared in order to obtain an optimized high performance signature verification system. Furthermore, the paper compared online and off-line methods for signature verification. No system was developed and as so the real life condition (such as emotions) was not really considered in the work. Results of the off-line and on-line systems show preference for on-line verification system though not an efficient method.
Kumar and Babu, (2011) proposed a new approach for offline signature verification and implemented. The proposed signature authentication system functions based on global and texture features of a given signature sample (Kalenova, 2003). This method makes use of the global features pulled out from the skeleton of the signature. While legitimate signatures of the same person may show some differences over a period, the differences between a skilled forgery and an actual signature may be imperceptible. When a genuine sample is given for enrollment, the system will automatically train the network with statistics generated from the given samples. The back propagation network used verifies the global features for validity. The result is a gray level co-occurrence matrix re presentation of the signature sample, which is obtained from the picture matrix of spatial or texture features extracted. Based on the values obtained the network will decide the appropriateness of the signature. The field of neural networks has provided the most excellent way of finding solution the problems that are most difficult to solve by traditional computational methods. Back propagation (Hanmandlu, 2005) is one such best algorithm which has hugely contributed to neural network. In back propagation network (BPN), whenever a network is being trained, it is not only given the input but it is also given a value that the network is needed to produce. The well-known BPN learns by example, which means it must be provided with a learning set that consists of few input examples and some known-correct output for every case. The neural network approach provides a major advantage with a Neural Network solution, that there is no need to understand the solution of the problem. A new feature extraction approach for on-line signature verification based on a circular grid is presented in (Argentina, 2010). Here, a circular chart enclosing the signature is divided into N identical sectors, and graphometric features are computed for each sector. The circular grid is placed so that the center of the grid matches the center of mass of the binary image of the signature.
The paper Fotak, Baca and Koruga (2011) presents previous work in the field of signature and identification to show the historical development of the idea and defines a new promising approach in handwritten signature identification based on some basic concepts of graph theory. From the above approach, not all signatures form a special graph and the dynamic property of the signature such as the time and pressure were not used in the identification process.
Prakash et al. (2010) examined the problem of quick retrieval of offline signatures in the context of database of signature images. The geometric center of the signature image is located. In this paper, the problem of quick retrieval of offline signatures in the context of database of signature images is addressed. The proposed methodology retrieves signatures in the database of signature images for a given query signature according to the decreasing order of their spatial similarity with the query. Similarity computed is based on orientations of corresponding edges drawn in between geometric centers (centroids) of the signature image. We retrieve the best hypotheses in a simple yet efficient way to speed up the subsequent robust recognition stage. The runtime of the signature recognition process is reduced, because the scanning of the entire database for a given query is narrowed down to comparing the query with a few top retrieved hypotheses. The experimentation conducted on a large MCYT signature database has shown promising results.
Abu et al. (2010) proposed a new technique of curve matching for comparing two signatures. This method includes curve and peak detection as features extracted from a signature. The necessary preprocessing were performed on the image of the signature. Peak detection is one of the most important time-domain functions performed in signal monitoring. Peak detection is the process of finding the locations and amplitudes of local maxima and minima. Peak detection can be performed, such as threshold peak detection and curve-fitting-based peak detection. Here we used the curve fitting based pick detection. The values of using Gaussian Elimination method was calculated and these value were stored for training pattern in database and later on for test the new pattern, these were used with the new , , and the error rate calculation gives the result of comparison. If we consider the value of can be learned as ∑ . The values of y for two signatures are then compared to find error.
A new warping technique called the extreme points warping (EPW) was proposed by Feng and Wah (2003). They have used new technique of curve matching for comparing two signatures. The curve generated in this procedure has been compared with the curve of the signatures generated in same way stored in the database. For rotational displacement they normalize each signature with respect to rotation. The signature's scanned portions are cut and stored as JPEG format in 100x100 pixels. The signature is then converted into matrix form and represented into viewer as collection of pixels. Then signatures are processed with skeletonization, rotation, translation, peak detection and comparison and curve matching. They took several portions of curve with respect to peak and compare the same portion with others. The outcomes of different phases are stored in database. Curve matching can significantly improve verification rate. Accuracy of our system is calculated by False Acceptance Rate (FAR) and False Rejection Rate (FRR) are 1.38 % and 13.75% respectively.

PROPOSED ALGORITHM
For any biometric method, image acquisition is the first step. Images can be obtained from databases online and from digital tablet or scanning. In this research, signature images were obtained from online source (SVC database) and also manually from a genius tablet. For offline signature, the image is converted to grey and resized using bicubic interpolation. After resizing, the image is converted to binary using Otsu threshold. The image is also thinned using Stentiford thinning algorithm. The image is then cropped. Principal Component Analysis (PCA) is then used to extract features of the signature. The distance between signatures is then computed for matching using Manhattan distance. J u n e 2 3 , 2 0 1 5

3.1Image Acquisition
For both online and offline signature identification, a primary requirement is the signature itself. Offline signatures were obtained by either scanning the signature image or snapping it with a camera. How clear and sharp the scanned image appears depend on the quality of the scanner as well as the document being scanned. Likewise, if the signature is snapped, the resolution of the camera affects the quality of the image obtained. There are several scanners and cameras available in the market. However, in both cases, no matter how good the devices are, noises are introduced.
Online signature is obtained from a digital tablet that records the movement of the pen on the tablet. Features obtained from tablet includes the coordinate position (in terms of x and y), pen pressure, time and so on. Some features depend on the hardware manufacturer and the type acquired. Several digital tablets are available in the market. For this research, a Genius Easypen i450x tablet was used. It has a 1024 pressure level and a 4 x 5.5 inches size. The features extracted from the tablet for this research work include the coordinate location of each pixel (x and y) and the pressure at each pixel.

Grey Scale conversion and Resizing
Preprocessing is the stage that follows after the acquisition of the signature image. This is to get rid of unwanted parts of the image.
For a black-and-white image, a light with can be represented by one number given by ∫ where is the spectral characteristic of the sensor used and is some scaling constant. The value is often referred to as the luminance, intensity, or gray level of a black-and-white image represents power per unit area, it is always nonnegative and finite, that is, where is the maximum possible. In image processing, is typically scaled such that it lies in some convenient arbitrary range, for example, . In these cases corresponds to the darkest possible level and 1 or 255 corresponds to the brightest possible level. Because of this scaling, the specific radiometric or photometric units associated with become unimportant. A black-and-white image has, in a sense, only one color. Thus, it is sometimes called a monochrome image.
A color image can be viewed as three monochrome images. For a color image, a light with is represented by three numbers which are called tristimulusvalues. One three-number set that is frequently used in practice is R, G, and B, representing the intensity of the red, green, and blue components. The tristimulus values R, G, and B are obtained by where , , and are spectral characteristics of the red, green, and blue sensors (filters) respectively. Like the gray level in a monochrome image, R, G, and B are non-negative and finite.
Resizing could either reduce or increase the size of the image. It all depends on the image size. The new image size is 128 by 128 pixels. It often may be necessary to re-sample a bitmap pixel image. Perhaps because it is to be re-sized, rotated, or have its perspective corrected, or get intentionally distorted, or have its image shape rectified. Bicubic interpolation was used to resize the image. Bicubic interpolation solves for the value at a new point by analyzing the 16 data points surrounding the interpolation region (Shuai, 2006).

Gaussian Filter
According to the Gaussian kernel is named after Carl Friedrich Gauss (1777-1855), a German mathematician. The Gaussian kernel is defined in 2D as: √ 8 where x is the distance from the origin in the horizontal axis, y the vertical distance from origin and the standard deviation. The determined the width of the Gaussian kernel. In statistics, when the Gaussian probability density function is considered, it is called the standard deviation, and the square of it, ,is the variance.

Binary Image (Otsu Thresholding)
Otsu Thresholding was used for binarilization. According to Otsu (1979), a way of accomplishing results is to set the threshold so as to try to make each cluster as tight as possible, thus minimizing their overlap.
Let the pixels of a given picture be represented in gray levels . The number of pixels at level is denoted by and the total number of pixels by . In order to simplify the discussion, the gray-level histogram is normalized and regarded as a probability distribution given by: Now suppose that the pixels is dichotomized into two classes and (background and objects, or vice versa) by a threshold at level ; denotes pixels with levels , and denote pixels with levels . Then the probabilities of class occurrence and the class mean levels, respectively, are given by This standpoint is motivated by a conjecture that well thresholded classes would be separated in gray levels, and conversely, a threshold giving the best separation of classes in gray levels would be the best threshold.
The discriminant criteria maximizing λ, K, and η, respectively, for k are, however, equivalent to one another; e.g., λ and in terms of , because the following basic relation always holds:

25
It is noticed that and are functions of threshold level , but is independent of . It is also noted that is based on the second-order statistics (class variances), while is based on the first-order statistics (class means). Therefore, is the simplest measure with respect to . Thus is adopted as the criterion measure to evaluate the "goodness" (or separability) of the threshold at level . The optimal threshold that maximizes η, or equivalently maximizes is selected in the following sequential search by using the simple cumulative quantities 14 and 15, or explicitly using 10-13: and the optimal threshold is 28 From the problem, the range of over which the maximum is sought can be restricted to 29

Thinning (Stentiford Thinning Algorithm)
Thinning is particularly useful when the intensity (thickness) of signature stroke is high. This usually is dependent on the pen tip. A basic method for skeletonization is thinning. It is a technique which extracts the skeleton of an object as a result. If the central pixel is not an endpoint, and has connectivity number = 1, then mark this pixel for deletion.

Endpoint pixel: A pixel is considered an endpoint if it is connected to just one other pixel. That is, if a black
pixel has only one black neighbor out of the eight possible neighbors.

Connectivity number:
It is a measure of how many objects are connected with a particular pixel.
∑ where: S ={1,3,5,7}, is the color of the eight neighbors of the pixel analyzed. is the center pixel. is the color value of the pixel to the right of the central pixel and the rest are numbered in counter clockwise order around the center.
Repeat steps 1 and 2 for all pixel locations matching . a.
Repeat steps 1-3 for the rest of the templates: , , and . will match pixels on the left side of the object, moving from bottom to top and from left to right. will select pixels along the bottom of the image and move from right to left and from bottom to top. locates pixels on the right side of the object, moving from top to bottom and right to left.
c. Set to white the pixels marked for deletion.

Cropping
After the signature image has been thinned, then the signature image is cropped. Cropping is performed in order to remove unwanted pixels around the signature image. The algorithm used in cropping is stated below: a. Scan the pixels from left to right, starting from the top. b.
Record the first pixel that is black and end the loop in (a).
c. Repeat (a) from left to right, starting from the bottom.

Feature Extraction
This involves gathering information that is peculiar to a particular signature. For the extraction of features, Principal Component Analysis (PCA) was used. PCA was first introduced by Pearson (1901), and developed independently by Hotelling (1933).The steps to the calculation of PCA according to (Linsay, 2002) are as follows: a. Normalization of dataset. Here the mean of the dataset is subtracted from the dataset.

̅ ∑
where ̅ is used to indicate the mean of the dataset .
Hence, the normalized dataset will be ̅ 32 Where is the normalized dataset and is the initial dataset (not normalized).
b. The next step is to calculate the covariance matrix. The covariance of two dataset and is obtained with the formula

∑ ̅ ̅
Where ̅ is the mean of , ̅ is the mean of and n is the number of element.
The covariance matrix is obtained by calculating the covariance in matrix below.

( )
c. According to Lay (2012), the eigenvalue and eigenvector of the covariance matrix is computed using: J u n e 2 3 , 2 0 1 5

| |
where is the covariance matrix, is the eigenvalue and is the identity matrix.And finally for each , is solved for eigenvector .After getting the eigenvector for each eigenvalue, the eigenvector with the highest eigenvalue is our Principal Component. In this research work, all the eigenvector were considered. PCA is a good dimension reduction method for a large number of interrelated variables. PCA is a good technique for pattern analysis, dimension reduction, etc. (Jolliffe, 2002;Smith, 2002). The dataset include the following features:

Matching (Verification)
The Manhattan distance computes the sum of difference in each dimension of two vectors in n dimensional vector space. According to Ismail (2010), it is the sum of the absolute differences of their corresponding components. Manhattan distance is also called the distance. If The templates are saved as the user enrolls his/her signature. When a user wants to verify his/her signature, if the user selects his/her name then the distance is computed with the selected name. If no specific user was selected, the system scans through all the available users for the best match that falls below the threshold.

3.10System Prototyping
Matlab which is a simple and useful high-level language for matrix manipulation was used for prototyping of the system. The homepage is made up of a menu bar, type selection popup menu and some buttons. The menu includes the file menu, edit menu and the help menu. The File Menu has the registration submenu for registering users, verification submenu for verifying signature and the exit submenu for exiting the application. The Edit Menu has settings submenu for displaying the basic settings interface (window). The new department and new admission year submenus create new department and admission year where applicable. The delete department, delete admission year and delete name (template) submenus delete department, admission year and signature registration respectively.
The offline interface is for enrolling offline signatures (Arif et.al., 2010). The necessary details are entered into the personal information panel. The load button changes to load more once an image has been loaded. The ok and cancel button closes the interface with and without saving of information respectively.
The preprocessing stages on a typical images scanned into the system is displayed in the Figure 3.

Grey Image and Resizing
The image is converted to grey after it has been loaded into the system. This is to ensure that the colour of the pen used in signing is irrelevant. After this, the image is resized to a 128 by 128 pixel to ensure that the signature images are of the same size prior to other preprocessing steps (Martin 2000;Nib, 1986). This is depicted in Figure 4.

Binary Image
The grey image is further converted to black and white (binary image) as shown in Figure 5. This turns the value of each pixel into 1 or 0. It also helps us identify the path that the pen followed, and hence the pixels that should be extracted.

Thinning
In Figure 6, the signature image was thinned. This is to reduce the image into a more compact representation (i.e. reduce the strokes to a pixel).

Cropping
In Figure 7, the images are cropped to fit the preprocessed signature. This helps us to get rid of the white spaces around the signature that are not in any way connected to the actual signature itself. The offline interface is for enrolling offline signatures. The necessary details are entered into the personal information panel, and the user signs on a pen tablet. The ok and cancel button closes the interface with and without saving of information respectively.

Feature Extraction
For the extraction of features from the signature (both online and offline), PCA was used. PCA is a good statistical tool for pattern analysis in data. The Figure 8 was the data gathered as a user signed on the tablet. Online test signatures were obtained from the SVC2004 database. Signatures were also gotten from a Genius i459X graphics tablet. A total of 50 users were enrolled and 600 test signatures were used for testing. 300 signatures out of the 600 test signatures were skilled forgery and the remaining 300 signatures were genuine. A forged signature was accepted and 6 genuine signatures were rejected. This gives a FRR of 2% and an FRR of 3.3%. The threshold was kept at 0.5.Applying Peak and Curve Comparism to the test signature applied to the proposed method, an FRR of 9.67% and a FAR of 9.33% was obtained (for offline). For online signature, applying Extreme Points Warping Technique gives a FAR of 4.67% and an FRR of 7.33% was recorded. J u n e 2 3 , 2 0 1 5

4.0CONCLUSION
This research work was based on signature identification (both online and offline). The signatures were acquired manually from a digital table (online) and also scanned into the system (offline). Signatures were also obtained from online database of signatures (SVC2004). Offline signatures undergo preprocessing before features are extracted from them. The preprocessing steps involve converting the signature image to grey and resizing it using bicubic interpolation. Furthermore, the signature image is converted into binary using Otsu threshold. The image is then thinned using Stentiford Thinning Algorithm for a compact representation. It is then cropped to rid the image of unwanted (white pixels) spaces surrounding it. After preprocessing, features were extracted from the signature image (coordinate of each pixel, pressure and time).For both online and offline signatures, Principal Component Analysis was used to extract features. The distance between the features extracted from two signatures is used as a basis for authenticity. Manhattan distance (also known as City Block or L1) was used to compute the distance between two signatures. Proper preprocessing helped in getting a good result for offline signatures. A good result was also obtained from the online signatures.
For future work, more dynamic properties such as pen (stroke) angle can be considered to improve the system's performance. Also, use of this method in combination with other biometric methods (multimodal biometric) can be examined in the future.