AN EFFICIENT FUZZY NEURAL NETWORK TRAINING MODEL FOR SUPERVISED PATTERN CLASSIFICATION SYSTEM

Among the existing NN architectures, Multilayer Feedforward Neural Network (MFNN) with single hidden layer architecture has been scrutinized thoroughly as best for solving nonlinear classification problem. The training time is consumed more for very huge training datasets in the MFNN training phase. In order to reduce the training time, a simple and fast training algorithm called Exponential Adaptive Skipping Training (EAST) Algorithm was presented that improves the training speed by significantly reducing the total number of training input samples consumed by MFNN for training at every single epoch. Although the training performance of EAST achieves faster, it still lacks in the accuracy rate due to high skipping factor. In order to improve the accuracy rate of the training algorithm, Hybrid system has been suggested in which the neural network is trained with the fuzzified data. In this paper, a z-Score Fuzzy Exponential Adaptive Skipping Training (z-FEAST) algorithm is proposed which is based on the fuzzification of EAST. The evaluation of the proposed z-FEAST algorithm is demonstrated effectively using the benchmark datasets - Iris, Waveform, Heart Disease and Breast Cancer for different learning rate. Simulation study proved that z-FEAST training algorithm improves the accuracy rate.


INTRODUCTION
Due to the implicit characteristics of approximating any nonlinear classification problem, Multilayer Feedforward Neural Network (MFNN) with a single hidden layer architecture has been scrutinized thoroughly as best for solving this problem (Mehra and Wah 1992; Hornik et al 1989). For training the above network, the Back Propagation learning algorithm has been practiced (Rumelhart and McClelland 1986; Saman and Bryan 2011). In order to enhance the training performance, the training speed is the factor that is considered to be very important. The training speed is highly depends on the dimensionality of training dataset. In general, training MFNN with a larger training datasets will generalize the network well. But, lengthy training time is needed for larger training dataset [3] which influence the training speed. In order to improve the training speed, EAST algorithm was exercised [6]. It exhibits the training input samples randomly for training which diminishes the total training input samples exponentially which in reduce the overall total training time, thereby speeding up the training process. But the accuracy rate is greatly affected. Since the Fuzzy Logic (FL) enhances the NN generalization capability and also Neuro fuzzy hybrid system are universal approximators (Kosko 1994) , a new Neuro fuzzy hybrid system with z-Score function has been put forward for improving the accuracy rate of EAST.

RELATED WORKS
Typically, by incorporating the advantage of both neural network and the fuzzy system, Neuro_fuzzy hybrid system is more impressive than either the neural network or the fuzzy system. Anandakumar et al [1] proposed a classification model using Modified Levenberg-Marquardt learning algorithm that improves the accuracy and also consumes less time for convergence. Initially, the statistical ANOVA ranking technique is applied to find the higher ranked dataset. In order to analyze the public transportation system service quality, an ANN model is adapted [3]. Kulkarni and Shinde [5] proposed Neuro-fuzzy classification model for supervised data classification. By using Fuzzification method, membership value is calculated for each attribute values of the given class in the membership matrix. For ANN training, this matrix is fed as an input to the model and obtains the corresponding membership value for each pattern to the target classes. At the end of each iteration, the target class for each pattern is predicted using Defuzzification method. MFNN has been trained by Levenberg-Marquardt (LM) algorithm [9], CAST [8], EAST [6] and LAST [7] to develop a fast ANN model for nonlinear pattern classification. Patricia Melin et al [10] applied competitive neural network trained with learning vector quantization algorithm for electrocardiogram signals classification. Taskin kavzoglu et al. [13] described the way of representing the training datasets for improving the performance of classification methods. The data representation relates the training dataset size and quality. In order to identify the outlier in the training dataset, the quality analysis is used. After some refinements, representation data is formed by conducting the training data selection which is an iterative process. Quang Hung do et al [14] implemented Neuro-Fuzzy approach for solving multiclass classification problem to predict the students" academic performance.

PROPOSED Z-FEAST METHODOLOGY 3.1 Overview of z-FEAST Architecture
The overall architecture of z-Feast classifier is represented diagrammatically in Figure 1. This Neuro-Fuzzy classifier for pattern classification consists of various blocks as specified in the figure. It consists of three steps: Fuzzification, ANN training with the backpropagation algorithm and Defuzzification [5].
Assume that the network contains n input nodes in the input layer, p hidden nodes in the hidden layer and m output nodes in the output layer. Since the above network is highly interconnected, the nodes in each layer are connected with all the nodes in the next layer. Let P represent the number of input patterns in the training dataset. The input matrix, X, of size p × n is presented to the network. The number of nodes in the input layer is equivalent to the number of columns in the input matrix, X. Each row in X is considered to be a real-valued vector xiє n+1 where 1 ≤ i ≤ n. In the Fuzzification process, the given training dataset is fed as input and the Z-score function is used as membership function. The membership matrix is obtained as output of this fuzzification process. The size of the matrix is S×D×C, where S is the number of input samples in the training dataset, D is the number of features / attributes and C is the number of target class. Then, this membership matrix, that is fuzzified data, is fed as input to the MFNN. The summed real-valued vector generated from the hidden layer is represented ziє p+1 where 1 ≤ i ≤ p. The estimated output realvalued vector generated from the network is denoted as yiє m where 1 ≤ i ≤ m and the corresponding target vector is represented as tiє m where 1 ≤ i ≤ m. Let it signifies the it th iteration number. Then, the network generated output, yi, is given as input to the defuzzification process. The MAX defuzzification method is applied for this process by assigning the pattern to the highest membership class. The defuzzified vector is compared with the target vector for calculating the error rate.
Let fN(x) be the activation function used in the hidden layer and fL(x) be the activation function used in the output layer. Let vij be the n × p weight matrix contains input-to-hidden weight coefficient for the link from the input node i to the hidden node j and voj be the bias weight to the hidden node j. Let wjk be the p × m weight matrix contains hidden-to-output weight coefficient for the link from the hidden node j to the output node k and wok be the bias weight to the output node k.

Proposed z-FEAST Algorithm
The working principle of the z-FEAST algorithm that is incorporated in the BPN algorithm is summarized below: Step 1. Weight Initialization: Initialize weights to small random values; Step 2. Furnish the input sample: Disseminate to the input layer an input sample vector xk having desired output vector yk; Step 3. Fuzzification Process: Convert Crisp to fuzzy value for the input vector xk. Input vectors are fuzzified using Z-score method. Z-score is modeled mathematically as The Membership Function"s centre, C, is given by, xjd is the d th feature of sample j.
The Membership Function"s width,  is given by, Where  x1d, x2d,…….,xsd are the d th feature of the s th pattern and  Cd denote the mean value of d th feature given in Equation (2) The membership matrix, fx, that is generated using the equation (1). In this matrix, gs,c(d) represent the membership value of d th feature of s th pattern to the c th class. fx is given as Step 4. Forward Phase: Starting from the first hidden layer and propagating towards the output layer: a. Calculate the activation values for the Hidden layer as: i. Estimate the net output value ii. Estimate the actual output Calculate the activation values for the Output layer as: i. Estimate the net output value ii. Estimate the actual output Step 5. Output errors: Calculate the error terms at the output layer as: Differentiate the activation function in Equation 6, Substitute the resultant value of Equation (8) in (7) = . 1 − ( ) . − (9) Step 6. Backward Phase: Propagate error backward to the input layer through the hidden layer using the error term I S S N 2 3 2 1 -807X V o l u m e 1 2 N u m b e r 1 0 J o u r n a l o f A d v a n c e s i n C h e m i s t r y Differentiate the activation function in Equation 4, Substitute the resultant value of Equation (11) in (10) = .
Step 7. Weight Amendment: Update weights using the Delta-Learning Rule a. Weight amendment for Output Unit b. Weight amendment for Hidden Unit Step 8. The Fuzzy to Crisp conversion for the output variable is done using centroid method * = .
To assign the class label, Where Os,c is the output of s th pattern to the c th class Step 10. Repeat steps 1-7 until the halting criterion is satisfied, which may be chosen as the Root Mean Square Error (RMSE), elapsed epochs and desired accuracy

Working Flow of z-FEAST Algorithm
The block diagram of the proposed strategy is diagrammatically represented in the following figure

EXPERIMENTAL SETUP AND RESULT 4.1 Experimental Layout
A 3-layer feedforward neural network is adopted for the simulation of all the training algorithms with the selected training architecture and training parameters mentioned in the Table 1. The simulation of all the training algorithms is repeated for two different learning rates such as 1e-4 (0.0001) and 1e-3(0.001). For training Heart dataset, 13, 5 and 1 neurons in the input, hidden and output layers respectively is used. And, for training Breast Cancer dataset, the NN architecture with 31, 15 and 1 neurons in the input, hidden and output layers respectively, is used. The NN architecture with 4, 5 and 1 neurons in the input, hidden and output layers respectively, is used for training Iris database. For training waveform dataset, 21, 10 and 1 neurons in the input, hidden and output layers respectively, is used.
According to the idea of Nguyen-Widrow algorithm (Nguyen and Widrow 1990), the NN weight coefficients are initialized with the random values within the specified range -0.5 to +0.5 for faster learning.

Evaluation Method
The dataset is randomly split into five equal sized disjoint folds. Among these five folds, a single fold is retained as the validation data for testing the network, and the remaining four folds are used as training data for training the network. The validation process is then repeated for five times, with each of the five folds used exactly once as the validation data. The results taken from the five training folds can then be averaged to produce a final result. The advantage of using this validation method is that all observations are used for both training and validation, and each observation is used for validation exactly once and also to avoid over-fitting (Peterson et al 1995).
The performance measures that are considered to evaluate the training algorithm are training time and classification accuracy. A good training algorithm will cut down the training time, while accomplishing better accuracy which is proved in our proposed work. The classification accuracy is calculated using the following formula

Dataset Description
The performance of all proposed AST algorithm is assessed for the classification problem on the benchmark two-class classification and multi-class classification datasets. The real-world benchmark datasets utilized for two-class classification problem are Heart and Breast Cancer Dataset, and multiclass classification problem are Iris and Waveform Dataset. The fore-mentioned datasets were fetched from the UCIMLR (University of California at Irvine Machine Learning Repository) (Asuncion and Newman 2007).
The specification of the benchmark datasets utilized for training in the research is summarized in Table 2.

Multiclass Problems Iris Dataset
In the IRIS dataset, the number of iris flower samples is 150 which is gathered from three different flower varieties equally. The varieties are listed as Iris Setosa, Iris Versicolour and Iris Virginica which is identified using width and length of Iris sepal, and width and length of Iris petal. Among these varieties, Iris Setosa is easier to be separated from the other two varieties, while the other two varieties, Iris Virgincia and Iris Versicolour, are partially obscured and harder to be distinguished.

Waveform Dataset
In the Waveform database generator dataset, the total number of wave"s samples is 5000 with 21 attributes which are equally divided into three wave classes (Asuncion and Newman 2007). These samples are collected from the generation of 2 of 3 "base" waves.

Heart Dataset
In the Statlog Heart disease database, the samples with 13 attributes are collected from 270 patients. Among these samples, the number of samples with heart disease "absent" is 150 and with heart disease "present" is 120.

Breast Cancer Dataset
In the Wisconsin Breast Cancer Diagnosis Dataset, the samples are collected from the patient"s characteristics of 569 with 32 features. Among these samples, 357 samples are diagnosed as benign and 212 samples are diagnosed as malignant class. Table 3 to 10 shows the experimental results of EAST, FEAST and z-FEAST algorithms observed at each step across five repeats of fivefold cross validation using two different learning rates such as 1e-4 and 1e-3. From these table 3 to 10, the EAST algorithm yields improved computational training speed in terms of the total number of trained input samples as well as total training time over FEAST and z-FEAST. But, when the skipping factor goes higher, the accuracy of EAST system is affected highly. But, z-FEAST improves the accuracy rate of the system.

RESULT ANALYSIS 4.5.1 Training Samples Comparison
The comparison results of the total number of input samples consumed for training by EAST, FEAST and z-FEAST with the learning rate of 1e-4 and 1e-3 are shown in Fig.3-6.
Herewith, it is assured from the Figure 3 that the total number of training samples consumed by EAST algorithm for training under the learning rate of 1e-4 is reduced by an average of nearly 17% and 6% of FEAST and z-FEAST algorithm for Iris Dataset, 69% and 8% for Waveform Dataset, 78% and 9% for Heart Dataset and 87% and 8% for Breast Cancer Dataset respectively.   Figure 4 that the total number of training samples consumed by EAST algorithm for training under the learning rate of 1e-3 is reduced by an average of nearly 15% and 3% of FEAST and z-FEAST algorithm for Iris Dataset, 79% and 11% for Waveform Dataset, 23% and 5% for Heart Dataset and 73% and 9% for Breast Cancer Dataset respectively.

Training Time Comparison
Herewith, it is concluded from the Figure 5, for training IRIS dataset, the total training time consumed by EAST algorithm with the learning rate of 1e-4 is reduced to an average of 37% of FEAST algorithm and 11% of z-FEAST algorithm, for Waveform Dataset by 35% of FEAST algorithm and 10% of z-FEAST algorithm, for Heart Dataset by 41% of FEAST algorithm and 9% of z-FEAST algorithm and for Breast Cancer Dataset by 16% of FEAST algorithm and 6% of z-FEAST algorithm respectively.   Figure 6, for training IRIS dataset, the total training time consumed by EAST algorithm with the learning rate of 1e-3 is reduced to an average of 29% of FEAST algorithm and 8% of z-FEAST algorithm, for Waveform Dataset by 45% of FEAST algorithm and 10% of z-FEAST algorithm, for Heart Dataset by 25% of FEAST algorithm and 10% of z-FEAST algorithm and for Breast Cancer Dataset by 67% of FEAST algorithm and 13% of z-FEAST algorithm respectively.

CONCLUSION
Thus, the z-Score Fuzzy Exponential Adaptive Skipping Training (Z-Feast) Algorithm is systematically investigated in order to improve the accuracy rate of EAST algorithm. And also, It is further concluded that the proposed z-FEAST algorithm is much faster than the standard BPN, LAST, CAST, HOT and EAST algorithm and also the accuracy rate is highly improved compared to EAST algorithm. The proposed z-FEAST Algorithm can be incorporated in any supervised algorithm used for training real-world supervised pattern classification.