IMPLEMENTATION OF SVM USING SEQUENTIAL MINIMAL OPTIMIZATION FOR POWER TRANSFORMER FAULT ANALYSIS USING DGA

Reliable operations of power transformers are necessary for effective transmission and distribution of power supply. During normal functions of the power transformer, distinct types of faults occurs due to insulation failure, oil aging products, overheating of windings, etc., affect the continuity of power supply thus leading to serious economic losses. To avoid interruptions in the power supply, various software fault diagnosis approaches are developed to detect faults in the power transformer and eliminate the impacts. SVM and SVM-SMO are the software fault diagnostic techniques developed in this paper for the continuous monitoring and analysis of faults in the power transformer. The SVM algorithm is faster, conceptually simple and easy to implement with better scaling properties for few training samples. The performances of SVM for large training samples are complex, subtle and difficult to implement. In order to obtain better fault diagnosis of large training data, SVM is optimized with SMO technique to achieve high interpretation accuracy in fault analysis of power transformer. The proposed methods use Dissolved Gas-in-oil Analysis (DGA) data set obtained from 500 KV main transformers of Pingguo Substation in South China Electric Power Company. DGA is an important tool for diagnosis and detection of incipient faults in the power transformers. The Gas Chromatograph (GC) is one of the traditional methods of DGA, utilized to choose the most appropriate gas signatures dissolved in transformer oil to detect types of faults in the transformer. The simulations are carried out in MATLAB software with an Intel core 3 processor with speed of 3 GHZ and 2 GB RAM PC. The results obtained by optimized SVM and SVM-SMO are compared with the existing SVM classification techniques. The test results indicate that the SVM-SMO approach significantly improve the classification accuracy and computational time for power transformer fault classification.


INTRODUCTION
Power transformers are normally exposed to electrical, mechanical, thermal and environmental stresses that degrade their insulation quality. To avoid the power failure, the development of periodic monitoring and accurate diagnosis system for transformers are necessary. Detecting faults in the transformer at an early stage helps in large savings of operation costs, maintenance costs and prevent premature breakdown or failure. There are several routine maintenance procedures for power transformers such as Dissolved Gas Analysis (DGA), moisture analysis in transformer oil [1,2], oil breakdown voltage test, the tan (delta) test, resistivity test, acidity test, sludge test, interfacial tension test and partial discharge (PD) acoustic emission sensing. Among these methods, DGA is an important tool for the early detection and diagnosis of incipient faults for transformers [3]. It is well known that overheating, arcing, partial discharge, winding circulating currents, and continuous sparking are the main factors in deteriorating transformer condition. These phenomena develop certain dissolved gases in the insulation oil. The gases include hydrocarbons such as: methane (CH4), ethane (C2H6), ethylene (C2H4), acetylene (C2H2) and others such as: hydrogen (H2), carbon dioxide (CO2), etc. The gases are extracted from the oil under high vacuum and analyzed by Gas Chromatograph (GC), to get individual gas concentrations. By interpretation of the gas contents, the developing faults in the power transformers can be diagnosed. Many diagnostic criteria have been developed to establish relationships between the gases and the fault conditions. The gas concentrations, generation rates, specific gas ratios, and the total combustible gas are important parameters for interpreting the result of DGA. To facilitate the procedure of power transformer fault classification, algorithms like Modified Differential Evolution [3], Multiclass SVM [18], Fast algorithm [22], Self adaptive RBF NN [25], Artificial Neural Network [20], K Nearest Neighbor [26], Support Vector Machine [19] and Radial Basis Function [27] have been presented in literature. Presently, the conventional ratio methods, statistical schemes and Artificial Intelligence (AI) methods are the major interpreting approaches for power transformer fault analysis. The conventional ratio methods mainly include Rogers Ratios [4], Duval Triangle [5], and International Electrotechnical Commission Ratios [IEC] [6]. Since conventional ratios' boundaries are sharp, they are unable to provide interpretation for every possible combination of ratio values [7]. The Artificial Neural Network methods have also been used to explore the nonlinear and complex relation between the gases concentration and the type of faults. Multilayer back propagation (MLP) [8], [9], self-organizing map network [10], Adaptive Back-propagation learning algorithm [11] and Extension NN [12,13] are important classification algorithms used for power transformer fault classification. ANN training suffers from trapping in local minima; therefore evolutionary training algorithms have been excelling in this field [7], [14], [15]. Other methods that have been investigated are wavelet decomposition [15], SVM [15], KNN [15] and fuzzy learning vector quantization network [16], [12]. In this paper SVM and SVM-SMO are considered for fault classification in power transformer. SVM has enhanced theory, excellent performance analysis and is more effective in solving problems having small samples. It is non linear and has high dimension because of which it has attracted widespread attention towards fault analysis of transformers. The feature extraction is quadratic in the operation with dataset, while using SVM classifier. The solution for very large quadratic programming (QP) optimization problem is obtained by training the SVM. This large QP problem is broken into smallest possible series of QP problems by SMO [22]. As QP problems are analytically handled by SMO, time consuming numerical optimization is eliminated. SMO manipulates training set in utilizing the memory needed for SMO and determine the result 1000 times faster than conventional algorithms. The paper is organized as follows: Section 2 presents a brief mathematical description about fault analysis and section 3 describes proposed methodology of SVM, SMO and SVM-SMO for fault classification in power transformers, Section 4 provides the experimental results, data set, data preprocessing and feature extraction, Section 5 draws the conclusion.

PROBLEM FORMULATION
The main objective of the proposed problem is to separate the normal state and fault state from the given input samples. The linear classifier separates the data, but it maximizes the distance between the given data point to the nearest data point of each class. The training data set is given by where, lnumber of training data, Xi -Training data, yiclass label as 1 or -1 for xi. A nonlinear function is adopted to map the original input space R n into N-dimensional feature space.
The separating hyper plane is developed in this N-dimensional feature space. Then the classifier is represented as, where,  -weight vector and b-scalar. In order to obtain the optimal classifier,  should be minimized subject to the following constraints The variable i  is the positive slack variables, necessary for misclassification.

PROPOSED METHODOLOGY
The classification of faults is effective by optimizing the kernel function of SVM with SMO. The performances of the classifiers are evaluated using classification accuracy and computational time. SVM-SMO classifier 1 separates the normal state from fault state, and then SVM-SMO classifier 2 is employed to present the difference between thermal heating and discharge type of the fault data, finally SVM-SMO classifier 3 classify the discharge data into low energy discharge and high energy discharge. Thus four types of transformer faults are identified by extracting features, training the classifiers and testing the trained networks. The overall methodology of proposed work is given in Figure 1.

SVM classifier
Support Vector Machine is a machine learning approach that originates from statistical learning theory that is used to solve the binary classification problems. The primary SVM features utilize input data and classify them into two possible classes which form the output, making it a non-probabilistic binary linear classifier. The number of support vectors from the training samples are determined by SVM and converted into feature space using kernel functions each marked as belonging to normal state or fault state [23]. To estimate conditional expectation, kernel parameters are used in kernel density estimation and the cost factor decides weighting function between the positive data and negative data. SVM based fault diagnosis involves three steps:  Extracting features including fault information of power transformer  Training the SVM  Identifying the fault types of power transformer using trained classifier.

Figure 2 Structure of optimal separating hyper plane of SVM
Consider the classification of two classes of patterns that are linearly separable, i.e., a linear classifier can perfectly separate the classes. The operation of SVM algorithm is based on finding the hyper planes that gives the distance A u g 2 0 , 2 0 1 3 between the training samples. The definition of Optimum Separation Hyper plane (OSH) is the linear classifier with the maximum margin for a given finite set of learning patterns. Figure 2 shows the optimal hyper plane (i.e., distances between hyper planes) is the linear classifier with the maximum margin for exact classification of the input patterns. The objective function is given in Eq. (5), where, Cmargin parameter,  -weight vector, Objective function obeys the principle of structural risk minimization in order to obtain the optimal solution [25]. The objective function in Eqn (5) can be re-modified by following Legrangian principle as, Where, The optimality conditions are given in Eq. (8) On substituting Eq. (8) in Eq. (7), the dual problem becomes, The Eq. (12) is a quadratic programming problem subject to the constraints of convex set and the unique solution is obtained.
The optimal separating hyper plane is given in Eq. (13) Where, svsupport vectors, kkernel parameter. The non-linear SVM classifier is On close observation of all the analysis, the training samples and kernel function are important factors that determine classifications using SVM [18]. In order to train SVM classifier, cost factor (C) and the kernel parameter (k) are utilized t o A u g 2 0 , 2 0 1 3 factorize kernel function. The most common kernel functions are Gaussian kernel, linear kernel and Radial Basis function kernel.
 The Gaussian basis function is defined as (15) where, σ -standard deviation.  The Linear kernel function is a type of polynomial kernel function that is defined as The Radial basis kernel function is defined as, (17) where, σ -standard deviation, xattribute value, ylabel value [26]. Only the best kernel function yields minimum error and highest classification accuracy. Support vector classification with Gaussian RBF kernel is sensitive to the kernel width. Small kernel width may cause over-fitting, and large one causes under-fitting, so optimal kernel width is merely selected based on the tradeoff between under-fitting loss and over-fitting loss. To reduce the coexisting over-fitting and under-fitting loss in support vector classification, Gaussian RBF kernel is adopted to obtain good feature space distribution than other kernel functions. Hence Gaussian kernel function is accepted as the kernel function in this work for better fault classification applications [23,24].

Sequential Minimal Optimization
Sequential Minimal Optimization (SMO) solves the SVM QP problem without any extra matrix storage and without invoking an iterative numerical routine for each sub-problem. Unlike traditional methods, SMO chooses to solve the smallest possible optimization problem. For the standard SVM QP problem, the smallest possible optimization problem involves two Lagrange multipliers, which is used for optimization of SVM to obey a linear equality constraint. At every step, SMO has been employed to resolve the QP problems by classifying them as sub problems thus the entire iteration due to numerical QP optimization is avoided. Execution of SMO Algorithm thus solves overall QP problems.
Consider a binary classification problem with a dataset (x1, y1)... (xn, yn), where xi is an input vector and yi ∈ {-1, +1} is a binary label corresponding to it. A soft-margin support vector machine is trained by solving a quadratic programming problem, which is expressed in the dual form as follows: Where C is an SVM hyper parameter and K (xi, xj) is the kernel function, both supplied by the user and the variables,

Algorithm:
SMO is an iterative algorithm for solving the optimization problem described through Lagrange multipliers that confirm to linear equality constraints. A series of smallest possible sub-problems are solved analytically by SMO involving two multipliers. For two multipliers 1  and 2  , the constraints are reduced to: Eqs. (21) and (22) can be solved analytically.
The algorithm proceeds as follows: Recall the geometry of the Lagrange multiplier conditions: The gradient of the objective function must be orthogonal to the tangent plane of the (active) constraints i.e., the projection of the gradient of f onto the space of directions tangent to the constraint "surface" is zero. 3. Repeat steps 1 and 2 until convergence.

Pick a second multiplier
When all the Lagrange multipliers satisfy the KKT condition, the problem has been solved. Although this algorithm is guaranteed to converge, decision making heuristics are used to choose the pair of multipliers so as to accelerate the rate of convergence.

Optimization of SVM using SMO
The fault data sets are the combination of five types of diagnostic gases content obtained by overheating of the transformer oil used for fault analysis. Initially these gases are preprocessed by data processing methods and the features are extracted to identify the fault types through classifiers. This paper describes the effective fault diagnosis performance of SVM, by employing optimization on SVM is known as SVM-SMO classifier. The SVM-SMO technique increases classification accuracy and reduces computational time by using DGA data set. Optimization problem has two Lagrange multipliers which must obey linear equality constraint according to Eq (24) to establish an optimized SVM, At every step, SMO chooses two Lagrange multipliers jointly to optimize and find the optimal values of the multipliers to update the input data by satisfying KKT conditions [22]. SMO does not require additional space, i.e., for storing the previous α1α2 and current α1α2 needs only 2*2 matrices so very large SVM training problems can fit inside the memory. The kernel parameter of SVM can be optimized by SMO by updating the two Lagrange multipliers at every step. The kernel function (k) measures the similarity or the distance between the input vectors _ x and stored training vector j x _ .

Figure 3 Constraints for Lagrange multipliers
The two Lagrange multipliers of SMO must satisfy the linear equality constraints on updating the kernel parameters. Figure 3 presents the linear equality constraints which cause the two Lagrange multipliers to lie on a diagonal line. SMO finds an optimum of the objective function on a diagonal line segment [22].

EXPERIMENTAL RESULTS
In this section, transformer fault analysis are carried out using DGA dataset performed with Intel core processor of speed 3GHZ and 2GB RAM PC. The features are extracted from the dataset; these features are trained, validated and tested using MATLAB for the classification of types of fault states in the transformers. The performances of the classifier are experimented and presented in terms of relative speed and the computational time for the 75 sets of fault data of power transformer.

ISSN 22773061
1693 | P a g e A u g 2 0 , 2 0 1 3 The four types of fault states observed from the power transformer are as follows: normal state, thermal heating, low energy discharge and high energy discharge. Due to overheating the insulating transformer oil will degrade and emit gases, the ratios of this emission gases are closely related to the fault types that contain H2, CH4, C2H2, C2H4 and C2H6 [17,21]. These combustible gases occur due to fault in the transformer and are monitored regularly to determine the degree, pattern and the abnormality conditions using SVM and optimized SVM classification methods [25].

Data set
The proposed method has 75 historical data sets of a 500kV main transformer, obtained from Pingguo Substation of South China Electric Power Company, and is utilized for training and testing the SVM and optimized SVM classifiers [24]. The data samples are then divided, where 50 data sets are used in training and validation and the remaining 25 data sets are considered for testing. The 50 data sets involved in training and validation are divided into 5 sets for normal state, 25 sets for thermal heating, 5 sets for low energy discharge and 15 sets for high energy discharge. The 25 data sets involved in testing are divided into 4 sets for normal state, 13 sets for thermal heating, 6 sets for low energy discharge and 2 sets for high energy discharge.

Feature extraction
The dissolved diagnostic gases obtained from oil transformer are pre-processed by data processing and the features are extracted for fault analysis using SVM and SVM-SMO classifiers [24].

Parameters
The parameters of SVM, and SMO used for classification of faults with DGA data set are discussed in this section. The performance of the classifiers mainly depends on the parameter values.

SVM
The challenging nature of SVM is based on the best choice of kernel function, kernel parameter and margin parameter. The Gaussian RBF kernel function is the universally accepted kernel function. The kernel parameter varies as the classification accuracy increases and the values are varied between [0.3, 1.0]. Margin parameter has to set according to the kernel value increased from 75 to 100, order of the polynomial kernel is set as 3, parameter learning rate is chosen as 0.01 and scaling factor in the radial basis function kernel is taken as 1 which are provided in Table 1.

SMO
The parameters of SMO are Karush-Kuhn-Tucker, Kernel cache limit, number of iterations, order of the polynomial and scaling factor are mentioned in Table 2. The Karush-Kuhn-Tucker (KKT) conditions are necessary and sufficient conditions for an optimal point of a positive definite QP problem. The selection of values between 0 and 1 specifies the ISSN 22773061 1694 | P a g e A u g 2 0 , 2 0 1 3 fraction of variables allowed to violate the KKT conditions for the SMO training method. For Gaussian RBF kernel function order of the polynomial is 3 and scaling factor is set as 1. Kernel cache limit specifies the size of the matrix has been fixed in advance as 5000 otherwise; the SMO will lead to overflow condition. Max iter is an integer set as 15000 that specify number of iterations of the main loop, if this limit is exceeded before the algorithm converges, then the algorithm stops and returns with an error.

Result Analysis
The proposed optimization approach is made effective by optimizing the SVM parameter has been proven in the result analysis where the classified fault states are plotted and the results have been tabulated.   Table 3 presents the results of proposed SVM methods tabulated in terms of training accuracy, standard deviation and training time, for various sets of training and testing percentages. Training accuracy and testing accuracy of standard SVM are 86% and 82% for 90% training and 10% testing of input datasets [25]. The increased training accuracy and testing accuracy of proposed SVM are 94.12%, 96.15% for same percentages of training and testing of input datasets. The tabulation shows improved performance with increased training accuracy and testing accuracy of 87% and 90.13%, even for 30% training of input datasets. A u g 2 0 , 2 0 1 3   Table 5 provides overall results of classification accuracy in terms of trial time and error rate that compares the performance of classifiers SVM and SVM-SMO. The mean computational time for SVM and SVM-SMO are calculated and presented. SVM-SMO takes larger testing time than SVM because for optimizing the parameter the code execution period is vast to get improved classification rate. The results indicate that the mean accuracy of SVM-SMO is high (96.12%) with reduced error rate than SVM (93%).  Table 6 depicts the comparative results of existing SVM [25], proposed SVM and SVM-SMO of DGA dataset, for training (90%) and testing (10%) samples of the data set. SVM-SMO has highest accuracy of 3% which is higher than proposed SVM, also is 6% higher than existing SVM is achieved by optimizing the parameter of SVM. A u g 2 0 , 2 0 1 3 The classification accuracies of SVM and SVM-SMO against various percentages of training samples and testing samples of the DGA data are plotted in Figure 6. The improved performance of SVM-SMO than SVM classifier in terms of classification accuracy is shown in Figure 7.

CONCLUSION
In this paper, the kernel function is optimized by continuously updating the two Lagrange multipliers to solve SVM's QP problem using a SVM-SMO based approach. The gas chromatograph method is utilized to choose the most appropriate gas signature from the faulty power transformer; these data sets are used to present the performance of optimized SVM. The features are applied as input data to the classifiers for faults classification in terms of accuracies and computational time. Test results of the proposed approach explain the effectiveness of the classifiers in terms of high efficiency than the standard approach. The classification accuracy of SVM-SMO (96.13) is better than that of SVM (94.12), achieves excellent performance in identifying the transformer fault type i.e. nature of the Fault. Experiments are carried out using A u g 2 0 , 2 0 1 3 real world data derived from the power transformer DGA data, results of proposed optimization techniques are demonstrated with the effectiveness of classification rate. In future, the optimization steps to SVM are also applied to solve non-linear QP problem with reduced computational time of the classifiers, this problem can also extended for advanced optimization methods like stochastic methods, heuristic methods and response surface methodology based approaches to get high accuracy and improvement in speed.