Application of the Artificial Neural Network to monitor the quality of treated water

Due to importance of the quality of treated water as a drastic parameter in people’s life and engineering problems, numerous experimental and semi-experimental models were recently used by water and environmental engineers in order to estimate the quality of water. Between the used models, Artificial Neural Network (ANN) approach as an advantageous black box model was showed great authority in engineering sciences in general and in water engineering in particular. In this study, an ANN-based method was utilized to model the quality of the potable water parameters. To evaluate the model, the water quality data sets of Zarrineh Rood water treatment plant –before and after treatment– were used. After the statistical analysis on the recorded daily data sets, they were divided into calibration and verification sub-sets. In this paper the measured heat, PH, opacity, total hardness, and the level of calcium before the treatment process were considered as input variables of the model and the quantity of Total Dissolved Solids (TDS) and Electrical Conductivity (EC) after treatment were considered as output neurons of ANN. To have better interpretation about the model efficiency, the outcomes were compared with other classical and practical models and the results proved high merit of ANN in predicting the parameters of treated water.


INTRODUCTION
Water treatment plants which are assumed as the primary sources to provide potable water play an essential role in human life. The need for potable water is increased with the growth of population. Developments of agricultural and industrial activities and remarkable enhancement in wastewater have increased water pollution. As a result, providing appropriate amount of potable water with acceptable quality has become a very sensitive and time consuming task for water engineers. Imposed water to the water treatment plant contains various quality parameters. In accordance to the potable water standards, some of the parameters should be removed thoroughly or should be reduced to an allowable level which is specified in the standards. The treatment process requires high accuracy and the case sensitive inherent of the procedure is considered as a major drawback of a water quality modeling. Therefore, it is inevitable to apply the evaluated models in different conditions according to their characteristics. It is notable that, the experimental methods to determine the quality of water are highly expensive. In fact, the classic experimental models can reasonably estimate the water quality parameters, but these models need various information which some of them are not available. Hence, it seems a need to have a mathematical tool to adjust an exact relationship between input and output of the plant to discern water quality parameters.
To overcome the shortages of experimental models, some intelligent methods (like Fuzzy based models) have been applied to predict and model the qualitative parameters of water (e.g. [1], [2] and [3]).
In recent years, Artificial Neural Network (ANN) as a self-learning and self-adaptive approximate has been tremendously used in modeling and forecasting non-linear processes. The ability of ANN in relating input and output variables in complex systems without any need to a prior knowledge about the physics of the process as well as its sufficiency in representing timescale variability have led to great applications of ANN for simulating tasks. In the content of qualitative modeling of water, ANN models are usually employed to predict or to optimize the values of qualitative parameters. Reference [4] analyzed the sensitivity of three types of back propagation ANN models to several sets of inputs to forecast the variation of the quality of groundwater in the blackfoot disease area in Taiwan. Reference [5] using ANN models optimized the amount of the alum and the values of other parameters which are used in coagulation part of water treatment. Reference [6] utilized ANN based method to compute the dissolved oxygen and biochemical oxygen demand levels in the Gomti River, India. Reference [7] predicted the quality of stormwater via ANN modeling at urbanized catchments located throughout the United States by determining the values of five traditional qualitative parameters. Reference [8] evaluated an ANN, an ensemble ANN (EANN) and a hybrid correlation analysis-EANN models to estimation the water quality characteristics at ungagged sites and showed the superiority of the hybrid model. Reference [9] utilized water quality variables as predictors to describe structures and applications of feed-forward, fully-connected, three-layer perceptron neural network model for computing the water quality index for the Kinta River, Malaysia.
In this study, using daily measured data sets of "Zarrineh Rood" water treatment plant, an ANNbased approach was evaluated in order to provide a quality estimator to simulate and present a relationship between raw and treated water quality. The study also presents a mathematicalbased model that can predict the quality of output water and reduces the need for daily quality testing. As a result, the model can be used as an intelligent tool for monitoring the performance and efficiency of the plant.

Artificial Neural Network and Efficiency Criteria
ANN offers an effective approach for handling large amounts of dynamic, non-linear and noisy data, especially when the underlying physical relationships are not fully understood. This makes ANN well suited to time series modeling of a data-driven nature.
ANN is composed of a number of interconnected simple processing elements called neurons (or nodes) with the attractive attribute of information processing characteristics such as non-linearity, parallelism, noise tolerance, and learning and generalization capability. Among the applied ANNs, the feed forward neural network (FFNN) with back propagation (BP) training algorithm is the most commonly used method in solving various engineering problems [10].
A FFNN consists of layers of neurons, with each layer being fully connected to the preceding layer by interconnection strengths, or weights. Initial estimated weights are progressively corrected during a training process that compares predicted outputs with known outputs (targets). Learning of the FFNN is generally accomplished by BP training algorithm [11]. The objective of the BP training algorithm is to find the optimal weights, which would generate an output vector, as close as possible to the target values of the output vector, with the selected accuracy.

Fig. 1: A three-layered feed-forward neural network with BP training algorithm
As shown in Fig. 1, three-layered FFNNs, which have been usually used for forecasting and simulating tasks, provide a general framework for representing non-linear functional mapping between a set of input and output variables. For a properly trained BP network, a new input leads to an output similar to the correct output. This property enables training of a network on a representatives set of input/target pairs. A clear systematic document about the BP training algorithm and the methods for designing the BP model have been proposed by [12] and [13].
The data before going through the network are usually normalized between 0 and 1. The network architecture that yields the best result in terms of root mean square error (RMSE) and determination coefficient (D) on the calibration and verification steps is determined through trial and error process [14]: Q are respectively observed data, predicted values and mean of N observed values. The RMSE is used to measure the accuracy of forecasted values, which produces a positive value by squaring the error. The RMSE increases from zero for perfect forecasts through large positive values as the discrepancies between forecasts and observations become increasingly large. Obviously high value for D (up to one) and small value for RMSE indicate high efficiency of the model.

Zarrineh Rood Plant and Data
In this study, the daily measured data sets of "Zarrineh Rood" water treatment plant -which is one of the fundamental water treatment plants in North West of Iran -were used to evaluate an FFNN-based model to predict qualitative parameters of potable water. As presented in Fig.  2, the plant is located 15 km north-east of Miandoab city, Iran. The water treatment plant is the main facility which provides the requirement of potable water for Tabriz City and several surrounding towns. The mentioned plant can be fed with nominal capacity of 5.5 cubic meters per second of raw water that is supplied from Shahid Kazemi Dam which is 9 km far from the plant location. The utilized daily measured qualitative parameters of Zarrineh Rood treatment plant were gathered in the laboratory of plant from beginning of 2005 up to end of 2008. Statistical analysis was performed on the data and the results are tabulated in Table 1. The first three years of observed data were used for training the model and the remaining data were employed as verification set. The target in current study was to predict the values of Total Dissolved Solids (TDS) and Electrical Conductivity (EC) parameters after the treatment process. Sensitivity analysis in FFNN models identified temperature, PH, opacity, total hardness and quantity of calcium as the input sets of model. The values of the mentioned parameters were separately measured for the raw and treated water.

Results of FFNN Model
The values of EC for potable water have a direct and specific relation with the values of TDS. Thus, the periodical measurement of EC is considered as a vital step to monitor the quality of treated water. The values of EC are influenced by temperature; therefore, the standard temperature for the measurement has been suggested to be 25o C.
The purpose of current research is to evaluate an intelligent model to determine the quality of potable water via estimating the values of EC and TDS as outputs of model. As the experimental measurement of EC and TDS are basically expensive, they were assumed as the outputs of the FFNN-based model in which the values of temperature, PH, opacity, total hardness and quantity of calcium of raw water were considered as inputs of the model. The measurement of the mentioned parameters as the inputs of the proposed model is economically acceptable. The sensitivity analysis performed on data, reconfirmed the suitability of the inputs and outputs selection.
In this paper, the input layer of FFNN included 5 neurons (temperature, PH, opacity, total hardness and quantity of calcium) and each model was trained using 5, 6 and 7 hidden neurons in a single hidden layer. Some researchers claim that networks with a single hidden layer can approximate any function to a desired accuracy and is enough for most forecasting problems [11]. Among the training algorithms, the Levenberg-Marquardt algorithm was selected, due to its fast convergence ability [15] and [16]. Also, the Tangent Sigmoid was selected as transfer function of both hidden and output layers [17]. The numbers of hidden neurons and training epoch were determined using trial and error process. For application of FFNN models on data sets, ANN Toolbox of MATLAB was used in current study [18]. The training process was terminated in an epoch where the model error in the validation step began to rise. This ensures that the network has not been over fitted to the calibration data and will not fail in generalizing to the unseen verification data. The networks were trained at different maximum number of iterations and finally 350 epochs was identified as the appropriate limit.
Four distinct types of FFNN models were checked to designate the most efficient model. First and second types included single output neuron (TDS and EC, respectively). The third type contained dual output sets, TDS and EC. The last type was considered to compare the classic experimental equations with the proposed FFNN model, thus, EC and TDS were considered as input and output parameters, respectively. In the first three types of model, value of temperature, PH, opacity, total hardness and quantity of calcium were considered as input parameters. To have a precise comparison between the proposed types of modeling, the values of correlation coefficient and the coefficient of determination of calibration and verification phases, were tabulated in Table 2. It is inferred from Table 2, the networks with a single output neuron were more accurate than the network with two output parameters. Furthermore, the maximum number of epochs and the most complex structure were also belonged to the third type of proposed models. The utilized time series structurally contain some noise, thus, the increase of the output neurons magnified such error and led to undesirable outcomes in the verification step. The comparison of computed and measured values of the first three models for verification step, have been shown in Figs. 3 to 6, respectively.

Comparison between FFNN and Experimental Models
To evaluate the outcomes of the proposed FFNN model, the results were compared with the results of conventional experimental models. A few numbers of experimental equations have been proposed to establish a relation between EC and TDS parameters. In the current study, Eqs. 3 and 4 were utilized to perform the comparison [19]:

TDS (mg/lit)=EC(dS/m)×(800) For EC>5 (dS/m)
The results of the equations were compared with the fourth type of FFNN model. The outcomes for verification step of both models have been presented in Figs. 7 and 8, respectively.

CONCLUDING REMARKS
In this study, the FFNN models were used to estimate the quality of treated water using the qualitative data of Zarrineh Rood water treatment plant. Values of temperature, PH, opacity, total hardness and quantity of calcium before the treatment process were considered as inputs of the models to predict the values of EC and TDS (after the treatment process) which can determine the qualitative condition of treated water. The results of the study, demonstrate the ability of the FFNN models to predict TDS and EC in terms of the correlation coefficient (more than 0.90) and coefficient of determination (more than 0.70) in the calibration and verification steps. The results also showed that FFNN models with a single hidden layer and five neurons is the best structure of the model to predict the water quality.
The outcomes of the research also showed that the experimental formulas to calculate TDS are not valid when the temperature are significantly changed while FFNN can desirably cope with the obstacle as a result of its non-linear structure. Finally, the results indicate the high ability of intelligent ANN-based methodology to predict the quality of refined water with no need to expensive instruments.
In order to complete the current study and as a research plan for the future study, it is suggested to use and compare other methods such as Fuzzy theory to cope with uncertainty of the process.