An Intelligent AntNet-Based Algorithm for Load Balancing in Grid Computing

Computational grids have a huge number of diverse and scattered resources that are used in handling complex problems. A decent load balancing methodology is needed to utilize grid resource by efficiently distributing tasks, for execution, on available computing nodes. Ant colony is a major and popular method for approximate optimization. It works by simulating the actual ant‟s demeanor in detecting the best path for the resources of food. This research paper employs ant colony optimization in proposing a load balancing technique for computational grids. The performance of the suggested technique is computed, evaluated and compared with that of a Random Distributed Load Balancing technique using simulation. The achieved results reveal that the suggested technique enhances the task average response time. It reveals also that the enhancement ratio progressively rises up as the system‟s load rises up till the load come to be mild where the best enhancement ratio is achieved. Immediately after that, the enhancement ratio declines steadily as the system‟s load rises up till the system becomes saturated.


INTRODUCTION
Computational grid is an integrated environment of software and hardware that supports users by consistent, dependable, pervasive and cheap access to a huge set of computing resources. Such resources may include but not limited to computers, storage space, software applications and data [1]. These resources can be shared and coordinated by grid users without taking into account their type and location in the virtual organizations (VOs) to solve intensive computing tasks. VOs are composed of individuals, foundations and resources. In grid computing, a joint interface is utilized for linking LANs and clusters together. Any user or VOs can share the computing clusters. For reliability and authentications issues, each cluster applies a local security policy which identifies the access rights for every user. This policy is applied through a local resource management system. Fig.1. demonstrates the clustering process of grid resources.
Grid computing is mainly motivated by providing its users and applications with a widespread and smooth access to a huge set of advanced computing resources. To do this task, an illusion of a single system image is created. Consequently, such computational environments are implemented in a way so that their clients should not have to worry about where their tasks are executed [2][3][4][5][6]15]. Various services are offered by grid computing systems to their users like application, computation, information, knowledge and data services. Such services are conducted by the available servers and computing resources in grid system. The computing resources and servers are diverse by their nature as they have dissimilar storage space, memory sizes, CPU speeds and I/O bandwidths [2,3,7].
The diversity of the grid resources connected with the unequal task arrival forms could result in a situation that some computing resources in a cluster come to be over-loaded while others in another cluster are under-loaded or even idle. Consequently, there is a desire to shift some jobs from overloaded computing nodes to be processed on the under loaded ones targeting to improve utilization of existing resources. Redistributing system"s workload is recognized as load balancing [8][9][10][11][12][13].
Ant colony optimization (ACO) [16] is a major and recent technique among approximate optimization methodologies. The rousing origin of ACO techniques is the actual colonies of Ants. To be more precisely, ACO is originated mainly by the foraging demeanor of ants. The ant puts a definite quantity of pheromone during its walk. Ants tend to select a path positively correlated to the density of pheromone of found trials. Over time, the effect of pheromone trail vanishes. When several ants select a definite path and put their pheromones, this leads to increasing trail density. Therefore, such trail entices other ants; this manner leads to a highway of ants using shortest path. Ants are also able to dynamically adjust their behavior based on changes in the environment. For example, they are able to discover a new path when the old path is not valid anymore as a result of appearing a new barrier. In the essence of their demeanor, ants are able to communicate indirectly by using chemical pheromone trails which gives them the ability to discover the shortest paths between the ant's nest and a food source. Such fabulous features of actual ant"s societies are utilized in ACO techniques for solving various scientific problems. Recently, ACO techniques are used for balancing workload of tasks in computational grids [18][19][20].
This research utilizes ACO technology in developing a load balancing algorithm for computational grids. It considers the diversity of existing computing resource in grid. The proposed algorithm selects a resource to execute a task according to the assessed task transfer time and anticipated task processing time when it is allocated to such resource. It balances the grid workload using a local, and global pheromone update procedures. The local pheromone update process updates the status of the designated resource directly after assigning a task to it. On the other hand, the status of every resource is updated for all tasks directly after finishing any task by using global pheromone update process. This procedure gives grid scheduler the latest information about all resources to be utilized in the next task allocation round which lead to effectively utilizing the existing grid resources.
This policy leads to maximizing system utilization and improving load balancing level. Hence the mean job response time is minimized. A simulation model is built for assessing the performance of the suggested algorithm. The results reveal that the suggested technique enhances the average job response time compared to random distribution load balancing algorithm (RDLBA) in all studied cases. The enhancement ratio rises up steadily as the system traffic intensity rises up till the system load come to be moderate at this point the highest enhancement ratio is attained. After that, the enhancement ratio declines steadily as the grid load rises up reaching to the system saturation point.
The remainder of the paper is structured as follows: Related work is introduced in section II. Studied computational grid system is presented in section III. Section IV gives the suggested ant colony load balancing algorithm. Section V discusses the simulated model and explains results. At the end, section VI concludes this research paper and gives some of our future research directions.

RELATED WORK
Lately, the problems in all sciences become very difficult and complicated. They require enormous processing power and large storing space. The old systems like parallel or cluster computing ones are improper for solving such complicated problems. At the same time, the increasing popularity of the Internet connected with the availability of low-cost advanced computers and very high-speed networks altered the method we utilize computers systems today. These technological chances enable user from using scattered and multi-owner resources in solving various large-scale and complicated scientific problems. Latest research on these areas resulted in developing a new computing technology called grid computing [1].
Effectively utilizing the huge and diverse grid resources is a big challenge to grid designers and software implementers. To achieve this goal, the service level of the grid infrastructure should utilize efficient and effective load balancing and resource management algorithms [1][2][3][4][5][6][7][8]. These algorithms can be categorized into static and dynamic ones. For more information about such categorization and the features of each category, the reader is directed to [10][11][12].
A large number of load balancing algorithms for traditional distributed and parallel systems have been developed [8][9][10][11][12][13]. Unfortunately, the load balancing algorithms designed for traditional parallel and distributed systems which usually run on heterogeneous and dedicated resources cannot work directly in grid environments. Therefore, it is essential to consider the impact of various dynamic characteristics of grid in designing and analyzing load balancing algorithms [1-3].
Lately, a number of scholars have utilized ACO technology for studying load balancing problem in computational grids [18][19][20][21]. In [18], the authors explained the basic ideas of ACO and their applications in general. They gave some illustrative examples. In [19], the authors presented an ACO policy for computational grids. The scheduler in their policy assigns the task to the best match processing node selected form the existing processing nodes group. The authors of this study have performed a variety of exhaustive experiments using different simulation settings. Their results revealed that the suggested technique can certainly be applied practically and its performance is much better than that of other three earlier techniques. In [20], the authors introduced an ACO policy for balancing load in computational grids. Their algorithm utilizes the capacity of the existing resources in selecting the best processing node to execute a task and it balances the workload for all of the existing processing resources. The major goal for this algorithm is to enhance the system throughput and consequently the total grid performance will be improved. The authors in [21], developed a heuristic approach to obtain optimal solution for resource allocation problem in grid computing. They conducted many experiments using various data sets and settings. The attained results reveal that the performance of their technique is better than some of existing ant techniques. Also in [22], the authors introduced an ACO algorithm for load balancing in grid computing. Their main contributions are balancing the entire system load while trying to minimize the mean response time of a given set of jobs. Compared with the other job scheduling algorithms, according to the experimental results, the algorithm can outperform them. In [23], the authors presented a new security constraint model by formulating the scheduling problem for work-flow requests in the scattered and data-intensive systems. They introduced various meta-heuristic modifications to the main techniques of swarm optimization for treating effective schedules formulation and they introduced an adaptable neighborhood swarm optimization technique. The performance of their technique is computed and compared with that of multi-start genetic and multi-start swarm optimization techniques. The results reveal that their proposed meta-heuristic techniques always give analogous results for scheduling work-flow requests.

GRID COMPTING MODEL
The grid computing model considered in this paper is shown in Fig. 2. It has six main components: User, Portal, Grid Information Server (GIS), Domains, Grid Scheduler (GS) and Processing Nodes (PNs).
1. User is a person or program that submits jobs for execution to the grid.
2. Portal provides grid applications to grid users.
3. GIS is responsible for collecting grid information such as grid workload, network traffic,.. etc. periodically. 4. Domain is an independent object consisting of one or more computing nodes, and a Domain Manager (DM).

GS receives jobs, selects feasible domain for executing them based on the acquired information from the GIS
and finally generates job-to-domain mappings according to the proposed load balancing algorithm.
6. PNs machines responsible for executing user jobs.
Every DM has unlimited storage capacity to hold all of the coming jobs from both exterior grid users and domain"s local users. The processing nodes don't have capacity to hold any jobs (i.e., zero buffer capacity). They are only for execution. The dynamic nature and heterogeneity of the Grid resources makes the status information about available computing resources essential for GS in taking the scheduling decisions properly. The main function of GIS is to provide this information to GS. It collects the state information from all domains, such as entire domain processing capacities (equals summation of all CPUs capacities of processing nodes in the domain), network bandwidth, memory size, software accessibilities and burden of a domain in a certain period for every DM. Every DM is in charge of: 1. Supervising a dynamically changing group of computing nodes that is any member can join or leave the group at any time.
2. Recording newly joining computing nodes to its domain.
3. Gathering all needed information about active computing nodes in its domain and periodically updates GIS with such information. This information may include but not limited to computing node"s processing capacities, available memory size, hardware specifications and existing software.

PROPOSED ANT COLONY LOAD BALANCING ALGORITHM (PACLBA)
The proposed ant colony load balancing algorithm (PACLBA) utilizes the main concepts of ACO techniques to minimize the response times of tasks in computational grids. This policy takes into consideration the current load information of each DM in taking load distribution decisions. In PACLBA the density of pheromone is updated based on the DM status information. The pheromone update process is conducted by executing a local and a global pheromone update functions. It aims to achieve minimum response time for every task by redistributing the workload in a way that efficiently utilizes all of the available grid resources. It is known that, the FCFS scheduling policy guarantees an assured fairness level, it does not need any information about task processing time in advance, its overhead is low and it can be implemented easily. Therefore, each DM utilizes such policy as a local scheduling one. With the FCFS policy, every DM in its local scheduling policy utilizes the fastest available processing node technique in case of having various free processing nodes at the time of selecting a processing node to execute a job.
In order to map the proposed ant colony model to the grid computing one, their relationships are explained as below: 1. An ant: Tasks in grid computing model are represented by ants in the ant colony model. Depending on the type of programs many methods can be used in estimating the program processing time [27]. With that, the pheromone indicator is defined by: Where Phij is the pheromone measure for the j th job which is assigned to the i th DM, Mj is the j th job size, Tj is the requested CPU time for processing j th job, loadi ,CPU_speedi and bandwidthi are the current status information of i th DM.
Based on equation (1), when a job is assigned to a DM, the DM status, the size of jobs and the anticipated program execution time are considered by the GS in the process of selecting the DM for execution. The smaller the value of Phij is, the more efficient it is for the i th DM to execute job j. Assume there are m DMs and n jobs, hence the Pheromone (Ph) matrix is defined as follows: In each round, the smallest value in the Ph matrix is selected. Assuming Phij is the selected value then j th job is assigned to the i th DM for processing there. After assigning a job to a DM, equation (1) is applied to that DM for each unallocated jobs in Ph matrix. This process is conducted to update local (row) pheromone. The whole Ph matrix entries are recomputed immediately after any job completion in a process called global pheromone update. After that, the row corresponding to the DM that just completes executing this job is further multiplied by (1−ρi), where 0≤ρi<1. ρi represents overhead incurred in the i th DM after finishing execution of j th job.
Performing global pheromone update reflects the changes of network condition and DM status after a job is completed. It incorporates the dynamic nature of the grid into the scheduling algorithm such that a better load balancing decision can be taken by the GS at the next turn. As it is illustrated above, the suggested technique considers the grid computing resources heterogeneity. It balances the grid load using the two pheromone update procedures explained earlier. The status of the designated DM is updated immediately after allocating jobs by the local update procedure. On the other hand, the global update procedure is used to update the status of all DMs with respect to all jobs immediately after a job completion takes place. It supports GS by the latest information about all DMs which in turn utilizes such information in taking balancing decisions for next task allocation round aiming to effectively utilizing the available grid resources. This policy leads to maximizing system utilization and improving load balancing level. Therefore, the task mean response time is improved.
The following example illustrates how a DM is selected to execute a job based on the pheromone level.

Example
Assume that the grid has five jobs (J1, J2, J3, J4, and J5) and five DMs (DM1, DM2, DM3, DM4 and DM5). Also, assume that the sizes of the five jobs are 5MB, 15MB, 10MB, 4MB and 3MB respectively and that the initial status of every DM is as given in Table 1. The numbers of CPU iterations required for every job are 4M, 3M, 4.5M, 5M and 3.5M respectively.  When the job is dispatched, the GS determines the minimum pheromone level in the Ph matrix, that is Ph25=0.120992. So J5 is scheduled to DM2 for execution. Hence, a local update (row update) to second row in the Ph matrix is performed for all jobs except J5. Since J5 is scheduled, column 5 in the Ph matrix is no longer needed. Now, assume that as a result of assigning J5 to DM2, DM2 load becomes 22%. The new Ph matrix after executing local update is as follows: If the execution of J5 is completed before the scheduler dispatches the next task, all elements of the Ph matrix will be updated by the global update process to estimate new values of pheromone indicators. These values are used in taking the new decision for allocating next task.
Suppose that, the DMs new status after the completing J5 is as listed in Table 2, and the overhead incurred in DM2 as a result of executing J5 is 0.1 (i.e., ρ2=0.1). Note that ρi=0 for all other DMs (i.e., 2 i   ) because no jobs are allocated to them for processing yet.

Average node Utilization Rate
The utilization rate Ui of i th processing node Pi is obtained by dividing the completion time of task at Pi to the highest task completion time obtained from all computing nodes in the whole grid (Makespan), as follows: Hence, the average utilization rate U of all processing nodes is computed by: Where M represents entire number of processing nodes in the system and U is in the range 0-1.

Load Balancing level
It is known that higher average resource utilization does not guarantee a good load balancing policy [24]. As a result of that, the mean square deviation d of processing nodes utilization rate i U will be used as a measure of load balancing level. It is defined by:  (5), the lower the value of d is, the more efficient load balancing accomplished. Hence, the relative deviation α of d with respect to U which explains the level of grid load balancing is given by: The small values of the mean square deviation d lead to higher relative deviation which tells that the entire system workload is balanced between processing nodes (i.e. a good load balancing level). The best level of load balancing is attained in case of d equals zero which leads to α equals 100%.
The previously explained three performance metrics can be applied to the grid environment and they are correlated. For example, if the grid workload is balanced between the processing nodes, then the resource utilization rate will be high and consequently, the response time of tasks will be improved.

Simulation Tool and Environment
Various simulation tools are available to simulate the proposed algorithm for balancing workload in grid computing systems. The reader is referred to [25] for more details. Among these simulation tools, GridSim v4.0 simulator [26] is utilized in our experiments because it is easily able to simulate various objects in grid computing systems through its offered facilities. These objects include users, heterogeneous resources, software applications, workload balancers of resource which are utilized in assessing performance of workload balancing methodologies. During the experiments, a heterogeneous grid model was constructed with diverse specifications for its resources to assess the performance of PACLBA. Gridlet objects are used to simulate tasks because it has all needed information associated with task and processing administration specifics. On the other hand, the Grid Information Service object has all of the requested information about the existed grid computing resources.
The simulation experimentations are conducted on a 3.6 GHz Core I3 Processor"s PC having 8GB RAM and it is equipped by windows 7 OS.

Experimental Setup
The simulated grid environment contains 3 domains (sites) having 60 processing elements in total with different characteristics, configurations, and capabilities. Every domain has one job waiting queue. Domain local scheduling policy is M/M/n FCFS with fastest available processing node policy. That is, it selects the fastest PNs to execute a job in case of having many free PNs at the decision making time. The local and global bandwidths are 1000Mbps and 100Mbps respectively. All time units are in seconds. The following assumptions are made for the simulations: 1.
Tasks arrive sequentially and randomly to the system following a Poisson process with rate λ.

2.
Times between arrivals are autonomous and follow the exponential distribution.

3.
Instantaneous arriving of tasks is prohibited.

4.
The task"s processing times are assumed to follow the exponential distribution with mean μ.

5.
Tasks are assumed to be mutually independent that is, there are no dependences or communication between them.

6.
Any computing node can be used in executing tasks and every CPU can perform only one task at a certain point of time.

7.
Tasks are not preemptable that is, the task processing could not be interrupted or shifted to any other computing node during its processing.

8.
Task length is a uniformly distributed random number in the range of (0.1…0.5) Million Instructions (MI) unit.

9.
Total CPUs speed ranging from 0 to 4 Million of Instructions per second (MIPs) are randomly assigned to the processing elements.
10. Every result listed in this paper is the mean value achieved from five simulation rounds starting with various seeds for generating random numbers.
Set ρ to represent the mean system traffic intensity parameter in the simulated mode. It is computed by dividing the mean arrival rate to the mean processing rate of tasks. Based on this definition, the tasks processing times μ are adjusted to obtain the requested traffic intensity ρ.
The job response time, mean node utilization and load balancing level are the three performance measurements used in evaluating the PACLBA. During the simulations, the average system traffic intensity factor is varied and results are collected to assess the performance of PACLBA under various system parameters setting. The final results of the simulations are presented on an average basis.

Experimental Results
This section presents an evaluation for the performance of the PACLBA and compares it with the performance of the Random Distribution Load Balancing Algorithm (RDLBA). In RDLBA the task processing domain is selected randomly. This performance comparison is conducted based on three performance measures: average job response time, average node utilization and load balancing level that indicates how much load balancing is achieved. In Fig. 3, the average job response time of the two algorithms is compared. From that figure, one can notice that average job response time of the two algorithms rises up as the system traffic intensity rises up. This is normal because increasing the traffic intensity means that there are many jobs need to be handled. One more point is that the PACLBA outperforms the RDLBA in all cases. This result was anticipated because the PACLBA selects a DM to execute a job according to the assessed task communication time and expected task processing time if it is allocated to that DM. Taking these parameters in consideration leads to effectively utilizing available resources which in turn minimize the grid mean job response time. On the other hand, the RDLBA selects randomly a DM to execute a job without taking into account any performance indicators and that lead to unbalance the distribution of system"s load. As a direct result, the available grid resources are poorly utilized and consequently, the system performance is degenerated. To estimate the enhancement ratio achieved in task mean response time, we computed the mean task response time improvement ratio (TR-TP)/TR, where TR and TP are the mean task response time obtained using the RDLBA and PACLB algorithm respectively. Fig. 4 presents the improvement ratio in the mean job response time. From it, one can notice that the enhancement ratio rises up steadily as the load (traffic intensity) rises up. This increase continues till the system load come to be intermediate where the extreme enhancement ratio is achieved. After that the enhancement ratio declines steadily as the system load rises up till the system"s saturation point reached.

Fig. 4. Improvement ratio in mean job response time
Figs. 5 and 6 illustrate the mean utilization and mean square deviation of processing nodes for various grid workload using RDLBA and PACLBA respectively. From these figures, one can notice that the average processing nodes utilization (mean square deviation) obtained using the two algorithms increases (decreases) as the grid workload increases. However, the utilization (mean square deviation) of processing nodes under the PACLBA is always higher (lower) than that of the RDLBA which means that, the performance of the PACLBA is better than that of the RDLBA. Since, a low value of mean square deviation means a good load balancing level is obtained [24]. This ensures the results presented earlier in Figs. 3 and 4. Fig. 7 presents the load balancing level for various grid workload using RDLBA and PACLBA. From that figure, it is noticed that the load balancing level obtained using the PACLBA is always higher than that of the RDLBA in all cases which again ensures the previously presented results. By carefully examining all of the presented results, we can say that the PACLBA performs more robustly than the RDLBA.

CONCLUSIONS AND FUTURE WORK
This paper presents an ant colony load balancing technique that selects a suitable domain manager for executing jobs in the computational grids infrastructure. The suggested algorithm takes into considerations the computing resources heterogeneity. It selects a domain manager to execute a job according to the assessed task transfer time and anticipated processing time of the task when it is allocated to that domain manager. The PACLBA balances the grid workload using a global and local pheromone update procedures.
To evaluate the performance of the PACLBA, a simulation model is built using GridSim simulator. The performance of proposed technique is compared with that of the RDLBA. The obtained results indicate that the PACLBA enhances the average task response time in all cases. The enhancement ratio rises up steadily as the system load rises up. Such increase continues till the system load come to be mild where the highest enhancement ratio is attained and then the enhancement ratio steadily degenerates as the system load rises up till the system"s saturation point is reached.
In the future, we will study the reliability of PACLBA by studying some fault tolerance metrics. Also, the ability to extend the PACLBA to be able to deal with dependent tasks by adding a synchronization mechanism to it could be studied because the proposed algorithm deals only with independent tasks.