Pso Optimization algorithm for Task Scheduling on The Cloud Computing Environment

The Cloud computing is a most recent computing paradigm where IT services are provided and delivered over the Internet on demand. The Scheduling problem for cloud computing environment has a lot of awareness as the applications tasks could be mapped to the available resources to achieve better results. One of the main existed algorithms of task scheduling on the available resources on the cloud environment is based on the Particle Swarm Optimization (PSO). According to this PSO algorithm, the application’s tasks are allocated to the available resources to minimize the computation cost only. In this paper, a modified PSO algorithm has been introduced and implemented for solving task scheduling problem in the cloud. The main idea of the modified PSO is that the tasks are allocated on the available resources to minimize the execution time in addition to the computation cost. This modified PSO algorithm is called Modified Particle Swarm Optimization (MPOS).The MPOS evaluations have been illustrated using different time, and cost parameters and their effects in the performance measures such as utilization, speedup, and efficiency. According to the implementation results, it is found that the modified MPOS algorithm outperforms the existed PSO.


INTRODUCTION
The Cloud computing allows the users to use any computational resources and services of data centers (i.e., machines, network, storage, operating systems, application development environments, application programs) over the network to deploy and evaluate their applications [1]. This means that the cloud computing provides self-service provisioning, which is considered an important feature in the cloud computing [2]. On the other hand, the cloud computing services are divided into three layers; SaaS (Software as a Services), PaaS (Platform as a Services), and IaaS (Infrastructure as a Services) [3]. The Cloud computing architecture is categorized as layers, service model, and deployment model (types). By this classification, the users can easily choose the suitable cloud services and types to fit their business according to these services [1,3]. Now a day, it is important for services (resources, applications...) to be accessed through the cloud environment because of the cloud computing benefits, such as saving cost and service availability at any time. On the other hands, Cloud computing has been emerged as a commercial reality in the field of information technology but the technology is still not fully developed [4]. There are still some topics that are needed to be focused on, as Resource management and Task scheduling.
The work in this paper is concerned with the task scheduling problem to minimize the computation cost and the total execution time of the applications using the provided resources by the Cloud service providers, such as Amazon and GoGrid3. We have achieved these features by introducing a modified Particle Swarm Optimization (MPSO) algorithm.
Particle Swarm Optimization (PSO) is a swarm-based intelligence algorithm influenced by the social behavior of animals, which is introduced by Kennedy and Eberhart [5]. Each particle has position and velocity. The position of particle at any instance of time is influenced by its personal best position (pbest) and the position of the best particle in global problem space (gbest). The performance of a particle is measured by a fitness value which is based on the problem specification.
Scheduling is the method by which threads, processes, tasks or data flows are given access (mapped) to system resources (e.g. processor time, communications bandwidth, utilization of the system) according to the users requirements [6]. A good scheduling algorithm is important as the requirement for most modern systems arises to perform multitasking (execute more than one process at a time) and multiplexing (transmit multiple flows simultaneously). [7,8] The rest of the paper is organized as: Section 2 provides conducted studies. task-resource scheduling problem formulation is discussed in Section 3. The principle of the proposed task scheduling algorithm (MPOS) is discussed in Section 4. Section 5 provides the contribution of our proposed algorithm. Finally, Section 6,7 represents conclusions and future propositions.

CONDUCTED STUDIES
Yun Yang, Ke Liu, and Jinjun Chen [9] have proposed an Innovative transaction intensive cost-constraint scheduling algorithm which considers the cost and time. The simulation results have demonstrated that this algorithm can achieve lower cost than others while meeting the user designated deadline. Suraj Pandey et al. [10] have proposed a heuristic task scheduling which optimizes the cost of task-resource mapping based on the solution of using particle swarm optimization (PSO) technique. PSO based mapping algorithm has much lower cost as compared to another algorithm called BRS (Best Resource Selection) based mapping. Their results show that PSO can achieve: a) as much as 3 times cost savings as compared to BRS, and b) good distribution of workload onto resources.
Ke Liu et al. [11] have presented a novel compromised-time-cost (CTC) scheduling algorithm. The CTC algorithm considers the characteristics of cloud computing to accommodate instance-intensive cost-constrained workflows by compromising the execution time and cost which are user input enabled on the fly. The simulation results has demonstrated that the CTC algorithm can achieve lower cost while meeting the user-designated deadline or reducing the mean execution time within the user-designated execution cost. Saeed Parsa and Reza Entezari-Maleki [11] have proposed a new task scheduling algorithm called Resource-Aware-Scheduling algorithm (RASA). It is composed of two traditional scheduling algorithms; Max-min and Min-min. The main feature of the RASA algorithm is that it amalgamates the advantages of Max-min and Min-min algorithms and alleviates their disadvantages. Though, the deadline of each task, arriving rate of the tasks, cost of the task execution on each of the resource, and cost of the communication are not considered .The experimental results show that RASA algorithm outperforms the existing scheduling algorithms in large scale distributed systems. J.Huang [12] has proposed workflow task scheduling algorithm based on the genetic algorithms (GA) model in the cloud computing environment which can fulfill the goals of the workflow task scheduling. They proved that the proposed algorithm's performance has improved perfectly analysis from algebra and the population size under the different settings, improved the efficiency of task scheduling, which can maximum satisfy the QoS (Quality Of Service) requirements of the users.
Lei Zhang et al. [13] have proposed a PSO algorithm. This proposed algorithm is similar to the genetic algorithms (GA). The aim of this algorithm is how to improve the efficiency of resource allocation and how to minimize the completion time simultaneously. It is noted that the performance of PSO usually spent shorter time to accomplish the various scheduling tasks and specifies better result comparing to the GA algorithm. Also, they have proved that the PSO algorithm can get better effect for a large scale optimization problem. J u l y 1 4 , 2 0 1 4 Cui Lin, and Shiyong Lu [14] have proposed an Scalable Heterogeneous Earliest-Finish-Time Algorithm (SHEFT) workflow scheduling algorithm to schedule a workflow elastically on a Cloud computing environment. The experimental results show that SHEFT is not only outperform several representative workflow scheduling algorithms in optimizing workflow execution time, but also enable resources to scale elastically at runtime.
Visalakshi and Sivanandam [15] have presented Hybrid Particle Swarm Optimization (HPSO) method for solving the Task Assignment Problem (TAP). The algorithm has been developed to dynamically schedule heterogeneous tasks on to heterogeneous processors in a distributed setup. The HPSO yields a better result than the Normal PSO when applied to the task assignment problem. The results Of PSO and HPSO is also compared with another popular heuristic optimization technique namely Genetic Algorithm (GA). The results infer that the PSO performs better than the GA.
S.Selvarani, and G.Sudha Sadhasivam [16] have proposed an improved cost-based scheduling algorithm for making efficient mapping of tasks to available resources in the cloud. The improvisation of traditional activity based costing is proposed by new task scheduling strategy for cloud environment where there may be no relation between the overhead application base and the way that different tasks cause overhead cost of resources in the cloud. This scheduling algorithm divides all user tasks depending on priority of each task into three different lists. This scheduling algorithm measures both resource cost and computation performance, it also Improves the computation/communication ratio.
Yang et al. [17] have highlighted the issue of job scheduling in cloud computing. They argued that there is no well-defined job scheduling algorithm for the cloud that considers the system state in the future .The existing job scheduling algorithms under utility computing paradigm do not take hardware/software failure and recovery in the cloud into account. To tackle this issue, they have proposed a Reinforcement Learning (RL )based algorithm that helps the scheduler to define scheduling decision with fault tolerable while maximizing utilities attained in the long term.

TASK-RESOURCE SCHEDULING PROBLEM FORMULATION
According to the task scheduling problem, the application is represented as a Directed Acyclic Graph (DAG) where nodes (or tasks) represent the needed computation and edges represent the communication between tasks. For each node in the DAG, a weight is assigned corresponding to computation cost, and weights for edges are assigned corresponding to communication cost between nodes [18].
..,Tn} is the set of tasks, and E represents the data dependencies between these tasks, whereFj,k =(Tj, Tk) ∈ E means thatthe data produced by Tj and consumed by Tk (see Fig. 1(a)) [10].
By considering a set of storage sites S = {1..., i}, a set of compute sites PC = {1, ..., j}, and a set of tasks T = {1, ..., k}. The 'average' computation time of a task Tk on a compute resource PCj for a certain size of input is considered known. Then, the cost of computation of a task on a compute host is inversely proportional to the time it takes for computation on that resource. Also, it is assumed that the cost of unit data access di,j from a resource i to a resource j is known. The access cost is fixed by the service provider (e.g. Amazon CloudFront). The transfer cost can be calculated according to the bandwidth between the sender and receiver sites. However, the cost for transferring unit data between sites, per second is one of task scheduling issues which will be considered. These costs are non-negative, symmetric, and satisfy the triangle inequality; that is, di,j = dj,i for all i, j ∈ N, and di,j + dj,k≥di,k for all i, j, k ∈ N (see Fig. 1 (b)) [10].
By considering an application DAG with a set of tasks T = {1, ..., k}, a set of storage sites S = {1, ..., i}, and a set of compute sites PC = {1, ..., j}, the problem can be stated as: "Find a task-resource mapping instance M, such that estimating the total cost and the total time for each compute resource PCj, the highest cost and also highest time among all the compute resources is minimized and load balance is achieved." [10].

Cost Minimization Problem
The goal is to assign the tasks to the available compute resources to minimizing the total cost of computation and total time of completion of an application. The cost is minimized such that it completes within the time (deadline) a user specifies. The cost is determined using the following equations [10]: (1) Cexe(M)j denoted to the total cost of all the tasks assigned to a compute resource PC j(Eq. 1). This value is computed by adding all the node weights (the cost of execution of a task k on compute resource j) of all tasks assigned to each resource in the mapping M.
Ctx(M)j is considered as the total access cost (including transfer cost) between tasks assigned to a compute resource PCj and those that are not assigned to that resource in the mapping M (Eq. 2). This value is the product of the output file size (given by the edge weightek1,k2) from a task k1 k to task k2 k and the cost of communication from the resource where k1 is mapped (M(k1)) to another resource where k2 is mapped (M(k2)).The average cost of communication of unit data between two resources is given by dM(k1),M(k2). The cost of communication is applicable only when two tasks have file dependency between them, that is when ek1, k2> 0. For two or more tasks executing on the same resource, the communication cost is zero.
For a given assignment M, the total cost Ctotal(M)j for a compute resource PCj is the sum of execution cost and transfer cost (Eq. 3). Then, the total cost for all the assignments will be dominated by the highest cost of a compute resource (Eq. 4) ensures that all the tasks are not mapped to a single compute resource. Hence, the goal of the assignment is to minimize this cost (Eq. 5).

Time Minimization Problem
According to our modified MPOS algorithm, the total time of task execution will be introduced as another parameter should be minimized beside the cost. The goal of this modification is to assign the tasks to the compute resources such that the time of computation is minimized. The time is determined using the following equation [19]: Completion time of (pu) }……………… (11) J u l y 1 4 , 2 0 1 4 According to equations (7-11), the dependency between each task and previous tasks is checked. If there is dependency between the tasks, so the cost of communication of unit data between two resources is given by dM(k1),M(k2) which is applicable only when two tasks have file dependency between them, that is when ek1, k2> 0. For two or more tasks executing on the same resource, the communication cost is zero. The value from the product of communication cost and files dependency is calculated then this result value is added to finish time of previous tasks in resource j to get Cdata(M)j (Eq. 7).
CST(M)j is the start time of all tasks on resource j which is calculated as the max between Cdata(M)j and finish time of the previous tasks in the same resource j (Eq. 8).
CFT(M)j is the finish time of tasks on resource j which is calculated as the addition between Cexe(M)j in (Eq. 1) and Cst(M)j. (Eq. 9) and this is the total time.
Then, the total time for all the assignments will be dominated by the highest time of a compute resource (Eq. 10) ensures that all the tasks are not mapped to a single compute resource. Hence, the goal of the assignment is to minimize this time (Eq. 11).

THE MODIFIED PARTICLE SWARM OPTIMIZATION SCHEDULING ALGORITHM
According to PSO, the population is set of particles in a problem space. Particles are initialized randomly; each particle will have a fitness value, this value evaluated by a fitness function to be optimized in each generation. Each particle knows its best position pbest and the best position so far among the entire group of particles gbest, the pbest of a particle is the best result (fitness value) so far reached by the particle, whereas gbest is the best particle in terms of fitness in an the all population. The evaluation is carried out in a loop until the results converge or until number of iterations (user specified stopping criteria) [13,20].
The particle will have velocity, which directs the flying of the particle. Each iteration, the velocity and the position of particles will be updated as follows [20]: As W, c1 and c2, are positive constants which represent the weight of previous velocity, the weight of the acceleration terms that pull each particle toward Pbest and gbest, respectively [20]. The existed PSO task scheduling algorithm provides a mapping of all tasks to a set of given resources based on the model described in the following two algorithms [10].  J u l y 1 4 , 2 0 1 4 Algorithm 2: PSO algorithm.
1.Set particle dimension as equal to the size of ready tasksin {ti} # T 2. Initialize particles position randomly from PC =1, ..., j and velocity vi randomly. 3. For each particle, calculate its fitness value with respect to Cost Minimization presented by eqn. 5. 4. If the fitness value is better than the previous best pbest,set the current fitness value as the new pbest.

After
Steps 3 and 4 for all particles, select the best particle as gbest.
6. For all particles, calculate velocity using Equation 12and update their positions using Equation 13.
7. If the stopping criteria or maximum iteration is not satisfied, repeat from Step3.
The existed PSO algorithm is modified by using two fitness functions instead of one. The first fitness function is to minimize the cost (as in the existed algorithm by equ.5), the other one is to minimize the compilation time which is presented by equ.11. Actually, these two fitness functions are implemented using different combinations (i.e., AND, sequence, and Best-To-Best operations in algorithm 2 of the existed PSO scheduling algorithm). This has been implemented by replacing step 3 in algorithm 2 using one of the following five combinations:

(e) For each particle, calculate its fitness value with respect to Best Time Minimization presented by eqn.11 TO Best Cost Minimization presented by eqn. 5.
These combinations have been implemented one after another to justify which combination will produce good results.

SIMULATION AND ANALYSIS OF RESULTS
In this section, the metric of the experiment setup, comparison, and results are presented.

Experimental Environment
The modified task scheduling algorithm has been written by java programming language using eclipse program in Intel(R) Core(TM)2 Duo CPU in 1.60GHZ of processor and 2.50 GB of RAM . The experimental setup of the PSO algorithm considers that the iterations =20, and the number of execution = 30.

Experimental Results
Three matrices are used to store the results for: The values for PP-matrix resemble the cost of unit data transfer between resources given by Amazon Cloud Front [21] .It is assumed that PC1 to be in US, PC2 in Hong Kong (HK) and PC3 in Japan (JP), respectively. The PP-matrix's values could be proposed randomly for every repeated experiment, but these values are kept constant during our MPSO task scheduling implementation. While, the values for TP-matrix are given by the Amazon EC2's pricing policy for different classes of virtual machine instances is used [22]. Each task has its own Data Size matrix (DS). The sum of all the values in the DS matrix varies according to the size of data (e.g., 64-1024 MB). According to Figure 1(a), if x is the output data size of task T1, then tasks T2, T3, and T4 receive x data as input and produce x data as output. Finally, task T 5 consumes 3x data and produces 6x data. These matrices are depicted in Table 1 [10]. J u l y 1 4 , 2 0 1 4 By applying the time and cost fitness functions according to combination (a) for our MPOS algorithms, the results are depicted in Figure 2, and the performance parameters are presented in Table 2.
[ According to the results in Figure 2, it is found that by applying the cost fitness function then time fitness function, the average cost=14.86 and the average time=10.82.
By applying the time function and then cost fitness function according to combination (b) for our MPOS algorithm, the results are depicted in Figure3, and the performance parameters are presented in Table 3.

Table3. performance parameters for each data size
According to the results in Figure 3, it is found that by applying the cost fitness function then time fitness function, the average cost= 12.05 and average time= 9.76.
By applying the time and cost fitness functions according to combination (c) for our MPOS algorithms, the results are depicted in Figure 4, and the performance parameters are presented in Table 4.

Table4. performance parameters for each data size
According to the results in Figure 4, it is found that by applying the cost and time fitness functions, the average cost = 6.20 and average time = 7.04.
By applying the time and cost fitness functions according to combination (d) for our MPOS algorithms, the results are depicted in Figure 5, and the performance parameters are presented in Table 5.

Table5. performance parameters for each data size
According to the results in Figure 5, it is found that by applying the best cost fitness function then the best time fitness function, the average cost = 6.15 and average time = 7.25.
By applying the time and cost fitness functions according to combination (e) for our MPOS algorithms, the results are depicted in Figure 6, and the performance parameters are presented in Table 6.

Table6. performance parameters for each data size
According to the results in Figure 6, it is found that by applying the best cost fitness function then the best time fitness function, the average cost= 6.15 and average time= 6.98.
According to the experiments results (see Figures (2-6)) using five combinations of the time and cost fitness functions which are defined from (a) to (e), we note that the average cost and average time are reduced and performance measures (i.e., utilization, speedup, efficiency) is increased in each experiment. So, by applying combination (e) for our MPOS algorithm would produce good results with respect to the total cost and total computation time minimization.
The implementation results of our modified MPSO using the five combinations (a-e) with respect to the existed PSO algorithm are presented in Figure 7.  Cost reduction of our MPSO algorithm relative to the existed PSO algorithm = (12.05/ 46.176)*100 =26% With respect to combination (c): Cost reduction of our MPSO algorithm relative to the existed PSO algorithm = (6.2/46.176)*100= 13.4%

With respect to combination (e):
Cost reduction of our MPSO algorithm relative to the existed PSO algorithm = (6.15/46.176) *100=13.3% According to the results in Fig. 7 and the computed cost reduction, we note that our modified MPSO algorithm is always outperformed the existed PSO algorithm.
Generally, by considering the computation time of tasks besides the cost for allocating tasks to the available resources produces better results than that considering the cost only.

CONCLUSION
In this paper, a modified task scheduling heuristic based on Particle Swarm Optimization (PSO) is introduced and implemented. This modified algorithm is called MPSO. The aim of the modified PSO is to minimize the total cost and time of execution of application workflows on Cloud computing environments, where the total cost of execution is obtained by varying the communication cost between resources and the execution time of compute resources. The main principle of our MPSO algorithm is that two fitness functions, cost and time, are introduced. According to the comparative results, it is found that our MPSO algorithm outperforms the existed PSO algorithm.