AN IMPLEMENTATION OF LOAD BALANCING ALGORITHM IN CLOUD ENVIRONMENT

Cloud Computing is an emerging computing paradigm. It aims to share data, calculations


INTRODUCTION
Cloud computing promises to increase the velocity with which applications are deployed, enhance modernization, and lower expenses, all at the same time increasing business agility. Cloud Computing is a concept that has many computers interconnected through a real time network like internet. Cloud computing mainly refers to distributed computing. Cloud computing enables well-situated, on-demand, dynamic and reliable utilization of distributed computing assets. The cloud is altering our life by providing users with new kinds of services. Users acquire service from a cloud without paying attention to the details. Cloud computing is a on demand service in which shared resources work together to perform a task to get the results in minimum possible time by distribution of any dataset among all the connected processing units. Cloud computing is also referred to refer the network based services which give an illusion of providing a real server hardware but in real it is simulated by the software's running on one or more real machines. Such virtual servers do not exist physically so they can be scaled up and down at any point of time [1]. Cloud computing is high utility software having the ability to change the IT software industry and making the software even more attractive [2]. Hence, It helps to accommodate changes in demand and helps any organization in avoiding the capital costs of software and hardware [3] [4].
There are many problems prevalent in cloud computing [6], [7]. Such as:  Ensuring appropriate access control (authentication, authorization, as well as auditing)  Network level migration, so that it requires least cost and time to shift a job  To offer correct security to the data in transit and to the data at rest.
 Data availability issues in cloud  Official quagmire and transitive trust issues  Data lineage, data origin and inadvertent leak of sensitive information is possible. And the most prevalent problem in Cloud computing is the problem of Load Balancing.

Necessity of Load Balancing
Load balancing is a computer network method for distributing workloads across multiple computing resources, for example computers, a computer cluster, network links, central processing units or disk drives. Load balancing plans to optimize resource use, maximize throughput, minimize response time, and evade overload of any one of the resources. By the use of multiple components with load balancing instead of a single component may increase reliability through redundancy.
Load balancing in the cloud differs from classical thinking on load-balancing architecture and implementation by using commodity servers to perform the load balancing because it's difficult to predict the number of requests that will be issued to a server. This provides for new opportunities and economies-of-scale, also presenting its own unique set of challenges. Load balancing is one of the central issues in cloud computing [8]. It is a mechanism that distributes the dynamic local workload evenly across all the nodes in the whole cloud to avoid a situation where some nodes are heavily loaded while others are idle or doing little work. It helps to attain a high customer satisfaction and resource utilization ratio, consequently improving the overall performance and resource utility of the system. It also makes sure that every computing resource is distributed efficiently and fairly [9]. It further prevents bottlenecks of the system which may occur due to load imbalance. When one or more components of any service stop working, load balancing facilitates in continuation of the service by implementing fair-over, i.e. in provisioning and de-provisioning of instances of applications without fail. The emerging cloud computing model attempts to address the explosive growth of web-connected devices, and handle massive amounts of data [10] and client demands. Thereby, giving rise to the question whether our cloud model is able to balance the ever-increasing load in an effective way or not.

PROS OF RUNNING SIMULATION
Use of cloud computing is increasing at a very fast pace everywhere because it turns the capital expenditure cost into operational cost. In addition to that, use of simulation tools is considered a better option in spite of being on the real cloud as performing experiments in a controlled and dependent environment is difficult and costly to handle [2].Moreover, effective resource utilization is not possible in case of Cloud. So, we just shift towards cloud simulation tools. Following are the advantages of running simulation tools in cloud: a. No capital cost involved: As we discussed earlier, that cloud computing makes a shift from capital expenditure cost to operational cost. Having a cloud simulation tool also involves having no installation cost or maintenance cost as well.
b. Leads to better results: Using such tools helps to change inputs and other parameters as well very easily which results in better and efficient output c. Evaluation of risks at an early stage: Because simulation tools involve no cost while running as is in case of being on cloud, so user can identify and solve any risk that is associated with the design or with any parameter.
d. Easy to learn: While working with such simulation tools, user need to have only programming abilities and rest all depend on that. If the user is well versed with the language, then simulation tools offer no problem [3].

CLOUD SIMULATION TOOLS
There are various simulation tools for cloud ,some of which are as follows: Cloud Sim: Analysing the performance, policies in real cloud is difficult to achieve because of its altering nature, so in such a situation, we can opt for CloudSim. CloudSim is a famous tool that is actually a toolkit for simulation of cloud scenarios [4].CloudSim has been developed as a CloudBus project in Australia [4]. CloudSim actually enables the users to have a proper insight into cloud scenarios without worrying about the low level implementation details [5]. CloudSim is invented as CloudBus Project at the University of Melbourne, Australia and supports system and behavior modeling of cloud system components such as data centers, virtual machines (VMs) and resource provisioning policies. It implements generic application provisioning techniques that can be extended with ease and limited efforts. CloudSim helps the researchers to focus on specific system design issues without getting concerned about the low level details related to cloud-based infrastructures and services [7]. CloudSim is an open source web application that launches preconfigured machines designed to run common open source robotic tools, robotics simulator Gazebo. SimJava is a toolkit for building working models of complex systems. It is based around a discrete event simulation kernel at the lowest level of CloudSim. It includes facilities for representing simulation objects as animated icons on screen [7,8].

B. CDOSim
CDOSim is a cloud deployment option (CDO) Simulator which can simulate the response times, SLA violations and costs of a CDO. A CDO is a decisions concerning simulator which takes decision about the selection of a cloud provider, specific runtime adaptation strategies, components deployment of virtual machine and its instances configuration. Component deployment to virtual machine instances includes the possibility of forming new components of already existing components. Virtual machine instance's configuration, refer to the instance type of virtual machine instances. CDOSim can simulate cloud deployments of software systems that were reverse engineered to KDM models. CDOSim has ability to represent the user's rather than the provider's perspective. CDOSim is a simulator that allows the integration of fine-grained models. CDOSim is best example for comparing runtime reconfiguration plans or for determining the tradeoff between costs and performance [16]. CDOSim is designed to address the major shortcomings of other existing cloud simulators such as 1. Consequently oriented towards the cloud user perspective instead of exposing fine-grained internals of a cloud platform.
2. Mitigates the cloud user's lack of knowledge and control concerning a cloud platform structure.
3. Simulation is independent of concrete programming languages in the case appropriate KDM extractors exist for a particular language. 4. Workload profiles from production monitoring data can be used to replay actual user behavior for simulating CDOs.
MDCSim MDCSim is a variant of CloudSim tools. It helps the user to analyze and predict the hardware related parameters of the data centers like those of servers, switches, routers etc. Also it is used predominantly because of its low overhead produced [4].
SPECI SPECI, Simulation Program for Elastic Cloud Infrastructures, is responsible for analyzing the various scalability and performance aspects of future Data centers [9].It is assumed that when data centers are made to grow big, then they do so in a non linear fashion, so there is a need to analyze the behaviour of such data centers. Here what SPECI plays a role. So, these are all about some major cloud simulation tools being used today.
Network Cloud Network Cloud is an extension of CloudSim and is capable of implementing network layer in CloudSim, reads a BRITE file and generates a topological network. Here, we have topology file which contains the number of nodes along with the various entities involved in simulation [4].In this simulation tool, each entity is to be mapped to a single BRITE node so that network CloudSim can work properly. Network CloudSim can be used to stimulate network traffic in CloudSim.

METHODOLOGY
In cloud computing, the platform, computing and software can be used as services. It is the form of utility computing, in which customer need not own the necessary infrastructure and pay for only what they use. The computing resources are delivered as virtual machines. In such a scenario, task scheduling algorithms play an important role where the aim is to schedule the tasks effectively. It helps to reduce the turnaround time and improve resource utilization. Load balancing in the cloud computing mainly impact on the performance of file system. With the load balancing technique improve the efficiency of the file system. In this thesis mainly work with the better load balance and cloud partition under the different situations. In this system to develop such a file system which can execute N number of jobs on processors which can take less time and work more. Time sharing approach helps to balance the load of number of jobs on processors and also helps to allocate that job the processor can execute according to its capacity which results in getting less weight time for the jobs. After this the time sharing technique execute jobs which are allocated jobs according to job sharing techniques and result in producing less response time then the existing file systems. The space sharing technique also allows splitting the job on different processors if one processor is not able to fulfill the requirements of the job then the job will be split on the different processors which makes job to be executed in less time.
In the work load model all tasks of jobs have equal service demand. Job cumulative service demand is dividing into maximum jobs and each job will have a demand of minimum time. This work load shows the advantage of space sharing policy. I) Job Selection: Job selection policy is used to select the jobs in the queue. The global scheduler consist the jobs in the queue. The aim of scheduling policy is to carry the job from the queue in some manner.
The algorithm adds clustering approach so as to divide VMs with similar capacities into groups. K-means clustering approach has been used to divide VMs into cluster. The load balancer will maintain a list of all the clusters with the minimum and maximum resource specific capacities of each cluster. This is range specifier list. Also the load balancer will maintain the list of VMS for each cluster. The approach is dynamic, centralized and heterogeneous in nature, considers resource specifIc demands. It reduces the overhead of scanning the entire list of VMs from the beginning.
Step1: Initialize all VMs with their specific resource types, capacities of each resource and status of VMs.
Step 2: Cluster the n VMs into k clusters using K-means clustering using the three resource types as parameters i.e. CPU processing speed, Memory and network bandwidth.
Step 3: Cloud controller receives a new request Step 4: Cloud controller queries appropriate node controller/load balancer for next allocation.
Step 5: Load balancer scans the range specifIer list of k clusters to see that which cluster can handle the incoming request.
Step6: Load balancer will then assign the request to the appropriate VM of the chosen cluster by looking into the list of cluster members which will match the specific demands of the task and whose status is available. In case more than one VMs satisfy this, then the first one which is found will get the task.
Step7: Remaining resource quantities of that VM in the VM list of that cluster is then updated.
Step 8: Status of that VM is changed from A V AILABLE to BUSY.
Step9: When the VM finiish processing the request, the status of that VM is changed to AVAILABLE.
Step 10: The load balancer also updates the capacity of that VM in the VMs capacities list.
Euclidean distance formula has been chosen to assign VMs to the clusters. The value of K i.e. the number of clusters has been chosen to be the highest prime factor of n where n is the number of VMs. The formulat for calculation of Eucledian distance has been mentioned over here:

EUD(VM)(C j )=sqrt[ (CPU i -CPU j )²+(Mem i -Mem j )²+(BW i -BW j )²
To find new mean of each cluster when a machine gets assigned to it is mentioned below.

• CPUj=(CPUi+CPUj)/2 • Memj=(Memi+Memj)/2 • BWj=(BWi+BWj)/2
The Most-fit policy is used to select the cluster. The Most-fit policy is used to reduce resource fragmentation by choosing the appropriate cluster which waste less number of processor and by taking care of the other jobs in the queue. Each cluster which has enough processors for the waiting job, the file system performs a series of simulated activities, to measure how many immediate subsequent allocations can follow the allocation decision. After each cluster has been checked, the file system selects the cluster with the largest number of immediate subsequent allocations to perform current job allocation. If there is no single site having enough free processors Multi-site execution co-allocation will be used, this policy tries to run a parallel job across several sites.   The number of iterations for creating the clusters will increase as we increase the number of clusters. In the K-means clustering process, centroid is calculated using the mean of the bandwidth, MIPS and RAM of the virtual machines available in that cluster. The centroid will shift its position in every iteration, therefore the number of iterations will continue till the saturation is achieved. The different number of iterations for creating different number of clusters are specified in the Table 2.  1600-1600 1700 -1700 3000 -3000