A NOVEL APPROACH OF OPTIMIZING PERFORMANCE USING K-MEANS CLUSTERING IN CLOUD COMPUTING

Cloud computing is distributed computing, storing, sharing and accessing data over the Internet. It provides a pool of shared resources to the users available on the basis of pay as you go service that means users pay only for those services which are used by him according to their access times. Load balancing ensures that no single node will be overloaded and used to distribute workload among multiple nodes. It helps to improve system performance and proper utilization of resources. We propose an improved load balancing algorithm for job scheduling in the cloud environment using K-Means clustering of cloudlets and virtual machines in the cloud environment. All the cloudlets given by the user are divided into 3 clusters depending upon client’s priority, cost and instruction length of the cloudlet. The virtual machines inside the datacenter hosts are also grouped into multiple clusters depending upon virtual machine capacity in terms of processor, memory, and bandwidth. Sorting is applied at both the ends to reduce the latency. Multiple number of experiments have been conducted by taking different configurations of cloudlets and virtual machine. Various parameters like waiting time, execution time, turnaround time and the usage cost have been computed inside the cloudsim environment to demonstrate the results. Compared with the other job scheduling algorithms, the improved load balancing algorithm can outperform them according to the experimental results.


INTRODUCTION
Cloud computing is a combination of many computing fields and has gained much popularity in the recent years. Cloud computing provides computing, storage, services, and applications over the Internet. Moreover, cloud computing facilitates to reduce capital cost, decouple services from the underlying technology, and provides flexibility in terms of resource provisioning. Cloud computing has become very beneficial for business services, applications and other types of consumer requirements. Very large enterprises are practicing to scale back all their hardware and infrastructure on the cloud and for that reason most of them has already begin consolidating their IT operations and virtualization mechanisms and technologies on the cloud [1].
Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. Cloud computing has claimed to jump the enterprise business to a brand-new level and permits them to cut back all the prices through improved production, reduced administration and infrastructure, architecture price and quicker preparation cycles.
Cloud computing is a type of computing that relies on sharing computing resources rather than having local servers or personal devices to handle applications. In cloud computing, the word cloud(also phrased as "the cloud" ) is used as a metaphor for "the Internet", so the phrase cloud computing means "a type of Internet-based computing", where different services such as servers, storage and applications are delivered to an organization's computers and devices through the Internet. Cloud computing is comparable to grid computing, a type of computing where unused processing cycles of all computers in a network are harnesses to solve problems too intensive for any stand-alone machine. The Cloud Computing may be a term that describes the infrastructure, platform, services and different kind of applications. As this is a platform it reconfigures servers or applications where the server can be a physical machines or virtual display machines. Cloud computing is different from ancient computing paradigms because it is customizable, scalable, encapsulated, abstract entity that gives totally different level of services, processes to the clients, driven by economies of scale and also the services area unit dynamically and totally configurable [2].

RELATED WORK
The most of the researches have been working in the area of load balancing in cloud computing process for enhancing the overall performances of the clouds. Some of these tasks should contain the improved traditional mechanisms to achieve the objective of load balancing. So as to appreciate their contribution, determination and better understandability the work ahead.
Al-Rayis et al. [1] explains that basically, load balancers can be deployed based on three different architectures. The centralized load balancing architecture which includes a central load balancer to make the decision for the entire system regarding which cloud resource should take what workload and based on which algorithm(s).
Bhoi et al. [2] discussed that in enhanced Max-Min Task Scheduling Algorithm in cloud computing helps in supplying a high performance computing based on protocols which allowed shared computation and storage over long distances. It Bhadani et al. [3] proposed a Central Load Balancing Policy for Virtual Machines (CLBVM) that balances the load evenly in a distributed virtual machine/cloud computing environment.
Bendiab et al. [4] introduced the Map Reduced based Entity Resolution load balancing technique in networking which is based on large datasets. In this technique, two main tasks are done: Map task and Reduce task which the author has described.
Birattari et al. [5] proposed troubleshoot of load balance in Cloud computing using Stochastic Hill Climbing.
Buzato et al. [6] proposed Bee Life algorithm which was used for scheduling in Cloud computing. Bee Life algorithm is inspired by the behavior and reproduction of bee to find food source. The algorithm evaluated the performance of the resources and it has the aim to reduce time and complexity of work.
Babu et al. [7] proposed a Honey Bee Behavior inspired Load Balancing [HBB-LB] technique which helps to achieve even load balancing across virtual machine to maximize throughput. It considers the priority of task waiting in queue for execution in virtual machines.
Dorigo et al. [8] has proposed a load balancing technique called colony of cooperating agents in ants based on soft computing for solving the optimization problem. This technique solves the problem with high probability. It is a simple loop moving in direction of increasing value which is uphill. And this make minor change in to original assignment according to some criteria.
Deldari et al. [9] proposed a novel load balancing algorithm called VectorDot in intelligent ants. It handles the hierarchical complexity of the datacenter and multidimensionality of resource loads across servers, network switches, and storage in an agile data center that has integrated server and storage virtualization technologies.
Desai et al. [10] discusses about the emerging technology i.e. a new standard of large scale distributed computing and parallel computing. It provides shared resources, information or other resources as per clients' requirements at specific times. For better management of available good load balancing techniques are required. And through better load balancing in cloud, performance is increased and user gets better services. So in this author has discussed many different load balancing techniques used to solve the issue in cloud computing environment.
Elzeki et al. [11] discussed in Improved Max-Min Algorithm in Cloud Computing that focuses on the cloud computing which further deals with the allocation of the tasks to the resources while observing different parameters like waiting time, Average waiting time, Turn Around time, processing cost. So, an algorithm named as Max-Min in improved manner from load balancing has been shown to overcome such kinds of problems.
Fahringer et al. [12] introduced a static load balancing technique called Ant Colony Optimization. In this technique, an ant starts the movement as the request is initiated. This technique uses the Ants behavior to collect information of cloud node to assign task to the particular node. In this technique, once the request is initiated, the ant and the pheromone starts the forward movement in the pathway from the "head" node.
Fang et al. [13] discussed a two-level task scheduling mechanism based on load balancing to meet dynamic requirements of users and obtain high resource utilization. It achieves load balancing by first mapping tasks to virtual machines and then virtual machines to host resources thereby improving the task response time, resource utilization and overall performance of the cloud computing environment.

OBJECTIVES
The primary objectives of this research work are summarized as follows: • To study the performance of existing load balancing algorithm. • To propose a new efficient load balancing with clustering at both sides i.e client side and server side. • To apply the sorting mechanism on the clusters formed at the client as well as cloud side. • To reduce the overhead time of scanning the entire VM's in a cluster by arranging them in descending order. • To implement the concept of priority based execution depending upon client's cost. • To implement the proposed algorithm in cloudsim simulator. • To evaluate the performance of proposed algorithm with current algorithm.
The Cloud network consist multiple users input with their different requirements which needed to fulfill by efficiently utilizing the available resources. There are different ways to fulfill user's requirement (Like priority). The main objective of this research work is to answer the question: in identical cloud environments, which load balancing architecture: centralized, decentralized or hierarchical architecture will give the best results in terms of response time and server load To answer this question a robust evaluation framework is implemented which includes the following steps:  To balance the load equally among different VMs.  Fetch all the available virtual machines in the datacenter/host.  Retrieve the processing capacity of the available virtual machines.  Clustering at the cloudlet side is done on the basis of the user requirements in terms of cloudlet length and cost.  High, medium, low priority is assigned to tasks and priority is directly proportional to cost i.e. high is the priority, more cost will be charged.  Cluster to cluster assignment of cloudlets is done which reduces the time as compare when cloudlets were assigned one by one.  Descending order is applied at both side within clusters for maximizing the benefit.  The task is allocated with the help of load balancing algorithm.
In this strategy, current system state plays major role while making decisions. Despite the fact that dynamic load balancing has higher run rime complexity then static one, dynamic has better performance report as it considers current load of system for choosing next datacenter to serve the request. This will surely provide an optimal choice from available ones for that state of system. Workload in the cloud is regularly a multi-objective problem. In this thesis we will highlight and pay attention to some of these problem and possible solution, so as to obtain an optimal solution. We expect that every application comprises of a number of slightly parallel tasks. Every application has a strict fulfillment due time. Prior to this due time, all computational assignments in the application must be completely executed with the results conveyed to the client. Our current application model concentrates on random sort of workloads. With two different clients group one with higher resources accessing rights while other group has relatively lower resources accessing rights.

RESEARCH METHODOLOGY
Cloud services provide computing on demand in real time. Number of users accessing cloud environment are always more than that were using it on previous day. Cloud has application areas for developing applications, providing and managing infrastructure, patching applications. Users and their requests for accessing cloud infrastructure are highly dynamic and loading servers running in data center. We need efficient strategy to balance load on these servers so that the servers don't get crash and they can persist long. Precisely Objective is to achieve accuracy, performance of servers and the cloud environment can be maintained.

Steps:
1. Initialize the Cloud Sim in Java 2. Create the Datacenter with different number of hosts.
3. Each Host will have the different numbers of Virtual machines of different capacities. 4. Then we will create the Cloudlets of varying length and size. 5. The list containing the Virtual machines [18] and Cloudlets will be given to the Data Center Broker (DCB) 6. DCB will compute the processing capacity of all the virtual machines and will divide them into multiple clusters using K-Means clustering by using various parameters like bandwidth, memory and processing capability.
7. Data center broker will maintain the list of the cloudlets and will also divide them into multiple clusters using K-Means Clustering. Various parameters used over here are cloudlet length, priority of the cloudlet and the cost associated with it.
8. All the cloudlets and the virtual machines inside the clusters are sorted in descending order. 9. Dispatch our cloudlet to appropriate virtual machine in the cluster. Since all the cloudlets and virtual machines are sorted in descending order, so the cloudlet with higher instruction size and higher priority will be assigned to the virtual machine with higher resources.
10. Repeat the same procedure for all the remaining cloudlets in the list. foreach Cloudlet k in CL. find the Instruction length, priority/deadline and cost of k. end 3.
Start the K-Means Clustering and divide them into 3 clusters. 4 Assume Centroid A, B, C 5.
foreach   We need efficient strategy to balance load on these servers so that the servers don't get crash and they can persist long. Precisely Objective is to achieve accuracy, performance of servers and the cloud environment can be maintained. Various experiments have been conducted and the results of existing work and the proposed work have been mentioned in the Table 1 and Table 2 EXPERIMENTAL RESULTS

Processing Cost
It is obtained by addition of cost per storage, cost per memory and cost per memory.
Processing Cost = RT * unit_cost. where, RT = response time unit_cost = cost per unit time