Cloud based Cost Effective Resource Allocation Model for Software-as-a-service Deployments

The Cloud Computing (CC) provides access to the resources with usage based payments model. The application service providers can seamlessly scale the services. In CC infrastructure, a different number of virtual machine instances can be created depending on the application requirements. The capability to scale Software-as-a-Service (SaaS) application is very attractive to the providers because of the potential to scale application resources to up or down, the user only pay for the resources required. Even though the large-scale applications are deployed on cloud infrastructures on pay-per-use basis, the cost of idle resources (memory, CPU) is still charged to application providers. The issues of saturation and wastage of cloud resources are still unresolved. This paper attempts to propose the resource allocation models for SaaS applications deployments over CC platforms. The best balanced resource allocation model is proposed keeping in view cost and user requirements.


INTRODUCTION
The Cloud computing (CC) covers two main areas which include applications delivery over the Internet as a service and system software deployed in datacenters offering the services normally on pay-per-use basis pricing model [1]. With CC a variable number of virtual machine instances can be created depending on the application requirement and this is actually the elasticity feature of this computing technique [2]. The applications deployed on CC are known as Software-as-a-Service (SaaS). SaaS is a software delivery paradigm where the software is hosted off-premises, developed by service providers and delivered via Internet and the payment mode follows a subscription model [3]. For SaaS providers, having the power to scale up or down an application to only consume and pay for the resources that are required at that point in time is an attractive capability and if done correctly it will be less expensive than running on regular hardware from traditional hosting [1].
However, in spite of the advantages of using CC to create highly scalable applications, solving performance problems through CC is not a trivial decision if involved costs are analyzed [4]. For example, Amazon Web Services charges by the hour for the number of instances you occupy, even if your machine is idle. In 2008, the image-processing Animoto application deployed over Amazon EC2 infrastructure [5] experienced a demand surge that resulted in growing from 50 servers to 3500 servers in three days; after the peak subsided, traffic fell to a level that was well below the peak. Hence, scale-up elasticity was not a cost optimization strategy but an operational requirement, and scale-down elasticity allowed the steady-state expenditure to more closely match the steady-state workload. Indeed, Infrastructure-as-a-Service (IaaS) provider charge by 3500 virtual instances because a peak load occurred at a certain time frame and when this peak disappeared, it would pay for unused resources [4]. This effect is still a barrier for SaaS providers, whose applications have different peak loads and they are highly prone to suffer over and under provisioning of resources [6,7].
Over and underutilization of resources are problems that are presented because elasticity in pay-per-use cloud models has not been achieved yet [8]. An over provisioning effect happens by resource underutilization: even if peak loads are successfully anticipated, resources are unused during nonpeak times. Armbrust et al. [1] provide a calculation of this problem: ''A service has a predictable daily demand where the peak requires 500 servers at a peak usage but it requires only 100 servers most of the time. As long as the average utilization over a whole day is 300 servers, the actual utilization over the whole day is 300 × 24 = 7200 server-hours; but since we must provision to the peak of 500 servers, we will pay for 500 × 24 = 12 000 server-hours, a factor of 1.7 more than what is needed.'' Overutilization, which occurs when potential revenue from customers is lost by poor performance (saturation) and customers stop using the application permanently after experiencing poor service, resulting in a permanent loss of the revenue stream [1]. Unfortunately, while current cloud platforms allow for the instantiation of new virtual machines, their lack of agility fails to provide users with the full potential of a real elastic model. Furthermore, current cloud virtualization mechanisms do not provide cost-effective pay-per-use model for Software-as-a-Service (SaaS) applications and just-in-time scalability is not achieved by simply deploying SaaS applications to cloud plat-forms [9]. By imposing per-hour costs, CC encourages SaaS architects extra attention to efficiency (i.e., releasing and acquiring resources only when necessary) [1]. This is caused by the traditional approach that consists in scaling applications based on the number of users. As a result, with the current resource allocation models, SaaS providers will be charged for global resource usage without taking account of resources used by each tenant. Consequently, there exists the need to create a true elastic architecture to charge SaaS providers the actual resource usage [6]. To achieve costeffective SaaS scalability, a level of automation is necessary, which translates in a more intelligent environment. A SaaS platform and its applications should be aware of how tenants use its resources [10]. In this sense, SaaS applications have an opportunity to improve this scenario by their multi-tenancy, which is the ability to offer one single application instance to several clients/providers (tenants). With the use of CC approaches such as on-demand resource allocation through Simple Object Access Protocol (SOAP) interfaces, it is possible to efficiently create virtualized resources for SaaS applications which allows to allocate and charge only consumed resources in a tenant-based environment. Jul y 20, 2 0 1 3

Test bed platforms and architectures
The test bed deployment has two main components: a Java-based SaaS platform and a private cloud platform, separately configured. As first step, the SaaS platform is deployed over a cloud infrastructure. The SaaS platform is composed of several components that allow the deployment of application as services ( Fig. 1) [4]. Each component is integrated in an Apache Tomcat container as a Web application, a packaged library (.jar) or business services (Web application + Web Services) is defined a service application as the software application that will be delivered as a service. Each service manages its own resources, such as data sources, libraries, and views.
As Table 1 [11] outlines, the core components of the SaaS implementation are open source technologies. Fig. 1 shows a set of business components that are consumed by platform. These business components were designed, developed and deployed by following a Service Oriented Architecture (SOA) design in order to be completely decoupled from the SaaS platform [26]. Each business component is developed as a Web application, but it exposes a set of Web services through WSO2 framework1 which integrates web services deployed through Apache Axis2 and dependency injection with Spring2. Jul y 20, 2 0 1 3 Each business component application implements its own Web services and they are referenced in the application Context. xml Spring file.
SaaS platform provides the App Metering Service which allows automatic and non-intrusive support for metering applications, tenant-based monitoring and virtual machine resource status. This service uses Java Management Extension (JMX) technology to provide information on performance and resource consumption of applications running in the Java platform. It also uses SIGAR (System Information Gatherer And Reporter) API2 which provides a portable interface for gathering system information such as system memory, CPU loads and so on. App Metering Service application exposes a Web service interface that can be consumed by monitors or any other component that requests information about VM instance.

Overutilization (saturation)
For overutilization definition, the term ''point of exhaustion'' is used. For conventional load testing, the point of exhaustion is typically defined when a limiting resource (such as CPU, memory or storage) has reached 100% utilization [7]. In contrast, the point of exhaustion for CC can be defined as the maximum useful payload that could be placed on a single virtual machine without adversely affecting the throughput [12]. Saturation or overutilization occurs whenever resource utilization gets above the point of exhaustion. The former means that at least one virtual machine must be monitored on each physical tier of the service being tested. In some cases, as workload begins to escalate, so do operating systemlevel activities such as thread context switching, CPU consumption, virtual memory management and so on. For experimentation purposes, it must suffice to note that when resource utilization skyrockets, throughput (useful work) generally declines [13]. The SaaS platform uses the HTTP request throughput calculated by JMeter tool (explained later). This throughput value is calculated as requests/unit of time [14]. The time is calculated from the start of the first request call to the end of the last request call. This includes any intervals between requests, as it is supposed to represent the server's load. The formula is throughput = (number of requests)/(total time). Previous works [15] use the throughput to define an inflection point as the percentage of utilization that is achieved when the throughput starts to decline.
Identifying those inflection points is the key for developing an accurate measurement for resource overutilization. The data point of greatest interest in this trend is the one that corresponds to the point of maximum throughput. If superimposed, the throughput trend on the utilization trend is possible to highlight this critical turning point where throughput and utilization become inversely related [13]. By correlating the percentage of resource utilization at maximum throughput, it is possible to detect when resource utilization is saturated by the workload [15]. This measurement is used in combination with a Tomcat-based cluster in order to determine the underutilization within virtual machines. Inflection points are measured by each virtual machine within the cloud based Tomcat cluster.

Underutilization (resource wasting)
Resource underutilization occurs whenever some resources are not being used by virtual machines within a CC infrastructure and an application is being executed [16]. Resource underutilization can be measured by the amount of resources available to be used by potential virtual machines and applications. In this sense, work [17] is used to state that when resource utilization (CPU, memory or storage) of a single VM(original) can be allocated in another VM(destination) without exceeding the maximum quantity allowed for such resource, then resource of the original VM is being wasted (underutilization) [4]. Fig. 2 [14] depicts a scenario where the used heap memory is measured within four virtual machines. According to Fig. 2, there are at least two VM instances that can be released by reallocating their resources (VM1 and VM2 resource utilization can be allocated in VM3). In this research, the number of underused resources is obtained Jul y 20, 2 0 1 3 through a knapsack approach3 [18] by calculating the combination of VM instances that can be allocated by another single VM, according to the measured resource (CPU or heap memory) .
As it is not the aim of this work to detail or solve the knapsack problem, this work uses a simple tree-based Java program to calculate the allocation. Each VM is evaluated against the rest at a certain point in time. An algorithm has been developed, which takes the weights of the knapsack as the used resources in a given measurement. The values or profits are taken from the available quantity of such resources (maximum allowed minus used) of the rest of VM instances [8,42]. By doing these weight and value vector assignations, the knapsack implementation returns the maximum number of VM instances that can be released by maximizing resource availability and gives low weight to VM instances with low usage.

Generating workload
The Apache JMeter tool was useded for creating workload to the Tomcat cluster installed in a cloud environment and where the SaaS platform has been deployed. Apache JMeter is an open source software Java desktop application designed to load test functional behavior and measure performance [14]. It can be used to simulate a heavy concurrent load on a J2EE application and to analyze overall performance under various load types, it also allows graphical analysis of performance metrics (e.g. throughput, response time) [5]. Simulating concurrent users by JMeter can be employed in an independent computer; they can also be employed in a distributed testing framework [15]. For this work, JMeter will be configured to create distributed requests to simulate workload usage. Fig. 3 [19] depicts the test bed architecture that is used in this work. The distributed SaaS platform setup that was explained in the previous section will be stressed with several requests through different hosts running JMeter tests. The concept of Resource Consumption State (RCS) is used to define the state of the CPU and memory resources used by the Tomcat servers. Through a similar mechanism proposed by [6], while the JMeter machines run the tests, an RCS Monitor will be collecting information about the resource utilization through Web services calls to the App Metering Service (explained earlier) by accessing to the resource status of each virtual machine. The amount of concurrent Tomcat users will depend on the server hardware (processors, memory), the types of resources being used within applications and what the applications are actually doing [17]. In Tomcat version 6.0 or newer, as used in the SaaS platform, a mechanism to configure the number of threads the Tomcat supports is via the maxThreads attribute of the Executor element in XML configuration files. The default setting for this attribute is 200, which should be enough to get most applications started, and according to [11], is enough to support at least a thousand simultaneous users. As in this research will be used small instances of virtual machines, it is established that each Tomcat server can handle 100 users as top [14]. Assuming different behaviors during twelve months, as presented in [8], different types of workload peaks are stated for SaaS requirements [1,15]. Jul y 20, 2 0 1 3 Two types of workload generation are as follows [14,18]: -Incremental. For each time period, workload starts from the peak of the previous time period and increases until reaching the maximum peak of established users at the end of current period (see solid line in Fig. 4 [12]).
-Peak-based. For each time period, workload starts from zero users and increments until reaching the maximum peak of users at the middle of the period. Then, the workload starts to decrease until zero at the end of the period (see dotted line in Fig. 4).

Test bed results
The test plans results are elaborated as follows. As explained, RCS Monitor gathered information every 10 s resulting in a total of 4320 measurements spanned in 720 min (30 days of simulated time, 1 month simulated). The following paragraphs present the results of such metrics according to the definition of RCS and each workload behavior. Underutilization and overutilization were metered by using the mechanisms explained before. For a certain simulated month, the underutilization will be the sum of the total wasted virtual machines calculated in all measurements. In the same way, the overutilization will represent the sum of all inflection points detected in the measurements of such simulated month.  [4] shows a chart of the throughput measurement results during the incremental (top screen) and peak-load (bottom screen) workload simulation. In order to calculate the throughput, JMeter tool generates a set of HTTP Samples during test execution and evaluates the requests per minute that the Tomcat-cluster can process. As shown in Fig. 5, the behavior of throughput during simulation changes over time and it shows some declinations in the efficiency of the Tomcat cluster. Table 3 shows the results of the measurements during both incremental and peak-based workload tests. The column labeled as Combined outlines the number of measurements where both CPU and memory are either saturated or underutilized. Last two columns calculate a percentage value by adapting formulas presented in [20]:  (2)) is the percentage of the total measurements performed that have inflection points. This value is calculated by dividing the combined overutilization Combined OU by the total of measurements performed which is obtained by multiplying the number of measurements per month (4320 in the tests) with the number of available virtual machines. It can be observed that a total of 51 840 server-hours were provided for the whole time the SaaS platform was running over the cloud infrastructure.

RESULTS AND DISCUSSION
To address under and over utilization issues, this work recommends a model for allocating workload when deploying SaaS platform and its applications over cloud infrastructures. This model comprises three approaches that take advantage of the multitenancy nature of SaaS applications in order to improve workload distribution and instantiate the number of cloud resources that are really needed. The first approach is tenant-based isolation which creates tenant-level granularity and separates execution contexts for different tenants; isolation implementation is divided into tenant-based persistence and tenant-based authentication. The second approach is tenant-based VM allocation which implements mechanisms to calculate the number of VM instances needed, given a set of tenants and their weights in terms of active users. The third approach is tenant-based load balancing, which implements a distribution mechanism to process and dispatch workload requests concerning each tenant.

VM allocation
In order to describe the topology and characteristics of the deployed cloud and server cluster, a profile-based approach, proposed by [9], is implemented. For example, the profile of test bed in this work is as follows: it uses small virtual machine types (1 CPU core, 1 GB of memory) and 800 MB to the Java heap memory for running Tomcat instances. Also, it was set in Tomcat configuration that each server can handle 100 users by setting its maxThreads configuration attribute to 200. Other server deployments can represent different profiles depending on the application provider needs. Based on the number of Tenant Context objects, their number of current active users and the profile of the deployed cluster, a number of virtual machines is calculated to support the current workload. Each tenant has its own resource requirements depending on the number of its active users and the applications that are been accessed. In order to assign values for VM allocation, each tenant context is given a weight that is calculated by the Tenant Context Manager component as showed in formula (3) Tenant Contextweight = Active users * (heap size / maxThreads). ( Java memory heap size assigned to Tomcat is used as a profile parameter for the calculation. Active users are those whose session timeout has not expired. This number is multiplied by the average memory size per thread. The VM capacity is determined subtracting the amount of memory used by the platform from the total Java memory heap size. Java Management Extension (JMX) implementation of the App Metering Service is used to calculate the number of Jul y 20, 2 0 1 3 threads used by Tomcat server plus the platform. By using formulas (3) and (4), a Tenant-BasedVMCalculator component performs calculations to obtain certain number of VM instances. As most resource allocation problems, VM instances allocation problem is related to knapsack problem [8]. The problem to solve is to calculate the minimum number of virtual machine instances with specific and 282 J. Espadas et al. / Future Generation Computer Systems 29 (2013) 273-286 homogeneous capacity (same VM type) in order to allocate a set of tenant context weights. This type of allocation problem is known as multi-objective optimization (MO) problem [22]. The allocation problem can be expressed as showed in formula (5) (adapted from [8,22]): The goal of formula (5) is not to obtain an assignation vector as traditional allocation mechanisms do, but to determine the minimum number of VM instances that are needed to allocate the entire weight vector given homogeneous VM capacity.
In order to solve this calculation, authors propose a simplistic iterative algorithm by using the same tree-based Java library presented in Section 4.2 for solving simple knapsack allocations. Tenant based allocation uses a vector of tenant context weights retrieved from Tenant Context Manager. The values to maximize are the same than weights in such a way that knapsack function allocates the maximum number of tenant context weights and maximizes resource usage. The first iteration of the proposed algorithm will allocate as many weights as it can within an initial VM. The remaining weights that could not be allocated in the first iteration will be used for a second iteration. Iterations continue until the remaining tenant context weights vector has a length of zero. After setting up the tenant-based components and deploying them over the test bed, all the simulation and tests were run again. Over and underutilization measurements where performed gainst workload tests in traditional load balancing (incremental and peak-based). Similar to Tables 3 and 4 shows the results of combined percentages. A main difference among results is the measurement of server-hours given by the number ofVMinstances that were created through tenantbased demand.
In Table  4, the server-hours were reduced from 51 840 to 35 402, a reduction of 32%. Also, it can be observed that averages of over and underutilization has been reduced, but in order to demonstrate a statistical significance improvement of previous averages (Table 3, known as control group) against new values in Table 4 (experimental group), a t-student test is carried out. The t-student test allows to determine if two averages are significantly different, in this case [13], if averages of Table 4 are statistically less than those in Table 3. With N as a number of months (samples), we have (N1 + N2 − 2) = (12 + 12 − 2) = 22 degrees of freedom and we set an accuracy of 99.5% (α = 0.005 of significance), t-student distribution table produces a value of tα = 2.8188 as base parameter. Next step is to calculate t-student values for each corresponding column pair (for example, underutilization for incremental workload of Tables 3 and 4). t-student test dictates that if calculated value is greater than tα parameter, we can say with a 99.5% of certainty that second column (Table 4) is statistically less than first column ( Table 3). The calculation for tstudent test is represented in formula (6) [21].
where X1 is the average and S1 the standard deviation of columns of Table 4 results. X2 and S2 is average and standard deviation, respectively, when tenant-based components are used ( Table 4). The last row of Table 4 shows the calculated t values. For example, taking the columns of the underutilization (UU) during incremental workload of both tables, the calculated t-student value is 3.2437. This value is greater than 2.8188 parameter, meaning that averages for underutilization before tenant-based components have been statistically improved. The t-student value for underutilization (UU) during peak-based is higher than tα (4.7208 > 2.8188) and we can say that this behavior was, statistically speaking,