Hybrid Scheduling Scheme for Real Time Systems

Systems as asymmetric multiprocessor platforms are considered power-efficient multiprocessor architectures, efficient task partitioning (assignment) and play a crucial role in achieving more energy efficiency at these multiprocessor platforms. This paper addresses the problem of energy-aware static partitioning of periodic real time tasks on heterogeneous multiprocessor platforms. A hybrid approach of Particle Swarm Optimization variant and priority assignment based Min-Min algorithm for task partitioning is proposed. The proposed approach aims to minimize the overall energy consumption, meanwhile avoid deadline violations. An energy-aware cost function is proposed to be considered in the proposed approach. Extensive simulated experiments and comparisons with related approaches are conducted in order to validate the effectiveness of the proposed technique. The achieved results demonstrate that the proposed partitioning scheme significantly outperforms in terms of the number of executed iterations to accomplish a specific task in addition to the energy savings.


INTRODUCTION
Huge Applications can cost-effectively utilize the underlying parallelism on the available distributed resources by partitioning the application into multiple independent tasks. The problem is generally addressed in terms of task scheduling, where tasks are the schedulable units of an application, and resources are a network of processors. The scheduling of a certain number of tasks to the parallel processors is critical for achieving high performance in distributed systems, especially for high performance computing [1].
Nowadays, embedded systems are involved in most details of our life such as smart phones, pocket PCs, Personal Digital Assistants (PDAs), multimedia devices, ..., etc., Awadalla, et al., [2], Awadalla [3]. As the applications on these devices are being complicated, there is a need to increase the performance while keeping the energy consumption of these devices in accepted levels especially for the portable battery-powered ones. So, minimizing energy consumption to prolong the battery life while achieving higher performance is a critical issue in the design of portable embedded systems. As the processor is one of the most important power consumers in any computing system, today's chip multiprocessor (CMP) or multiprocessor system on chip (MPSoC) platforms can deliver a higher performance at the cost of lower power consumption than uniprocessor systems [4][5]. Embedded systems today are often implemented upon platforms comprised of different kinds of processing units, such as CPU's, DSP chips, graphics co-processors, math co-processors, etc., with each kind of processing unit specialized to perform a different function most efficiently. Such platforms are commonly referred to as heterogeneous platforms [6][7]. TI's OMAP™ [8] mobile processors are good example of these heterogeneous platforms. The multiprocessor scheduling of recurrent real-time tasks can be generally carried out under the partitioned scheme or under the global scheme. In the partitioned scheme, the tasks are statically partitioned among the processors and all instances (jobs) of a task are executed on the same processor and no job is permitted to migrate among processors. In the global scheme, a task can migrate from one processor to another during the execution of different jobs. Furthermore, an individual job of a task that is preempted from some processor, may resume execution in a different processor. Nevertheless, in both schemes, parallelism is prohibited, i.e., no job of any task can be executed at the same time on more than one processor. This paper considers the partitioned scheduling scheme. The main advantage of the partitioned scheduling is that after partitioning the tasks among processors, the multiprocessor scheduling problem is reduced to a set of traditional uniprocessor ones.
The problem of partitioning tasks among processors, sometimes, referred to as Task Assignment Problem (TAP) is an intractable NP-Hard problem even if the processors are homogeneous [9]. So, approximation algorithms and heuristic techniques are used to solve this problem. This paper proposes a modified Particle Swarm Optimization (PSO) variant based on Min-min technique and priority assignment algorithm for energy-aware task partitioning on heterogeneous multiprocessor platforms. The rest of this paper is organized as follows: Section 2 reviews existing research on task partitioning upon heterogeneous platforms and related areas. Section 3 defines the problem and describes task, processor, and power models used in this paper. Section 4 presents PSO, Min-Min and priority assignment technique for I S S N 2 2 7 7 -3061 V o l u m e 15 N u m b e r 6 I n t e r n a t i o n a l j o u r n a l o f C o m p u t e r s a n d T e c h n o l o g y 6839 | P a g e c o u n c i l f o r I n n o v a t i v e R e s e a r c h A p r i l , 2 0 1 6 w w w . c i r w o r l d . c o m task partitioning and introduces our proposed approach. Section 5 presents simulation results for the proposed algorithm and discusses these results. Section 6 summarizes our conclusions.

RELATED WORK
Multiprocessors have become powerful computing means for running real-time applications and their high performance depends greatly on parallel and distributed network environment system. Consequently, several methods have been developed to optimally tackle the multiprocessor task scheduling problem which is called NP-hard problem. To address this issue, this research presents two approaches. Baruah [10] proved that task partitioning among heterogeneous multiprocessors is intractable (strongly NP hard), represented the problem as an equivalent Integer Linear Programming (ILP) problem, and designed a 2-step approximation algorithm for solving this problem. The idea of LP relaxations to ILP problems is used in the first step to map most tasks, while in the second step the algorithm maps the remaining tasks using exhaustive enumeration. This two-step algorithm takes time polynomial in the number of tasks, and exponential in the number of processors. Baruah [10] used tree partitioning in the second step instead of exhaustive enumeration to make the algorithm takes time polynomial in the number of tasks, and polynomial in the number of processors. In [11], Braun et. al. compared 11 heuristics for mapping a set of independent tasks onto heterogeneous distributed computing systems. The best one that has minimum makespan, that is defined as the maximum completion time for the whole processors, was the Genetic Algorithm (GA) followed by Min-min algorithm. Chen and Cheng [12] applied the Ant Colony Optimization (ACO) algorithm. They proved that ACO outperforms both GA and LP-based approaches in terms of obtaining feasible solutions as well as processing time. Kang, et al., [1] presented a PSO-based hybrid algorithm to schedule the tasks represented by a Directed Acyclic Graph (DAG) to a bounded number of heterogeneous processors such that its schedule length is optimized. The algorithm first generates feasible initial solutions by using some effective list scheduling strategy and then evolve the solution by using crossover and mutation operator. Abdelhalim [9] presented a modified algorithm based on the Particle Swarm Optimization (PSO) for solving this problem and showed that his approach outperforms the major existing methods such as GA and ACO methods. Then, his PSO approach is developed to optimize the solution to reduce the energy consumption by minimizing average utilization of processors (without using any energy or power model). Finally, a tradeoff between minimizing the design makespan as well as energy consumption is obtained. Visalakshi and Sivanandam [13] presented a hybrid PSO method for solving the task assignment problem. Their algorithm has been developed to dynamically schedule heterogeneous tasks onto heterogeneous processors in a distributed setup. It considers load balancing and handles independent non-preemptive tasks. The hybrid PSO yields a better result than the normal PSO when applied to the task assignment problem. The results are also compared with GA. The results infer that the PSO performs better than the GA. Omidi and Rahmani [14] used PSO for task scheduling in multiprocessor systems as an important step for efficient utilization of resources. They considered independent tasks on homogeneous multiprocessor systems. Apart from all these efforts, this paper integrates the PSO approach with a polynomial-time partitioning techniques; Min-Min and priority assignment. The proposed approach takes into account energy efficiency during task partitioning among heterogeneous cores in MPSoCs.

SYSTEM MODEL
This paper considers the problem of power-aware task partitioning on heterogeneous multiprocessor platforms. So, models of task, processor, and power are presented.

Task Model
A periodic real-time task τi generates an infinite sequence of task instances (jobs). Each job executes for C time units at most, be generated every T time unit, and has relative deadline D time units after its arrival.
This paper considers a periodic task set {τ1, τ2, …, τn} of n independent real-time tasks. A task is τi represented as 3-tuple (Cij, Di, Ti) where Cij is the Worst-Case Execution Time (WCET) of task τi on processor j, D is the relative deadline, and T is the period. Implicit deadlines are considered in this paper, i.e., the relative deadline is assumed to be the same as the period. Each task τi has a utilization Uij= Cij /Tij on processor j. An n x m utilization matrix, Dawei and Wu [6], can be defined where each row represents a task and each column represents a processor.

Processor Model
A heterogeneous multiprocessor platform with m preemptive processors based on CMOS technology is defined as {P1, P2…, Pm}. This paper considers Dynamic Voltage/Frequency Scaling (DVFS) processors that supports variable frequency (speed) and voltage levels continuously, i.e., DVFS processors can operate at any speed/voltage in its range (ideal). Of course, practical DVFS processors supports discrete speed/voltage levels (non-ideal). So, the desired speed/voltage of the ideal DVFS processor is rounded to the nearest higher speed/voltage level of the practical DVFS processor supports. The time (energy) required to change the processor speed is very small compared to that required to complete a task. It is assumed that the speed/voltage change overhead, similar to the context switch overhead, is incorporated in the task execution time. In this work, it is assumed that the processor's maximum speed (frequency) is 1 and all other speeds are normalized with respect to the maximum speed. When MPSoCs platforms are considered, there are per-core and full-chip DVFS techniques, Kong et. Al., [7]. In the per-core DVFS, each core operates at individual frequency/voltage, and has no operating frequency constraint. On the other hand, the practical full-chip DVFS designs restrict that all the cores in one chip operate at the same clock frequency/voltage. For each processor, the tasks are

Power Model
The power consumption in CMOS circuits has two main components: dynamic and static power. The dynamic power consumption which arises due to switching activity can be represented as in [4]: Where Ceff is the effective switching capacitance, Vdd is the supply voltage, and F is the processor clock frequency (speed) which can be expressed in terms of a constant k, supply voltage Vdd and threshold voltage Vth as follows: The static power consumption is primarily occurred due to leakage currents (Ileak) [7], and the static (leakage) power (Pleak) can be expressed as: When the processor is idle, a major portion of the power consumption comes from the leakage. Currently, leakage power is rapidly becoming the dominant source of power consumption in circuits and persists whether a computer is active or idle, Koufaty [15]. So, lowering supply voltage is one of the most effective ways to reduce both dynamic and leakage power consumption. As a result, it reduces energy consumption where the energy consumption is the power dissipated over time. For simplicity reasons, Eq. (1) is reduced to a simplified power model P = F 3 using normalized values where F is the processor speed (frequency). Then, a simplified energy model E= F 2 (using normalized values) can be used.

THE PROPOSED APPROACH
Before introducing our proposed approach in this paper, a background on PSO and priority assignment and Min-Min techniques will be presented.

PSO
Particle swarm optimization (PSO) has been successfully used to optimize nonlinear functions, combinatorial optimization problems and multi-objective problems because of its simplicity, flexibility, easy operation, and fast convergence. It is an optimization technique stimulating social behavior of the flying birds and their methods of information exchange. PSO algorithm improves the search efficiency by using the evolutionary computation which combining local best solution (local search) and global best solution (global search) together. PSO has been presented as an optimization technique in job shop problem, [16]. In this paper, we use the PSO strategy to solve the task scheduling problem. In PSO, each individual in the initial solutions called a flying particle whose velocity is dynamically changed according to the flying records of its local and its neighbors global. During the past few years, several models of PSO algorithm have been explored by researchers Bou, et al., [17]. Higashino et al., [18] developed the PSO algorithm simulating the behavior of swarms in the nature, such as birds, fish, etc. In PSO, the potential solutions, called particles, fly through the problem space by following the current optimum particles. PSO has been successfully applied in many scientific areas and there are many variants of the algorithm. A survey of PSO methods and applications could be found in Chou (2014). At the beginning, a set (swarm) of random solutions (particles) is used to initialize the PSO algorithm that starts iterations looking for optimal solution. During every iteration, each particle is updated by two best values. The first one is the personal best pbest that the particle has achieved so far. The second is the global best gbest obtained by any particle in the swarm. After finding the two best values, the particle updates its velocity and position according to equations (4) and (5) respectively. The typical procedure of PSO is shown in figure 1.
Initialize the population randomly.

DO {
For each particle.

{ Calculate fitness value
If the fitness value is better than the best fitness value (pbest) in history then set current value as the new pbest.
} Choose the particle with the best fitness value of all particles as the gbest.
For each particle. Xnew= Xold + Vnew } } Until termination criterion is met. Fig. 1 The typical procedure of PSO The random numbers R1 and R2 are generated uniformly between 0 and 1 and the constants C1 (self-knowledge factor) and C2 (social-knowledge factor) are usually in the range from 1.5 to 2.5. Finally, the inertia factor W can be fixed or varied with a decreasing value as the algorithm proceeds, Omidi and Rahmani [13] or it may be restarted as in Abdelhalim [9]. PSO has been applied to solve the problem of task partitioning for homogeneous multiprocessor as in [13] and also for heterogeneous multiprocessors, Abdelhalim [9] and, Visalakshi, and Sivanandam [14]. Considering a system consisting of m processors and n tasks. A possible solution (particle) is a vector of n elements, where each element is associated to a given task. Each element takes an integer value i where 1 ≤ i ≤ m and represents the processor that the task is assigned to. Thus, the search space size is mn. There are k particles in the swarm that form swarm (population) size; these particles are initialized randomly.

Min-Min and Priority Assignment Algorithm
Min-Min is a simple and fast algorithm capable of good performance. The Min-Min algorithm is designed, Braun [11], for mapping tasks in heterogeneous computing systems. It first finds the minimum completion time of all unmapped tasks, where the completion time of a task on a machine equals task's execution time on that machine plus execution times of all tasks mapped to that machine. Next, the task which has minimum completion time is selected, similar technique called Max-Min selects the task with maximum completion time, and mapped to the machine. Finally, the newly mapped task is removed and the process repeats until all tasks are mapped. There are some limitations of Min-Min algorithm, it chooses smaller tasks first which makes use of resource with high computational power. As a result, the schedule produced by Min-Min is not optimal when number of smaller tasks exceeds the large ones. Also, one resource can execute only one job at a time and Size and number of resources are static and should be known in prior. [7].
To handle real-time tasks on multiprocessor system, task utilization is considered instead of execution time and completion utilization is used. Of course, tasks that make the processor's utilization exceeds 1 are unaccepted. If there is no accepted alternative, then the task set is unfeasible. To overcome the limitations of Min-Min algorithm, PSO algorithm will be used as a hybrid approach cascaded with Min-Min algorithm. To speed up the process of Min-Min algorithm, the priorities of the tasks will first be determined from Directed Acyclic Graph, DAG, and then assigned to the tasks in such way that the important task will be assigned to the processor that eventually leads to a better scheduling. After achieving the priorities of the tasks and sort them in all different possibilities to execute, the Min-Min algorithm will be invoked to find the minimum completion time of the them and finally, PSO algorithm will start based on the achieved outcome as initial particles to optimize the scheduling process. Directed Acyclic Graph (DAG) can represent applications executed within multiprocessor systems. A DAG G = (V, E), consists of a set of vertices V representing the tasks to be executed and a set of directed edges E representing communication dependencies among tasks as shown in figure 2. The edge set E contains directed edges eij for each task Ti  V that task Tj  V depends on. The computation weight of a task is represented by the number of CPU clock cycles to execute the task. Given an edge eij, Ti is called the immediate predecessor of Tj and Tj is called the immediate successor of Ti. An immediate successor Tj depends on its immediate predecessors such that Tj cannot start execution before it receives results from all of its immediate predecessors. A task without immediate predecessors is called an entry-task and a task without immediate successors is called an exit-task.
In this paper, real-time tasks are considered. Each task is characterized by the following parameters:  I n t e r n a t i o n a l j o u r n a l o f C o m p u t e r s a n d T e c h n o l o  The algorithm starts by assigning levels for the tasks (the root task has level 0). The level of a task graph is defined as: This Level function indirectly conveys precedence relations between the tasks. If the task Ti is an ancestor of task Tj, then Level (Ti) < Level (Tj). If there is no path between the two tasks, then there is no precedence relation between them and the order of their execution can be arbitrary.
Secondly, the sequence of tasks' execution in each level is determined. For the root level (T1, T2) shown in Figure 3, if there is only one parent task, then it comes first. If there is more than one parent task, the number of children for each parent in the next level is calculated and their parent has got a priority according to that number in a descending order.
The parent with the highest number of children comes first (T1 executed before T2). If two or more parents have the same number of children (T3, T4 and T5) then the parent that has a common child is to be executed first (T4 and T5 will be executed before T3). When two parents have the same common child, they will be listed in an arbitrary order (T4 and T5).
For example, in the random generated task graph shown in figure 3, the level for each task can be determined.  I n t e r n a t i o n a l j o u r n a l o f C o m p u t e r s a n d T e c h n o l o

An illustrative example
For the random generated DAG with nine tasks with their exection time and inter-process communication shown in figure  3 and based on the priority assignment algorithm, Table 1 shows the possibilities of tasks orders based on their priorities, where RA is random order. Min-Min algorithm will be invoked to determine which of these possibilities will have the minimum execution time and feeds it to PSO approach to start its optimization capability to enhance the system performance in terms of minimizing the total execution time and power saving. Table 1 All task order possibilities based on their priorities

The Proposed Hybrid Approach
The hybrid approach proposed in this paper, simply modifies the initialization step in the PSO procedure by assigning priorities for each task and then incorporating a Min-Min solution (particle) in the randomly generated population. This approach gives the PSO algorithm a push to start from a good solution and then the PSO goes on trying to optimize the solution resulting in the Min-Min solution in the worst case, PSO in the illustrated example starts with task order 6 or 7 as shown in table 1 because they have the minimum execution time. A cost function favoring makespan (maximum processor accumulative utilization) minimization is proposed. Then, a penalty is added to the infeasible solutions that exceed the processing capacity of any processor. In other words, the cost is represented as follows (Chen and Thiele, [20]): Cost = Max(Uj) + Penalty for j = 1,2, …, m Penalty = Sum (Uj )> 1 for j = 1,2, ... , m Next, the cost function is developed to incorporate energy where the proposed PSO approach tries to find energy efficient solutions. Aydin and Yang [21] considered energy aware task partitioning for homogeneous multiprocessors and introduced some helpful proofed theorems and propositions. Some of them are presented here.
Proposition 1: For a single processor system and a set of periodic real-time tasks with total utilization U ≤ 1. The optimal speed to minimize the total energy consumption while meeting all the deadlines is constant and equal to total utilization, Aydin and Yang [21]. Proposition 2: A task assignment that evenly divides the total load among all the processors, if it exists, will minimize the total energy consumption for any number of tasks. So, minimizing the make span will minimize energy consumption especially when full-chip DVFS multiprocessor platforms are considered, the make span cost function, Eq. (6), will be used as all processors on the chip have to operate at the same frequency which is the maximum processor utilization [21]. On the other hand, if per-core DVFS multiprocessor platforms are assumed, an energy-aware cost function needs to be proposed. An energy-aware cost function depends on average utilization of processors, but it does not give an accurate measure for energy consumption. Then, a tradeoff between average and maximum utilization is introduced. This paper introduces an energy-aware cost function considering simplified energy model as follows: Cost = Sum(Uj) 2 / m + Penalty for j 1,2, …, m When applying PSO, the parameters used are the swarm size k = 100, number of iterations=100, C1 = C2 = 2 [5], and the inertia W = 1 that according to the PSO variant used, may be fixed or may decrease linearly until reaching 0 or it may be then restarted (re-excited) to 1 to decrease linearly again.

EXPERIMENTS and DISCUSSION
The approaches have been implemented using MATLABTM. Utilization matrices have been uniformly generated of light tasks with utilization ranges from 0.05 to 0.25 and medium tasks with utilization ranges from 0.25 to 0. 5   The proposed approach gives the PSO algorithm a push toward the best solution using a particle (solution) obtained by Min-Min. This makes PSO gives better results with reasonable number of iterations. In the worst case, our proposed approach gives Min-min performance if it could not optimize the solution. Figures 6 and 7 show the performance of our proposed approach with 100 iterations and light tasks assigned to 4 and 10 cores respectively. It is obvious that our proposed approach behaves so better when the search space grows.
When medium tasks are used, the proposed approach behaves the same way and shows better performance especially with large search spaces. Figures 8 and 9 show the case when medium tasks are partitioned on 8 and 16 processors respectively.
As mentioned earlier, when full-chip DVFS is considered the makespan cost function is used. If per-core DVFS is considered, the introduced energy-aware cost function, Eq. (8), is taken into account. Figures 10 and 11 show the case of partitioning light tasks on per-core DVFS platforms of 4 and 10 cores respectively. It is clear that using makespan cost function, Eq. (6), increases the feasibility (schedulability) of the task set more than using Eq. (8) as a cost function which is more energy efficient.
It is worth to be noted that another Max-Min particle (solution), in addition to Min-Min particle, may be added to the population in the initialization step when the task set nature requires that, i.e., when Max-Min gives better solutions than Min-Min. This occurs when task utilizations are diverse, e.g., when there is a long task in a short-task task set.

CONCLUSIONS
This paper considered the problem of power-aware task partitioning on heterogeneous multiprocessor platforms. The paper proposed a hybrid approach of PSO variant and priority assignment Min-Min algorithm that outperformed its counterparts in less number of iterations for the same problem instance. Also, the energy-aware cost function is addressed in this paper and it differentiated between the full-chip and per-core DVFS processors. As a future work, any verified polynomial-time partitioning technique can be added as a particle to the population in the initialization step to give the PSO algorithm a forward push to get better solutions.