Improving Friends Matching in Social Networks Using Graph Coloring

Recently, most of people have their own profiles in different social networks. Usually, their profiles have some brief description about their personnel picture, family members, home town, career, date of birth etc. which indicate other people know some general information about others. In social networks, usually friends recommendation is done by finding the most mutual friends and suggest them to be friends. In this paper, we will introduce an algorithm, with a linear time complexity, that helps people to get not only good friends but also have same characteristics


I. Introduction
In social networks, graphs can be used to represent the social network. A graph G can be defined as G = [V,E], where V is the set of vertices in the graph and E is the set of edges. Each edge e ∈ E connects two distinct vertices in the graph, Cormen et al. [7], [11]. Graphs can be used to solve many kinds of problems; one of these problems is the social networks problems. Obviously, each vertex v ∈ V will present a person profile and each edge connects two vertices if there is a friendship between the two people presented by the vertices.
The graph coloring problem is such a graph labeling that we use to color graph elements under some constraints with minimal colors. It colors graph vertices such that no two adjacent vertices hold the same color, which we call it vertex coloring. It can be inferred that if two vertices have the same color, then they have no relation that connects them together. Moreover, edges can be colored by assigning a color for each edge that no two adjacent edges have the same color. Graph coloring can be used in different fields such as biochemistry, electrical engineering and computer science, Rossi, Rayan A, and Nesreen K. Ahmad [20]. It can be used specifically in scheduling, resource allocation and timetabling, Cormen et al. [7]. Chalupa (2011) [5], applied coloring algorithm with a proper heuristic in social networks in order to define the G graph and its complementary G ̅ , where G = [V,E] represents the social network graph, every v1,v2 ∈ V relates to two distinct people and every e ∈ E is an edge, which implies that v1 knows v2. On the other hand, G ̅ = [V ̅ ,E ̅ ] defines a complementary graph that every v ̅ 1,v ̅ 2 ∈ V ̅ relates to two distinct people and every e ̅ ∈ E ̅ is an edge that inform v ̅ 1 does not know v ̅ 2. Essentially, the social network will have sub-groups V and V ̅ , where V will hold the list of friends who are in the friendship list and the complementary group V ̅ will hold the list of people who do not have friendship among each other. In this paper, we will concentrate on the second group V ̅ to find new friends for people whose identifications are matching as much as others with similar identifications. This will help people to get the most proper friends. Other coloring algorithms can be added such as those introduced in [10][2] [4].
Graph coloring is not an easy job and it can be considered as a NP problem. In [18], they presented an algorithm, which can find the Maximum Independent Set (MIS) in large sizes graphs. They apply their algorithm Finding Maximum Independent Set (FMIS) over several graphs of orders: 20, 30, …, 2000. Sharieh and his colleagues in [18] claimed that the complexity of FMIS can be considered as O(1.0052n) for a graph of n nodes which is better than other algorithms.
On the other hand, in [2], finding the MIS can be done based on the weights of the nodes, while the weight of the node depends on the degree of the node. In [2], Al-Jaber and Sharieh presented an algorithm for finding MIS according to weights of the nodes with O(N2) complexity in the worst case of a graph of N nodes.
Finding the MIS can be very helpful in clustering, which is based on partitioning a large number of entities into a fixed number of groups based on different number of attributes [17], such as assigning students to proper groups as authors presented in [16], [12]. Rossi & Musolesi in 2014 [19], demonstrated different techniques for identifying the users social networks from their check-in data. Actually, they presented three strategies. The first, spatio-temporal trajectory relates to the time they check- in. The second, using the frequency they visit the same place, respectively. The third strategy was a hybrid between previous two strategies. Some of datasets they created enabled them to classify more than 80% of the users correctly.
In [15], author presented a Artificial neuro fuzzy logic system for detecting human emotions, which can be used in social networks to detect the user's emotion during he/she is checking out his/her profile of timeline which can be used to match friends emotions in check-in places. Although it will be as another indicator of the friends matching, but for simplicity in this research we will use the three identifications presented in [19].
Dunbar, R. [9] claimed that there is a limited number of friends that can be handled by a human being, so finding a good friend who holds the same characteristics is not an easy job. Obviously, it requires too much time to know your partner's characteristics such that what he/she likes or dislikes, where he/she likes to go, and why. All of these questions can be used when you are searching for a new good friend who matches your personality issues or when a person asks you for creating friendship. On the other hand, when people applying for a new job it is a good chance for the managers to get a brief description about the candidates before calling them to the interviews even if they do not know each other.
In this paper, we will present a new algorithm that will be helpful in finding friends whose characteristics are matching more than the people you may know and exist in your friendship list. This algorithm can be used also for finding the proper people for the proper position in a specific company. When a manager knows the characteristics of this person, he/she will be able to decide which one will be the best candidate to be chosen.
The rest of this paper is organized as follows . Section II presents Literature Review. Section III present the proposed algorithm. Section IV presents the algorithm analysis. Finally, the conclusion is presented in Section V.

II. Literature Review
In [14], Malkawi et al. used coloring algorithm to present a new exam scheduling. The exam scheduling algorithm considered as an NP-hard problem that could not be solved in polynomial time, also it has many strict constraints, such as a student should not have 2 exams at the same day. Another one, is to reduce the gaps between students exams. In [14], authors did their best to achieve accuracy, fairness and optimal time period. Authors claimed that the complexity time needed for their algorithm is linear in time complexity.
When we want to implement social networks by graphs, we need a large graph size to be used because social networks usually have a huge number of participants. In [8], Donderiaa V & K. Janaa P presented a novel scheme for graph coloring that is dealing with large graph problems. Authors presented their own algorithm and they called it NGC algorithm to color large graphs using minimum number of colors. They show the time complexity of NGC algorithm as O(n3). They compared their NGC algorithm with DSATUR algorithm presented by Brelaz D [4], using different number of graph sizes 7 -600 nodes. Although NGC algorithm used more colors than DSATUR algorithm for the number of nodes 60, 520 and 600. It shows less running time for all cases, according to the comparisons that authors did in [8].
Social networks are very large networks. There are different kinds of personal information for their users and also they can be considered as good indicators of their users interests. In [13], H. Lee D & Brusilovsky P presented a study over "CiteULike" which investigated if the social connection might indicate user similarity, which depends on the strength of the their connection.
In [3], the authors showed that if a local algorithm has constant memory and constant lookahead is arbitrarily worse than the global optimal. They also compared local algorithms and they showed there is a scenario for every local algorithm that it performs arbitrarily worse than another local algorithm.
Usually, friends recommendation of two people (x,y) is built according to common friends who exist in both friendship lists for both x and y and have some common similarities such as the name of the company they are working in as shown in Guy, I et al [10].
Alvin Chin, Bin Xu and Hao Wang [6] showed a new algorithm to recommend friends using physical locations such as friends attending the same workshop or conference. They added some social interactions such as sending and receiving messages and ask some questions to each others. They based their study on meeting for proximity; common interest and common friends for homophily. This can be used in studying physical locations cases, like workspaces and conferences.
In their results, they mentioned that for people in the conference case, the main two reasons to recommend a friend is that they may know each other or maybe they faced each other before. But, what about people who do not know each other and they have common identifications? In this paper, we will show a new algorithm that will help these people to be good friends.

III. Algorithm M(G,б,Ҏ)
In this section, a new algorithm will be presented, which will be used to improve matching friends on social networks. This algorithm will be implemented using four main stages. Assume that we want to match new friends for a person б, as follows.
First Stage: The algorithm will apply the best graph coloring algorithm to color the network. One algorithm with low complexity time was presented by Malkwai and his colleagues [14], because it has a linear run time complexity. The output of the algorithm will be a colored graph, which will give us two distinct groups. The first group L holds the people who know each other and thus they are connected in the social network. The second is L̅ group, which is the one that has the people who are not connected to each others. In other words, the people who do not exist in the friendship list of the required person б.  [19]. Also it will be applied over all vertices in group L̅ to get their identifications and record them.
Third Stage: the second group L̅ will be used to search in all the vertices (personal profiles) in it and get the characteristics of each person ρ; then determine if any characteristic matches the characteristic of the required person б. Here we will create a new weighted graph that holds the vertices of people such as ρ who has the same characteristic of б. Obviously this will be done by adding an edge that connects ρ and б. On the other hand, the weight of this edge will implicate the amount of matching characteristics between ρ and б. The more weight value the edge has, the more matched characteristic ρ and б have. Notice that the characteristic will indicate by applying the Rossi & Musolesi 2014 [19] algorithm with all kind of techniques they presented. At the end of this step, we will get a new weighted graph G ̅ ̅ , which represents all people ρ who have at least one common characteristic with the required person б.
Fourth Stage: we will go through the new graph G ̅ ̅ and find the maximum weighted edge, which will connect the required person б with the most matching person Ҏ. These are the nodes represent the persons with the largest number of common characteristics between each others. 2. Check all identifications of б using techniques shown in [19].
3. Assign a vertex for each personal profile in the network, where vi ∈ V.

For each vertex vi ∈ V and for all vj in the friendship list of vi draw an edge ei ∈ E, between vi and vj.
5. Apply the graph coloring algorithm presented in [14], which will produce two subgroups P and P ̅ , such as, P holds all б's friends and P ̅ will hold all vertices related to people who are not in the б's friendship list (i.e. people do not know б and visa verse, which will be searching area to get new proper match-able friends for б.
6. For each v ̅ i ∈ P ̅ , apply all techniques in [19], and find every ones identifications.
7. Compare each identification for both б and v ̅ i if they have the same identification draw an edge between б and v ̅ i and the number of matching identifications to wi value, which is the weight of the edge between б and v ̅ i. 8. Go through the graph and arrange the edges descending according to their weight.
9. The maximum weight wh will lead to the most match-able vertex v ̅ h that will be suggested to be a good friend for б. Now, to demonstrate the idea let us take an example. Assume we have an undirected graph G(V,E), where V is a set of vertices such as {v1,v2,v3,…,v21}, and E is a set of edges and each edge connects vi and vj where i≠j, as shown in Figure  1. Assume that each vertex vi relates a person profile in Facebook, and an edge will exist between vi and vj if vj and vj are friends and vj is in vi's friendship list and visa verse. Notice that in real life, such a graph will be very huge but for simplicity, we will use a graph with only 20 vertices to show the idea behind the proposed algorithm. Apply Rossi L & Musolesi M [19] algorithm that will give the identification of each person according to the check-in places he/she published them in using his/her Facebook profile. These person-identifications will be presented in a matrix M that will hold a row for each vertex vi (person) and the columns will be filled by the person's identifications idi. In this example, we will assume only 3 identifications, while we can use more algorithms to get more identifications such as people emotional states. If this person has its value as 1, number 1 will be replaced in M[vi,idj]. On the other hand, if vertex vi (person) has not the identification idi, then a 0 will be placed in M[vi,idj] (as shown in Table 1 which shows the rows of the first 8 vertices)  Now we will apply Malkwai et al [14] coloring graph algorithm. This will color all vertices in the graph, such as each adjacent vertices are related to profiles of friends where there is an edge that connects them, with different colors. Consequently, if there are two vertices have no friendship, they will have the same color, as shown in Figure 2.
We will categorize the vertices according to their colors because all same colored vertices are non-friends according to the friendship list in their profiles. In this case, the algorithm colored the graph vertices by three colors, which are red, green and orange. This will be implemented in another matrix L[C,Vno] that will hold a column for each color and the vertices number those have the same color of the column's label, as shown in Table 2. At this point, we will check every vertex in each group with all group members. Remember that each group contains people who are non-friends. We will take each vertex and add it to matrix N[vi, vj, wi], where vi is the person we will search for new friends for him/her, vj is the person who is in the same group (i.e. there vertices colors are the same) and wi (will be initialized by 0) is the weight of the edge connects vi with vj. The weight wi will be increased by one if vi and vj have the same identification. Clearly, it will be at least zero and at most 3, as shown in Table 3.
After going over all vertices in the colored graph, we can see the friends to be suggested in each group. So let us take v1 as an example and show the instance relates to it from matrix N [vi, vj, wi], where N is the matrix that holds the combination between vertices and vi. We can decide which friends are the best choice to be friends, because they have more common identifications (ie. more weight value). This can be inferred by the weights of each pair of vertices. The more weight they have, the more common identifications are matched. Clearly, to suggest friends, we will choose the vertices connected with v1 with the highest wi value. However, these values can be sorted in descending order to pop-up the highest ones using a fast sorting algorithm such as the algorithm presented by Abu Dalhoum and his colleagues in [1].  According to Table 1, v19 and v8 have 3 identifications in common with v1, which implies that we have to suggest them to be friends more than v9 who has no common identifications with v1. So, this algorithm provides us a solution for friend's recommendation to people who do not know each other but their identifications are common; that we believe they will be such good friends.
Note that the algorithm excludes all nodes with different colors than red one, who might have a high weight of identification as well. For example, v1 is not a friend with v7 ,v10, v13, v21 and others. In fact, the list of nodes with different colors is larger than the list of nodes with the same color (7 nodes with orange color and 6 greentotal 13), while the nodes with red color are only 8). This means that the subset of nodes, which we investigate to suggest is 8/21 ≈ 40% of the total potential nodes (for this example).

IV. Algorithm Analysis
The time complexity of M(G, б,Ҏ) will be as follows.
Step number (1) will take a constant time O(c) because it will just check the required person б profile. Steps (2-4) will take O(V+E) steps to draw the graph, which is needed to draw the required graph from the social network.
Step number (6) will take O(c*V) = O(V), while step number (7) will take ≤ P ̅ ≤ V steps because in the worst case it will be O(P ̅ ) if б has common identification with each v ̅ i ∈ P ̅ .
Step number (8), in the worst case, will take O(P ̅ ), that will happen when all v ̅ i ∈ P ̅ has similar identifications with б. Last step will take O(P ̅ ) to get the maximum weight value. So the complexity for the whole algorithm will be O(V+ P ̅ )≈O(2*V) ≈O(V) which can be considered as a linear complexity.

V. Experiments
In this section, we will present the experiments done for different number of identifications and different number of personnel profiles. Figure 3, present the execution time required when the personnel profile defines three different identifications and show that the execution time is increasing whenever the number of vertices in a graph is increasing. We tested the algorithm with used up to 6000 vertices, although facebook rules do not allow more than 5000 people to be friends to a specific user. In Figure 4 shows results when the proposed algorithm was tested using random identifications from 1 to 10 and, the graph has up to 1000 vertices. It is clear that whenever the number of identifications increase the time of matching increases.

VI. Conclusions
In this paper, we present an algorithm that can be used in social networks to help people to get new friends. These new friends will be chosen after matching both of the new friends identifications. We can use the proposed graph coloring algorithm to get people who are not currently connected with each other, although they will be perfect couples. The person who has more common identifications with the one we want to recommend friend for him will be the first choice to suggest his friendship. We tested the algorithm using random graph with 1000 nodes with up to 10 random identifications. The results show that the execution time increases proportionally the number of the identifications required for matching personnel profiles. Also, we tested the proposed algorithm with three different identifications and up to 6000 vertices. Absolutely, the more number of vertices, the more time required for matching the vertices.