Semantic link-based Model for User Recommendation in Online community

Recommendation systems have been widely used to overcome the problem of information overloading and help people to make the right decision of needed items, such as: movies, books, products, or even people. This paper proposes a semantic model for people recommendation in online community. This model predicts significant links that would be established between community' members even they do not know each other using cascaded collaborative filtering (CF). Two major steps in this type of recommendation model are (i) the method used to compute similarity between people, and (ii) the method used to combine these similarities in order to compute the overall similarity between target member and others. By utilizing local attributes of nodes and links, similarity between members is calculated. Semantic relatedness between members is delivered from connection strength and trust score in order to identify the closeness between them. Extensive experimental of the proposed model using real dataset of scientific community was applied to recommend authors as possible coworkers for a target researcher. Experimental results on real dataset from publication network show that the proposed model for people recommendation outperforms other known techniques in ranking recommended collaborators.


INTRODUCTION
Social collaboration refers to processes that help multiple people interact, share information or knowledge in order to to achieve any common goal. Social collaboration media can be classified into two main types: Social networking and online communities. A social network is a social structure between actors, indicates the ways in which they are connected through various social familiarities ranging from casual acquaintance to close familiar bonds [9]. Online communities form a fundamental part of the web today where a large portion of the Internet's traffic is driven by and through them [1]. They are fundamentally defined by the repetition of interactions among a set of nodes (representing actors) over time known as knowledge workers who used to share content, seek support, and socialize. In these communities, people are not aware of each other, but people tend to communicate with their peers, whom they trust, to get advice. Accordigly, people to people recommendation has recently become an important task in many online communities where it is significant to find the "right" actor who can interact with or work together. A recommendation service in such communities must not suggest just anybody who is possibly relevant, but has to check up the trust score of such member in order to be able to give advice. Thus, it is more rational to deliver recommendations within an informal community of users using their interest and social context. In research community for example, people tend to interact with others who have similar research interests [10]. Ttherefore, it is significant to consider social structure of user community as well as users' profiles and social behaviors, as an additional source of information in building people recommendation service. There are two prevalent approaches for building recommender systems: content-based (CB) [20] and collaborative filtering (CF) [8]. Traditional content-based RS recommends items similar to those the user preferred in the past. It also consider feature of items preferred by users in order to get similar items. While collaborative filtering recommends items that other users with similar preferences have liked in the past [13]. Therefore, it is necessary to define the community (similar people) in order to identify their preference. Hybrid approaches combine content-based and collaborative filtering methods in several different ways [21] in order to achieve a better performance of filtering, and take the advantages of each. This research work aims to help finding appropriate member(s) to communicate with in collaborative environment such as online communities where users are not familiar with each other's. Thus, we propose a cascaded collaborative filtering model for people recommendation (CCFPRE) that predicst candidate users to communicate with in the future. In people recommendation, "items" are also users who actively participate via community interactions which differ from traditional CF. Mutual nature of such interactions is utilized to recommend, for a given user, other users they may like to contact. CCFPRE is based on calculating similarity between target user and other users using semantic local features of nodes within a community. It starts by modeling information about community members (nodes); identify link-based features that is used in calculating the tie strength, and trust score between members. Next, similarity engines are used to calculate similarity between target user and others using different similarity measurement techniques appropriate to user's feature(s). A new promosing spread similarity technique is proposed here which consider the length and weight of connection among members in online community. The main contribution of this research is providing an advanced recommendation model that utilizes link features of users in online community to identify the expected reciprocity. Section2 describes different recomemnation techniques that could be applied to people recommendadtyion. Section3 illustraetes the proposed CF model for people recommendation. Link-based recommendadtion engine is described in section4. Experiment applied on real world online community for reseraccher is explained in section5. Experiments show that the proposed model outperform previous recomemndadtion techniques. Finally, conclusion and future work is described in section6.

RELATED WORK
Recommendations systems have been applied in diverse fields, most of it focus on user-product recommendations emphasizing on the one-way relationship between user and the products they are interested in buying. These recommender systems cannot be directly applied for people recommendations due to the existence of two-way relationship between people as well as highly sparse of dataset. People-to-people matching systems were initially used for dating services, where users join with the objective of meeting other users with the common need. Peaple are matched based on metric that include some personal information such as education and professional background, personal interests, hobbies, etc. Some real-world examples of these systems are employer-employee (in job search networks), mentor-student (in university social networks), consume-to-consumer (in marketplaces) and male-female (in an online dating network) [14]. In social networks, users are "flooded" with information from feed reader and many other resources, thus recommendation of other people that a member may like has recently become an important task in many online social networks. Several systems have been proposed for people recommendation in social network such as CollabNet [6], which uses gradient descent to learn the relative contributions of similar users or items to the ranking of recommendations produced by a recommender system. By using weights to represent the contributions of similar users for each active user in social networks, similarity among people is measured which is then used to identify and ranking people. SocialCollab [5] has adapted traditional collaborative filtering algorithm to predict, for a given user in social network, other users they may like to contact. User similarity is calculated based on both attractiveness and taste. Users' taste is defined through s their favorites when they actively make decisions selecting other users. While attractiveness is measured based on their involvement in interactions by being selected by other users. Another proposed a people to people recommendation system was proposed in [15] which utilized tensor models that have the ability to correlate and find latent relationships between similar users based on both information, user interactions and user attributes, in order to generate recommendations. People recommendation could also be considered as a link prediction problem in heterogeneous and reciprocal networks. The system developed in [4] used structural features and structural collaborative information about people, and properties of links between people to measure positive and negative signs through its path. Recently, N o v 2 7 , 2 0 1 3 approaches based on analysis of both content of micro-blogs to detect users' interests and in the exploration of the topology of the network to find candidate users for recommendation has been proposed [2,3]

PEOPLE RECOMMENDATION MODEL IN ONLINE COMMUNITY
Collaborative filtering (CF) methods recommend items based on aggregated user preferences of those items. The main principal of CF technique is that when people historically select the same set of items, this indicated that they have the same interest and therefore, are likely to be interested in other items in the future. However, people recommendation is different since a user would have a dual role as both "users" and "items", and the assumption of active users and passive items in traditional CF cannot be applied here. Thus, people recommendation is based on defining similarity between users in terms of the mutual properties such as: attractiveness, interest, and taste. Therefore, the proposed CF people recommendation model involves two important challenges which are illustrated in this section. First, accurately identifying the user preferences, this is based on utilizing link-based attributes that distinguish users within the community. Second, assign different similarity measurement mechanism for each type of attributes which is used to rank the unified set of recommended users.

Semantic User Modeling
User model contains information that the system knows about the user concentrating on modeling of user knowledge, plans, and preferences in a domain [10]. Modeling the user would help in identifying the taste of each active user using information collected from other connecting users as well as the user' activities. In online community, this information is either explicitly defined or implicitly inferred. Explicit aspect provides us with description about the user's regular information which leads to form information network while the implicit knowledge enables us to add meta knowledge (semantic attribute) about users. Attributes used for modeling of user in online community could be classified into: content-based and link-based each is related to set of features of member. Content-based attributes such as interest, emotional, and taste are extracted from post, comments, documents, tags, published by user. While, link-based attributes such as: connectivity, behavior and trust are identified using type and frequency of interaction between users, activities, and community pattern. In this research we focus on only link-based attribute and classify link-based attributes into direct and indirect (semantic) attribute. Direct attributes are found when two member of the community explicitly communicate with each other through sending message or comment on the each other posts. While, semantic attributes represent hidden relationships extracted from indirect relationships. Number of common friends, frequency of joining in the same type of events or activities, and having the same social pattern are example of such semantic attributes. Therefore, the proposed user model would contain both types of attributes to be used in measuring he strength of ties between users as well as trust score which is used in the proposed model to enhance the coverage of recommendations process.

User Similarity Measurement
In traditional CF, the most important step is to define similarity. Two items are similar if both are selected together by the same set of users. Alternatively, two users are similar if they both select the same set of items (i.e., they have similar taste). In online communities, "items" as the recipients of actions are also users who are contributing in network interactions. Therefore, traditional CF cannot be directly applied to people recommendation, since we don't have an explicit rank from user to other users. Thus, similarity among members is calculated based on connectivity, behavior and trust. Motivated by the homophily [18], users in online community are more easily influenced by the friends they regularly communicating with and trust. Strength of bonds between members and trust score are key issues to associate similarity value between members. The proposed recommendation model extend the traditional CF to handle the specific nature of online community such that it would recommend users to a target user x through the following assumptions: If people similar to target user x are connected to user y , then user x could also be connected to user y, If people similar to target user x would trust user y , then user x could also trust user y Therefore, the proposed framework works as follows: it identify similar people for a candidate u x based on connective attributes WA x using spread similarity technique based on. Next, HIT algorithm is applied to calculate trust similarity score between people to u x based on trust-based attributes TS x . Motivated by the idea of ranking the items in the right order to get the Top-N recommendation list [7,11], potential recommendations list for user u x is assembled and re-ranked according to the intersections between the sets LW x ∩ TS x . N o v 2 7 , 2 0 1 3 calculated based on user's interaction features. While phase2 cacaulate trust scores using distinguished potential relations in order to converge ranking of recommended users generated from phase1.

Problem Definition
The proposed recommender engine is generally applicable for any online community. The main challenge is to identify link-based attributes and to classify them to be either related to connectivity or trust features. In this paper, we use academic information network which stores publication and citation data in order infer authors' future collaboration with others in the same community. In publication network [23], direct relationship between member are in form of coauthorship, direct citation, and being working in the same research institution. While, indirect relations are co-citation, bibliographic coupling, and regularly attending the same conferences. Link recommender engine computes similarity between nodes by taking into account both the "connection", and "trust" strength between members. Accordingly, our model starts by forming a graph using direct relationship between members. The graph contains two types of nodes: authors, and papers and two types of edges representing the above mentioned direct relationships.
An edge is introduced between two authors if they if they coauthored at least one paper. Another edge is introduced between an author and a paper if the author citied this paper as shown in figure1. Thus, we deduce different metric that are used to quantify the "similarity score" between authors by considering both connectivity and trust measurements. The recommendation process works as follow: First, a starting node (author) is selected for which we will recommend the "right" authors to interact or collaborate with in the future. Similarities between target author and others are calculated using both spread similarity and HITS algorithm. Those scores are then used to obtain lists of compatible people to be recommended to the target author. Next, semantic features are used to re-ranked obtained results and converge recommendation. Finally, disjoint set of top-k similarity values are recommended to the selected target author.

Spread similarity calculation using connective features
Using publications graph representation, connective similarity between authors is calculated. We adapt the FriendTNS algorithm [22]which is based on the degree of closeness between the neighbor nodes that form the pathways of maximum length of 2 between two nodes. This algorithm has been applied on real life social networks and over performed other local link approaches in terms of accuracy and time . It recommends friend based on the number of common friends each candidate friend has with the target user. In order to illustrate the effectiveness of proposed spread similarity technique, let us consider the case shown in figure2, which represents eight connected authors through their common publication and number appear on the links between nodes represent the frequency of interaction between users (the number of coauthored papers between two authors). When applying FriendTNS algorithm to recommend people to Au 2 which is connected to both author Au 4, Au 1 , it yields to generate the following list Au 3 , Au 5 , Au 6 , Au 8 . However, this list is not ordered yet. Therefore, FriendTNS algorithm applies the following formula to compute similarity between neighborhood nodes based on the inverse sum of node degree in order to rank the above list of candidate authors.
Sim(x,y)= Equ (1) However, this formula considers only the binary relationships between people (the existence of relation between two people) and accordingly similarity values between all non-neighbor nodes in a graph is set to zero. Furthermore, this formula does not take into account the semantic of the relationship among people in the same community which is measured through the frequency of interaction (generally it could be number of posts which two people sent to each other, number of replies, or number of co-author papers as we considered here). In online community, frequency of online interactions between a pair of users directly impacts the degree of relationship strength. The stronger the relationship, the higher likelihood that a certain type of interaction will take place between the pair of users [24]. Accordingly, node similarity is essentially emphasized the notation relationship strength between nodes. Therefore, we multiply the above formula by the frequency of interactions to calculate the node similarity. We use the following formula to calculate node similarity between two connected nodes.

Figure1: Publication information networks
Sim(x,y)= Equ (2) Equation2 is used to calculate basic node similarity and thus similarity matrix is created between users as shown in table1. This matrix contains weight associated between connected nodes (authors). However, the proposed link-based recommender model aims to form a similarity index to weight (predict) unconnected nodes in order to recommend suitable authors to interact with. Thus, after calculating basic node similarity matrix, link-based recommender engine identifies the set of neighboring nodes to a target user x, . Then through a progressive process, other connected nodes to this set are collected with their predefined basic similarity. Next, spread similarity is calculated for these unconnected set for target user using the product of the basic similarities between the nodes of the shortest path from user x, to other nodes up to two successive nodes in the collected set. Finally, the top-k users are recommended to the target user x . To explain this process let us recall the case of Au 2 is which is connected to both user Au 4, Au 1 . Au 4 is connected with 2 other nodes (Au 3 and Au 8 ), while Au 1 is connected with 4 other nodes (Au 3 , Au5, Au6, Au8). Spread similarity is then calculated for both sets connected to Au 4, Au 1 in order to find the set of recommended users. . As seen from that example Au 3 appears in both recommended list of Au 1 and Au 4. Therefore, it is significant to consider this intersection of recommended users as well. Accordingly, spread similarity (Ssim) equation is defined as:

Table1: Basic node similarity matrix between authors
Applying equation3, spread similarity between nodes Au 2 and Au 3 using equation (3)

Table2: Spread similarity matrix between authors
When applying spread similaty, it would recommend the following order of authors to Au 2: (Au 3 , Au5, Au6, Au8) as shown in table2. Then, semantic connective features such as attending the same conferences are used to converge (re-rank) the set of top ranked authors. The number of common attended conference between target author and each author in the recommended list would increase the score of that author.

Trust-based node similarity calculation
Traditional recommender systems ignore trust relations among users [16]. In real world, users can be easily influenced by friends trust. Trust is the measure of willingness to believe in a user based on its competence and behavior within a specific context at a given time [25]. Trust-based recommendation methods assume additional heuristics knowledge about the trust network among users [17]. This knowledge is related to the asset of trust value between individuals. Trust score between nodes is computed based on nepotistic relationships [19] which are either explicitly or implicitly defined. Users in online community are able to assign trust values for each other through voting or reputation scores. On the other hand, trust score can also be inferred through tracing who comment back a positive review such as in on eBay, following a new follower or re-tweet in Twitter, etc. In publication network, citation is considered as a key attributes in trust measurement which assume that authors tend to trust the work she/he cited to. Thus, citation matrix is first created which relate author to cited paper, next HITS ("Hypertext induced topic selection") algorithm [12] is used to identify candidate list of authors to be recommended to a target user. Essentially, HITS algorithm is based on setting a hub and authority values for each web page and create adjacency matrix to be used to rank pages with respect to a specific query. HITS uses an iterative approach, it assigns two scores to each node: a hub score and an authority score. Next, authority score is caculated for those authors identified from phase. This score converges after a few iterations (we use the normal 20 iterations). Thus, authors are ranked using their "authoritieness" value which provides a measure of importance of an author (target) with respect to other neighboring nodes. A good authority is a user who obtained more citation from her/his colleagues which is measure of her/his trusts score while good hub is a user who cites to her/his colleagues. Thus, authority measure is used by trust-based recommender engine as an indicator for author trust score. In the above example, authority scores for those authors are {0.020447, 0.063971, 0.000349, 0.590895, 0.795742,0.112491,0.021734} respectively. Co-citation and bibliographic coupling have become standard measurements in scientometrics for detecting author similarity [10]. Thus, co-citation is considered here as sematic feature which is used to increase trust score among users such that the increasing the number of co-cited papers, the oscillation of trust score.

EXPERIMENTAL PROTOCOL AND EVALUATION METRICS
The proposed framework is general and can be applied to any online community. In order to describe the effectiveness of the proposed model, we apply a set of experiments on large publication datasets which are publicly available. 1 The basic idea of experiments is to validate correctives, accuracy, and effectiveness of proposed model. This is achieved through applying a set of experiments which are explained in this section.

Experimental Setup and dataset
We perform our experiments on real-world datasets containing academic coauthors network and papers citation network. Both are extracted from academic search system Arnetminer 2 . The coauthor data set consists of 1,036,990 authors and 1,554,643 coauthor relations, while the citation data set contains 1,632,442 papers and 12,710,347 citations between these papers [23]. Furthermore, this dataset contains 2166 pairs of advisor-advisee coauthors as well as 3937 pairs of coauthors as colleagues. Also for each paper, a reference to the conference's name in which this paper has been published is mentioned (which we use to identify authors who attended the same conference). We use the following attribute to model the user connectivity feature: author ID, list of co-authors associated with the number of co-author papers, list of advisor (students), list of colleagues, and list of authors who attend the same conferences. The last three attributes are considered as semantic attribute which are used to enhance strength of ties among authors. Moreover, we use the following attributes to model the user trust feature: author ID, list of cited authors associated with the number of citation papers, list of co-citation authors, and list of bibliographic coupling. The last two attributes are considered as semantic attributes which are used to enhance the trust score among authors. The experiment setting was defined as follows. First, in order to validating the correctness of model , we divide the list of co-author for each user into two separate lists: original and test sets with ratio 80, 20 % respectively such that information in the test set is not allowed to be used for recommendation. Thus, the list of recommended author obtained using original set only is compared with test set in order to estimate the relevance of the recommended authors with those co-authors in the test set. Furthermore, a comparison between the proposed model with other recommendation mechanisms is applied which shows superior of CCFPRE. Finally, we compare the results obtained using the two similarity measure: spread similarity and trust score with the cascaded version.

Evaluation methodology
We use the classic precision performance measure for people recommendations. Accuracy of the proposed recommender system is evaluated by its ability to retrieve relevant co-authors among the top-N recommended authors. For a target user x receiving a list of k recommended authors (top-k list), precision is defined as follows: Precision is the ratio of the number of relevant authors in the top-k list (i.e., those in the top-k list that belong in the test set of co-author of the target author) to k.
precision (RE) = Thus, precision is a measurement of the percentage of overlap between generated recommendation list and the user actual test set of co-authors. Precision is measured at different points in the ranked list of suggested collaborators. Thus, precision at rank k (P@k) is defined as the proportion of recommended authors that were relevant, i.e. were in the target user test set, between K recommended authors. In the following experiments, we evaluate precision for values of k equal to 5, 10, 20 respectively. Moreover, we use the Average Reciprocal Hit-Rank measure (ARHR) [7] to qualify the accuracy of recommendation model. This measurement is used to rewards each hit based on where it occurred in the top-N list. The number of hits is the number of co-author in the test set that were also present in the top-N recommended items returned for each user. Therefore, If h is the number of hits that occurred at positions p1, p2, . . . , ph within the top-N lists (i.e., 1 ≤ pi ≤ N), then the ARHR is equal to:

ARHR =
This measure indicates that authors who occur earlier in the top-N recommendation lists are weighted higher than authors occur later in the list. The highest value of ARHR is equal to the hit-rate and occurs when all the hits occur in the first position, whereas the lowest value of the ARHR is equal to hit-rate/N when all the hits occur in the last position in the list of the top-N recommendations.

Experimental results
The first experiment targets the correctives and accuracy of the proposed model. Thus, CCFPRE is applied over a set of 10 different authors, lists of top ranked 20 candidate authors for ercah of them are then retrieved, and precision and ARHR are calculated at different level in order to measure the accuracy of model. N o v 2 7 , 2 0 1 3 Tables 4,5 and Fig.3,4 show the results in terms of precision and ARHR which rewards each hit based on where it occurs in the top-N authors. It could be observed that the precision diminishes for longer recommendation lists, but have promising values for shorter lists as shown in first column in table4, where it reach the maximum for user2,6,7,9 respectively. Furthermore, the value of ARHR shown in table5 and figure4 rewards each hit based on where it occurs in the top-N collaborators that were recommended and provide a promising at

Table5: ARHR of Proposed model
Other experiments have been applied in order to compare the accuracy of each proposed similarity measurement techniques with other state-of-the-art similarity measurement such as friend-TSN and HITs. Thus, we first compare spread similarity with Friend-TSN using both precision and ARHR values at different levels. According to Table6,7 which present those comparisons, where numbers in bold indicates the improvement of precision value when using spread similarity. It is significant to mention that the P(5), ARHR(5) obtained for user2 using friend-TSN is zero which means that none of recommended authors exist in the original test set while spread similarity has got a higher value indicating its effectiveness in recommendation. Furthermore, for all the results presented so far, the value of ARHR improved at all levels using spread similarity. According to table8, figure 5, which presents the precision and ARHR values at each level of cascaded model. It is significant to mention that the ARHR values has been improved for all levels which proof that CCFPRE outperforms other techniques in ranking recommended authors, and hence the effectiveness of cascaded model.

Conclusion
This paper presents a novel and effective model for recommending collaborators in co-workers environment. This model utilizes link-based attributes of members in online community to predict appropriate actor(s) to communicate with. The model depends on effectively identify suitable feature to model user connectives and trust attribute. Local attributes of links present relationships between people in community in order to handle the mutual nature of their interactions. Then, different similarity measurements techniquesa are utilized to identify for a target user, candidate recommendations and then sorts these candidates according different semantic weighting features. The results reported here from several experiments are promising especially for spread similarity technique that has been proposed here. Future work intent to consider the content published by users as an indication of her/his interest.