Public Key Encryption with Conjunctive Field Free Keyword Search Scheme

Searchable encryption allows a remote server to search over encrypted documents without knowing the sensitive data contents. Prior searchable symmetric encryption schemes focus on single keyword search. Conjunctive Keyword Searches (CKS) schemes improve system usability by retrieving the matched documents. In this type of search, the user has to repeatedly perform the search protocol for many times. Most of existent (CKS) schemes use conjunctive keyword searches with fixed position keyword fields; this type of search is not useful for many applications, such as unstructured text. In our paper, we propose a new public key encryption scheme based on bilinear pairings, the scheme supports conjunctive keyword search queries on encrypted data without needing to specify the positions of the keywords where the keywords can be in any arbitrary order. Instead of giving the server one trapdoor for each keyword in the conjunction set, we use a bilinear map per a set of combined keywords to make them regarded as one keyword. In another meaning, the proposed method will retrieve the data in one round of communication between the user and server. Furthermore, the search process could not reveal any information about the number of keywords in the query expression. Through analysis section we determine how such scheme could be used to guarantee fast and secure access to the database.


INTRODUCTION
Cloud computing has become the most common phenomenon in the recent years. More and more cloud services have flourished all around the world such as computing resource, storage space outsourcing and different kinds of software applications. For many reasons like low cost, efficiency, convenience, better connectivity and etc., user often stores his data on a remote server. Since more servers are public, there exist a lot of risks for the data in the transition process, the user ensures the privacy of his data by storing it in encrypted form, and then he can search the encrypted data and retrieve it. The first scheme of searching encrypted data by keyword was tackled by Song et al. [1]. To securely search through encrypted data, searchable encryption schemes have been introduced in recent years [2,3,4,5,6,7,8], which can be divided into two schemes: symmetric searchable encryption (SSE) and asymmetric searchable encryption (ASE). To perform a search on a dataset, an user creates an index of keywords listed in the documents and later on executes the search on the index in a way that allows the server to retrieve the documents contain a certain keyword instead of retrieving all the encrypted documents back which is fully impractical solution in cloud computing scenarios. Recent refinements and extensions to this scheme are given in [7,8].
The drawback of all the follow-up works is that they only allow the remote server to retrieve the documents that match a specific keyword, but they do not allow for Boolean combinations, conjunctive and disjunctive, of such queries.
Most classical searchable encryption works focus only on single keyword search [4,5,6,7,1] or multiple keyword search [9,10,11,12,13]. In the symmetric key schemes, recently some solutions have been introduced for general Boolean queries on encrypted files [14,15], and there are only two related works in the public key setting [16,17].
There are many Boolean operations, like disjunction, conjunction and negation. In the disjunctive search, the user can search for encrypted documents containing: w1 or w2 or wn. While in the conjunctive search, the user can search for the encrypted documents containing: w1 and w2 and wn and finally in the negative search, the user can search for all encrypted documents which do not contain particular words. To support multiple Boolean encrypted keywords, such as conjunction operation, we consider a mail server, shown in Figure 1, which retrieves a stream of email encrypted messages, each email will be defined some keyword fields, like "From", "Date" and "Status". Before sending the message, the sender, for example Jack, should encrypt the message content by using a public key encryption algorithm with the recipient's public key, and then adds some additional encrypted keywords of the above keyword fields, like "Jack", "11/02/2008" and "secret". When the recipient wants to retrieve the encrypted messages which are sent by "Jack" at "11/02/2008" and having "secret" status, rather than retrieving all messages from "Jack", he sends a "trapdoor" with multi keywords "Jack" AND "11/02/2008" AND "secret" to the mail server which in turn routes the corresponding encrypted emails to receiver without learning any information.
Existent schemes for conjunctive keywords search ( [11] and subsequent works) were supporting keyword fields in the index. This setting is not useful and much more difficult to search in most systems, such as the database text and the body of e-mail.
Despite the efficiency of Public-key Encryption with Keyword Search scheme PEKS [4], there are some important cases relating the use of PEKS, which were studied in [18]. One of these cases is that the scheme did not support the notion of the multiple keywords search.
Our proposed solution to solve the above problem is to define a secure scheme of public key encryption with keyword field free conjunctive keyword searches (PKE-KFF-CKS) that allows conjunctive keyword search queries on encrypted data without needing to specify the positions of the keywords (hide the keywords positions from the querier) where the keywords can be in any arbitrary sequence. Furthermore, instead of giving the server one trapdoor for each keyword in the conjunction, we combine individual keywords to make them regarded as one keyword, this can be done using the template concatenation function w1||w2||…||wm without needing for conjunctive search mark , the cloud server cannot know the number of keywords, in other meaning if the users want to retrieve the documents that contain a set of keyword, they have not to repeatedly perform the search protocol for m keywords times. Also, we show that our scheme is secure against adaptive chosen-keyword attacks in the random oracle model ROM under the Bilinear Diffie Hellman assumption.

Main Contributions
Our main contributions can be summarized as: (1) Our scheme dealing with keyword field-free conjunctive keyword searches, we design a novel algorithm that converts the conjunctive keywords search to a single keyword search and consequently the model cannot support the posting list intersection protocol. With this new scheme, we can greatly reduce the search time and the storage cost of the searchable index.
(2) Creating Indistinguishability-Chosen Keyword Attack (IND-CKA) secure index using a bloom filter for each file in a collection of files.
(3) Security of our scheme based on the Bilinear Diffie-Hellman assumption.

Previous Work
Song, Wagner, and Perrig [1] first proposed the notion of searchable encryption for a single-user. They introduced a scheme in the symmetric key setting, which encrypts each word of a document separately. Goh [7] proposed a method for secure index using the Bloom filters. Each keyword is processed using the keyed hash function f as the pseudo-random function and then inserted into a Bloom filter. The trapdoor consists of an indicator of that which bits in the Bloom filter should be tested. In the public key setting, Boneh et al. [4] first proposed public key module for keyword search, where anyone can use public key and write to the data stored on remote server, but only authorized users with the secret key can search. Furthermore, the keyword security could not be protected in the public key setting since remote server could encrypt any keyword with public key and then use the received trapdoor to evaluate this ciphertext. However, these above approaches focus only on single keyword search. To improve search functionalities, many boolean keyword search schemes over encrypted data have been proposed. Obviously, there are two naive solutions to achieve conjunctive keyword search: the first is to get the intersection of all sets of documents where each set is the searching result for every keyword in the conjunctive; the second is to define a meta-keyword for every possible keywords conjunction. The first work for conjunctive keyword-searchable encryption was proposed by Golle et al. [11], their works consisted of two schemes: the first scheme compares two hash codes of the keywords to find the required documents, the transmission cost of the trapdoors is very high. The second scheme tests two outputs of bilinear pairing constructed from input keywords and checks if the keywords are included in the document. Boneh and Waters [19] introduced a public key CKS scheme from a generalization of anonymous identity based encryption. Their paper supports comparison queries and general subset queries. Byun et al. [20] presented an efficient scheme using bilinear pairings, which has a constant size of trapdoors and requires two pairing operations per document for searching. The scheme is more efficient than both schemes by Golle et al. [11] in terms of communication overhead, but it has higher computational overhead for the encryption process of each document by requiring as many pairing operations as the number of the associated total keywords. Ryu and Takagi [21] introduced an efficient scheme for conjunctive keyword searches where the size of the trapdoors for several keywords is nearly the same as for a single keyword. They use asymmetric pairings in groups of prime order. The encryption process requires one pairing per document and the server has to perform two pairings per document to search. Hwang and Lee [12] introduced a public key encryption scheme with conjunctive keyword search (PECK) and gave a new concept called multiuser PECKS. The notion of their scheme is to minimize the communication and storage overhead for the remote server and also for the user.
Recently Wang et al. [22] proposed the first keyword-field-free conjunctive keyword search scheme KFF-CKS for dynamic groups that is proven secure in the ST model. The notion is to remove the keyword fields by using a bilinear map per keyword per document index.

Security Requirements
1. Data security [23]: when the data owners encrypt the keywords and the message using the authorized user's public key, only the corresponding secret key can decrypt the content, that mean no one could derive the embedded keywords from the cipher-text.
2. User authentication: After encrypting, no information can be extracted from the trapdoor and the ciphertexts, but the remote server still has to check whether the users who send the trapdoor are the authorized users. [24,25,26].
3. Trapdoor security [23]: Whenever the receiver wants to search the encrypted data, he sends the trapdoor containing the corresponding keywords to the remote server; other users can get nothing from the trapdoor even if the trapdoors are obtained by the adversaries.
4. Against off-line keyword-guessing attack: any proposed security scheme should stand against outside adversaries and inside attackers (malicious servers) [10,12].

Outline
The rest of the paper is organized as follows. Section 2 introduces the preliminaries. Then we provide the outline of the proposed work, notations, semantic security of the PKE-KFF-CKS scheme and construction of PKE-KFF-CKS in Section 3. Section 4 gives the security analysis, performance and comparisons. Finally, Section 5 introduces the brief conclusions.

The Bilinear Pairings and Complexity Assumptions
We briefly show theoretical background and complexity assumptions that used throughout our paper.
(1) Bilinear maps: Let and be two cyclic groups of prime order q. ̂ be a map which satisfies the following properties:


Bilinear: for all and , ( ) ̂  Non-degenerate: there exist such that ̂ , where 1 is the identity of .


Computable: for all is computable in polynomial time.
An algorithm solves BDH problem with the probability if Pr[ ( , where the probability is over the random choice of generator , the random choice of and random coins consumed by .

Outline of the Conjunctive Keyword Searchable Encryption [11].
A conjunctive keyword searchable encryption (CKSE) consists of the following four algorithms:  KeyGen(k): It is run by the data owner to initiate the scheme. It takes a security parameter k, and returns a secret key SK.

Outline of the PEKS scheme [4].
A public key encryption with keyword search (PEKS) scheme consists of the following algorithms: (1) KeyGen( ): Takes a security item as input, and creates a public/private key (Rpub, Rpriv) for the receiver.
(2) PEKS(Rpub,W) : Given Receiver's public key Rpub and a word W, computes a searchable encryption S for W.

IND-CKA game:
 KeyGen: The challenger runs the KeyGen( ) algorithm to create the public key pk and the secret key sk. He gives pk to the attacker, while sk is kept secret from him. Such an adversary is called an IND-CKA adversary. 's advantage in attacking the scheme is defined as the following function of the security parameter : The probability is over the random bits used by the challenger and the adversary. A PKES scheme is IND-CKA secure if for any polynomially time adversary, is negligible.

Bloom Filter BF
Bloom filter is a space-efficient data structure which is used to check whether an element is a member of a set. Burton H. Bloom [27] introduced this data structure in 1970. BF is used to test whether an element s is a member of a set F = (w1,…,wn). The set F is coded as an array BF of x bits, where all bits are initially set to 0. The filter uses r independent hash functions h1,…,hr, to map items into a domain between 0 and x-1. For each element wi F where 1 i n, the array bits at the positions h1(wi),…,hr(wi) are set to 1. Note that, a location may be set to 1 multiple times. The elements themselves are not stored in BF, only their membership may be queried by an application. To determine if a word s F, we check whether the bits at positions h1(s),…, hr(s) in BF are all 1. If any bit is 0, then s F. Otherwise, we say F with high probability.
A false positive is possible which can be controlled by changing the filter length x as follows: where n is the number of elements, FPR is the user defined False Positive Rate, (FPR) can be approximated as: False positive matches are possibly occurred, but false negatives are not, thus a Bloom filter has a 100% recall rate.
The amount of space required to store bloom file is significant less compared to data structures, such as self-balancing binary search trees, hash tables, or simple arrays or linked lists, etc.
The time required either to add elements or to check whether an element is in the set or not is a completely independent of the number of elements already in the set. We just need to find the r indexes using r hash functions. In a hardware implementation, the Bloom filter regards as a perfect scheme because its r lookups are independent and can be parallelized.

Semantic Security of the PKE-KFF-CKS Scheme.
The proposed scheme is semantically secure (indistinguishability) against an adaptive chosen keyword attack IND-CKA if every PPT (Probabilistic Polynomial Time) attacker has a negligible advantage. PKE-KFF-CKS consists of two public key encryption algorithms, i.e., algorithms BuildIndex and DocEncrypt, where BuildIndex algorithm closely follows the PEKS algorithm. Therefore, we define security for the PKE-KFF-CKS scheme in the sense of semantic security of [4] as follows: Given the security parameter ), the challenger calls the key generation algorithm KeyGenerator( ) to generate secret key Usk and public key Upub, then he sends Upub to and keeps Usk to itself. Let be an adversary that can adaptively ask the challenger for the trapdoor TW for any keyword W of it's choice, where W = {w1||w2||…||ws}. Firstly, chooses two sets of conjunctive words W0 = {w01||w02||…||w0s} and W1={w11||w12||…||w1s}, which are not to be asked for the trapdoors TW0 or TW1 previously, and sends them to the challenger. Then picks a random and creates the secure index Iw using the BuildIndex algorithm and gives the attacker W = {Upub, Iw }.
can continue to ask for trapdoors TW for any keyword W = {w1||w2||…||ws} of his choice as long as W W0,W1. Finally, outputs a guess and wins the game if .
We define 's advantage in breaking the PKE-KFF-CKS scheme as:

Construction of PKE-KFF-CKS
Contrary to Golle et al. scheme [11] we do not target the fixed field keyword; we rather consider an enhanced query model consisting of Boolean expression on keywords expressed in the conjunctive form without needing to specify the positions of the keywords where the keywords can be in any arbitrary order.  When the data user wants to retrieve the document IDi that has the following keywords (A and B and C), he can create a trapdoor in arbitrary order as one search token, that's mean he can send one of the following combined keywords (A||B||C), (A||C||B), (B||A||C), (B||C||A), (C||A||B) or (C||B||A) as a query to the remote server. Then the server tests the Bloom filter against the trapdoor and retrieves the associated matched document to the U without needing for the posting list intersection protocol.
Our scheme consists of six algorithms KeyGenerator, BuildIndex, DocEncrypt, TrapdoorGen, SearchIndex and DocDecrypt which are scattered between two phases, Sender Phase and Retrieval Phase.

Sender Phase
This phase includes three algorithms as detailed below: assurances that if the same document identifier is encrypted multiple times, it will create different ciphertexts but all decrypted to the same value. Then O creates one Bloom filter BF for each document, this filter consists of an array of x-bits, and uses r independent hash functions h1,…,hr. The filter allows the data owner to perform keyword searches efficiently, but could result in some false positive retrieval. A classical Bloom Filter may reveal information about the contents of the document since the hash functions are publicly known. So, in our work a suitable solution to create a searchable index using Bloom Filter is to instead index each conjunctive keyword by its encrypted image. To do so, we apply bilinear maps on elliptic curves, we use two groups and of prime order q and a bilinear map ̂ , also we need three hash functions , and . The owner creates m! possible permutations of these keywords sequence P = {per1, per2,…, perm!} and makes each permutation perj looks like one keyword using concatenation operation as perj = {w1||w2||…||wm} where j [1,m!]. We use such a bilinear map with each permutation perj as Encperj = ( (Upub, (perj) a )). Then the BFIDi will be constructed using the hash values on the conjunctive string hz(Encperj ), z = 1,…, r, instead of applying the hash values on perj directly. After that the array bits at the positions h1(Encperj ),…, hr(Encperj ) are set to 1. Finally, O stores the encrypted ID EncIDi and associated bloom filter BFIDi in IDi .
In Bloom Filter, the number of 1s is reliant on the number of BF entries, in this case, the number of different permutations. As a consequence, the scheme reveals the number of keywords in each document. To avoid this problem, padding number of dummy keywords may be used to make sure that the number of 1s in the Bloom Filter is nearly the same for various documents. Padding process is costly compared to the scheme without it because the higher rate of false-positive. The final step in the sender phase algorithm is sending the ID and encrypted documents EncDoc set to remote server.
The algorithms 1,2 and 3 below show the key generator algorithm, build index algorithm and document set encryption algorithm respectively.

Algorithm 1-KeyGenerator( )
Given a security parameter which determines the size of and , the algorithm works as follows: 1: generate a prime q, two groups , of order q and a bilinear map ̂ .
2: select a random generator of .
3: pick a random as a private key, Upr = , for user and calculate the corresponding public Upub = .
4: pick a random as a private key, Opr = , for owner and calculate the corresponding public Opub = .
5: pick a random as an escrow key, and calculate V = .

Algorithm 2: BuildIndex(D i ,W Di ,PP,O pr )
The algorithm is executed by the owner O to encrypt the conjunctive keyword WDi and produce a searchable encrypted index IDi as follows: 1: encrypt the IDi using El Gamal cipher under O's private key and U's public key Upub as EncIDi =IDi (G a ), where G= (Upub,V).

Retrieval Phase
This phase includes three algorithms as detailed below: 1: Trapdoor Generator: To retrieve only the documents containing keywords Q, the data user U has to ask the O for public key Opub to generate trapdoors; If O is offline these owners' data can't be retrieved in time. If not, U will get the public key Opub and create one trapdoor for a conjunctive keyword set Q = {q1, q2,…, ql}, using TrapdoorGen(Q,PP,Upr) algorithm. Firstly, the data user combines the conjunctive query to make them look like one query, Tq = {q1||q2||…||ql}, then U will compute the trapdoor of the search request of concatenated conjunctive keyword Tq under his private key b, Tw = (Tq) b . Finally, U submits Tw to the cloud server.
2: Search Index: Upon receiving the trapdoor Tw, server will call the SearchIndex(IDi ,Tw,PP) algorithm on each searchable index and return the associated Bloom filter BFIDi , then compute T= (Opub,Tw) and independent hash functions hi(T) where i = 1…r. Then S test BF in all r locations, if all r locations of all independent hash functions in BF are 1, the remote server returns the relevant encrypted file corresponding the IDi to U. In other words, searchable index ID can be used to check set membership without leaking the set items, and for accumulated hashing. The algorithm is executed by the user to generate a trapdoor as follows: 1: create the combined keywords set as Tq = {q1||q2||…||ql}.
2: compute trapdoor under U's private key as: Tw= (Tq) b 3: send the generated trapdoor Tw to the server.

Algorithm 5: SearchIndex(IDi,Tw,PP)
The algorithm is executed by the server S to determine whether a given Index IDi contains a conjunctive keyword Tq as follows: 1: compute T = ( (Opub,Tw)).  -More queries. After the above challenge query, is allowed again to query with the same restriction that Wi W0,W1, responds to these queries in the same way as before. Upon receiving the challenge, can call the algorithm SearchIndex(ID,Tw,PP) on the secure index ID to determine if the conjunctive keyword in the ID is the same of W or not.

Security Analysis
does not know the private key that is chosen independently of any conjunctive keyword, that's mean, at this stage, has no solution to distinguish the from 0 or 1.
-Output. Finally, returns its guess ' {0,1} indicating whether the challenge is the result of encryption process for W0 or W1. As this point, algorithm selects a random pair ( , ) from the -list and returns as its guess for ( ) abc , A must have deliver his query for either ( (W0, )) or ( (W1, )). Hence, with probability 1/2 -list includes a pair whose left hand is = (W , )= ( . If selects this pair ( ) from the -list then / = ( ) abc as required.
To complete the proof of theorem (4.1), we now use the same approach as in [4] to analyze the probability that does not abort during the above experiment. We define the following three events: -Ev1: does not abort during the Trapdoor queries.
-Ev2: does not abort during the Challenge queries.
We suppose that both events Ev1 and Ev2 occur with sufficiently high probability. Let we consider the first event Ev1, the probability of Ev1 is (1-1/(qT +1) q T ) 1/e, where 1/(qT +1) is the probability that a trapdoor query makes to abort. By Ev3, if does not abort during the simulation, it will choose a correct tuple in -list with probability at least 1/qH2 , and will produce the correct answer with probability at least /qH2. Overall by combining Ev1, Ev2 and Ev3, we have 's success probability is at least /eqTqH2.

Time Efficient:
In the former works, when the user U wants to retrieve documents containing each of several keywords, he must give the server trapdoors for each of the keywords individually and rely on an intersection operation. This solution is not desirable, it requires O(n) m search time, where n is the number of documents and m is the number of keywords in conjunctive string. In other words, the sever needs O(n) search time for each keyword in conjunctive string. While in our work, the user computes one trapdoor for all conjunctive keyword and sends it to the remote server and with one round over all conjunctive keywords, a server calls SearchIndex algorithm once on each trapdoor. Documents whose indexes match trapdoors are returned to U. The overhead of such Boolean queries is linear with the number of keywords in the Boolean expression, but can be completed with a one round over the document without needing intersection operation, whereas the naive solution of performing such boolean queries involves multiple rounds over the document. Hence the proposed scheme, based on Bloom filters [7], requires O(n) search time for all keywords in conjunctive string, and O(1) for communication cost. The scheme tests each BF only once per search. in other words, the time required to check whether a conjunctive keyword is present or not is independent of the number of keywords present in the set, we just need O(n) to find the r indexes using r hash functions.
The overall performance of our scheme includes the cost of index construction and the time necessary for searches. Updating or adding the documents require calling BuildIndex algorithm, which has cost linear in the size of the documents, while deleting the documents requires a constant time computation. bits to store all the items. Hence the server storage cost that required to store the Index ID is O(n) which is very small compared to the amount of data.  Figure 3 demonstrates the time consumption of a remote server for performing a search query with both normal conjunctive search and PKE-KFF-CKS on the encrypted Bloom filter. Obviously, with the increasing of the number of files n, the time spend using the proposed scheme is much less than the time spend using normal conjunctive search, in other words, the efficiency of PKE-KFF-CKS is far higher than normal conjunctive search method. Figure 4 shows that the time consumption using the normal conjunctive search grows linearly with the size of the query collection m, while the time consumption using the proposed scheme has little impact with the size of the query collection m, where the number of the keywords in the query m increases from 5 to 30. Furthermore, while the search cost is linear with the number of query keywords in other conjunctive keyword search schemes [11,19], PKE-KFF-CKS introduce nearly constant overhead while increasing the number of the keywords in the query.