A Group Collaboratable Proof of Retrievability Scheme for Cloud Data Storage

Cloud computing and cloud data storage have become important applications on the Internet. An important trend in cloud computing and cloud data storage is group collaboration since it is a great inducement for an entity to use a cloud service, especially for an international enterprise. In this paper we propose a cloud data storage scheme with some protocols to support group collaboration. A group of users can operate on a set of data collaboratively with dynamic data update supported. Every member of the group can access, update and verify the data independently. The verification can also be authorized to a third-party auditor for convenience


INTRODUCTION
Cloud computing (includes cloud data storage) allows users to obtain required services speedier than ever and easily to expand the services they need, with cheaper software acquisition and hardware maintenance costs. It also provides a flexible and convenient environment for users to access data and services. Furthermore, cloud computing allows people to create potential brand-new applications, such as automatic data backup and cross-regional group collaboration. Recently, group collaboration has become an important trend in cloud computing, and has been acknowledged as one of top 10 cloud computing trends for the decade [16]. In this paper, we propose a cloud data storage scheme with some protocols to support group collaboration. The proposed scheme possesses the following properties:  Proof of retrievability: A Cloud storage service provider can provide a proof to a user to ensure that the data stored in the cloud servers are intact and retrievable.
 Fully dynamic data updatable: Users can make changes, including insertion, modification, deletion, and appending, to their data at their well at anytime.  Fully anonymous: Non-group members cannot identify that a data change is made by which group member, and cannot distinguish that whether two data are modified by the same or different group members.
 Fully traceable: The group manager can find out that a data change is made by which group member, even though the change is generated by the collusion of multiple members.

RELATED WORKS
Early related works focused on peer-to-peer (P2P) network data storage problems. Lillibridge et al. proposed a scheme for P2P data backup by using (m+k,m)-erase codes to distribute file blocks to m + k peer hosts [14]. Filho et al. used RSAbased hash functions to verify data integrity, achieving undeceivable data authentication in P2P networks [10].
Ateniese et al. introduced the model of provable data possession (PDP) [1]. The main intention of PDP is to confirm the accuracy of data stored in untrusted storage servers. In the following research, Ateniese et al. used traditional symmetric key encryptions to construct their PDPs [3], providing more efficiency than the previous scheme, and supporting dynamic data file block appending and modification. Curtmola et al. extended the PDP model to multiple data replicas across distributed storage systems [6]. Their scheme can ensure the integrity of data without encoding each replica separately. Juels et al. introduced the proof of retrievability (PoR) model to ensure the integrity of remote data [13]. Their scheme utilized error-correcting codes and pseudo-random dispersed checking blocks to ensure both possession and retrievability of data. Shacham et al. extended this PoR model with a random linear function, called homomorphism authenticator, and presented a PoR scheme without limitation on the number of verifications [17]. Bowers et al. generalized Juels's model and Shacham's model and presented an improved PoR scheme [4]. Latter, Bowers et al. expanded their scheme to a distributed architecture [5]. Dodis et al. linked PoR with the well studied topic hardness amplification in complexity theory and defined a purely information-theoretic notion of PoR codes [7]. Some improved PoR codes were also introduced. Recently, Wang et al. proposed a public auditing scheme for cloud data storage with zero knowledge privacy based on an aggregatable signature based broadcast (ASBB) encryption scheme [18]. Esiner1 et al. proposed a PoR scheme based on a data structure called FlexDPDP to impove the effeciency [9]. Han et al. proposed a PoR scheme with efficient aggregatable operations based on Maximum Rank Distance (MRD) codes [11].
Using erasure correction codes and homomorphism symbols, Wang et al. proposed a decentralized scheme which supports dynamic data modification, deletion, and appending (but no data insertion supported) [20]. Data errors can be detected and trying to find the location of the error blocks, recovering errors efficiently. In their follow-up study, Wang et al. utilized the Merkle hash tree (MHT) data structure to improve the PoR model, supporting both public verification and fully dynamic data update [19].
MHT is a widely employed authentication model, which is intended to effectively authenticate a set of data from an untrusted source with a small amount of trusted information [15]. A MHT is constructed as a binary tree, in which the leaves are the hashes of authentic data values. Figure 1 illustrates an authentication of a MHT with seven leaves. A verifier with the authentic root R requests for the data B2 and B5. Apart from providing the data, the prover also prepares Around the same time, Erway et al. extended the PDP model to a fully dynamically updatable scheme [8]. They utilized the skip list data structure to support dynamic data updates, in particular for data insertions. Wang et al. extended their scheme to a public-key encryption-based scheme which supports public verification and fully dynamic data update, and ensures no privacy leakage during public verifications [21]. To increase efficiency, this scheme also utilized aggregate signature to combine multiple verifications into one verification. Ateniese et al. proposed a general construction of publickey homomorphic linear authenticator (HLA) [2]. Any identification protocol can be transformed into a public-key HLA as long as the protocol satisfies appropriate conditions. Around the same time, Itani et al. presented some protocols to ensure the privacy of individual users in cloud data storage [12]. Recently, Zhou et al. proposed an attribute-based cloud data storage scheme for mobile devices [23]. However, they did not consider PDP or PoR. M a y 2 2 , 2 0 14

ARCHITECTURE
A cloud data storage architecture for group collaboration is illustrated in Figure 2. In the architecture, messages between entities are transmitted by secured channels. The architecture consists of the following entities:  User: A user is an individual who stores data in the cloud storage system. A user has full access right to his data and can verify the integrity and retrievability of the data at any time. A user may belong to one or more groups. In each group there is a group manager who is able to manage group members and trace a signature of the group to a member. All members of a group may work on a set of data on behalf of the group.
 Cloud Storage Service Provider (CSSP): A CSSP is an organization which possesses abundance of resources and expertise in order to construct and maintain a cloud data storage system.
 Third-Party Auditor (TPA): A TPA is an agency authorized by users or groups to verify the integrity and the retrievability of their data. In the cloud data storage architecture, TPA is an optional entity in the architecture.

Figure 2. Cloud data storage architecture for group collaboration
A CSSP provides a vast amount of storage space that shared by all users. The storage space is usually constructed by multiple storage servers in a distributed manner. All storage servers work simultaneously and collaboratively. In order to ensure the accuracy of stored data, appropriate redundancies may be stored in the storage servers for the usage of error correction codes or erasure correction codes to prevent data loss due to accident or deliberate destruction. Users can access their private data through the interfaces provided by the CSSP and manipulate the data with appending, insertion, modification, and deletion operations according to their well. The redundancies in the storage servers must be adjusted immediately corresponding to the changes made by these operations. In order to let users fell relieved to store their data on the cloud storage system, there must be some ways to convince users that the data stored in the cloud are intact, confidential, and retrievable. For this purpose, we need an efficient method by which users can verify their data, and the verification will cause little computational load to the storage servers. Furthermore, the amount of the message transmitted between the users and the CSSP for the verification is as small as possible. When data are shared by a group of users, i.e. under group collaboration, each member can read, modify, and verify the shared data independently. The behavior of a group member should be concealed from outside of the group. That is, an entity outside a group cannot identify that a modification was made by which group member, or distinguish that two modifications were made by the same or different group members. However, the behavior of a group member should be traceable inside the group, i.e., given a modified data block, the group manager can disclose that which group member had made the modification.
A TPA is an institution trusted by users and has capability and expertise that users may not have. Users may not have sufficient capacity and resources, e.g. time, computing power and network bandwidth, to verify the accuracy of the data M a y 2 2 , 2 0 14 stored in the cloud. In such a situation, a TPA can be authorized by users to verity their data immediately or periodically, and report results to the corresponding user.

PRPPOSED SCHEME AND PROTOCOLS
As described in the previous section, users and TPAs must have an efficient way to verify specific data stored in the cloud.
In the proposed scheme, verifications are carried out by a challenge-response interaction. A user or a TPA can submit a request to the CSSP as a challenge. The CSSP then computes a set of values corresponding to the challenge and sends it back to the user or the TPA as a response. If the response coincides with the knowledge about the data, then it has proved that the data stored in the storage servers are intact and retrievable.
We adopt the methods in [19] and [22] to design these procedures. In the beginning, the scheme is initialized with Setup by a trusted party, generating public parameters. GrpSetup is called by a group manager to generate secret and public keys for a new group. A user calls Join to join a designed group, and calls ReqPermit to request one-time signing permits from the group manager when needed. When a user wants to store a file to the cloud storage, the file is first split into blocks of constant size, and then SigGen is called to generate metadata of each file block, then the file and its metadata are transferred to the server. In the execution of SigGen, it will call Enc to encrypt the file and call Sign to generate signatures of the file. We can use standard cryptographic primitives such as AES to implement Enc. When a user reads a file block, he first verifies it with Verify, and then decrypts it with Dec. When a user or a TPA wants to check the retrievability of a file, ReqProof is called to send a request to the CSSP. The CSSP uses GenProof to generate a proof according to the request and send it back to the requester, who then uses ChkProof to validate the proof. The proof includes an aggregated signature which is generated by Aggregate and can be verify by AggVerify. The purpose of using aggregated signature is to enhance verification efficiency and to save communication bandwidth. To update a file, a user sends an update request to the CSSP via ReqUpdate. The CSSP accomplishes the update with ExecUpdate and sends a proof for the update back to the user. The user then checks the proof by ChkUpdate. The manager of a group can use Open to find out which group member lastly updated a file block. The details of these procedures are described as follows.  Join(PU , usrinfo, grpinfo): A user uses this procedure to join a group where PU is user's public key and usrinfo is the user description. Before calling this procedure, the user selects sU R Zp * as private key, and computes PU = sUP in advance. If the manager agrees the request, he will compute a certificate Cert sAH1(grpinfo || PU || T ) for the user, where T is a time period for this certificate. The manager will then record PU, usrinfo, Cert and T to database and send back Cert, T, PA and sE to the user.   SigGen(sE , xisU , xiPU , Si 1 < i < n , F): A user utilizes this procedure to generate verification metadata for a file F = B1 , B2 , . . . , Bn . Here we assume that the file is pre-processed with Reed-Solomon code and divided into n blocks. The procedure first executes Enc(sE , Bj) to encrypt each file block Bj to Bj′, then obtains j = Sign(xi sU , xiPU , Si , Bj′) with a randomly chosen signing permit. Note that each signing permit should be used only once, and here we assume k > n. Finally, the procedure constructs the MHT from these Bj′ and then computes the signature RF of the MHT root RF.
When this procedure is finished, the user sends CSSP the file description fileinfo, the encrypted file F′ = Bi′ , the set of signatures Φ = i , and the signed MHT root RF corresponding to the file. The user doesn't need to keep the file and its signature, thus he can delete them from his local storage. Note that one file needs only one execution of this procedure. The procedure first retrieves the file blocks and their signatures from F′ and Φ according to the set of block indices I = (i1, . . . , im) contained in chal, and calls Aggregate( j , H1(Bj) i1 < j < im , PA , grpinfo) to obtain an aggregated signature (S , SH , Pxj , H1(Bj) i1 < j < im , PA). Then it computes some auxiliary authentication information (AAI) Aj i1 < j < im based on I and the MHT of the file where Aj is the set of node siblings on the path from the leaf H1(Bj′) to the root RF. Finally, the procedure returns V = (S , SH , Pxj , H1(Bj′), Aj i1 < j < im , PA , grpinfo, RF ) where RF is the signature of RF.
After the procedure is complete, the CSSP sends V to the verifier for verification.
 ChkProof(V): A user or a TPA uses this procedure to validate the proof V. The procedure first verifies the aggregated signature with AggVerify. Then it obtains the MHT root RF′ with H1(Bj′), Aj i1 < j < im , and checks whether RF is a valid signature of RF′. The procedure accepts the proof if all these checks are successful, otherwise rejects the proof.
 ReqUpdate(fileinfo, inst): A user updates a file with this procedure. The parameter inst consists of an instruction (modify, insert, or delete), a block index, a new block and its signature (optional) for the update. Data appending can be accomplished by insertion.
 ExecUpdate(F′, Φ, inst): The CSSP utilizes this procedure to accomplish the update of the file F according to the parameter inst. The operation includes storing the new file block and its signature (modify and insert) or removing the designate block and its signature (delete), and then adjusting the MHT of the file accordingly. We adapt the method in [19] to perform this adjustment. After the update is complete, the procedure returns a proof V = (H1(Bi′), Ai , RF′ , RF) which will then passes to the requester, where i is the index of the updated block, Bi′ is the original block, RF′ is the signature of the MHT root before update, and RF is the MHT root after update. Thus the AggVerify procedure can also successfully verify the aggregate signature.

SECURITY ANALYSIS
The security of the proposed scheme is mainly based on the hardness of the elliptic curve discrete logarithm problem and some related problems on elliptic curves.
Elliptic curve discrete logarithm problem (ECDLP): Given two points P and Q in a rational point group on an elliptic curve, find an integer k such that kP = Q. k is called the discrete logarithm of Q to the base P. If the DDHP in G1 can be solved in polynomial time, but the ECDLP and CDHP cannot be solved in polynomial time, G1 is called a gap Diffie-Hellman group. Our scheme is based on such a group.
The proposed scheme may suffer from several types of attack. The first type of attack is to obtain the private key of a group or a group member. The second type of attack is to forge the data stored in cloud storage. The attacker may be an external attacker, the CSSP itself, an individual within the group, or some colluded members within the group. The third type of attack is to steal the data stored in cloud storage. The fourth type of attack is to destroy the integrity of the data stored in cloud storage. Type I attack: obtain a private key 1. Obtain the private key of a group. There are two ways to obtain the private key of a group. The first way is to compute the private key sA from the group public key PA = sAP. However, this is equivalent to solve the eclipse curve discrete logarithm problem. The second way is that a member of the group can try to compute sA from his certificate Cert = sAH(grpinfo || PU || T ) or from an one-time signing permit Si = sAH(grpinfo || xiPU || T ). However, this is also equivalent to solve the eclipse curve discrete logarithm problem.
2. Obtain the private key of a group member. There are two ways to obtain the private key of a group member.
The first way is to compute the private key sU from his public key PU = sUP. However, this is equivalent to solve the eclipse curve discrete logarithm problem. Another possible way is to solve the private key from a signature i = (SG , xiPU , T ) signed by the member where SG = xi sU H(B) + sAH(grpinfo || xiPU || T ). Since every signature uses a distinct xi, solving this equation is equivalent to solve the eclipse curve discrete logarithm problem. It is no exception even the group manager knows sA.

Type II attack: forge the data stored in cloud storage
An adversary may fake a data block, and then forge the signature of this data block. To successfully fake a data block, a legal signature must be forged. Suppose the signature of a faked data block is i = (SG , xiPU , T ). this signature must pass the following examination

DISCUSSIONS AND CONCLUSIONS
Cloud computing and cloud data storage have become important applications on the Internet. Technology giants such as Microsoft, Amazon, Google, IBM, Cisco, and Dell have invested in developing cloud computing and data storage technologies and services. In such a development, group collaboration is an important trend since it is a great inducement for an entity to use a cloud service, especially for an international enterprise. In this paper we propose a cloud data storage scheme with some protocols to support group collaboration. A group of users can operate on a set of data collaboratively with dynamic data appending, insertion, modification, and deletion operations. Every member of the group can access, update and verify the data independently. The verification can also be authorized to a TPA for convenience. The TPA cannot learn information about the data and the user. However, the group manager can find out which member lastly updated a file block.
The security of our scheme is based on the hardness of the ECDLP and the CDHP on finite fields. In order to provide sufficient security (approximately the same as the standard 1024-bit RSA signature), we can use elliptic curves over finite field Fq with embedding degree 6 where q is approximately 170 bits long. G1 of prime order p is a subgroup of E(Fq) where p is also of length approximately 170 bits. Therefore, the ECDLP in G1 is as hard as the DLP in finite field of length approximately 1020 bits. For a data block, the signature is of size about 350 bits, or 44 bytes. For a 1GB file with data block of size 8KB, there are totally 131,073 signatures (plus the one for the MHT root), resulting about 5.5MB overhead to be stored in the cloud storage for this file. Signing a file block needs one hash operation, one elliptic curve multiplicative operation, and one elliptic curve additive operation. Verifying a file block costs two hash operations, three pairing operations, and one finite field multiplicative operation. The size of an aggregate signature is linear to the number of data blocks requested. To aggregate m signatures it takes m hash operations and totally m -1 elliptic curve additive operations.
Verifying an aggregate signature with m signatures aggregated spends m + 2 pairing operations and 2m -1 finite field multiplicative operations, and it brings around 132 + 88m bytes communication throughput for this verification. Overall speaking, our scheme is pretty efficient.