Hydrophilic interactions ; their Origin , their Magnitude and their Relevance to Protein Folding

A simple model is constructed in which the phenomenon of hydrophilic (HφI) interactions may be studied exactly. This model reveals the origin of the strong HφI interaction, as well as providing an approximate method of estimating its order of magnitude. The results are in agreementwith previous theoretical calculations, as well as with experimental data, and simulated calculations. The source of the non-additivity of the HφI interactions among three and four HφI interactions is examined. A few comments on the relevance of the HφI interactions to the process of protein folding are discussed.

These findings have a profound effect on our understanding of some fundamental biochemical processes such as protein folding, protein-protein association and molecular recognition. [11][12] Yet, in spite of the overwhelming evidence for these findings (from theoretical, experimental and simulations), they were largely ignored. If you open any modern textbook of biochemistry, biophysics or molecular biology, you will find statements about the "major importance of the effect," and on the "insignificant contribution of hydrogen bonding" in protein folding. 12 I believe that there are two main reasons for the continual lingering of the myth about the importance of the in protein folding.
The first is that most people in the field, including myself, have spent many years in studying, teaching and applying the effect, and it is very hard to deviate from the accepted paradigm regarding the dominance of the effect, even after strong evidence favoring the new paradigm involving effects became available.
The second reason is that both and effects are difficult to understand. As solvent-induced interaction the understanding of these effects depend on the knowledge of the notoriously difficult statistical mechanical theory of liquid water.
In 1959, Kauzmann 13 had proposed a simple, intuitive and very appealing, some even claim convincing 12 model for the effect. This was the Gibbs energy change for the process of transferring a non-polar solute from water into an organic liquid. It was known that the Gibbs energy change in this transfer process is large and negative. Kauzmann realized the analogy between this transfer process and the process of transferring a group of a protein from being exposed to water into the interior of a protein, Figure 1. If each of these groups of a protein contributes about -2 to -3 kcal/mol then one can expect a large negative change to the Gibbs energy of the process of protein folding. Kauzmann's model had captured the imagination and the conviction of most scientists. 14 No doubt this is the reason that most textbooks claim that the effect is the most important one in the process of protein folding.
In 1990, I have shown that Kauzmann's model was not adequate to the problem of protein folding. 2 It was also shown 1,5 that effects might be more important to protein folding. Unfortunately, the argument in favor of the effects were not easy to grasp and to accept. There has also been much confusion in identifying the effects with direct intra-molecular hydrogen bonding (HB).
The purpose of this article is to first clarify the difference between various solvent induced effects involving groups, second, to reveal the origin of the strong effects, and to provide an estimate of the order of magnitude of these effects. Finally, we shall comment on the relevance of effects to protein folding.
To achieve this goal a simple and solvable model has been designed. This is described in section 3. It should be stressed that this is not a model for aqueous solution. The purpose of the model is to clarify the origin of the interactions. In section 4, we discuss the interactions in real aqueous solutions. We show that the essence of the argument for the existence of strong in real systems is almost the same as in the simple model. Some concluding remarks are discussed in section 5. J u l y 2 0 , 2 0 1 3

Definitions and motivation for the constructing of a model for interactions
We consider a system of N water molecules at some temperature T and pressure P, and two solutes 1 and 2 at some fixed positions, and orientations in the solvent. The process of interest is to bring these two solutes from infinite separation = ∞ to some close distance R, Figure 2. The Gibbs energy change for this process is the same as the potential of mean force (PMF) between the two solutes 1,12. This can be written as where is the pair potential between the two solutes in vacuum. Usually, we also need to specify the relative orientations of the two solutes but this will be done for each of the effects separately. is referred to as the solventinduced contribution to the Gibbs energy change. The latter can always be expressed in terms of solvation Gibbs energies of all the species involved in the process. The second equality can be derived either theoretically 12 or simply from cyclic process in Figure 2.
When the solutes are hydrophobic ( ), say methane, ethane and the like, the quantity may be referred to as interaction. This effect has been discussed in great detail in references 11 and 12 and will not be discussed here. When the solutes are hydrophilic ( ), such as methanol, ethanol, etc., we refer to as the interaction. We also apply the quantity for two groups, say hydroxyl, carbonyl or amine, attached to some backbone molecule. This is important in the application of the to protein folding.
We shall next specify several effects, some of these will be further studied within the model described in the next section.

A. Formation of direct hydrogen bond(HB) between the two solutes
The relevant process is depicted in Figure 3. Two solutes are brought from infinite separation to a distance of about ~2.8Å, and with the correct orientations such that an HB is formed between the two solutes. The Gibbs energy change in this process is 2) J u l y 2 0 , 2 0 1 3 In this process the direct interaction is approximately equal to a HB energy, denoted . The solvent induced part is essentially equal to the loss of the solvation Gibbs energy of one "arm" on solute 1, and one "arm" on solute 2. This approximation is equivalent to viewing the solute as consisting of a few "arms" along which a hydrogen bond may be formed. For instance, amine group NH has one arm, carbonyl group, = has two arms, hydroxyl group, OH has three arms and a water molecule has four arms. In writing the approximation (2.2) it is assumed that only the solvation of the arms involved in the formation of HB should be taken into account. It is also assumed that the solvations of different arms on a single solute are independent. 15 Other parts of the molecules, the environment of which are unchanged will not contribute to the Gibbs energy change of the process. 1 Of course, in reality one needs to distinguish between a donor and an acceptor arms, but for the purpose of this article we ignore this distinction.
The solvation Gibbs energy of one arm was estimated based on a simple model for water, as well as from experimental data to be about -2.25 kcal/mol. 11,12 Taking to be of the order of -6 to -6.5 kcal/mol, we get for ∆ in (2.2) This is significantly different from what is inferred from the so-called HB inventory argument which was discussed in great detail in reference 11, 12 and 16. It should be noted that most people when discussing effects in protein folding are actually referring to the process of formation of an intra-molecular HB by groups. Thus, the HB inventory argument has not only underestimated the contribution of intra-molecular HBing, but also has led to the dismissal of other, possibly more important effects as discussed below.

B. interaction between two groups at a distance of about 4.5Å
This is perhaps the most important effect in both protein folding and protein-protein association. 11,12 Here, we consider the process of bringing two solutes from infinite separation to a distance ≈ 4.5Å, within the solvent. We also require that the orientations of these solutes is such that one arm of solute 1 and one arm of solute 2 can form HBs with one water molecule, Figure 4. The Gibbs energy change for this process is In this process we assume that the direct interaction between the two groups is relatively small. As before, the solvent-induced part of ∆ is due only to the solvation of those parts of the solute which are changed in the process, specifically, the two arms on the two solutes. As in the process depicted in Figure 3, we assume that the types of arms (donor or acceptor) match when a HB is formed.
If we look superficially at the process as depicted in Figure 4, we see in both the initial and the final states, that the arm of solute 1 can form a HB with water molecule, and the arm of solute 2 can form a HB with water molecule. It is easy to conclude that the solvation of the two solutes will not change in the process, hence there is no reason to expect any significant solvent-induced effect in this process. We shall see in the next section that this conclusion is false. It arises from confusing a HB energy with solvation Gibbs energy of a HBing arm. This is the same error committed in the HBinventory argument, 12 and the main reason that has hindered understanding this effect.

C. interaction between three groups
As an extension of the pairwise interaction discussed above considers the following process. We start with three solutes (or groups on a protein) initially at infinite separation from each other, to a final configuration as shown in Figure 5. J u l y 2 0 , 2 0 1 3 process between three arms belonging to either three solutes or to three groups on a protein.

Only the final configuration is shown.
At this configuration the three arms of the three solutes can form three HBs with a water molecule. The Gibbs energy change for this process is Again, we can assume that the direct interaction between the three solutes is relatively small and the main contribution to the Gibbs energy change is the indirect, solvent-induced part denoted 3 .
A superficial examination of this process could lead to the erroneous conclusion that 3 should be negligible. The argument; three arms of the solutes form three HBs with water molecules in both the initial and the final state. Therefore, one would not expect a large contribution from this effect. As before the fallacy of this conclusion is confusing a HB energy with solvation Gibbs energy of a HBing arm of a solute (or group). We shall see in the next section the origin of this confusion, and also find that 3 is even larger than 2 . We shall also see that 3 is not pairwise additive.

D.
interaction between four groups The final process that we shall study in the next section is the interaction between four solutes or groups. We consider the process of bringing four solutes from infinite separation to the final configuration as depicted in Figure 6. The four solutes in the final configuration can form four HBs with a water molecule. The corresponding Gibbs energy change is We shall study this interaction in the next section within a simple model system. We shall see that this effect is larger than the interaction between three groups. We shall also see that this interaction is not pairwise additive. J u l y 2 0 , 2 0 1 3

E. Other stronger and longer range interactions
There is a host of other solvent induced interactions, some are stronger than the ones discussed above, and some have longer range. We shall not discuss these in this article. They were discussed in detail in reference 12.

A "minimal" model showing the origin of the interactions
The model described below was specifically designed so that it is simple enough to be exactly solvable, yet contains enough features that enable us to study the molecular origin of the effects, to estimate the order of magnitude of these effects, and to examine the source of the non-additivity of the effects. It should be emphasized, however that we do not consider this model to be reliable for estimating the effects in real aqueous solutions. It is used here for demonstrating the origin of the various effects. The estimated values of the effects will be done in the next section (see also reference 12).
As we have seen in the previous section all of the interactions may be expressed as differences in solvation Gibbs energies of solutes at various configurations. Each solvation Gibbs energy of a solute s has the form 11,12 Here, is the Boltzmann constant, T the absolute temperature. = −1 . is the binding energy, i.e. the total interaction energy of the solute s with all solvent molecules at a specific configuration where , is the solute-solvent pair potential.
The average 0 in (3.1) is over all possible configurations of the solvent molecules with the probability distribution of water molecules in the absence of the solute s.
In (3.3) we have taken the average in the (T, V, N) ensemble, whereas in (3.1) one should take the average in the T, P, N ensemble. However, in actual aqueous systems we can neglect the difference between the Gibbs and the Helmholtz energy of the solvation. 11 As we can see from (3.1) the solvation Gibbs energy depends on the solute-solvent interaction through , and the solvent-solvent interaction through the distribution function 0 in (3.4). For the study of the interactions we need to examine the difference between solvation Gibbs energies. In such difference the more important factor is the solutesolvent interaction. Therefore, in the model described below we take into account only solute-solvent interactions and neglect solvent-solvent interaction. The latter assumption renders the solvability of the model.

Figure 7.A 2D system of M cells, and N water-like molecules.
We consider a lattice system of M cells and N water-like molecules (N<M). Figure 7 shows a 2D depiction of the system. Each cell can be occupied only by one water molecule at a fixed point, i.e. it lacks translational degrees of freedom. Each water molecule has four arms along which it can form HB with an arm of a solute molecule, but we neglect HBing between water molecules. We further assume that each water molecule can be in one of n different orientations.
For example, Figure 8 shows the case of n=10 and the configurations are described by the angle  We now place a solute at a fixed cell. The solute has one arm along which it can form a HB with another solute or with one water molecule being in the correct orientation.
For a system of N water molecules and one solute at a fixed cell and having a fixed orientation, the partition function is In this system there are only two energy levels; either the solute forms a HB with a water molecule, or not. We assume that the water molecule can form a HB with the solute molecule only when it is in one out of the n orientations, say i=0 in Figure 8. In this case, the PF has the form The first terms on the rhs of (3.8) corresponds to all configurations for which a HB is formed between the solute and a water molecule. The second term corresponds to all configurations for which a water molecule is in the cell next to the arm of the solute but it does not form HB. The third term is the total number of configurations for which there is no water molecule at the site next to the arm of the solute.
The Gibbs energy change for placing a solute at a fixed cell in this system is thus For very large N and M we get from (3.9) where = / is the fraction of cells occupied by water molecules. This is also the probability of finding a water molecule at a specific cell. The first term on the rhs of (3.10) is the Gibbs energy change to form a "cavity" in the system, which in our case is an empty cell. This is also the limit of ∆ * in (3.10) for the case = 0, hence In a real solvent this is the so-called "cavity work". 11,12 Here, it is the work required to find a cell empty. This is zero when → 0, but it is infinite when → 1, i.e. all cells are occupied. Note also that this quantity is essentially entropic, and it does not depend on the interactions in the system.
For the next analysis we shall not need the "cavity work" term. This will always cancel out in the process of interactions between groups on a protein. We always take into account the conditional solvation Gibbs energy of the group 11,12 , see also section 4.
We now examine the Gibbs energy change for the processes discussed in section 2. To get an order of magnitude for the different terms involved we rewrite the solvation Gibbs energy of one arm of a solute as First we examine two limiting cases: 1.When → 0. This can occur either when the density of water is too low → 0 , or that n is very large 1 → 0 . In this limit, as expected, there is no solvent-induced effect, i.e. If each arm forms a HB in both the initial and the final configuration, then we break to HBs and make two HBs and therefore the net effect is = 0. This is essentially the origin of the confusion that exists between the formation of a HB and the solvation by HBs.
In reality the HB energy is different from the Gibbs energy of solvation of an arm. To see the origin of the interactions we take the case of ≪ 1, but − ≫ 1, hence (3.14) reduces to Thus, for small value of (but not too small), we see that the main origin of the effect is that in the initial state the solvation Gibbs energies of the two solutes involve the factor − 2 , and the probability 2 , i.e. two water molecules must be in the correct configuration to form HBs with the two solutes. On the other hand, in the final configuration we also have the factor − 2 , but now only one probability factor .
The case discussed above reveals the main origin of the occurrence of interaction. We shall further discuss the general case in the next section. Thus, if we take the value of estimated from (3.12) we get for the interaction in . This is quite similar to the value of the interaction which was estimated from either theory or experimental data 11,12 . For more realistic estimates see next section.  Figure 10 shows the values of 2 as a function of for = 298 and = −6 / . We see that in both limits → 0, and → 1 2 is zero. However, for small values of , we can obtain strong interactions, which are of the same order of magnitude as the HB energy. Figure 11. a) The temperature dependence of for two values of . b) The same as a, but for . c) The same as a, but for . Figure 11a shows the dependence of 2 on the temperature for two values of .
We next examine the interaction between three solutes, or three arms on three groups of a protein. As before, in the two limits → 0, and → 1 we have: However, for a small value of , but not too small we can write 3 as  Finally, we address the question of additivity of the interaction. This is an important aspect of any solventinduced effect. It has been discussed in connection with the interactions in reference 16. Here, we examine the nonadditivity of the interaction within the simple model system.
We define the extent of non-additivity with respect to pairs of groups as  Figure 12 shows the values of the extent of non-additivity for three and four groups. As can be seen from Figure  12 the non-additivity of the interactions are positive and of the same order of magnitude as .
To conclude, we have seen that we can demonstrate the origin of some effects in an extremely simple model. We do not consider this to be a realistic model. However, it is remarkable that the estimated values obtained from this model are of similar order of magnitude as in a real system.

The interaction for real solutes in liquid water
We consider again a system of N water molecules in a volume V and temperature T. The solvation Helmholtz energy of a solute s is given by The solute s is assumed to have at least one arm along which it can form a HB with water molecules. We write the solute-solvent pair potential as where , is the HB part of the interaction associated with the particular arm we are considering. * * is defined in equation (4.2). The total binding energy can be written as The meaning of the two terms on the rhs of (4.4), correspond to splitting the solvation process of the solute s into two steps. First, we solvate the * * part of the interaction, the corresponding Helmholtz energy of solvation is ∆ * * . Next, we solvate the HBing of the arm. The corresponding work is the conditional solvation Helmholtz energy of the arm given that all other parts of the interactions were already solvated.
A particular simple case is when the solute s is a hard sphere plus one arm. In this case * * is simply the hard sphere pair potential. The corresponding solvation Helmholtz energy for this solute is In (4.5) the first term is the solvation Helmholtz energy of a hard sphere, or the cavity work. This is the analog of the work required to find an empty cell in equation (3.10). The second term is the conditional solvation Helmholtz energy of the HBing of an arm given that the hard part of the interaction has already been solvated.
As we have done in section 3, we focus on the change in the conditional solvation of one arm on each solute, or on the group of the protein.
Focusing on the second term of either eq. (4.4) or (4.5) we can simplify the average quantity as follows: where is the conditional distribution of the configuration of the water molecules, given the HS at some specific location in the solvent. For the more general case, the conditional distribution is where the condition is the presence of the backbone (BB) of the protein. J u l y 2 0 , 2 0 1 3 The advantage of focusing on the conditional average in (4.6) is that now the integral can be split into two regions, in one the integration is over all configurations of water molecules such that there is one water molecule at the correct configuration near the arm to form a HB. The second includes all other configurations.
This can easily be seen by rewriting the average in (4.6) as where is the distribution of the values of the binding energies of the solute, i.e.
Thus, is the probability of finding the binding energy of the solute between and + 11 A water molecule can form a HB with the arm only from a small region of the configurational space. Therefore, we rewrite the integral in (4.6) or (4.7) approximately in two terms Here we assume that the integrand can have only two values; when the arm is hydrogen bonded we have = , and when it is not hydrogen bonded = 0.
is the probability of finding a water molecule in the correct location and the correct orientation to form a HB with the arm. The probability of the complementary event is 1 − . As in section 3, the probability depends both on the density and on solvent-solvent interactions. This has important consequences regarding the entropy and the enthalpy changes associated with the interaction. This aspect will be discussed briefly in the next section.
For estimating the value of 2 the argument provided in section 3 is essentially the same with one important difference. Using the expression (4.9) for the conditional solvation for the initial and the final configurations we get the approximate expression This is almost the same as equation (3.16) except for one important difference. In the model used in section 3 we have assumed that the probability of finding a water molecule in the correct configuration to form a HB with one arm, (1) , is the same as the probability of finding a water molecule in the correct configuration to form two HBs with the two arms, (2) .
However, in reality these two probabilities are different. Using a very simple model for water (denoted BN3D in reference 11) we have estimated that the ratio of (1) / (2) is about 360/20, i.e. (2) is smaller than (1) by a factor of about 18. This is due to the fact that in (1) we take into account the full rotation about the axis along which a single bond is formed, see Figure 13a. On the other hand only a small fraction of these configurations contribute to the formation of two HBs, Figure  13b. Assuming again that (1) and (2) are small compared to 1 and that the factor − (1) is large compared with one, we can get a rough estimate; for (1) 2 ≈ −2 / and for = −6.5 / 2 ≈ −2.6 / (4.11)