For the GxxxG motif INK1197 cost region, there is always going to be evidence of phylogenetic signal due to the strongly conserved glycine residues (30.7% identical for GxxxGxxxGxxxG) and there is certainly some conservation in the lengths of the repeats in sequences that are more closely related (Figures 4 and 5). However, the imposed 25% sequence identity cutoff in our data analysis has filtered most of the apparent sequence similarity in the variable regions of the repeat. This can be seen by comparing the similarity between
any two aligned sequences both within the repeat region (Figure 5) and outside of the repeats (see Additional files 1 and 2). For FliH, we calculated correlation coefficients between all possible pairs of amino acids, in all possible combinations of positions in the repeats, and used statistical methods to determine whether certain pairs of amino acids
in specific positions are found together significantly more often than would be expected by chance. We hypothesized that certain pairs of amino acids in nearby positions, such as positions within the same repeat, or in adjacent Selleckchem A1155463 repeats, would be highly correlated, while amino acids in positions farther away from each other would be unlikely to be strongly correlated, and that the correlations are due to selective pressure imposed by structural constraints on the GxxxG motifs. For instance, in α-helices, there is a well known incidence of oppositely charged residues (for example glutamate and lysine) occurring in i, i+4 or i, i+3 pairs, therefore Sepantronium chemical structure forming stabilizing intra-helical salt bridges, and these are typically not highly conserved interactions. Farnesyltransferase Rather they appear to be the result of random mutations and selective pressures to stabilize nearby charged residues within the context of the helical structure. Similar results have been found for pair correlations in β-sheets [37]. Figure 4 Number of FliH sequences having primary repeat segments of different lengths. The number of FliH sequences having primary repeat segments of different
lengths is shown. The number on the x-axis represents only the number of GxxxGs; flanking AxxxGs and GxxxAs were not counted. Figure 5 Multiple alignment of the primary repeat segments from the FliH proteins of different organisms. The primary repeat segments in the FliH proteins were aligned by hand. Only sequences that contained a repeat segment appear in this alignment. Finally, we sought to determine how prevalent long glycine repeats are in other types of proteins not related to FliH, and to identify a protein of known three-dimensional structure that contains a FliH-like repeat segment that is involved in helix-helix dimerization. To address both goals, a large number of protein structures were downloaded from the Protein Data Bank (PDB; http://www.rcsb.org/pdb).