LARGE-SCALE TAXONOMIC PROFILING OF EUKARYOTIC MODEL ORGANISMS: A COMPARISON OF ORTHOLOGOUS PROTEINS ENCODED BY THE HUMAN, FLY, NEMATODE, AND YEAST GENOMES

Arcady R. Mushegian, James R. Garey, Jason Martin, and Leo X. Liu

Genome Research 8, 590-598        Link  to alignments used in this paper

Summary
Comparisons of DNA and protein sequences between humans and model organisms, including the yeast Saccharomyces cerevisiae, the nematode Caenorhabditis elegans, and the fruit fly Drosophila melanogaster, are a significant source of information about the function of human genes and proteins in both normal and disease states.  Important questions regarding cross-species sequence comparison remain unanswered, including: (i) the fraction of the metabolic, signaling and regulatory pathways that is shared by humans and the various model organisms, and (ii) the validity of functional inferences based on sequence homology.  We addressed these questions by analyzing the available fractions of human, fly, nematode, and yeast genomes for orthologous protein-coding genes, applying strict criteria to distinguish between candidate orthologous and paralogous proteins.  Forty-two quartets of proteins could be identified as candidate orthologs.  Twenty-four  Drosophila protein sequences were more similar to their human orthologs than the corresponding nematode proteins.  Analysis of sequence substitutions and evolutionary distances in this dataset revealed that most C. elegans genes are evolving more rapidly than Drosophila genes, suggesting that unequal evolutionary rates may contribute to the differences in similarity to human protein sequences.  The available fraction of Drosophila proteins appears to lack representatives of many protein families and domains, reflecting the relative paucity of genomic data from this species.



Figure 1. The three possible topologies for a tree describing the evolutionary relationships between nematodes, arthropods and humans.  Tree A (blue) reflects the conventional interpretation of metazoan phylogeny with nematodes as a "protocoelomate" group basal to arthropods and humans.  This tree was supported by Neighbor-Joining analysis of 24 protein quartets as described in the text.  Tree B (red) represents the "Ecdysozoa" phylogeny derived from 18S rRNA gene sequences of a variety of nematodes and arthropods (Aguinaldo et al. 1997), and is supported by 11 protein quartets.  Tree C (green) is not expected from any metazoan phylogenetic hypothesis and is supported by a single protein quartet.  Average bootstrap values and their standard deviations are shown for each tree.



Figure 2.  Relative evolutionary rates of the 36 protein quartets subjected to phylogenetic analysis. Protein quartets supporting Tree A are shown in blue, those supporting Tree B are shown in red, and the quartet supporting Tree C is shown in green.  The protein name abbreviation is shown along the Y axis, and the proteins are plotted in order of the mean evolutionary distance of nematode to human and arthropod to human where nematode is C. elegans, and arthropod is D. melanogaster.  Proteins with the highest number of pairwise substitutions (fast evolving) are at the top and those with the lowest number of pairwise substitutions (slow evolving) are at the bottom.  The evolutionary distances along the X axis were determined from amino acid alignments using a Poisson correction as described in Methods. The dashed line represents the midway point where 18 proteins are above the line and 18 are below the line.   The key to the bars is shown within the figure. 



Figure 3.  Four-way relative rate plot of evolutionary distances for 36 proteins.  The ratio of evolutionary distances from (human-nematode)/(human-arthropod) for each protein is plotted on the X-axis, where a ratio of one would be expected if the proteins were evolving homogeneously in those branches assuming that arthropods and nematodes are sister taxa.  The ratios of evolutionary distance from (yeast-nematode)/(yeast-arthropod) are plotted on the Y-axis, which should equal one if a protein evolved homogenously in the nematode and arthropod lineages.  The position where the X-axis and Y-axis both equal one represents the region where genes would fall if they evolved homogeneously in all four taxa, if Tree B is correct.  Proteins to the right of the vertical line at X=1 should favor Tree A, proteins to the left should favor Tree C, while proteins falling near the diagonal line should favor Tree B.  The distribution of the 36 orthologous proteins is skewed, with those that yield Tree B (red, square) scattered uniformly around the diagonal line (with one exception: CDC42 supports Tree B but falls to the extreme right of the graph), while all of the proteins that yield Tree A (blue, diamond) are scattered to the right of the diagonal.  The quartet favoring Tree C is shown in green (triangle).