Comparative Genomics of the Archaea

Comparative Genomics of the Archaea

(Euryarchaeota): Evolution of Conserved Protein

Families, the Stable Core, and the Variable Shell

Kira S. Makarova1,2,4, L. Aravind1,3, Michael Y. Galperin1, Nick V. Grishin1,

Roman L. Tatusov1, Yuri I. Wolf1,4, and Eugene V. Koonin1,5

1 National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health,

Bethesda, Maryland 20894 USA; 2 Department of Pathology, F.E. Hebert School of Medicine, Uniformed Services University of

the Health Sciences, Bethesda, Maryland 20814-4799 USA; 3 Department of Biology, Texas A&M University, College Station,

Texas 70843 USA

Comparative analysis of the protein sequences encoded in the four euryarchaeal species whose genomes have been sequenced completely (Methanococcus jannaschii, Methanobacterium thermoautotrophicum, Archaeoglobus fulgidus, and Pyrococcus horikoshii) revealed 1326 orthologous sets, of which 543 are represented in all four species. The proteins that belong to these conserved euryarchaeal families comprise 31%–35% of the gene complement and may be considered the evolutionarily stable core of the archaeal genomes. The core gene set includes the great majority of genes coding for proteins involved in genome replication and expression, but only a relatively small subset of metabolic functions. For many gene families that are conserved in all euryarchaea, previously undetected orthologs in bacteria and eukaryotes were identified. A number of euryarchaeal synapomorphies (unique shared characters) were identified; these are protein families that possess sequence signatures or domain architectures that are conserved in all euryarchaea but are not found in bacteria or eukaryotes. In addition, euryarchaea-specific expansions of several protein and domain families were detected. In terms of their apparent phylogenetic affinities, the archaeal protein families split into bacterial and eukaryotic families. The majority of the proteins that have only eukaryotic orthologs or show the greatest similarity to their eukaryotic counterparts belong to the core set. The families of euryarchaeal genes that are conserved in only two or three species constitute a relatively mobile component of the genomes whose evolution should have involved multiple events of lineage-specific gene loss and horizontal gene transfer. Frequently these proteins have detectable orthologs only in bacteria or show the greatest similarity to the bacterial homologs, which might suggest a significant role of horizontal gene transfer from bacteria in the evolution of the euryarchaeota.