Experimental validation of enzyme function is crucial for genome interpretation, but it remains challenging because it cannot be scaled up to accommodate the constant accumulation of genome sequences. We tackled this issue for the MetA and MetX enzyme families, phylogenetically unrelated families of acyl-L-homoserine transferases involved in L-methionine biosynthesis. Members of these families are prone to incorrect annotation because MetX and MetA enzymes are assumed to always use acetyl-CoA and succinyl-CoA, respectively. We determined the enzymatic activities of 100 enzymes from diverse species, and interpreted the results by structural classification of active sites based on protein structure modeling. We predict that >60% of the 10,000 sequences from these families currently present in databases are incorrectly annotated, and suggest that acetyl-CoA was originally the sole substrate of these isofunctional enzymes, which evolved to use exclusively succinyl-CoA in the most recent bacteria. We also uncovered a divergent subgroup of MetX enzymes in fungi that participate only in L-cysteine biosynthesis as O-succinyl-L-serine transferases.
Red seaweeds are key components of coastal ecosystems and are economically important as food and as a source of gelling agents, but their genes and genomes have received little attention. Here we report the sequencing of the 105-Mbp genome of the florideophyte Chondrus crispus (Irish moss) and the annotation of the 9,606 genes. The genome features an unusual structure characterized by gene-dense regions surrounded by repeat-rich regions dominated by transposable elements. Despite its fairly large size, this genome shows features typical of compact genomes, e.g., on average only 0.3 introns per gene, short introns, low median distance between genes, small gene families, and no indication of large-scale genome duplication. The genome also gives insights into the metabolism of marine red algae and adaptations to the marine environment, including genes related to halogen metabolism, oxylipins, and multicellularity (microRNA processing and transcription factors). Particularly interesting are features related to carbohydrate metabolism, which include a minimalistic gene set for starch biosynthesis, the presence of cellulose synthases acquired before the primary endosymbiosis showing the polyphyly of cellulose synthesis in Archaeplastida, and cellulases absent in terrestrial plants as well as the occurrence of a mannosylglycerate synthase potentially originating from a marine bacterium. To explain the observations on genome structure and gene content, we propose an evolutionary scenario involving an ancestral red alga that was driven by early ecological forces to lose genes, introns, and intergenetic DNA; this loss was followed by an expansion of genome size as a consequence of activity of transposable elements.
Bananas (Musa spp.), including dessert and cooking types, are giant perennial monocotyledonous herbs of the order Zingiberales, a sister group to the well-studied Poales, which include cereals. Bananas are vital for food security in many tropical and subtropical countries and the most popular fruit in industrialized countries. The Musa domestication process started some 7,000 years ago in Southeast Asia. It involved hybridizations between diverse species and subspecies, fostered by human migrations, and selection of diploid and triploid seedless, parthenocarpic hybrids thereafter widely dispersed by vegetative propagation. Half of the current production relies on somaclones derived from a single triploid genotype (Cavendish). Pests and diseases have gradually become adapted, representing an imminent danger for global banana production. Here we describe the draft sequence of the 523-megabase genome of a Musa acuminata doubled-haploid genotype, providing a crucial stepping-stone for genetic improvement of banana. We detected three rounds of whole-genome duplications in the Musa lineage, independently of those previously described in the Poales lineage and the one we detected in the Arecales lineage. This first monocotyledon high-continuity whole-genome sequence reported outside Poales represents an essential bridge for comparative genome analysis in plants. As such, it clarifies commelinid-monocotyledon phylogenetic relationships, reveals Poaceae-specific features and has led to the discovery of conserved non-coding sequences predating monocotyledon-eudicotyledon divergence.
Polyploidization is an important process in the evolution of eukaryotic genomes, but ensuing molecular mechanisms remain to be clarified. Autopolyploidization or whole-genome duplication events frequently are resolved in resulting lineages by the loss of single genes from most duplicated pairs, causing transient gene dosage imbalance and accelerating speciation through meiotic infertility. Allopolyploidization or formation of interspecies hybrids raises the problem of genetic incompatibility (Bateson-Dobzhansky-Muller effect) and may be resolved by the accumulation of mutational changes in resulting lineages. In this article, we show that an osmotolerant yeast species, Pichia sorbitophila, recently isolated in a concentrated sorbitol solution in industry, illustrates this last situation. Its genome is a mosaic of homologous and homeologous chromosomes, or parts thereof, that corresponds to a recently formed hybrid in the process of evolution. The respective parental contributions to this genome were characterized using existing variations in GC content. The genomic changes that occurred during the short period since hybrid formation were identified (e.g., loss of heterozygosity, unilateral loss of rDNA, reciprocal exchange) and distinguished from those undergone by the two parental genomes after separation from their common ancestor (i.e., NUMT (NUclear sequences of MiTochondrial origin) insertions, gene acquisitions, gene location movements, reciprocal translocation). We found that the physiological characteristics of this new yeast species are determined by specific but unequal contributions of its two parents, one of which could be identified as very closely related to an extant Pichia farinosa strain.
Streptomyces cattleya, a producer of the antibiotics thienamycin and cephamycin C, is one of the rare bacteria known to synthesize fluorinated metabolites. The genome consists of two linear replicons. The genes involved in fluorine metabolism and in the biosynthesis of the antibiotic thienamycin were mapped on both replicons.
Fungi are of primary ecological, biotechnological and economic importance. Many fundamental biological processes that are shared by animals and fungi are studied in fungi due to their experimental tractability. Many fungi are pathogens or mutualists and are model systems to analyse effector genes and their mechanisms of diversification. In this study, we report the genome sequence of the phytopathogenic ascomycete Leptosphaeria maculans and characterize its repertoire of protein effectors. The L. maculans genome has an unusual bipartite structure with alternating distinct guanine and cytosine-equilibrated and adenine and thymine (AT)-rich blocks of homogenous nucleotide composition. The AT-rich blocks comprise one-third of the genome and contain effector genes and families of transposable elements, both of which are affected by repeat-induced point mutation, a fungal-specific genome defence mechanism. This genomic environment for effectors promotes rapid sequence diversification and underpins the evolutionary potential of the fungus to adapt rapidly to novel host-derived constraints.
Legumes (Fabaceae or Leguminosae) are unique among cultivated plants for their ability to carry out endosymbiotic nitrogen fixation with rhizobial bacteria, a process that takes place in a specialized structure known as the nodule. Legumes belong to one of the two main groups of eurosids, the Fabidae, which includes most species capable of endosymbiotic nitrogen fixation. Legumes comprise several evolutionary lineages derived from a common ancestor 60 million years ago (Myr ago). Papilionoids are the largest clade, dating nearly to the origin of legumes and containing most cultivated species. Medicago truncatula is a long-established model for the study of legume biology. Here we describe the draft sequence of the M. truncatula euchromatin based on a recently completed BAC assembly supplemented with Illumina shotgun sequence, together capturing approximately 94% of all M. truncatula genes. A whole-genome duplication (WGD) approximately 58 Myr ago had a major role in shaping the M. truncatula genome and thereby contributed to the evolution of endosymbiotic nitrogen fixation. Subsequent to the WGD, the M. truncatula genome experienced higher levels of rearrangement than two other sequenced legumes, Glycine max and Lotus japonicus. M. truncatula is a close relative of alfalfa (Medicago sativa), a widely cultivated crop with limited genomics tools and complex autotetraploid genetics. As such, the M. truncatula genome sequence provides significant opportunities to expand alfalfa's genomic toolbox.
Bacteria of the Thiomonas genus are ubiquitous in extreme environments, such as arsenic-rich acid mine drainage (AMD). The genome of one of these strains, Thiomonas sp. 3As, was sequenced, annotated, and examined, revealing specific adaptations allowing this bacterium to survive and grow in its highly toxic environment. In order to explore genomic diversity as well as genetic evolution in Thiomonas spp., a comparative genomic hybridization (CGH) approach was used on eight different strains of the Thiomonas genus, including five strains of the same species. Our results suggest that the Thiomonas genome has evolved through the gain or loss of genomic islands and that this evolution is influenced by the specific environmental conditions in which the strains live.
Genomes of animals as different as sponges and humans show conservation of global architecture. Here we show that multiple genomic features including transposon diversity, developmental gene repertoire, physical gene order, and intron-exon organization are shattered in the tunicate Oikopleura, belonging to the sister group of vertebrates and retaining chordate morphology. Ancestral architecture of animal genomes can be deeply modified and may therefore be largely nonadaptive. This rapidly evolving animal lineage thus offers unique perspectives on the level of genome plasticity. It also illuminates issues as fundamental as the mechanisms of intron gain.
Only three biological pathways are known to produce oxygen: photosynthesis, chlorate respiration and the detoxification of reactive oxygen species. Here we present evidence for a fourth pathway, possibly of considerable geochemical and evolutionary importance. The pathway was discovered after metagenomic sequencing of an enrichment culture that couples anaerobic oxidation of methane with the reduction of nitrite to dinitrogen. The complete genome of the dominant bacterium, named 'Candidatus Methylomirabilis oxyfera', was assembled. This apparently anaerobic, denitrifying bacterium encoded, transcribed and expressed the well-established aerobic pathway for methane oxidation, whereas it lacked known genes for dinitrogen production. Subsequent isotopic labelling indicated that 'M. oxyfera' bypassed the denitrification intermediate nitrous oxide by the conversion of two nitric oxide molecules to dinitrogen and oxygen, which was used to oxidize methane. These results extend our understanding of hydrocarbon degradation under anoxic conditions and explain the biochemical mechanism of a poorly understood freshwater methane sink. Because nitrogen oxides were already present on early Earth, our finding opens up the possibility that oxygen was available to microbial metabolism before the evolution of oxygenic photosynthesis.
BACKGROUND: Clostridium sticklandii belongs to a cluster of non-pathogenic proteolytic clostridia which utilize amino acids as carbon and energy sources. Isolated by T.C. Stadtman in 1954, it has been generally regarded as a "gold mine" for novel biochemical reactions and is used as a model organism for studying metabolic aspects such as the Stickland reaction, coenzyme-B12- and selenium-dependent reactions of amino acids. With the goal of revisiting its carbon, nitrogen, and energy metabolism, and comparing studies with other clostridia, its genome has been sequenced and analyzed. RESULTS: C. sticklandii is one of the best biochemically studied proteolytic clostridial species. Useful additional information has been obtained from the sequencing and annotation of its genome, which is presented in this paper. Besides, experimental procedures reveal that C. sticklandii degrades amino acids in a preferential and sequential way. The organism prefers threonine, arginine, serine, cysteine, proline, and glycine, whereas glutamate, aspartate and alanine are excreted. Energy conservation is primarily obtained by substrate-level phosphorylation in fermentative pathways. The reactions catalyzed by different ferredoxin oxidoreductases and the exergonic NADH-dependent reduction of crotonyl-CoA point to a possible chemiosmotic energy conservation via the Rnf complex. C. sticklandii possesses both the F-type and V-type ATPases. The discovery of an as yet unrecognized selenoprotein in the D-proline reductase operon suggests a more detailed mechanism for NADH-dependent D-proline reduction. A rather unusual metabolic feature is the presence of genes for all the enzymes involved in two different CO2-fixation pathways: C. sticklandii harbours both the glycine synthase/glycine reductase and the Wood-Ljungdahl pathways. This unusual pathway combination has retrospectively been observed in only four other sequenced microorganisms. CONCLUSIONS: Analysis of the C. sticklandii genome and additional experimental procedures have improved our understanding of anaerobic amino acid degradation. Several specific metabolic features have been detected, some of which are very unusual for anaerobic fermenting bacteria. Comparative genomics has provided the opportunity to study the lifestyle of pathogenic and non-pathogenic clostridial species as well as to elucidate the difference in metabolic features between clostridia and other anaerobes.
Our knowledge of yeast genomes remains largely dominated by the extensive studies on Saccharomyces cerevisiae and the consequences of its ancestral duplication, leaving the evolution of the entire class of hemiascomycetes only partly explored. We concentrate here on five species of Saccharomycetaceae, a large subdivision of hemiascomycetes, that we call "protoploid" because they diverged from the S. cerevisiae lineage prior to its genome duplication. We determined the complete genome sequences of three of these species: Kluyveromyces (Lachancea) thermotolerans and Saccharomyces (Lachancea) kluyveri (two members of the newly described Lachancea clade), and Zygosaccharomyces rouxii. We included in our comparisons the previously available sequences of Kluyveromyces lactis and Ashbya (Eremothecium) gossypii. Despite their broad evolutionary range and significant individual variations in each lineage, the five protoploid Saccharomycetaceae share a core repertoire of approximately 3300 protein families and a high degree of conserved synteny. Synteny blocks were used to define gene orthology and to infer ancestors. Far from representing minimal genomes without redundancy, the five protoploid yeasts contain numerous copies of paralogous genes, either dispersed or in tandem arrays, that, altogether, constitute a third of each genome. Ancient, conserved paralogs as well as novel, lineage-specific paralogs were identified.
BACKGROUND: Methylotrophy describes the ability of organisms to grow on reduced organic compounds without carbon-carbon bonds. The genomes of two pink-pigmented facultative methylotrophic bacteria of the Alpha-proteobacterial genus Methylobacterium, the reference species Methylobacterium extorquens strain AM1 and the dichloromethane-degrading strain DM4, were compared. METHODOLOGY/PRINCIPAL FINDINGS: The 6.88 Mb genome of strain AM1 comprises a 5.51 Mb chromosome, a 1.26 Mb megaplasmid and three plasmids, while the 6.12 Mb genome of strain DM4 features a 5.94 Mb chromosome and two plasmids. The chromosomes are highly syntenic and share a large majority of genes, while plasmids are mostly strain-specific, with the exception of a 130 kb region of the strain AM1 megaplasmid which is syntenic to a chromosomal region of strain DM4. Both genomes contain large sets of insertion elements, many of them strain-specific, suggesting an important potential for genomic plasticity. Most of the genomic determinants associated with methylotrophy are nearly identical, with two exceptions that illustrate the metabolic and genomic versatility of Methylobacterium. A 126 kb dichloromethane utilization (dcm) gene cluster is essential for the ability of strain DM4 to use DCM as the sole carbon and energy source for growth and is unique to strain DM4. The methylamine utilization (mau) gene cluster is only found in strain AM1, indicating that strain DM4 employs an alternative system for growth with methylamine. The dcm and mau clusters represent two of the chromosomal genomic islands (AM1: 28; DM4: 17) that were defined. The mau cluster is flanked by mobile elements, but the dcm cluster disrupts a gene annotated as chelatase and for which we propose the name "island integration determinant" (iid). CONCLUSION/SIGNIFICANCE: These two genome sequences provide a platform for intra- and interspecies genomic comparisons in the genus Methylobacterium, and for investigations of the adaptive mechanisms which allow bacterial lineages to acquire methylotrophic lifestyles.
Diatoms are photosynthetic secondary endosymbionts found throughout marine and freshwater environments, and are believed to be responsible for around one-fifth of the primary productivity on Earth. The genome sequence of the marine centric diatom Thalassiosira pseudonana was recently reported, revealing a wealth of information about diatom biology. Here we report the complete genome sequence of the pennate diatom Phaeodactylum tricornutum and compare it with that of T. pseudonana to clarify evolutionary origins, functional significance and ubiquity of these features throughout diatoms. In spite of the fact that the pennate and centric lineages have only been diverging for 90 million years, their genome structures are dramatically different and a substantial fraction of genes ( approximately 40%) are not shared by these representatives of the two lineages. Analysis of molecular divergence compared with yeasts and metazoans reveals rapid rates of gene diversification in diatoms. Contributing factors include selective gene family expansions, differential losses and gains of genes and introns, and differential mobilization of transposable elements. Most significantly, we document the presence of hundreds of genes from bacteria. More than 300 of these gene transfers are found in both diatoms, attesting to their ancient origins, and many are likely to provide novel possibilities for metabolite management and for perception of environmental signals. These findings go a long way towards explaining the incredible diversity and success of the diatoms in contemporary oceans.
BACKGROUND: The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. RESULTS: We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/splicing machinery generates numerous non-conventional transcripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved new genes by duplication since its separation from N. crassa, despite the presence of the repeat induced point mutation mechanism that mutates duplicated sequences. We also provide evidence that frequent gene loss took place in the lineages leading to P. anserina and N. crassa. P. anserina contains a large and highly specialized set of genes involved in utilization of natural carbon sources commonly found in its natural biotope. It includes genes potentially involved in lignin degradation and efficient cellulose breakdown. CONCLUSION: The features of the P. anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope.
The Bacillus cereus group represents sporulating soil bacteria containing pathogenic strains which may cause diarrheic or emetic food poisoning outbreaks. Multiple locus sequence typing revealed a presence in natural samples of these bacteria of about 30 clonal complexes. Application of genomic methods to this group was however biased due to the major interest for representatives closely related to Bacillus anthracis. Albeit the most important food-borne pathogens were not yet defined, existing data indicate that they are scattered all over the phylogenetic tree. The preliminary analysis of the sequences of three genomes discussed in this paper narrows down the gaps in our knowledge of the B. cereus group. The strain NVH391-98 is a rare but particularly severe food-borne pathogen. Sequencing revealed that the strain should be a representative of a novel bacterial species, for which the name Bacillus cytotoxis or Bacillus cytotoxicus is proposed. This strain has a reduced genome size compared to other B. cereus group strains. Genome analysis revealed absence of sigma B factor and the presence of genes encoding diarrheic Nhe toxin, not detected earlier. The strain B. cereus F837/76 represents a clonal complex close to that of B. anthracis. Including F837/76, three such B. cereus strains had been sequenced. Alignment of genomes suggests that B. anthracis is their common ancestor. Since such strains often emerge from clinical cases, they merit a special attention. The third strain, KBAB4, is a typical facultative psychrophile generally found in soil. Phylogenic studies show that in nature it is the most active group in terms of gene exchange. Genomic sequence revealed high presence of extra-chromosomal genetic material (about 530kb) that may account for this phenomenon. Genes coding Nhe-like toxin were found on a big plasmid in this strain. This may indicate a potential mechanism of toxicity spread from the psychrophile strain community. The results of this genomic work and ecological compartments of different strains incite to consider a necessity of creating prophylactic vaccines against bacteria closely related to NVH391-98 and F837/76. Presumably developing of such vaccines can be based on the properties of non-pathogenic strains such as KBAB4 or ATCC14579 reported here or earlier. By comparing the protein coding genes of strains being sequenced in this project to others we estimate the shared proteome, or core genome, in the B. cereus group to be 3000+/-200 genes and the total proteome, or pan-genome, to be 20-25,000 genes.
Acinetobacter baumannii is the source of numerous nosocomial infections in humans and therefore deserves close attention as multidrug or even pandrug resistant strains are increasingly being identified worldwide. Here we report the comparison of two newly sequenced genomes of A. baumannii. The human isolate A. baumannii AYE is multidrug resistant whereas strain SDF, which was isolated from body lice, is antibiotic susceptible. As reference for comparison in this analysis, the genome of the soil-living bacterium A. baylyi strain ADP1 was used. The most interesting dissimilarities we observed were that i) whereas strain AYE and A. baylyi genomes harbored very few Insertion Sequence elements which could promote expression of downstream genes, strain SDF sequence contains several hundred of them that have played a crucial role in its genome reduction (gene disruptions and simple DNA loss); ii) strain SDF has low catabolic capacities compared to strain AYE. Interestingly, the latter has even higher catabolic capacities than A. baylyi which has already been reported as a very nutritionally versatile organism. This metabolic performance could explain the persistence of A. baumannii nosocomial strains in environments where nutrients are scarce; iii) several processes known to play a key role during host infection (biofilm formation, iron uptake, quorum sensing, virulence factors) were either different or absent, the best example of which is iron uptake. Indeed, strain AYE and A. baylyi use siderophore-based systems to scavenge iron from the environment whereas strain SDF uses an alternate system similar to the Haem Acquisition System (HAS). Taken together, all these observations suggest that the genome contents of the 3 Acinetobacters compared are partly shaped by life in distinct ecological niches: human (and more largely hospital environment), louse, soil.
The analysis of the first plant genomes provided unexpected evidence for genome duplication events in species that had previously been considered as true diploids on the basis of their genetics. These polyploidization events may have had important consequences in plant evolution, in particular for species radiation and adaptation and for the modulation of functional capacities. Here we report a high-quality draft of the genome sequence of grapevine (Vitis vinifera) obtained from a highly homozygous genotype. The draft sequence of the grapevine genome is the fourth one produced so far for flowering plants, the second for a woody species and the first for a fruit crop (cultivated for both fruit and beverage). Grapevine was selected because of its important place in the cultural heritage of humanity beginning during the Neolithic period. Several large expansions of gene families with roles in aromatic features are observed. The grapevine genome has not undergone recent genome duplication, thus enabling the discovery of ancestral traits and features of the genetic organization of flowering plants. This analysis reveals the contribution of three ancestral genomes to the grapevine haploid content. This ancestral arrangement is common to many dicotyledonous plants but is absent from the genome of rice, which is a monocotyledon. Furthermore, we explain the chronology of previously described whole-genome duplication events in the evolution of flowering plants.
Microbial biotransformations have a major impact on contamination by toxic elements, which threatens public health in developing and industrial countries. Finding a means of preserving natural environments-including ground and surface waters-from arsenic constitutes a major challenge facing modern society. Although this metalloid is ubiquitous on Earth, thus far no bacterium thriving in arsenic-contaminated environments has been fully characterized. In-depth exploration of the genome of the beta-proteobacterium Herminiimonas arsenicoxydans with regard to physiology, genetics, and proteomics, revealed that it possesses heretofore unsuspected mechanisms for coping with arsenic. Aside from multiple biochemical processes such as arsenic oxidation, reduction, and efflux, H. arsenicoxydans also exhibits positive chemotaxis and motility towards arsenic and metalloid scavenging by exopolysaccharides. These observations demonstrate the existence of a novel strategy to efficiently colonize arsenic-rich environments, which extends beyond oxidoreduction reactions. Such a microbial mechanism of detoxification, which is possibly exploitable for bioremediation applications of contaminated sites, may have played a crucial role in the occupation of ancient ecological niches on earth.
The duplication of entire genomes has long been recognized as having great potential for evolutionary novelties, but the mechanisms underlying their resolution through gene loss are poorly understood. Here we show that in the unicellular eukaryote Paramecium tetraurelia, a ciliate, most of the nearly 40,000 genes arose through at least three successive whole-genome duplications. Phylogenetic analysis indicates that the most recent duplication coincides with an explosion of speciation events that gave rise to the P. aurelia complex of 15 sibling species. We observed that gene loss occurs over a long timescale, not as an initial massive event. Genes from the same metabolic pathway or protein complex have common patterns of gene loss, and highly expressed genes are over-retained after all duplications. The conclusion of this analysis is that many genes are maintained after whole-genome duplication not because of functional innovation but because of gene dosage constraints.
Anaerobic ammonium oxidation (anammox) has become a main focus in oceanography and wastewater treatment. It is also the nitrogen cycle's major remaining biochemical enigma. Among its features, the occurrence of hydrazine as a free intermediate of catabolism, the biosynthesis of ladderane lipids and the role of cytoplasm differentiation are unique in biology. Here we use environmental genomics--the reconstruction of genomic data directly from the environment--to assemble the genome of the uncultured anammox bacterium Kuenenia stuttgartiensis from a complex bioreactor community. The genome data illuminate the evolutionary history of the Planctomycetes and allow us to expose the genetic blueprint of the organism's special properties. Most significantly, we identified candidate genes responsible for ladderane biosynthesis and biological hydrazine metabolism, and discovered unexpected metabolic versatility.
Pseudomonas entomophila is an entomopathogenic bacterium that, upon ingestion, kills Drosophila melanogaster as well as insects from different orders. The complete sequence of the 5.9-Mb genome was determined and compared to the sequenced genomes of four Pseudomonas species. P. entomophila possesses most of the catabolic genes of the closely related strain P. putida KT2440, revealing its metabolically versatile properties and its soil lifestyle. Several features that probably contribute to its entomopathogenic properties were disclosed. Unexpectedly for an animal pathogen, P. entomophila is devoid of a type III secretion system and associated toxins but rather relies on a number of potential virulence factors such as insecticidal toxins, proteases, putative hemolysins, hydrogen cyanide and novel secondary metabolites to infect and kill insects. Genome-wide random mutagenesis revealed the major role of the two-component system GacS/GacA that regulates most of the potential virulence factors identified.
Lactobacillus delbrueckii ssp. bulgaricus (L. bulgaricus) is a representative of the group of lactic acid-producing bacteria, mainly known for its worldwide application in yogurt production. The genome sequence of this bacterium has been determined and shows the signs of ongoing specialization, with a substantial number of pseudogenes and incomplete metabolic pathways and relatively few regulatory functions. Several unique features of the L. bulgaricus genome support the hypothesis that the genome is in a phase of rapid evolution. (i) Exceptionally high numbers of rRNA and tRNA genes with regard to genome size may indicate that the L. bulgaricus genome has known a recent phase of important size reduction, in agreement with the observed high frequency of gene inactivation and elimination; (ii) a much higher GC content at codon position 3 than expected on the basis of the overall GC content suggests that the composition of the genome is evolving toward a higher GC content; and (iii) the presence of a 47.5-kbp inverted repeat in the replication termination region, an extremely rare feature in bacterial genomes, may be interpreted as a transient stage in genome evolution. The results indicate the adaptation of L. bulgaricus from a plant-associated habitat to the stable protein and lactose-rich milk environment through the loss of superfluous functions and protocooperation with Streptococcus thermophilus.
Homeodomain transcription factors are involved in many developmental processes and have been intensely studied in a few model organisms, such as mouse, Drosophila and Caenorhabditis elegans. Homeobox genes fall into 10 classes (ANTP, PRD, POU, LIM, TALE, SIX, Cut, ZFH, HNF1, Prox) and 89 different families/groups, all of which are present in vertebrates. Additional groups may be uncovered by further genome annotation, particularly of complex vertebrate genomes. Eight of these groups have been found only in vertebrates, but not in the genome of the tunicate Ciona intestinalis. The other 81 groups of homeobox gene that have been detected in vertebrates so far probably appeared during the early evolution of bilaterians or earlier, as they are also present outside the chordates. How the homeobox genes evolved during and after the main radiation of the bilaterians remains poorly understood, as only a few animal genomes have been sequenced completely. However, drastic changes have occurred at least in the lineage of C. elegans , such as loss of several Hox genes and Hox cluster fragmentation . Here we report considerable alterations of the homeobox gene complement in the tunicate lineage.
The only natural mechanism of malaria transmission in sub-Saharan Africa is the mosquito, generally Anopheles gambiae. Blocking malaria parasite transmission by stopping the development of Plasmodium in the insect vector would provide a useful alternative to the current methods of malaria control. Toward this end, it is important to understand the molecular basis of the malaria parasite refractory phenotype in An. gambiae mosquito strains. We have selected and sequenced six bacterial artificial chromosome (BAC) clones from the Pen-1 region that is the major quantitative trait locus involved in Plasmodium encapsulation. The sequence and the annotation of five overlapping BAC clones plus one adjacent, but not contiguous clone, totaling 585kb of genomic sequence from the centromeric end of the Pen-1 region of the PEST strain were compared to that of the genome sequence of the same strain produced by the whole genome shotgun technique. This project identified 23 putative mosquito genes plus putative copies of the retrotransposable elements BEL12 and TRANSIBN1_AG in the six BAC clones. Nineteen of the predicted genes are most similar to their Drosophila melanogaster homologs while one is more closely related to vertebrate genes. Comparison of these new BAC sequences plus previously published BAC sequences to the cognate region of the assembled genome sequence identified three retrotransposons present in one sequence version but not the other. One of these elements, Indy, has not been previously described. These observations provide evidence for the recent active transposition of these elements and demonstrate the plasticity of the Anopheles genome. The BAC sequences strongly support the public whole genome shotgun assembly and automatic annotation while also demonstrating the benefit of complementary genome sequences and of human curation. Importantly, the data demonstrate the differences in the genome sequence of an individual mosquito compared to that of a hypothetical, average genome sequence generated by whole genome shotgun assembly.
Acinetobacter sp. strain ADP1 is a nutritionally versatile soil bacterium closely related to representatives of the well-characterized Pseudomonas aeruginosa and Pseudomonas putida. Unlike these bacteria, the Acinetobacter ADP1 is highly competent for natural transformation which affords extraordinary convenience for genetic manipulation. The circular chromosome of the Acinetobacter ADP1, presented here, encodes 3325 predicted coding sequences, of which 60% have been classified based on sequence similarity to other documented proteins. The close evolutionary proximity of Acinetobacter and Pseudomonas species, as judged by the sequences of their 16S RNA genes and by the highest level of bidirectional best hits, contrasts with the extensive divergence in the GC content of their DNA (40 versus 62%). The chromosomes also differ significantly in size, with the Acinetobacter ADP1 chromosome <60% of the length of the Pseudomonas counterparts. Genome analysis of the Acinetobacter ADP1 revealed genes for metabolic pathways involved in utilization of a large variety of compounds. Almost all of these genes, with orthologs that are scattered in other species, are located in five major 'islands of catabolic diversity', now an apparent 'archipelago of catabolic diversity', within one-quarter of the overall genome. Acinetobacter ADP1 displays many features of other aerobic soil bacteria with metabolism oriented toward the degradation of organic compounds found in their natural habitat. A distinguishing feature of this genome is the absence of a gene corresponding to pyruvate kinase, the enzyme that generally catalyzes the terminal step in conversion of carbohydrates to pyruvate for respiration by the citric acid cycle. This finding supports the view that the cycle itself is centrally geared to the catabolic capabilities of this exceptionally versatile organism.
To evaluate the existing annotation of the Arabidopsis genome further, we generated a collection of evolutionary conserved regions (ecores) between Arabidopsis and rice. The ecore analysis provides evidence that the gene catalog of Arabidopsis is not yet complete, and that a number of these annotations require re-examination. To improve the Arabidopsis genome annotation further, we used a novel "full-length" enriched cDNA collection prepared from several tissues. An additional 1931 genes were covered by new "full-length" cDNA sequences, raising the number of annotated genes with a corresponding "full-length" cDNA sequence to about 14,000. Detailed comparisons between these "full-length" cDNA sequences and annotated genes show that this resource is very helpful in determining the correct structure of genes, in particular, those not yet supported by "full-length" cDNAs. In addition, a total of 326 genomic regions not included previously in the Arabidopsis genome annotation were detected by this cDNA resource, providing clues for new gene discovery. Because, as expected, the two data sets only partially overlap, their combination produces very useful information for improving the Arabidopsis genome annotation.
Identifying the mechanisms of eukaryotic genome evolution by comparative genomics is often complicated by the multiplicity of events that have taken place throughout the history of individual lineages, leaving only distorted and superimposed traces in the genome of each living organism. The hemiascomycete yeasts, with their compact genomes, similar lifestyle and distinct sexual and physiological properties, provide a unique opportunity to explore such mechanisms. We present here the complete, assembled genome sequences of four yeast species, selected to represent a broad evolutionary range within a single eukaryotic phylum, that after analysis proved to be molecularly as diverse as the entire phylum of chordates. A total of approximately 24,200 novel genes were identified, the translation products of which were classified together with Saccharomyces cerevisiae proteins into about 4,700 families, forming the basis for interspecific comparisons. Analysis of chromosome maps and genome redundancies reveal that the different yeast lineages have evolved through a marked interplay between several distinct molecular mechanisms, including tandem gene repeat formation, segmental duplication, a massive genome duplication and extensive gene loss.
Tetraodon nigroviridis is a freshwater puffer fish with the smallest known vertebrate genome. Here, we report a draft genome sequence with long-range linkage and substantial anchoring to the 21 Tetraodon chromosomes. Genome analysis provides a greatly improved fish gene catalogue, including identifying key genes previously thought to be absent in fish. Comparison with other vertebrates and a urochordate indicates that fish proteins have diverged markedly faster than their mammalian homologues. Comparison with the human genome suggests approximately 900 previously unannotated human genes. Analysis of the Tetraodon and human genomes shows that whole-genome duplication occurred in the teleost fish lineage, subsequent to its divergence from mammals. The analysis also makes it possible to infer the basic structure of the ancestral bony vertebrate genome, which was composed of 12 chromosomes, and to reconstruct much of the evolutionary history of ancient and recent chromosome rearrangements leading to the modern human karyotype.
Tunicate embryos and larvae have small cell numbers and simple anatomical features in comparison with other chordates, including vertebrates. Although they branch near the base of chordate phylogenetic trees, their degree of divergence from the common chordate ancestor remains difficult to evaluate. Here we show that the tunicate Oikopleura dioica has a complement of nine Hox genes in which all central genes are lacking but a full vertebrate-like set of posterior genes is present. In contrast to all bilaterians studied so far, Hox genes are not clustered in the Oikopleura genome. Their expression occurs mostly in the tail, with some tissue preference, and a strong partition of expression domains in the nerve cord, in the notochord and in the muscle. In each tissue of the tail, the anteroposterior order of Hox gene expression evokes spatial collinearity, with several alterations. We propose a relationship between the Hox cluster breakdown, the separation of Hox expression domains, and a transition to a determinative mode of development.
The hyperthermophilic euryarchaeon Pyrococcus abyssi and the related species Pyrococcus furiosus and Pyrococcus horikoshii, whose genomes have been completely sequenced, are presently used as model organisms in different laboratories to study archaeal DNA replication and gene expression and to develop genetic tools for hyperthermophiles. We have performed an extensive re-annotation of the genome of P. abyssi to obtain an integrated view of its phylogeny, molecular biology and physiology. Many new functions are predicted for both informational and operational proteins. Moreover, several candidate genes have been identified that might encode missing links in key metabolic pathways, some of which have unique biochemical features. The great majority of Pyrococcus proteins are typical archaeal proteins and their phylogenetic pattern agrees with its position near the root of the archaeal tree. However, proteins probably from bacterial origin, including some from mesophilic bacteria, are also present in the P. abyssi genome.
Prochlorococcus marinus, the dominant photosynthetic organism in the ocean, is found in two main ecological forms: high-light-adapted genotypes in the upper part of the water column and low-light-adapted genotypes at the bottom of the illuminated layer. P. marinus SS120, the complete genome sequence reported here, is an extremely low-light-adapted form. The genome of P. marinus SS120 is composed of a single circular chromosome of 1,751,080 bp with an average G+C content of 36.4%. It contains 1,884 predicted protein-coding genes with an average size of 825 bp, a single rRNA operon, and 40 tRNA genes. Together with the 1.66-Mbp genome of P. marinus MED4, the genome of P. marinus SS120 is one of the two smallest genomes of a photosynthetic organism known to date. It lacks many genes that are involved in photosynthesis, DNA repair, solute uptake, intermediary metabolism, motility, phototaxis, and other functions that are conserved among other cyanobacteria. Systems of signal transduction and environmental stress response show a particularly drastic reduction in the number of components, even taking into account the small size of the SS120 genome. In contrast, housekeeping genes, which encode enzymes of amino acid, nucleotide, cofactor, and cell wall biosynthesis, are all present. Because of its remarkable compactness, the genome of P. marinus SS120 might approximate the minimal gene complement of a photosynthetic organism.
Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.
Ralstonia solanacearum is a devastating, soil-borne plant pathogen with a global distribution and an unusually wide host range. It is a model system for the dissection of molecular determinants governing pathogenicity. We present here the complete genome sequence and its analysis of strain GMI1000. The 5.8-megabase (Mb) genome is organized into two replicons: a 3.7-Mb chromosome and a 2.1-Mb megaplasmid. Both replicons have a mosaic structure providing evidence for the acquisition of genes through horizontal gene transfer. Regions containing genetically mobile elements associated with the percentage of G+C bias may have an important function in genome evolution. The genome encodes many proteins potentially associated with a role in pathogenicity. In particular, many putative attachment factors were identified. The complete repertoire of type III secreted effector proteins can be studied. Over 40 candidates were identified. Comparison with other genomes suggests that bacterial plant pathogens and animal pathogens harbour distinct arrays of specialized type III-dependent effectors.
Lactococcus lactis is a nonpathogenic AT-rich gram-positive bacterium closely related to the genus Streptococcus and is the most commonly used cheese starter. It is also the best-characterized lactic acid bacterium. We sequenced the genome of the laboratory strain IL1403, using a novel two-step strategy that comprises diagnostic sequencing of the entire genome and a shotgun polishing step. The genome contains 2,365,589 base pairs and encodes 2310 proteins, including 293 protein-coding genes belonging to six prophages and 43 insertion sequence (IS) elements. Nonrandom distribution of IS elements indicates that the chromosome of the sequenced strain may be a product of recent recombination between two closely related genomes. A complete set of late competence genes is present, indicating the ability of L. lactis to undergo DNA transformation. Genomic sequence revealed new possibilities for fermentation pathways and for aerobic respiration. It also indicated a horizontal transfer of genetic information from Lactococcus to gram-negative enteric bacteria of Salmonella-Escherichia group.
Microsporidia are obligate intracellular parasites infesting many animal groups. Lacking mitochondria and peroxysomes, these unicellular eukaryotes were first considered a deeply branching protist lineage that diverged before the endosymbiotic event that led to mitochondria. The discovery of a gene for a mitochondrial-type chaperone combined with molecular phylogenetic data later implied that microsporidia are atypical fungi that lost mitochondria during evolution. Here we report the DNA sequences of the 11 chromosomes of the approximately 2.9-megabase (Mb) genome of Encephalitozoon cuniculi (1,997 potential protein-coding genes). Genome compaction is reflected by reduced intergenic spacers and by the shortness of most putative proteins relative to their eukaryote orthologues. The strong host dependence is illustrated by the lack of genes for some biosynthetic pathways and for the tricarboxylic acid cycle. Phylogenetic analysis lends substantial credit to the fungal affiliation of microsporidia. Because the E. cuniculi genome contains genes related to some mitochondrial functions (for example, Fe-S cluster assembly), we hypothesize that microsporidia have retained a mitochondrion-derived organelle.
Chanarin-Dorfman syndrome (CDS) is a rare autosomal recessive form of nonbullous congenital ichthyosiform erythroderma (NCIE) that is characterized by the presence of intracellular lipid droplets in most tissues. We previously localized a gene for a subset of NCIE to chromosome 3 (designated "the NCIE2 locus"), in six families. Lipid droplets were found in five of these six families, suggesting a diagnosis of CDS. Four additional families selected on the basis of a confirmed diagnosis of CDS also showed linkage to the NCIE2 locus. Linkage-disequilibrium analysis of these families, all from the Mediterranean basin, allowed us to refine the NCIE2 locus to an approximately 1.3-Mb region. Candidate genes from the interval were screened, and eight distinct mutations in the recently identified CGI-58 gene were found in 13 patients from these nine families. The spectrum of gene variants included insertion, deletion, splice-site, and point mutations. The CGI-58 protein belongs to a large family of proteins characterized by an alpha/beta hydrolase fold. CGI-58 contains three sequence motifs that correspond to a catalytic triad found in the esterase/lipase/thioesterase subfamily. Interestingly, CGI-58 differs from other members of the esterase/lipase/thioesterase subfamily in that its putative catalytic triad contains an asparagine in place of the usual serine residue.
Rickettsia conorii is an obligate intracellular bacterium that causes Mediterranean spotted fever in humans. We determined the 1,268,755-nucleotide complete genome sequence of R. conorii, containing 1374 open reading frames. This genome exhibits 804 of the 834 genes of the previously determined R. prowazekii genome plus 552 supplementary open reading frames and a 10-fold increase in the number of repetitive elements. Despite these differences, the two genomes exhibit a nearly perfect colinearity that allowed the clear identification of different stages of gene alterations with gene remnants and 37 genes split in 105 fragments, of which 59 are transcribed. A 38-kilobase sequence inversion was dated shortly after the divergence of the genus.
The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the approximately 120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes approximately 13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.
Arabidopsis thaliana is an important model system for plant biologists. In 1996 an international collaboration (the Arabidopsis Genome Initiative) was formed to sequence the whole genome of Arabidopsis and in 1999 the sequence of the first two chromosomes was reported. The sequence of the last three chromosomes and an analysis of the whole genome are reported in this issue. Here we present the sequence of chromosome 3, organized into four sequence segments (contigs). The two largest (13.5 and 9.2 Mb) correspond to the top (long) and the bottom (short) arms of chromosome 3, and the two small contigs are located in the genetically defined centromere. This chromosome encodes 5,220 of the roughly 25,500 predicted protein-coding genes in the genome. About 20% of the predicted proteins have significant homology to proteins in eukaryotic genomes for which the complete sequence is available, pointing to important conserved cellular functions among eukaryotes.