To explore the origins and consequences of tetraploidy in the African clawed frog, we sequenced the Xenopus laevis genome and compared it to the related diploid X. tropicalis genome. We characterize the allotetraploid origin of X. laevis by partitioning its genome into two homoeologous subgenomes, marked by distinct families of 'fossil' transposable elements. On the basis of the activity of these elements and the age of hundreds of unitary pseudogenes, we estimate that the two diploid progenitor species diverged around 34 million years ago (Ma) and combined to form an allotetraploid around 17-18 Ma. More than 56% of all genes were retained in two homoeologous copies. Protein function, gene expression, and the amount of conserved flanking sequence all correlate with retention rates. The subgenomes have evolved asymmetrically, with one chromosome set more often preserving the ancestral state and the other experiencing more gene loss, deletion, rearrangement, and reduced gene expression.
Choanoflagellates are the closest known relatives of metazoans. To discover potential molecular mechanisms underlying the evolution of metazoan multicellularity, we sequenced and analysed the genome of the unicellular choanoflagellate Monosiga brevicollis. The genome contains approximately 9,200 intron-rich genes, including a number that encode cell adhesion and signalling protein domains that are otherwise restricted to metazoans. Here we show that the physical linkages among protein domains often differ between M. brevicollis and metazoans, suggesting that abundant domain shuffling followed the separation of the choanoflagellate and metazoan lineages. The completion of the M. brevicollis genome allows us to reconstruct with increasing resolution the genomic changes that accompanied the origin of metazoans.
Trichoderma reesei is the main industrial source of cellulases and hemicellulases used to depolymerize biomass to simple sugars that are converted to chemical intermediates and biofuels, such as ethanol. We assembled 89 scaffolds (sets of ordered and oriented contigs) to generate 34 Mbp of nearly contiguous T. reesei genome sequence comprising 9,129 predicted gene models. Unexpectedly, considering the industrial utility and effectiveness of the carbohydrate-active enzymes of T. reesei, its genome encodes fewer cellulases and hemicellulases than any other sequenced fungus able to hydrolyze plant cell wall polysaccharides. Many T. reesei genes encoding carbohydrate-active enzymes are distributed nonrandomly in clusters that lie between regions of synteny with other Sordariomycetes. Numerous genes encoding biosynthetic pathways for secondary metabolites may promote survival of T. reesei in its competitive soil habitat, but genome analysis provided little mechanistic insight into its extraordinary capacity for protein secretion. Our analysis, coupled with the genome sequence data, provides a roadmap for constructing enhanced T. reesei strains for industrial applications such as biofuel production.
The smallest known eukaryotes, at approximately 1-mum diameter, are Ostreococcus tauri and related species of marine phytoplankton. The genome of Ostreococcus lucimarinus has been completed and compared with that of O. tauri. This comparison reveals surprising differences across orthologous chromosomes in the two species from highly syntenic chromosomes in most cases to chromosomes with almost no similarity. Species divergence in these phytoplankton is occurring through multiple mechanisms acting differently on different chromosomes and likely including acquisition of new genes through horizontal gene transfer. We speculate that this latter process may be involved in altering the cell-surface characteristics of each species. In addition, the genome of O. lucimarinus provides insights into the unique metal metabolism of these organisms, which are predicted to have a large number of selenocysteine-containing proteins. Selenoenzymes are more catalytically active than similar enzymes lacking selenium, and thus the cell may require less of that protein. As reported here, selenoenzymes, novel fusion proteins, and loss of some major protein families including ones associated with chromatin are likely important adaptations for achieving a small cell size.
Crenarchaeota are ubiquitous and abundant microbial constituents of soils, sediments, lakes, and ocean waters. To further describe the cosmopolitan nonthermophilic Crenarchaeota, we analyzed the genome sequence of one representative, the uncultivated sponge symbiont Cenarchaeum symbiosum. C. symbiosum genotypes coinhabiting the same host partitioned into two dominant populations, corresponding to previously described a- and b-type ribosomal RNA variants. Although they were syntenic, overlapping a- and b-type ribotype genomes harbored significant variability. A single tiling path comprising the dominant a-type genotype was assembled and used to explore the genomic properties of C. symbiosum and its planktonic relatives. Of 2,066 ORFs, 55.6% matched genes with predicted function from previously sequenced genomes. The remaining genes partitioned between functional RNAs (2.4%) and hypotheticals (42%) with limited homology to known functional genes. The latter category included some genes likely involved in the archaeal-sponge symbiotic association. Conversely, 525 C. symbiosum ORFs were most highly similar to sequences from marine environmental genomic surveys, and they apparently represent orthologous genes from free-living planktonic Crenarchaeota. In total, the C. symbiosum genome was remarkably distinct from those of other known Archaea and shared many core metabolic features in common with its free-living planktonic relatives.
We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.
Microbial methane consumption in anoxic sediments significantly impacts the global environment by reducing the flux of greenhouse gases from ocean to atmosphere. Despite its significance, the biological mechanisms controlling anaerobic methane oxidation are not well characterized. One current model suggests that relatives of methane-producing Archaea developed the capacity to reverse methanogenesis and thereby to consume methane to produce cellular carbon and energy. We report here a test of the "reverse-methanogenesis" hypothesis by genomic analyses of methane-oxidizing Archaea from deep-sea sediments. Our results show that nearly all genes typically associated with methane production are present in one specific group of archaeal methanotrophs. These genome-based observations support previous hypotheses and provide an informed foundation for metabolic modeling of anaerobic methane oxidation.
The compact genome of Fugu rubripes has been sequenced to over 95% coverage, and more than 80% of the assembly is in multigene-sized scaffolds. In this 365-megabase vertebrate genome, repetitive DNA accounts for less than one-sixth of the sequence, and gene loci occupy about one-third of the genome. As with the human genome, gene loci are not evenly distributed, but are clustered into sparse and dense regions. Some "giant" genes were observed that had average coding sequence sizes but were spread over genomic lengths significantly larger than those of their human orthologs. Although three-quarters of predicted human proteins have a strong match to Fugu, approximately a quarter of the human proteins had highly diverged from or had no pufferfish homologs, highlighting the extent of protein evolution in the 450 million years since teleosts and mammals diverged. Conserved linkages between Fugu and human genes indicate the preservation of chromosomal segments from the common vertebrate ancestor, but with considerable scrambling of gene order.
The first chordates appear in the fossil record at the time of the Cambrian explosion, nearly 550 million years ago. The modern ascidian tadpole represents a plausible approximation to these ancestral chordates. To illuminate the origins of chordate and vertebrates, we generated a draft of the protein-coding portion of the genome of the most studied ascidian, Ciona intestinalis. The Ciona genome contains approximately 16,000 protein-coding genes, similar to the number in other invertebrates, but only half that found in vertebrates. Vertebrate gene families are typically found in simplified form in Ciona, suggesting that ascidians contain the basic ancestral complement of genes involved in cell signaling and development. The ascidian genome has also acquired a number of lineage-specific innovations, including a group of genes engaged in cellulose metabolism that are related to those in bacteria and fungi.