The genus Yersinia has been used as a model system to study pathogen evolution. Using whole-genome sequencing of all Yersinia species, we delineate the gene complement of the whole genus and define patterns of virulence evolution. Multiple distinct ecological specializations appear to have split pathogenic strains from environmental, nonpathogenic lineages. This split demonstrates that contrary to hypotheses that all pathogenic Yersinia species share a recent common pathogenic ancestor, they have evolved independently but followed parallel evolutionary paths in acquiring the same virulence determinants as well as becoming progressively more limited metabolically. Shared virulence determinants are limited to the virulence plasmid pYV and the attachment invasion locus ail. These acquisitions, together with genomic variations in metabolic pathways, have resulted in the parallel emergence of related pathogens displaying an increasingly specialized lifestyle with a spectrum of virulence potential, an emerging theme in the evolution of other important human pathogens.
        
Title: Draft Genome Sequence of a Clinical Strain of Yersinia enterocolitica (IP10393) of Bioserotype 4/O:3 from France Savin C, Frangeul L, Ma L, Bouchier C, Moszer I, Carniel E Ref: Genome Announc, 1:e00150, 2013 : PubMed
We sequenced the genome of a clinical isolate of Yersinia enterocolitica (IP10393) from France. This strain belongs to bioserotype 4/O:3, which is the most common pathogenic subgroup worldwide. The draft genome has a size of 4,463,212 bp and a G+C content of 47.0%, and it is predicted to contain 4,181 coding sequences.
Global spread and limited genetic variation are hallmarks of M. tuberculosis, the agent of human tuberculosis. In contrast, Mycobacterium canettii and related tubercle bacilli that also cause human tuberculosis and exhibit unusual smooth colony morphology are restricted to East Africa. Here, we sequenced and analyzed the whole genomes of five representative strains of smooth tubercle bacilli (STB) using Sanger (4-5x coverage), 454/Roche (13-18x coverage) and/or Illumina DNA sequencing (45-105x coverage). We show that STB isolates are highly recombinogenic and evolutionarily early branching, with larger genome sizes, higher rates of genetic variation, fewer molecular scars and distinct CRISPR-Cas systems relative to M. tuberculosis. Despite the differences, all tuberculosis-causing mycobacteria share a highly conserved core genome. Mouse infection experiments showed that STB strains are less persistent and virulent than M. tuberculosis. We conclude that M. tuberculosis emerged from an ancestral STB-like pool of mycobacteria by gain of persistence and virulence mechanisms, and we provide insights into the molecular events involved.
Legionella pneumophila and L. longbeachae are two species of a large genus of bacteria that are ubiquitous in nature. L. pneumophila is mainly found in natural and artificial water circuits while L. longbeachae is mainly present in soil. Under the appropriate conditions both species are human pathogens, capable of causing a severe form of pneumonia termed Legionnaires' disease. Here we report the sequencing and analysis of four L. longbeachae genomes, one complete genome sequence of L. longbeachae strain NSW150 serogroup (Sg) 1, and three draft genome sequences another belonging to Sg1 and two to Sg2. The genome organization and gene content of the four L. longbeachae genomes are highly conserved, indicating strong pressure for niche adaptation. Analysis and comparison of L. longbeachae strain NSW150 with L. pneumophila revealed common but also unexpected features specific to this pathogen. The interaction with host cells shows distinct features from L. pneumophila, as L. longbeachae possesses a unique repertoire of putative Dot/Icm type IV secretion system substrates, eukaryotic-like and eukaryotic domain proteins, and encodes additional secretion systems. However, analysis of the ability of a dotA mutant of L. longbeachae NSW150 to replicate in the Acanthamoeba castellanii and in a mouse lung infection model showed that the Dot/Icm type IV secretion system is also essential for the virulence of L. longbeachae. In contrast to L. pneumophila, L. longbeachae does not encode flagella, thereby providing a possible explanation for differences in mouse susceptibility to infection between the two pathogens. Furthermore, transcriptome analysis revealed that L. longbeachae has a less pronounced biphasic life cycle as compared to L. pneumophila, and genome analysis and electron microscopy suggested that L. longbeachae is encapsulated. These species-specific differences may account for the different environmental niches and disease epidemiology of these two Legionella species.
Streptococcus gallolyticus (formerly known as Streptococcus bovis biotype I) is an increasing cause of endocarditis among streptococci and frequently associated with colon cancer. S. gallolyticus is part of the rumen flora but also a cause of disease in ruminants as well as in birds. Here we report the complete nucleotide sequence of strain UCN34, responsible for endocarditis in a patient also suffering from colon cancer. Analysis of the 2,239 proteins encoded by its 2,350-kb-long genome revealed unique features among streptococci, probably related to its adaptation to the rumen environment and its capacity to cause endocarditis. S. gallolyticus has the capacity to use a broad range of carbohydrates of plant origin, in particular to degrade polysaccharides derived from the plant cell wall. Its genome encodes a large repertoire of transporters and catalytic activities, like tannase, phenolic compounds decarboxylase, and bile salt hydrolase, that should contribute to the detoxification of the gut environment. Furthermore, S. gallolyticus synthesizes all 20 amino acids and more vitamins than any other sequenced Streptococcus species. Many of the genes encoding these specific functions were likely acquired by lateral gene transfer from other bacterial species present in the rumen. The surface properties of strain UCN34 may also contribute to its virulence. A polysaccharide capsule might be implicated in resistance to innate immunity defenses, and glucan mucopolysaccharides, three types of pili, and collagen binding proteins may play a role in adhesion to tissues in the course of endocarditis.
BACKGROUND: Helicobacter pylori infection is associated with several gastro-duodenal inflammatory diseases of various levels of severity. To determine whether certain combinations of genetic markers can be used to predict the clinical source of the infection, we analyzed well documented and geographically homogenous clinical isolates using a comparative genomics approach. RESULTS: A set of 254 H. pylori genes was used to perform array-based comparative genomic hybridization among 120 French H. pylori strains associated with chronic gastritis (n = 33), duodenal ulcers (n = 27), intestinal metaplasia (n = 17) or gastric extra-nodal marginal zone B-cell MALT lymphoma (n = 43). Hierarchical cluster analyses of the DNA hybridization values allowed us to identify a homogeneous subpopulation of strains that clustered exclusively with cagPAI minus MALT lymphoma isolates. The genome sequence of B38, a representative of this MALT lymphoma strain-cluster, was completed, fully annotated, and compared with the six previously released H. pylori genomes (i.e. J99, 26695, HPAG1, P12, G27 and Shi470). B38 has the smallest H. pylori genome described thus far (1,576,758 base pairs containing 1,528 CDSs); it contains the vacAs2m2 allele and lacks the genes encoding the major virulence factors (absence of cagPAI, babB, babC, sabB, and homB). Comparative genomics led to the identification of very few sequences that are unique to the B38 strain (9 intact CDSs and 7 pseudogenes). Pair-wise genomic synteny comparisons between B38 and the 6 H. pylori sequenced genomes revealed an almost complete co-linearity, never seen before between the genomes of strain Shi470 (a Peruvian isolate) and B38. CONCLUSION: These isolates are deprived of the main H. pylori virulence factors characterized previously, but are nonetheless associated with gastric neoplasia.
The Escherichia coli species represents one of the best-studied model organisms, but also encompasses a variety of commensal and pathogenic strains that diversify by high rates of genetic change. We uniformly (re-) annotated the genomes of 20 commensal and pathogenic E. coli strains and one strain of E. fergusonii (the closest E. coli related species), including seven that we sequenced to completion. Within the approximately 18,000 families of orthologous genes, we found approximately 2,000 common to all strains. Although recombination rates are much higher than mutation rates, we show, both theoretically and using phylogenetic inference, that this does not obscure the phylogenetic signal, which places the B2 phylogenetic group and one group D strain at the basal position. Based on this phylogeny, we inferred past evolutionary events of gain and loss of genes, identifying functional classes under opposite selection pressures. We found an important adaptive role for metabolism diversification within group B2 and Shigella strains, but identified few or no extraintestinal virulence-specific genes, which could render difficult the development of a vaccine against extraintestinal infections. Genome flux in E. coli is confined to a small number of conserved positions in the chromosome, which most often are not associated with integrases or tRNA genes. Core genes flanking some of these regions show higher rates of recombination, suggesting that a gene, once acquired by a strain, spreads within the species by homologous recombination at the flanking genes. Finally, the genome's long-scale structure of recombination indicates lower recombination rates, but not higher mutation rates, at the terminus of replication. The ensuing effect of background selection and biased gene conversion may thus explain why this region is A+T-rich and shows high sequence divergence but low sequence polymorphism. Overall, despite a very high gene flow, genes co-exist in an organised genome.
Leptospira biflexa is a free-living saprophytic spirochete present in aquatic environments. We determined the genome sequence of L. biflexa, making it the first saprophytic Leptospira to be sequenced. The L. biflexa genome has 3,590 protein-coding genes distributed across three circular replicons: the major 3,604 chromosome, a smaller 278-kb replicon that also carries essential genes, and a third 74-kb replicon. Comparative sequence analysis provides evidence that L. biflexa is an excellent model for the study of Leptospira evolution; we conclude that 2052 genes (61%) represent a progenitor genome that existed before divergence of pathogenic and saprophytic Leptospira species. Comparisons of the L. biflexa genome with two pathogenic Leptospira species reveal several major findings. Nearly one-third of the L. biflexa genes are absent in pathogenic Leptospira. We suggest that once incorporated into the L. biflexa genome, laterally transferred DNA undergoes minimal rearrangement due to physical restrictions imposed by high gene density and limited presence of transposable elements. In contrast, the genomes of pathogenic Leptospira species undergo frequent rearrangements, often involving recombination between insertion sequences. Identification of genes common to the two pathogenic species, L. borgpetersenii and L. interrogans, but absent in L. biflexa, is consistent with a role for these genes in pathogenesis. Differences in environmental sensing capacities of L. biflexa, L. borgpetersenii, and L. interrogans suggest a model which postulates that loss of signal transduction functions in L. borgpetersenii has impaired its survival outside a mammalian host, whereas L. interrogans has retained environmental sensory functions that facilitate disease transmission through water.
Mycobacterium ulcerans is found in aquatic ecosystems and causes Buruli ulcer in humans, a neglected but devastating necrotic disease of subcutaneous tissue that is rampant throughout West and Central Africa. Here, we report the complete 5.8-Mb genome sequence of M. ulcerans and show that it comprises two circular replicons, a chromosome of 5632 kb and a virulence plasmid of 174 kb. The plasmid is required for production of the polyketide toxin mycolactone, which provokes necrosis. Comparisons with the recently completed 6.6-Mb genome of Mycobacterium marinum revealed >98% nucleotide sequence identity and genome-wide synteny. However, as well as the plasmid, M. ulcerans has accumulated 213 copies of the insertion sequence IS2404, 91 copies of IS2606, 771 pseudogenes, two bacteriophages, and multiple DNA deletions and rearrangements. These data indicate that M. ulcerans has recently evolved via lateral gene transfer and reductive evolution from the generalist, more rapid-growing environmental species M. marinum to become a niche-adapted specialist. Predictions based on genome inspection for the production of modified mycobacterial virulence factors, such as the highly abundant phthiodiolone lipids, were confirmed by structural analyses. Similarly, 11 protein-coding sequences identified as M. ulcerans-specific by comparative genomics were verified as such by PCR screening a diverse collection of 33 strains of M. ulcerans and M. marinum. This work offers significant insight into the biology and evolution of mycobacterial pathogens and is an important component of international efforts to counter Buruli ulcer.
The aspergilli comprise a diverse group of filamentous fungi spanning over 200 million years of evolution. Here we report the genome sequence of the model organism Aspergillus nidulans, and a comparative study with Aspergillus fumigatus, a serious human pathogen, and Aspergillus oryzae, used in the production of sake, miso and soy sauce. Our analysis of genome structure provided a quantitative evaluation of forces driving long-term eukaryotic genome evolution. It also led to an experimentally validated model of mating-type locus evolution, suggesting the potential for sexual reproduction in A. fumigatus and A. oryzae. Our analysis of sequence conservation revealed over 5,000 non-coding regions actively conserved across all three species. Within these regions, we identified potential functional elements including a previously uncharacterized TPP riboswitch and motifs suggesting regulation in filamentous fungi by Puf family genes. We further obtained comparative and experimental evidence indicating widespread translational regulation by upstream open reading frames. These results enhance our understanding of these widely studied fungi as well as provide new insight into eukaryotic genome evolution and gene regulation.
Theileria annulata and T. parva are closely related protozoan parasites that cause lymphoproliferative diseases of cattle. We sequenced the genome of T. annulata and compared it with that of T. parva to understand the mechanisms underlying transformation and tropism. Despite high conservation of gene sequences and synteny, the analysis reveals unequally expanded gene families and species-specific genes. We also identify divergent families of putative secreted polypeptides that may reduce immune recognition, candidate regulators of host-cell transformation, and a Theileria-specific protein domain [frequently associated in Theileria (FAINT)] present in a large number of secreted proteins.
Legionella pneumophila, the causative agent of Legionnaires' disease, replicates as an intracellular parasite of amoebae and persists in the environment as a free-living microbe. Here we have analyzed the complete genome sequences of L. pneumophila Paris (3,503,610 bp, 3,077 genes), an endemic strain that is predominant in France, and Lens (3,345,687 bp, 2,932 genes), an epidemic strain responsible for a major outbreak of disease in France. The L. pneumophila genomes show marked plasticity, with three different plasmids and with about 13% of the sequence differing between the two strains. Only strain Paris contains a type V secretion system, and its Lvh type IV secretion system is encoded by a 36-kb region that is either carried on a multicopy plasmid or integrated into the chromosome. Genetic mobility may enhance the versatility of L. pneumophila. Numerous genes encode eukaryotic-like proteins or motifs that are predicted to modulate host cell functions to the pathogen's advantage. The genome thus reflects the history and lifestyle of L. pneumophila, a human pathogen of macrophages that coevolved with fresh-water amoebae.
Identifying the mechanisms of eukaryotic genome evolution by comparative genomics is often complicated by the multiplicity of events that have taken place throughout the history of individual lineages, leaving only distorted and superimposed traces in the genome of each living organism. The hemiascomycete yeasts, with their compact genomes, similar lifestyle and distinct sexual and physiological properties, provide a unique opportunity to explore such mechanisms. We present here the complete, assembled genome sequences of four yeast species, selected to represent a broad evolutionary range within a single eukaryotic phylum, that after analysis proved to be molecularly as diverse as the entire phylum of chordates. A total of approximately 24,200 novel genes were identified, the translation products of which were classified together with Saccharomyces cerevisiae proteins into about 4,700 families, forming the basis for interspecific comparisons. Analysis of chromosome maps and genome redundancies reveal that the different yeast lineages have evolved through a marked interplay between several distinct molecular mechanisms, including tandem gene repeat formation, segmental duplication, a massive genome duplication and extensive gene loss.