Soil-transmitted nematodes, including the Strongyloides genus, cause one of the most prevalent neglected tropical diseases. Here we compare the genomes of four Strongyloides species, including the human pathogen Strongyloides stercoralis, and their close relatives that are facultatively parasitic (Parastrongyloides trichosuri) and free-living (Rhabditophanes sp. KR3021). A significant paralogous expansion of key gene families-families encoding astacin-like and SCP/TAPS proteins-is associated with the evolution of parasitism in this clade. Exploiting the unique Strongyloides life cycle, we compare the transcriptomes of the parasitic and free-living stages and find that these same gene families are upregulated in the parasitic stages, underscoring their role in nematode parasitism.
Bacteriovorax marinus SJ is a predatory delta-proteobacterium isolated from a marine environment. The genome sequence of this strain provides an interesting contrast to that of the terrestrial predatory bacterium Bdellovibrio bacteriovorus HD100. Based on their predatory lifestyle, Bacteriovorax were originally designated as members of the genus Bdellovibrio but subsequently were re-assigned to a new genus and family based on genetic and phenotypic differences. B. marinus attaches to gram-negative bacteria, penetrates through the cell wall to form a bdelloplast, in which it replicates, as shown using microscopy. Bacteriovorax is distinct, as it shares only 30% of its gene products with its closest sequenced relatives. Remarkably, 34% of predicted genes over 500 nt in length were completely unique with no significant matches in the databases. As expected, Bacteriovorax shares several characteristic loci with the other delta-proteobacteria. A geneset shared between Bacteriovorax and Bdellovibrio that is not conserved among other delta-proteobacteria such as Myxobacteria (which destroy prey bacteria externally via lysis), or the non-predatory Desulfo-bacteria and Geobacter species was identified. These 291 gene orthologues common to both Bacteriovorax and Bdellovibrio may be the key indicators of host-interaction predatory-specific processes required for prey entry. The locus from Bdellovibrio bacteriovorus is implicated in the switch from predatory to prey/host-independent growth. Although the locus is conserved in B. marinus, the sequence has only limited similarity. The results of this study advance understanding of both the similarities and differences between Bdellovibrio and Bacteriovorax and confirm the distant relationship between the two and their separation into different families.
Global spread and limited genetic variation are hallmarks of M. tuberculosis, the agent of human tuberculosis. In contrast, Mycobacterium canettii and related tubercle bacilli that also cause human tuberculosis and exhibit unusual smooth colony morphology are restricted to East Africa. Here, we sequenced and analyzed the whole genomes of five representative strains of smooth tubercle bacilli (STB) using Sanger (4-5x coverage), 454/Roche (13-18x coverage) and/or Illumina DNA sequencing (45-105x coverage). We show that STB isolates are highly recombinogenic and evolutionarily early branching, with larger genome sizes, higher rates of genetic variation, fewer molecular scars and distinct CRISPR-Cas systems relative to M. tuberculosis. Despite the differences, all tuberculosis-causing mycobacteria share a highly conserved core genome. Mouse infection experiments showed that STB strains are less persistent and virulent than M. tuberculosis. We conclude that M. tuberculosis emerged from an ancestral STB-like pool of mycobacteria by gain of persistence and virulence mechanisms, and we provide insights into the molecular events involved.
BACKGROUND: M. africanum West African 2 constitutes an ancient lineage of the M. tuberculosis complex that commonly causes human tuberculosis in West Africa and has an attenuated phenotype relative to M. tuberculosis. METHODOLOGY/PRINCIPAL FINDINGS: In search of candidate genes underlying these differences, the genome of M. africanum West African 2 was sequenced using classical capillary sequencing techniques. Our findings reveal a unique sequence, RD900, that was independently lost during the evolution of two important lineages within the complex: the "modern" M. tuberculosis group and the lineage leading to M. bovis. Closely related to M. bovis and other animal strains within the M. tuberculosis complex, M. africanum West African 2 shares an abundance of pseudogenes with M. bovis but also with M. africanum West African clade 1. Comparison with other strains of the M. tuberculosis complex revealed pseudogenes events in all the known lineages pointing toward ongoing genome erosion likely due to increased genetic drift and relaxed selection linked to serial transmission-bottlenecks and an intracellular lifestyle. CONCLUSIONS/SIGNIFICANCE: The genomic differences identified between M. africanum West African 2 and the other strains of the Mycobacterium tuberculosis complex may explain its attenuated phenotype, and pave the way for targeted experiments to elucidate the phenotypic characteristic of M. africanum. Moreover, availability of the whole genome data allows for verification of conservation of targets used for the next generation of diagnostics and vaccines, in order to ensure similar efficacy in West Africa.
Schistosomiasis is one of the most prevalent parasitic diseases, affecting millions of people in developing countries. Amongst the human-infective species, Schistosoma mansoni is also the most commonly used in the laboratory and here we present the systematic improvement of its draft genome. We used Sanger capillary and deep-coverage Illumina sequencing from clonal worms to upgrade the highly fragmented draft 380 Mb genome to one with only 885 scaffolds and more than 81% of the bases organised into chromosomes. We have also used transcriptome sequencing (RNA-seq) from four time points in the parasite's life cycle to refine gene predictions and profile their expression. More than 45% of predicted genes have been extensively modified and the total number has been reduced from 11,807 to 10,852. Using the new version of the genome, we identified trans-splicing events occurring in at least 11% of genes and identified clear cases where it is used to resolve polycistronic transcripts. We have produced a high-resolution map of temporal changes in expression for 9,535 genes, covering an unprecedented dynamic range for this organism. All of these data have been consolidated into a searchable format within the GeneDB (www.genedb.org) and SchistoDB (www.schistodb.net) databases. With further transcriptional profiling and genome sequencing increasingly accessible, the upgraded genome will form a fundamental dataset to underpin further advances in schistosome research.
Toxoplasma gondii is a zoonotic protozoan parasite which infects nearly one third of the human population and is found in an extraordinary range of vertebrate hosts. Its epidemiology depends heavily on horizontal transmission, especially between rodents and its definitive host, the cat. Neospora caninum is a recently discovered close relative of Toxoplasma, whose definitive host is the dog. Both species are tissue-dwelling Coccidia and members of the phylum Apicomplexa; they share many common features, but Neospora neither infects humans nor shares the same wide host range as Toxoplasma, rather it shows a striking preference for highly efficient vertical transmission in cattle. These species therefore provide a remarkable opportunity to investigate mechanisms of host restriction, transmission strategies, virulence and zoonotic potential. We sequenced the genome of N. caninum and transcriptomes of the invasive stage of both species, undertaking an extensive comparative genomics and transcriptomics analysis. We estimate that these organisms diverged from their common ancestor around 28 million years ago and find that both genomes and gene expression are remarkably conserved. However, in N. caninum we identified an unexpected expansion of surface antigen gene families and the divergence of secreted virulence factors, including rhoptry kinases. Specifically we show that the rhoptry kinase ROP18 is pseudogenised in N. caninum and that, as a possible consequence, Neospora is unable to phosphorylate host immunity-related GTPases, as Toxoplasma does. This defense strategy is thought to be key to virulence in Toxoplasma. We conclude that the ecological niches occupied by these species are influenced by a relatively small number of gene products which operate at the host-parasite interface and that the dominance of vertical transmission in N. caninum may be associated with the evolution of reduced virulence in this species.
Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.
An emergent clone of Haemophilus influenzae biogroup aegyptius (Hae) is responsible for outbreaks of Brazilian purpuric fever (BPF). First recorded in Brazil in 1984, the so-called BPF clone of Hae caused a fulminant disease that started with conjunctivitis but developed into septicemic shock; mortality rates were as high as 70%. To identify virulence determinants, we conducted a pan-genomic analysis. Sequencing of the genomes of the BPF clone strain F3031 and a noninvasive conjunctivitis strain, F3047, and comparison of these sequences with 5 other complete H. influenzae genomes showed that >77% of the F3031 genome is shared among all H. influenzae strains. Delineation of the Hae accessory genome enabled characterization of 163 predicted protein-coding genes; identified differences in established autotransporter adhesins; and revealed a suite of novel adhesins unique to Hae, including novel trimeric autotransporter adhesins and 4 new fimbrial operons. These novel adhesins might play a critical role in host-pathogen interactions.
Here we discuss the evolution of the northern Australian Staphylococcus aureus isolate MSHR1132 genome. MSHR1132 belongs to the divergent clonal complex 75 lineage. The average nucleotide divergence between orthologous genes in MSHR1132 and typical S. aureus is approximately sevenfold greater than the maximum divergence observed in this species to date. MSHR1132 has a small accessory genome, which includes the well-characterized genomic islands, nuSAalpha and nuSabeta, suggesting that these elements were acquired well before the expansion of the typical S. aureus population. Other mobile elements show mosaic structure (the prophage varphiSa3) or evidence of recent acquisition from a typical S. aureus lineage (SCCmec, ICE6013 and plasmid pMSHR1132). There are two differences in gene repertoire compared with typical S. aureus that may be significant clues as to the genetic basis underlying the successful emergence of S. aureus as a pathogen. First, MSHR1132 lacks the genes for production of staphyloxanthin, the carotenoid pigment that confers upon S. aureus its characteristic golden color and protects against oxidative stress. The lack of pigment was demonstrated in 126 of 126 CC75 isolates. Second, a mobile clustered regularly interspaced short palindromic repeat (CRISPR) element is inserted into orfX of MSHR1132. Although common in other staphylococcal species, these elements are very rare within S. aureus and may impact accessory genome acquisition. The CRISPR spacer sequences reveal a history of attempted invasion by known S. aureus mobile elements. There is a case for the creation of a new taxon to accommodate this and related isolates.
BACKGROUND: The genus Neisseria contains two important yet very different pathogens, N. meningitidis and N. gonorrhoeae, in addition to non-pathogenic species, of which N. lactamica is the best characterized. Genomic comparisons of these three bacteria will provide insights into the mechanisms and evolution of pathogenesis in this group of organisms, which are applicable to understanding these processes more generally. RESULTS: Non-pathogenic N. lactamica exhibits very similar population structure and levels of diversity to the meningococcus, whilst gonococci are essentially recent descendents of a single clone. All three species share a common core gene set estimated to comprise around 1190 CDSs, corresponding to about 60% of the genome. However, some of the nucleotide sequence diversity within this core genome is particular to each group, indicating that cross-species recombination is rare in this shared core gene set. Other than the meningococcal cps region, which encodes the polysaccharide capsule, relatively few members of the large accessory gene pool are exclusive to one species group, and cross-species recombination within this accessory genome is frequent. CONCLUSION: The three Neisseria species groups represent coherent biological and genetic groupings which appear to be maintained by low rates of inter-species horizontal genetic exchange within the core genome. There is extensive evidence for exchange among positively selected genes and the accessory genome and some evidence of hitch-hiking of housekeeping genes with other loci. It is not possible to define a 'pathogenome' for this group of organisms and the disease causing phenotypes are therefore likely to be complex, polygenic, and different among the various disease-associated phenotypes observed.
The 3.1-Mb genome of an outbreak methicillin-resistant Staphylococcus aureus (MRSA) strain (TW20) contains evidence of recently acquired DNA, including two large regions (635 kb and 127 kb). The strain is resistant to a wide range of antibiotics, antiseptics, and heavy metals due to resistance genes encoded on mobile genetic elements and also mutations in housekeeping genes.
BACKGROUND: Trypanosoma brucei gambiense is the causative agent of chronic Human African Trypanosomiasis or sleeping sickness, a disease endemic across often poor and rural areas of Western and Central Africa. We have previously published the genome sequence of a T. b. brucei isolate, and have now employed a comparative genomics approach to understand the scale of genomic variation between T. b. gambiense and the reference genome. We sought to identify features that were uniquely associated with T. b. gambiense and its ability to infect humans. METHODS AND FINDINGS: An improved high-quality draft genome sequence for the group 1 T. b. gambiense DAL 972 isolate was produced using a whole-genome shotgun strategy. Comparison with T. b. brucei showed that sequence identity averages 99.2% in coding regions, and gene order is largely collinear. However, variation associated with segmental duplications and tandem gene arrays suggests some reduction of functional repertoire in T. b. gambiense DAL 972. A comparison of the variant surface glycoproteins (VSG) in T. b. brucei with all T. b. gambiense sequence reads showed that the essential structural repertoire of VSG domains is conserved across T. brucei. CONCLUSIONS: This study provides the first estimate of intraspecific genomic variation within T. brucei, and so has important consequences for future population genomics studies. We have shown that the T. b. gambiense genome corresponds closely with the reference, which should therefore be an effective scaffold for any T. brucei genome sequence data. As VSG repertoire is also well conserved, it may be feasible to describe the total diversity of variant antigens. While we describe several as yet uncharacterized gene families with predicted cell surface roles that were expanded in number in T. b. brucei, no T. b. gambiense-specific gene was identified outside of the subtelomeres that could explain the ability to infect humans.
We report the genome of the facultative intracellular parasite Rhodococcus equi, the only animal pathogen within the biotechnologically important actinobacterial genus Rhodococcus. The 5.0-Mb R. equi 103S genome is significantly smaller than those of environmental rhodococci. This is due to genome expansion in nonpathogenic species, via a linear gain of paralogous genes and an accelerated genetic flux, rather than reductive evolution in R. equi. The 103S genome lacks the extensive catabolic and secondary metabolic complement of environmental rhodococci, and it displays unique adaptations for host colonization and competition in the short-chain fatty acid-rich intestine and manure of herbivores--two main R. equi reservoirs. Except for a few horizontally acquired (HGT) pathogenicity loci, including a cytoadhesive pilus determinant (rpl) and the virulence plasmid vap pathogenicity island (PAI) required for intramacrophage survival, most of the potential virulence-associated genes identified in R. equi are conserved in environmental rhodococci or have homologs in nonpathogenic Actinobacteria. This suggests a mechanism of virulence evolution based on the cooption of existing core actinobacterial traits, triggered by key host niche-adaptive HGT events. We tested this hypothesis by investigating R. equi virulence plasmid-chromosome crosstalk, by global transcription profiling and expression network analysis. Two chromosomal genes conserved in environmental rhodococci, encoding putative chorismate mutase and anthranilate synthase enzymes involved in aromatic amino acid biosynthesis, were strongly coregulated with vap PAI virulence genes and required for optimal proliferation in macrophages. The regulatory integration of chromosomal metabolic genes under the control of the HGT-acquired plasmid PAI is thus an important element in the cooptive virulence of R. equi.
Citrobacter rodentium (formally Citrobacter freundii biotype 4280) is a highly infectious pathogen that causes colitis and transmissible colonic hyperplasia in mice. In common with enteropathogenic and enterohemorrhagic Escherichia coli (EPEC and EHEC, respectively), C. rodentium exploits a type III secretion system (T3SS) to induce attaching and effacing (A/E) lesions that are essential for virulence. Here, we report the fully annotated genome sequence of the 5.3-Mb chromosome and four plasmids harbored by C. rodentium strain ICC168. The genome sequence revealed key information about the phylogeny of C. rodentium and identified 1,585 C. rodentium-specific (without orthologues in EPEC or EHEC) coding sequences, 10 prophage-like regions, and 17 genomic islands, including the locus for enterocyte effacement (LEE) region, which encodes a T3SS and effector proteins. Among the 29 T3SS effectors found in C. rodentium are all 22 of the core effectors of EPEC strain E2348/69. In addition, we identified a novel C. rodentium effector, named EspS. C. rodentium harbors two type VI secretion systems (T6SS) (CTS1 and CTS2), while EHEC contains only one T6SS (EHS). Our analysis suggests that C. rodentium and EPEC/EHEC have converged on a common host infection strategy through access to a common pool of mobile DNA and that C. rodentium has lost gene functions associated with a previous pathogenic niche.
Erwinia amylovora causes the economically important disease fire blight that affects rosaceous plants, especially pear and apple. Here we report the complete genome sequence and annotation of strain ATCC 49946. The analysis of the sequence and its comparison with sequenced genomes of closely related enterobacteria revealed signs of pathoadaptation to rosaceous hosts.
Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.
BACKGROUND: The mollicute Mycoplasma conjunctivae is the etiological agent leading to infectious keratoconjunctivitis (IKC) in domestic sheep and wild caprinae. Although this pathogen is relatively benign for domestic animals treated by antibiotics, it can lead wild animals to blindness and death. This is a major cause of death in the protected species in the Alps (e.g., Capra ibex, Rupicapra rupicapra). METHODS: The genome was sequenced using a combined technique of GS-FLX (454) and Sanger sequencing, and annotated by an automatic pipeline that we designed using several tools interconnected via PERL scripts. The resulting annotations are stored in a MySQL database. RESULTS: The annotated sequence is deposited in the EMBL database (FM864216) and uploaded into the mollicutes database MolliGen http://cbi.labri.fr/outils/molligen/ allowing for comparative genomics. CONCLUSION: We show that our automatic pipeline allows for annotating a complete mycoplasma genome and present several examples of analysis in search for biological targets (e.g., pathogenic proteins).
Streptococcus pneumoniae is a human commensal and pathogen able to cause a variety of diseases that annually result in over a million deaths worldwide. The S. pneumoniae(Spain23F) sequence type 81 lineage was among the first recognized pandemic clones and was responsible for almost 40% of penicillin-resistant pneumococcal infections in the United States in the late 1990s. Analysis of the chromosome sequence of a representative strain, and comparison with other available genomes, indicates roles for integrative and conjugative elements in the evolution of pneumococci and, more particularly, the emergence of the multidrug-resistant Spain 23F ST81 lineage. A number of recently acquired loci within the chromosome appear to encode proteins involved in the production of, or immunity to, antimicrobial compounds, which may contribute to the proficiency of this strain at nasopharyngeal colonization. However, further sequencing of other pandemic clones will be required to establish whether there are any general attributes shared by these strains that are responsible for their international success.
The continued evolution of bacterial pathogens has major implications for both human and animal disease, but the exchange of genetic material between host-restricted pathogens is rarely considered. Streptococcus equi subspecies equi (S. equi) is a host-restricted pathogen of horses that has evolved from the zoonotic pathogen Streptococcus equi subspecies zooepidemicus (S. zooepidemicus). These pathogens share approximately 80% genome sequence identity with the important human pathogen Streptococcus pyogenes. We sequenced and compared the genomes of S. equi 4047 and S. zooepidemicus H70 and screened S. equi and S. zooepidemicus strains from around the world to uncover evidence of the genetic events that have shaped the evolution of the S. equi genome and led to its emergence as a host-restricted pathogen. Our analysis provides evidence of functional loss due to mutation and deletion, coupled with pathogenic specialization through the acquisition of bacteriophage encoding a phospholipase A(2) toxin, and four superantigens, and an integrative conjugative element carrying a novel iron acquisition system with similarity to the high pathogenicity island of Yersinia pestis. We also highlight that S. equi, S. zooepidemicus, and S. pyogenes share a common phage pool that enhances cross-species pathogen evolution. We conclude that the complex interplay of functional loss, pathogenic specialization, and genetic exchange between S. equi, S. zooepidemicus, and S. pyogenes continues to influence the evolution of these important streptococci.
Bacterial infections of the lungs of cystic fibrosis (CF) patients cause major complications in the treatment of this common genetic disease. Burkholderia cenocepacia infection is particularly problematic since this organism has high levels of antibiotic resistance, making it difficult to eradicate; the resulting chronic infections are associated with severe declines in lung function and increased mortality rates. B. cenocepacia strain J2315 was isolated from a CF patient and is a member of the epidemic ET12 lineage that originated in Canada or the United Kingdom and spread to Europe. The 8.06-Mb genome of this highly transmissible pathogen comprises three circular chromosomes and a plasmid and encodes a broad array of functions typical of this metabolically versatile genus, as well as numerous virulence and drug resistance functions. Although B. cenocepacia strains can be isolated from soil and can be pathogenic to both plants and man, J2315 is representative of a lineage of B. cenocepacia rarely isolated from the environment and which spreads between CF patients. Comparative analysis revealed that ca. 21% of the genome is unique in comparison to other strains of B. cenocepacia, highlighting the genomic plasticity of this species. Pseudogenes in virulence determinants suggest that the pathogenic response of J2315 may have been recently selected to promote persistence in the CF lung. The J2315 genome contains evidence that its unique and highly adapted genetic content has played a significant role in its success as an epidemic CF pathogen.
BACKGROUND: Streptococcus suis is a zoonotic pathogen that infects pigs and can occasionally cause serious infections in humans. S. suis infections occur sporadically in human Europe and North America, but a recent major outbreak has been described in China with high levels of mortality. The mechanisms of S. suis pathogenesis in humans and pigs are poorly understood. METHODOLOGY/PRINCIPAL FINDINGS: The sequencing of whole genomes of S. suis isolates provides opportunities to investigate the genetic basis of infection. Here we describe whole genome sequences of three S. suis strains from the same lineage: one from European pigs, and two from human cases from China and Vietnam. Comparative genomic analysis was used to investigate the variability of these strains. S. suis is phylogenetically distinct from other Streptococcus species for which genome sequences are currently available. Accordingly, approximately 40% of the approximately 2 Mb genome is unique in comparison to other Streptococcus species. Finer genomic comparisons within the species showed a high level of sequence conservation; virtually all of the genome is common to the S. suis strains. The only exceptions are three approximately 90 kb regions, present in the two isolates from humans, composed of integrative conjugative elements and transposons. Carried in these regions are coding sequences associated with drug resistance. In addition, small-scale sequence variation has generated pseudogenes in putative virulence and colonization factors. CONCLUSIONS/SIGNIFICANCE: The genomic inventories of genetically related S. suis strains, isolated from distinct hosts and diseases, exhibit high levels of conservation. However, the genomes provide evidence that horizontal gene transfer has contributed to the evolution of drug resistance.
BACKGROUND: Of the > 2000 serovars of Salmonella enterica subspecies I, most cause self-limiting gastrointestinal disease in a wide range of mammalian hosts. However, S. enterica serovars Typhi and Paratyphi A are restricted to the human host and cause the similar systemic diseases typhoid and paratyphoid fever. Genome sequence similarity between Paratyphi A and Typhi has been attributed to convergent evolution via relatively recent recombination of a quarter of their genomes. The accumulation of pseudogenes is a key feature of these and other host-adapted pathogens, and overlapping pseudogene complements are evident in Paratyphi A and Typhi. RESULTS: We report the 4.5 Mbp genome of a clinical isolate of Paratyphi A, strain AKU_12601, completely sequenced using capillary techniques and subsequently checked using Illumina/Solexa resequencing. Comparison with the published genome of Paratyphi A ATCC9150 revealed the two are collinear and highly similar, with 188 single nucleotide polymorphisms and 39 insertions/deletions. A comparative analysis of pseudogene complements of these and two finished Typhi genomes (CT18, Ty2) identified several pseudogenes that had been overlooked in prior genome annotations of one or both serovars, and identified 66 pseudogenes shared between serovars. By determining whether each shared and serovar-specific pseudogene had been recombined between Paratyphi A and Typhi, we found evidence that most pseudogenes have accumulated after the recombination between serovars. We also divided pseudogenes into relative-time groups: ancestral pseudogenes inherited from a common ancestor, pseudogenes recombined between serovars which likely arose between initial divergence and later recombination, serovar-specific pseudogenes arising after recombination but prior to the last evolutionary bottlenecks in each population, and more recent strain-specific pseudogenes. CONCLUSION: Recombination and pseudogene-formation have been important mechanisms of genetic convergence between Paratyphi A and Typhi, with most pseudogenes arising independently after extensive recombination between the serovars. The recombination events, along with divergence of and within each serovar, provide a relative time scale for pseudogene-forming mutations, affording rare insights into the progression of functional gene loss associated with host adaptation in Salmonella.
Enteropathogenic Escherichia coli (EPEC) was the first pathovar of E. coli to be implicated in human disease; however, no EPEC strain has been fully sequenced until now. Strain E2348/69 (serotype O127:H6 belonging to E. coli phylogroup B2) has been used worldwide as a prototype strain to study EPEC biology, genetics, and virulence. Studies of E2348/69 led to the discovery of the locus of enterocyte effacement-encoded type III secretion system (T3SS) and its cognate effectors, which play a vital role in attaching and effacing lesion formation on gut epithelial cells. In this study, we determined the complete genomic sequence of E2348/69 and performed genomic comparisons with other important E. coli strains. We identified 424 E2348/69-specific genes, most of which are carried on mobile genetic elements, and a number of genetic traits specifically conserved in phylogroup B2 strains irrespective of their pathotypes, including the absence of the ETT2-related T3SS, which is present in E. coli strains belonging to all other phylogroups. The genome analysis revealed the entire gene repertoire related to E2348/69 virulence. Interestingly, E2348/69 contains only 21 intact T3SS effector genes, all of which are carried on prophages and integrative elements, compared to over 50 effector genes in enterohemorrhagic E. coli O157. As E2348/69 is the most-studied pathogenic E. coli strain, this study provides a genomic context for the vast amount of existing experimental data. The unexpected simplicity of the E2348/69 T3SS provides the first opportunity to fully dissect the entire virulence strategy of attaching and effacing pathogens in the genomic context.
Candida dubliniensis is the closest known relative of Candida albicans, the most pathogenic yeast species in humans. However, despite both species sharing many phenotypic characteristics, including the ability to form true hyphae, C. dubliniensis is a significantly less virulent and less versatile pathogen. Therefore, to identify C. albicans-specific genes that may be responsible for an increased capacity to cause disease, we have sequenced the C. dubliniensis genome and compared it with the known C. albicans genome sequence. Although the two genome sequences are highly similar and synteny is conserved throughout, 168 species-specific genes are identified, including some encoding known hyphal-specific virulence factors, such as the aspartyl proteinases Sap4 and Sap5 and the proposed invasin Als3. Among the 115 pseudogenes confirmed in C. dubliniensis are orthologs of several filamentous growth regulator (FGR) genes that also have suspected roles in pathogenesis. However, the principal differences in genomic repertoire concern expansion of the TLO gene family of putative transcription factors and the IFA family of putative transmembrane proteins in C. albicans, which represent novel candidate virulence-associated factors. The results suggest that the recent evolutionary histories of C. albicans and C. dubliniensis are quite different. While gene families instrumental in pathogenesis have been elaborated in C. albicans, C. dubliniensis has lost genomic capacity and key pathogenic functions. This could explain why C. albicans is a more potent pathogen in humans than C. dubliniensis.
BACKGROUND: Chlamydia trachomatis is the most common cause of sexually transmitted infections globally and the leading cause of preventable blindness in the developing world. There are two biovariants of C. trachomatis: 'trachoma', causing ocular and genital tract infections, and the invasive 'lymphogranuloma venereum' strains. Recently, a new variant of the genital tract C. trachomatis emerged in Sweden. This variant escaped routine diagnostic tests because it carries a plasmid with a deletion. Failure to detect this strain has meant it has spread rapidly across the country provoking a worldwide alert. In addition to being a key diagnostic target, the plasmid has been linked to chlamydial virulence. Analysis of chlamydial plasmids and their cognate chromosomes was undertaken to provide insights into the evolutionary relationship between chromosome and plasmid. This is essential knowledge if the plasmid is to be continued to be relied on as a key diagnostic marker, and for an understanding of the evolution of Chlamydia trachomatis. RESULTS: The genomes of two new C. trachomatis strains were sequenced, together with plasmids from six C. trachomatis isolates, including the new variant strain from Sweden. The plasmid from the new Swedish variant has a 377 bp deletion in the first predicted coding sequence, abolishing the site used for PCR detection, resulting in negative diagnosis. In addition, the variant plasmid has a 44 bp duplication downstream of the deletion. The region containing the second predicted coding sequence is the most highly conserved region of the plasmids investigated. Phylogenetic analysis of the plasmids and chromosomes are fully congruent. Moreover this analysis also shows that ocular and genital strains diverged from a common C. trachomatis progenitor. CONCLUSION: The evolutionary pathways of the chlamydial genome and plasmid imply that inheritance of the plasmid is tightly linked with its cognate chromosome. These data suggest that the plasmid is not a highly mobile genetic element and does not transfer readily between isolates. Comparative analysis of the plasmid sequences has revealed the most conserved regions that should be used to design future plasmid based nucleic acid amplification tests, to avoid diagnostic failures.
BACKGROUND: Pseudomonas fluorescens are common soil bacteria that can improve plant health through nutrient cycling, pathogen antagonism and induction of plant defenses. The genome sequences of strains SBW25 and Pf0-1 were determined and compared to each other and with P. fluorescens Pf-5. A functional genomic in vivo expression technology (IVET) screen provided insight into genes used by P. fluorescens in its natural environment and an improved understanding of the ecological significance of diversity within this species. RESULTS: Comparisons of three P. fluorescens genomes (SBW25, Pf0-1, Pf-5) revealed considerable divergence: 61% of genes are shared, the majority located near the replication origin. Phylogenetic and average amino acid identity analyses showed a low overall relationship. A functional screen of SBW25 defined 125 plant-induced genes including a range of functions specific to the plant environment. Orthologues of 83 of these exist in Pf0-1 and Pf-5, with 73 shared by both strains. The P. fluorescens genomes carry numerous complex repetitive DNA sequences, some resembling Miniature Inverted-repeat Transposable Elements (MITEs). In SBW25, repeat density and distribution revealed 'repeat deserts' lacking repeats, covering approximately 40% of the genome. CONCLUSIONS: P. fluorescens genomes are highly diverse. Strain-specific regions around the replication terminus suggest genome compartmentalization. The genomic heterogeneity among the three strains is reminiscent of a species complex rather than a single species. That 42% of plant-inducible genes were not shared by all strains reinforces this conclusion and shows that ecological success requires specialized and core functions. The diversity also indicates the significant size of genetic information within the Pseudomonas pan genome.
BACKGROUND The continued rise of Clostridium difficile infections worldwide has been accompanied by the rapid emergence of a highly virulent clone designated PCR-ribotype 027. To understand more about the evolution of this virulent clone, we made a three-way genomic and phenotypic comparison of an 'historic' non-epidemic 027 C. difficile (CD196), a recent epidemic and hypervirulent 027 (R20291) and a previously sequenced PCR-ribotype 012 strain (630).
RESULTS:
Although the genomes are highly conserved, the 027 genomes have 234 additional genes compared to 630, which may contribute to the distinct phenotypic differences we observe between these strains relating to motility, antibiotic resistance and toxicity. The epidemic 027 strain has five unique genetic regions, absent from both the non-epidemic 027 and strain 630, which include a novel phage island, a two component regulatory system and transcriptional regulators.
CONCLUSIONS:
A comparison of a series of 027 isolates showed that some of these genes appeared to have been gained by 027 strains over the past two decades. This study provides genetic markers for the identification of 027 strains and offers a unique opportunity to explain the recent emergence of a hypervirulent bacterium.
BACKGROUND: Streptococcus uberis, a Gram positive bacterial pathogen responsible for a significant proportion of bovine mastitis in commercial dairy herds, colonises multiple body sites of the cow including the gut, genital tract and mammary gland. Comparative analysis of the complete genome sequence of S. uberis strain 0140J was undertaken to help elucidate the biology of this effective bovine pathogen. RESULTS: The genome revealed 1,825 predicted coding sequences (CDSs) of which 62 were identified as pseudogenes or gene fragments. Comparisons with related pyogenic streptococci identified a conserved core (40%) of orthologous CDSs. Intriguingly, S. uberis 0140J displayed a lower number of mobile genetic elements when compared with other pyogenic streptococci, however bacteriophage-derived islands and a putative genomic island were identified. Comparative genomics analysis revealed most similarity to the genomes of Streptococcus agalactiae and Streptococcus equi subsp. zooepidemicus. In contrast, streptococcal orthologs were not identified for 11% of the CDSs, indicating either unique retention of ancestral sequence, or acquisition of sequence from alternative sources. Functions including transport, catabolism, regulation and CDSs encoding cell envelope proteins were over-represented in this unique gene set; a limited array of putative virulence CDSs were identified. CONCLUSION: S. uberis utilises nutritional flexibility derived from a diversity of metabolic options to successfully occupy a discrete ecological niche. The features observed in S. uberis are strongly suggestive of an opportunistic pathogen adapted to challenging and changing environmental parameters.
Pseudomonas aeruginosa isolates have a highly conserved core genome representing up to 90% of the total genomic sequence with additional variable accessory genes, many of which are found in genomic islands or islets. The identification of the Liverpool Epidemic Strain (LES) in a children's cystic fibrosis (CF) unit in 1996 and its subsequent observation in several centers in the United Kingdom challenged the previous widespread assumption that CF patients acquire only unique strains of P. aeruginosa from the environment. To learn about the forces that shaped the development of this important epidemic strain, the genome of the earliest archived LES isolate, LESB58, was sequenced. The sequence revealed the presence of many large genomic islands, including five prophage clusters, one defective (pyocin) prophage cluster, and five non-phage islands. To determine the role of these clusters, an unbiased signature tagged mutagenesis study was performed, followed by selection in the chronic rat lung infection model. Forty-seven mutants were identified by sequencing, including mutants in several genes known to be involved in Pseudomonas infection. Furthermore, genes from four prophage clusters and one genomic island were identified and in direct competition studies with the parent isolate; four were demonstrated to strongly impact on competitiveness in the chronic rat lung infection model. This strongly indicates that enhanced in vivo competitiveness is a major driver for maintenance and diversifying selection of these genomic prophage genes.
Clavibacter michiganensis subsp. sepedonicus is a plant-pathogenic bacterium and the causative agent of bacterial ring rot, a devastating agricultural disease under strict quarantine control and zero tolerance in the seed potato industry. This organism appears to be largely restricted to an endophytic lifestyle, proliferating within plant tissues and unable to persist in the absence of plant material. Analysis of the genome sequence of C. michiganensis subsp. sepedonicus and comparison with the genome sequences of related plant pathogens revealed a dramatic recent evolutionary history. The genome contains 106 insertion sequence elements, which appear to have been active in extensive rearrangement of the chromosome compared to that of Clavibacter michiganensis subsp. michiganensis. There are 110 pseudogenes with overrepresentation in functions associated with carbohydrate metabolism, transcriptional regulation, and pathogenicity. Genome comparisons also indicated that there is substantial gene content diversity within the species, probably due to differential gene acquisition and loss. These genomic features and evolutionary dating suggest that there was recent adaptation for life in a restricted niche where nutrient diversity and perhaps competition are low, correlated with a reduced ability to exploit previously occupied complex niches outside the plant. Toleration of factors such as multiplication and integration of insertion sequence elements, genome rearrangements, and functional disruption of many genes and operons seems to indicate that there has been general relaxation of selective pressure on a large proportion of the genome.
BACKGROUND: Stenotrophomonas maltophilia is a nosocomial opportunistic pathogen of the Xanthomonadaceae. The organism has been isolated from both clinical and soil environments in addition to the sputum of cystic fibrosis patients and the immunocompromised. Whilst relatively distant phylogenetically, the closest sequenced relatives of S. maltophilia are the plant pathogenic xanthomonads. RESULTS: The genome of the bacteremia-associated isolate S. maltophilia K279a is 4,851,126 bp and of high G+C content. The sequence reveals an organism with a remarkable capacity for drug and heavy metal resistance. In addition to a number of genes conferring resistance to antimicrobial drugs of different classes via alternative mechanisms, nine resistance-nodulation-division (RND)-type putative antimicrobial efflux systems are present. Functional genomic analysis confirms a role in drug resistance for several of the novel RND efflux pumps. S. maltophilia possesses potentially mobile regions of DNA and encodes a number of pili and fimbriae likely to be involved in adhesion and biofilm formation that may also contribute to increased antimicrobial drug resistance. CONCLUSION: The panoply of antimicrobial drug resistance genes and mobile genetic elements found suggests that the organism can act as a reservoir of antimicrobial drug resistance determinants in a clinical environment, which is an issue of considerable concern.
BACKGROUND: The fish pathogen Aliivibrio salmonicida is the causative agent of cold-water vibriosis in marine aquaculture. The Gram-negative bacterium causes tissue degradation, hemolysis and sepsis in vivo. RESULTS: In total, 4 286 protein coding sequences were identified, and the 4.6 Mb genome of A. salmonicida has a six partite architecture with two chromosomes and four plasmids. Sequence analysis revealed a highly fragmented genome structure caused by the insertion of an extensive number of insertion sequence (IS) elements. The IS elements can be related to important evolutionary events such as gene acquisition, gene loss and chromosomal rearrangements. New A. salmonicida functional capabilities that may have been aquired through horizontal DNA transfer include genes involved in iron-acquisition, and protein secretion and play potential roles in pathogenicity. On the other hand, the degeneration of 370 genes and consequent loss of specific functions suggest that A. salmonicida has a reduced metabolic and physiological capacity in comparison to related Vibrionaceae species. CONCLUSION: Most prominent is the loss of several genes involved in the utilisation of the polysaccharide chitin. In particular, the disruption of three extracellular chitinases responsible for enzymatic breakdown of chitin makes A. salmonicida unable to grow on the polymer form of chitin. These, and other losses could restrict the variety of carrier organisms A. salmonicida can attach to, and associate with. Gene acquisition and gene loss may be related to the emergence of A. salmonicida as a fish pathogen.
The obligate intracellular bacterium Wolbachia pipientis strain wPip induces cytoplasmic incompatibility (CI), patterns of crossing sterility, in the Culex pipiens group of mosquitoes. The complete sequence is presented of the 1.48-Mbp genome of wPip which encodes 1386 coding sequences (CDSs), representing the first genome sequence of a B-supergroup Wolbachia. Comparisons were made with the smaller genomes of Wolbachia strains wMel of Drosophila melanogaster, an A-supergroup Wolbachia that is also a CI inducer, and wBm, a mutualist of Brugia malayi nematodes that belongs to the D-supergroup of Wolbachia. Despite extensive gene order rearrangement, a core set of Wolbachia genes shared between the 3 genomes can be identified and contrasts with a flexible gene pool where rapid evolution has taken place. There are much more extensive prophage and ankyrin repeat encoding (ANK) gene components of the wPip genome compared with wMel and wBm, and both are likely to be of considerable importance in wPip biology. Five WO-B-like prophage regions are present and contain some genes that are identical or highly similar in multiple prophage copies, whereas other genes are unique, and it is likely that extensive recombination, duplication, and insertion have occurred between copies. A much larger number of genes encode ankyrin repeat (ANK) proteins in wPip, with 60 present compared with 23 in wMel, many of which are within or close to the prophage regions. It is likely that this pattern is partly a result of expansions in the wPip lineage, due for example to gene duplication, but their presence is in some cases more ancient. The wPip genome underlines the considerable evolutionary flexibility of Wolbachia, providing clear evidence for the rapid evolution of ANK-encoding genes and of prophage regions. This host-Wolbachia system, with its complex patterns of sterility induced between populations, now provides an excellent model for unraveling the molecular systems underlying host reproductive manipulation.
Plasmodium knowlesi is an intracellular malaria parasite whose natural vertebrate host is Macaca fascicularis (the 'kra' monkey); however, it is now increasingly recognized as a significant cause of human malaria, particularly in southeast Asia. Plasmodium knowlesi was the first malaria parasite species in which antigenic variation was demonstrated, and it has a close phylogenetic relationship to Plasmodium vivax, the second most important species of human malaria parasite (reviewed in ref. 4). Despite their relatedness, there are important phenotypic differences between them, such as host blood cell preference, absence of a dormant liver stage or 'hypnozoite' in P. knowlesi, and length of the asexual cycle (reviewed in ref. 4). Here we present an analysis of the P. knowlesi (H strain, Pk1(A+) clone) nuclear genome sequence. This is the first monkey malaria parasite genome to be described, and it provides an opportunity for comparison with the recently completed P. vivax genome and other sequenced Plasmodium genomes. In contrast to other Plasmodium genomes, putative variant antigen families are dispersed throughout the genome and are associated with intrachromosomal telomere repeats. One of these families, the KIRs, contains sequences that collectively match over one-half of the host CD99 extracellular domain, which may represent an unusual form of molecular mimicry.
The gram-negative enteric bacterium Proteus mirabilis is a frequent cause of urinary tract infections in individuals with long-term indwelling catheters or with complicated urinary tracts (e.g., due to spinal cord injury or anatomic abnormality). P. mirabilis bacteriuria may lead to acute pyelonephritis, fever, and bacteremia. Most notoriously, this pathogen uses urease to catalyze the formation of kidney and bladder stones or to encrust or obstruct indwelling urinary catheters. Here we report the complete genome sequence of P. mirabilis HI4320, a representative strain cultured in our laboratory from the urine of a nursing home patient with a long-term (> or =30 days) indwelling urinary catheter. The genome is 4.063 Mb long and has a G+C content of 38.88%. There is a single plasmid consisting of 36,289 nucleotides. Annotation of the genome identified 3,685 coding sequences and seven rRNA loci. Analysis of the sequence confirmed the presence of previously identified virulence determinants, as well as a contiguous 54-kb flagellar regulon and 17 types of fimbriae. Genes encoding a potential type III secretion system were identified on a low-G+C-content genomic island containing 24 intact genes that appear to encode all components necessary to assemble a type III secretion system needle complex. In addition, the P. mirabilis HI4320 genome possesses four tandem copies of the zapE metalloprotease gene, genes encoding six putative autotransporters, an extension of the atf fimbrial operon to six genes, including an mrpJ homolog, and genes encoding at least five iron uptake mechanisms, two potential type IV secretion systems, and 16 two-component regulators.
Mycobacterium marinum, a ubiquitous pathogen of fish and amphibia, is a near relative of Mycobacterium tuberculosis, the etiologic agent of tuberculosis in humans. The genome of the M strain of M. marinum comprises a 6,636,827-bp circular chromosome with 5424 CDS, 10 prophages, and a 23-kb mercury-resistance plasmid. Prominent features are the very large number of genes (57) encoding polyketide synthases (PKSs) and nonribosomal peptide synthases (NRPSs) and the most extensive repertoire yet reported of the mycobacteria-restricted PE and PPE proteins, and related-ESX secretion systems. Some of the NRPS genes comprise a novel family and seem to have been acquired horizontally. M. marinum is used widely as a model organism to study M. tuberculosis pathogenesis, and genome comparisons confirmed the close genetic relationship between these two species, as they share 3000 orthologs with an average amino acid identity of 85%. Comparisons with the more distantly related Mycobacterium avium subspecies paratuberculosis and Mycobacterium smegmatis reveal how an ancestral generalist mycobacterium evolved into M. tuberculosis and M. marinum. M. tuberculosis has undergone genome downsizing and extensive lateral gene transfer to become a specialized pathogen of humans and other primates without retaining an environmental niche. M. marinum has maintained a large genome so as to retain the capacity for environmental survival while becoming a broad host range pathogen that produces disease strikingly similar to M. tuberculosis. The work described herein provides a foundation for using M. marinum to better understand the determinants of pathogenesis of tuberculosis.
We have determined the complete genome sequences of a host-promiscuous Salmonella enterica serovar Enteritidis PT4 isolate P125109 and a chicken-restricted Salmonella enterica serovar Gallinarum isolate 287/91. Genome comparisons between these and other Salmonella isolates indicate that S. Gallinarum 287/91 is a recently evolved descendent of S. Enteritidis. Significantly, the genome of S. Gallinarum has undergone extensive degradation through deletion and pseudogene formation. Comparison of the pseudogenes in S. Gallinarum with those identified previously in other host-adapted bacteria reveals the loss of many common functional traits and provides insights into possible mechanisms of host and tissue adaptation. We propose that experimental analysis in chickens and mice of S. Enteritidis-harboring mutations in functional homologs of the pseudogenes present in S. Gallinarum could provide an experimentally tractable route toward unraveling the genetic basis of host adaptation in S. enterica.
Chlamydia trachomatis is the most common cause of sexually transmitted infections in the UK, a statistic that is also reflected globally. There are three biovariants of C. trachomatis: trachoma (serotypes A-C) and two sexually transmitted pathovars; serotypes D-K and lymphogranuloma venereum (LGV). Trachoma isolates and the sexually transmitted serotypes D-K are noninvasive, whereas the LGV strains are invasive, causing a disseminating infection of the local draining lymph nodes. Genome sequences are available for single isolates from the trachoma (serotype A) and sexually transmitted (serotype D) biotypes. We sequenced two isolates from the remaining biotype, LGV, a long-term laboratory passaged strain and the recent "epidemic" LGV isolate-causing proctitis. Although the genome of the LGV strain shows no additional genes that could account for the differences in disease outcome, we found evidence of functional gene loss and identified regions of heightened sequence variation that have previously been shown to be important sites for interstrain recombination. We have used new sequencing technologies to show that the recent clinical LGV isolate causing proctitis is unlikely to be a newly emerged strain but is most probably an old strain with relatively new clinical manifestations.
The bacterium Neisseria meningitidis is commonly found harmlessly colonising the mucosal surfaces of the human nasopharynx. Occasionally strains can invade host tissues causing septicaemia and meningitis, making the bacterium a major cause of morbidity and mortality in both the developed and developing world. The species is known to be diverse in many ways, as a product of its natural transformability and of a range of recombination and mutation-based systems. Previous work on pathogenic Neisseria has identified several mechanisms for the generation of diversity of surface structures, including phase variation based on slippage-like mechanisms and sequence conversion of expressed genes using information from silent loci. Comparison of the genome sequences of two N. meningitidis strains, serogroup B MC58 and serogroup A Z2491, suggested further mechanisms of variation, including C-terminal exchange in specific genes and enhanced localised recombination and variation related to repeat arrays. We have sequenced the genome of N. meningitidis strain FAM18, a representative of the ST-11/ET-37 complex, providing the first genome sequence for the disease-causing serogroup C meningococci; it has 1,976 predicted genes, of which 60 do not have orthologues in the previously sequenced serogroup A or B strains. Through genome comparison with Z2491 and MC58 we have further characterised specific mechanisms of genetic variation in N. meningitidis, describing specialised loci for generation of cell surface protein variants and measuring the association between noncoding repeat arrays and sequence variation in flanking genes. Here we provide a detailed view of novel genetic diversification mechanisms in N. meningitidis. Our analysis provides evidence for the hypothesis that the noncoding repeat arrays in neisserial genomes (neisserial intergenic mosaic elements) provide a crucial mechanism for the generation of surface antigen variants. Such variation will have an impact on the interaction with the host tissues, and understanding these mechanisms is important to aid our understanding of the intimate and complex relationship between the human nasopharynx and the meningococcus.
To understand the evolution, attenuation, and variable protective efficacy of bacillus Calmette-Guerin (BCG) vaccines, Mycobacterium bovis BCG Pasteur 1173P2 has been subjected to comparative genome and transcriptome analysis. The 4,374,522-bp genome contains 3,954 protein-coding genes, 58 of which are present in two copies as a result of two independent tandem duplications, DU1 and DU2. DU1 is restricted to BCG Pasteur, although four forms of DU2 exist; DU2-I is confined to early BCG vaccines, like BCG Japan, whereas DU2-III and DU2-IV occur in the late vaccines. The glycerol-3-phosphate dehydrogenase gene, glpD2, is one of only three genes common to all four DU2 variants, implying that BCG requires higher levels of this enzyme to grow on glycerol. Further amplification of the DU2 region is ongoing, even within vaccine preparations used to immunize humans. An evolutionary scheme for BCG vaccines was established by analyzing DU2 and other markers. Lesions in genes encoding sigma-factors and pleiotropic transcriptional regulators, like PhoR and Crp, were also uncovered in various BCG strains; together with gene amplification, these affect gene expression levels, immunogenicity, and, possibly, protection against tuberculosis. Furthermore, the combined findings suggest that early BCG vaccines may even be superior to the later ones that are more widely used.
Comparisons of the 1.84-Mb genome of serotype M5 Streptococcus pyogenes strain Manfredo with previously sequenced genomes emphasized the role of prophages in diversification of S. pyogenes and the close relationship between strain Manfredo and MGAS8232, another acute rheumatic fever-associated strain.
Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only approximately 200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader-associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.
Clostridium botulinum is a heterogeneous Gram-positive species that comprises four genetically and physiologically distinct groups of bacteria that share the ability to produce botulinum neurotoxin, the most poisonous toxin known to man, and the causative agent of botulism, a severe disease of humans and animals. We report here the complete genome sequence of a representative of Group I (proteolytic) C. botulinum (strain Hall A, ATCC 3502). The genome consists of a chromosome (3,886,916 bp) and a plasmid (16,344 bp), which carry 3650 and 19 predicted genes, respectively. Consistent with the proteolytic phenotype of this strain, the genome harbors a large number of genes encoding secreted proteases and enzymes involved in uptake and metabolism of amino acids. The genome also reveals a hitherto unknown ability of C. botulinum to degrade chitin. There is a significant lack of recently acquired DNA, indicating a stable genomic content, in strong contrast to the fluid genome of Clostridium difficile, which can form longer-term relationships with its host. Overall, the genome indicates that C. botulinum is adapted to a saprophytic lifestyle both in soil and aquatic environments. This pathogen relies on its toxin to rapidly kill a wide range of prey species, and to gain access to nutrient sources, it releases a large number of extracellular enzymes to soften and destroy rotting or decayed tissues.
Toxoplasma gondii is a globally distributed protozoan parasite that can infect virtually all warm-blooded animals and humans. Despite the existence of a sexual phase in the life cycle, T. gondii has an unusual population structure dominated by three clonal lineages that predominate in North America and Europe, (Types I, II, and III). These lineages were founded by common ancestors approximately10,000 yr ago. The recent origin and widespread distribution of the clonal lineages is attributed to the circumvention of the sexual cycle by a new mode of transmission-asexual transmission between intermediate hosts. Asexual transmission appears to be multigenic and although the specific genes mediating this trait are unknown, it is predicted that all members of the clonal lineages should share the same alleles. Genetic mapping studies suggested that chromosome Ia was unusually monomorphic compared with the rest of the genome. To investigate this further, we sequenced chromosome Ia and chromosome Ib in the Type I strain, RH, and the Type II strain, ME49. Comparative genome analyses of the two chromosomal sequences revealed that the same copy of chromosome Ia was inherited in each lineage, whereas chromosome Ib maintained the same high frequency of between-strain polymorphism as the rest of the genome. Sampling of chromosome Ia sequence in seven additional representative strains from the three clonal lineages supports a monomorphic inheritance, which is unique within the genome. Taken together, our observations implicate a specific combination of alleles on chromosome Ia in the recent origin and widespread success of the clonal lineages of T. gondii.
Bordetella avium is a pathogen of poultry and is phylogenetically distinct from Bordetella bronchiseptica, Bordetella pertussis, and Bordetella parapertussis, which are other species in the Bordetella genus that infect mammals. In order to understand the evolutionary relatedness of Bordetella species and further the understanding of pathogenesis, we obtained the complete genome sequence of B. avium strain 197N, a pathogenic strain that has been extensively studied. With 3,732,255 base pairs of DNA and 3,417 predicted coding sequences, it has the smallest genome and gene complement of the sequenced bordetellae. In this study, the presence or absence of previously reported virulence factors from B. avium was confirmed, and the genetic bases for growth characteristics were elucidated. Over 1,100 genes present in B. avium but not in B. bronchiseptica were identified, and most were predicted to encode surface or secreted proteins that are likely to define an organism adapted to the avian rather than the mammalian respiratory tracts. These include genes coding for the synthesis of a polysaccharide capsule, hemagglutinins, a type I secretion system adjacent to two very large genes for secreted proteins, and unique genes for both lipopolysaccharide and fimbrial biogenesis. Three apparently complete prophages are also present. The BvgAS virulence regulatory system appears to have polymorphisms at a poly(C) tract that is involved in phase variation in other bordetellae. A number of putative iron-regulated outer membrane proteins were predicted from the sequence, and this regulation was confirmed experimentally for five of these.
We determined the complete genome sequence of Clostridium difficile strain 630, a virulent and multidrug-resistant strain. Our analysis indicates that a large proportion (11%) of the genome consists of mobile genetic elements, mainly in the form of conjugative transposons. These mobile elements are putatively responsible for the acquisition by C. difficile of an extensive array of genes involved in antimicrobial resistance, virulence, host interaction and the production of surface structures. The metabolic capabilities encoded in the genome show multiple adaptations for survival and growth within the gut environment. The extreme genome variability was confirmed by whole-genome microarray analysis; it may reflect the organism's niche in the gut and should provide information on the evolution of virulence in this organism.
The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B) and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common themes in the genome evolution of other human enteropathogens.
BACKGROUND: Rhizobium leguminosarum is an alpha-proteobacterial N2-fixing symbiont of legumes that has been the subject of more than a thousand publications. Genes for the symbiotic interaction with plants are well studied, but the adaptations that allow survival and growth in the soil environment are poorly understood. We have sequenced the genome of R. leguminosarum biovar viciae strain 3841. RESULTS: The 7.75 Mb genome comprises a circular chromosome and six circular plasmids, with 61% G+C overall. All three rRNA operons and 52 tRNA genes are on the chromosome; essential protein-encoding genes are largely chromosomal, but most functional classes occur on plasmids as well. Of the 7,263 protein-encoding genes, 2,056 had orthologs in each of three related genomes (Agrobacterium tumefaciens, Sinorhizobium meliloti, and Mesorhizobium loti), and these genes were over-represented in the chromosome and had above average G+C. Most supported the rRNA-based phylogeny, confirming A. tumefaciens to be the closest among these relatives, but 347 genes were incompatible with this phylogeny; these were scattered throughout the genome but were over-represented on the plasmids. An unexpectedly large number of genes were shared by all three rhizobia but were missing from A. tumefaciens. CONCLUSION: Overall, the genome can be considered to have two main components: a 'core', which is higher in G+C, is mostly chromosomal, is shared with related organisms, and has a consistent phylogeny; and an 'accessory' component, which is sporadic in distribution, lower in G+C, and located on the plasmids and chromosomal islands. The accessory genome has a different nucleotide composition from the core despite a long history of coexistence.
African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including approximately 900 pseudogenes and approximately 1700 T. brucei-specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.
The obligately anaerobic bacterium Bacteroides fragilis, an opportunistic pathogen and inhabitant of the normal human colonic microbiota, exhibits considerable within-strain phase and antigenic variation of surface components. The complete genome sequence has revealed an unusual breadth (in number and in effect) of DNA inversion events that potentially control expression of many different components, including surface and secreted components, regulatory molecules, and restriction-modification proteins. Invertible promoters of two different types (12 group 1 and 11 group 2) were identified. One group has inversion crossover (fix) sites similar to the hix sites of Salmonella typhimurium. There are also four independent intergenic shufflons that potentially alter the expression and function of varied genes. The composition of the 10 different polysaccharide biosynthesis gene clusters identified (7 with associated invertible promoters) suggests a mechanism of synthesis similar to the O-antigen capsules of Escherichia coli.
The social amoebae are exceptional in their ability to alternate between unicellular and multicellular forms. Here we describe the genome of the best-studied member of this group, Dictyostelium discoideum. The gene-dense chromosomes of this organism encode approximately 12,500 predicted proteins, a high proportion of which have long, repetitive amino acid tracts. There are many genes for polyketide synthases and ABC transporters, suggesting an extensive secondary metabolism for producing and exporting small molecules. The genome is rich in complex repeats, one class of which is clustered and may serve as centromeres. Partial copies of the extrachromosomal ribosomal DNA (rDNA) element are found at the ends of each chromosome, suggesting a novel telomere structure and the use of a common mechanism to maintain both the rDNA and chromosomal termini. A proteome-based phylogeny shows that the amoebozoa diverged from the animal-fungal lineage after the plant-animal split, but Dictyostelium seems to have retained more of the diversity of the ancestral genome than have plants, animals or fungi.
Plasmodium berghei and Plasmodium chabaudi are widely used model malaria species. Comparison of their genomes, integrated with proteomic and microarray data, with the genomes of Plasmodium falciparum and Plasmodium yoelii revealed a conserved core of 4500 Plasmodium genes in the central regions of the 14 chromosomes and highlighted genes evolving rapidly because of stage-specific selective pressures. Four strategies for gene expression are apparent during the parasites' life cycle: (i) housekeeping; (ii) host-related; (iii) strategy-specific related to invasion, asexual replication, and sexual development; and (iv) stage-specific. We observed posttranscriptional gene silencing through translational repression of messenger RNA during sexual development, and a 47-base 3' untranslated region motif is implicated in this process.
Leishmania species cause a spectrum of human diseases in tropical and subtropical regions of the world. We have sequenced the 36 chromosomes of the 32.8-megabase haploid genome of Leishmania major (Friedlin strain) and predict 911 RNA genes, 39 pseudogenes, and 8272 protein-coding genes, of which 36% can be ascribed a putative function. These include genes involved in host-pathogen interactions, such as proteolytic enzymes, and extensive machinery for synthesis of complex surface glycoconjugates. The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Trypanosoma cruzi (Tritryp) genomes suggest that the mechanisms regulating RNA polymerase II-directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling. Abundant RNA-binding proteins are encoded in the Tritryp genomes, consistent with active posttranscriptional regulation of gene expression.
Entamoeba histolytica is an intestinal parasite and the causative agent of amoebiasis, which is a significant source of morbidity and mortality in developing countries. Here we present the genome of E. histolytica, which reveals a variety of metabolic adaptations shared with two other amitochondrial protist pathogens: Giardia lamblia and Trichomonas vaginalis. These adaptations include reduction or elimination of most mitochondrial metabolic pathways and the use of oxidative stress enzymes generally associated with anaerobic prokaryotes. Phylogenomic analysis identifies evidence for lateral gene transfer of bacterial genes into the E. histolytica genome, the effects of which centre on expanding aspects of E. histolytica's metabolic repertoire. The presence of these genes and the potential for novel metabolic pathways in E. histolytica may allow for the development of new chemotherapeutic agents. The genome encodes a large number of novel receptor kinases and contains expansions of a variety of gene families, including those associated with virulence. Additional genome features include an abundance of tandemly repeated transfer-RNA-containing arrays, which may have a structural function in the genome. Analysis of the genome provides new insights into the workings and genome evolution of a major human pathogen.
Aspergillus fumigatus is exceptional among microorganisms in being both a primary and opportunistic pathogen as well as a major allergen. Its conidia production is prolific, and so human respiratory tract exposure is almost constant. A. fumigatus is isolated from human habitats and vegetable compost heaps. In immunocompromised individuals, the incidence of invasive infection can be as high as 50% and the mortality rate is often about 50% (ref. 2). The interaction of A. fumigatus and other airborne fungi with the immune system is increasingly linked to severe asthma and sinusitis. Although the burden of invasive disease caused by A. fumigatus is substantial, the basic biology of the organism is mostly obscure. Here we show the complete 29.4-megabase genome sequence of the clinical isolate Af293, which consists of eight chromosomes containing 9,926 predicted genes. Microarray analysis revealed temperature-dependent expression of distinct sets of genes, as well as 700 A. fumigatus genes not present or significantly diverged in the closely related sexual species Neosartorya fischeri, many of which may have roles in the pathogenicity phenotype. The Af293 genome sequence provides an unparalleled resource for the future understanding of this remarkable fungus.
Theileria annulata and T. parva are closely related protozoan parasites that cause lymphoproliferative diseases of cattle. We sequenced the genome of T. annulata and compared it with that of T. parva to understand the mechanisms underlying transformation and tropism. Despite high conservation of gene sequences and synteny, the analysis reveals unequally expanded gene families and species-specific genes. We also identify divergent families of putative secreted polypeptides that may reduce immune recognition, candidate regulators of host-cell transformation, and a Theileria-specific protein domain [frequently associated in Theileria (FAINT)] present in a large number of secreted proteins.
The obligate intracellular bacterial pathogen Chlamydophila abortus strain S26/3 (formerly the abortion subtype of Chlamydia psittaci) is an important cause of late gestation abortions in ruminants and pigs. Furthermore, although relatively rare, zoonotic infection can result in acute illness and miscarriage in pregnant women. The complete genome sequence was determined and shows a high level of conservation in both sequence and overall gene content in comparison to other Chlamydiaceae. The 1,144,377-bp genome contains 961 predicted coding sequences, 842 of which are conserved with those of Chlamydophila caviae and Chlamydophila pneumoniae. Within this conserved Cp. abortus core genome we have identified the major regions of variation and have focused our analysis on these loci, several of which were found to encode highly variable protein families, such as TMH/Inc and Pmp families, which are strong candidates for the source of diversity in host tropism and disease causation in this group of organisms. Significantly, Cp. abortus lacks any toxin genes, and also lacks genes involved in tryptophan metabolism and nucleotide salvaging (guaB is present as a pseudogene), suggesting that the genetic basis of niche adaptation of this species is distinct from those previously proposed for other chlamydial species.
The genus Coccolithovirus is a recently discovered group of viruses that infect the globally important marine calcifying microalga Emiliania huxleyi. Among the 472 predicted genes of the 407,339-base pair genome are a variety of unexpected genes, most notably those involved in biosynthesis of ceramide, a sphingolipid known to induce apoptosis. Uniquely for algal viruses, it also contains six RNA polymerase subunits and a novel promoter, suggesting this virus encodes its own transcription machinery. Microarray transcriptomic analysis reveals that 65% of the predicted virus-encoded genes are expressed during lytic infection of E. huxleyi.
The bacterial family Enterobacteriaceae is notable for its well studied human pathogens, including Salmonella, Yersinia, Shigella, and Escherichia spp. However, it also contains several plant pathogens. We report the genome sequence of a plant pathogenic enterobacterium, Erwinia carotovora subsp. atroseptica (Eca) strain SCRI1043, the causative agent of soft rot and blackleg potato diseases. Approximately 33% of Eca genes are not shared with sequenced enterobacterial human pathogens, including some predicted to facilitate unexpected metabolic traits, such as nitrogen fixation and opine catabolism. This proportion of genes also contains an overrepresentation of pathogenicity determinants, including possible horizontally acquired gene clusters for putative type IV secretion and polyketide phytotoxin synthesis. To investigate whether these gene clusters play a role in the disease process, an arrayed set of insertional mutants was generated, and mutations were identified. Plant bioassays showed that these mutants were significantly reduced in virulence, demonstrating both the presence of novel pathogenicity determinants in Eca, and the impact of functional genomics in expanding our understanding of phytopathogenicity in the Enterobacteriaceae.
Staphylococcus aureus is an important nosocomial and community-acquired pathogen. Its genetic plasticity has facilitated the evolution of many virulent and drug-resistant strains, presenting a major and constantly changing clinical challenge. We sequenced the approximately 2.8-Mbp genomes of two disease-causing S. aureus strains isolated from distinct clinical settings: a recent hospital-acquired representative of the epidemic methicillin-resistant S. aureus EMRSA-16 clone (MRSA252), a clinically important and globally prevalent lineage; and a representative of an invasive community-acquired methicillin-susceptible S. aureus clone (MSSA476). A comparative-genomics approach was used to explore the mechanisms of evolution of clinically important S. aureus genomes and to identify regions affecting virulence and drug resistance. The genome sequences of MRSA252 and MSSA476 have a well conserved core region but differ markedly in their accessory genetic elements. MRSA252 is the most genetically diverse S. aureus strain sequenced to date: approximately 6% of the genome is novel compared with other published genomes, and it contains several unique genetic elements. MSSA476 is methicillin-susceptible, but it contains a novel Staphylococcal chromosomal cassette (SCC) mec-like element (designated SCC(476)), which is integrated at the same site on the chromosome as SCCmec elements in MRSA strains but encodes a putative fusidic acid resistance protein. The crucial role that accessory elements play in the rapid evolution of S. aureus is clearly illustrated by comparing the MSSA476 genome with that of an extremely closely related MRSA community-acquired strain; the differential distribution of large mobile elements carrying virulence and drug-resistance determinants may be responsible for the clinically important phenotypic differences in these strains.
Burkholderia pseudomallei is a recognized biothreat agent and the causative agent of melioidosis. This Gram-negative bacterium exists as a soil saprophyte in melioidosis-endemic areas of the world and accounts for 20% of community-acquired septicaemias in northeastern Thailand where half of those affected die. Here we report the complete genome of B. pseudomallei, which is composed of two chromosomes of 4.07 megabase pairs and 3.17 megabase pairs, showing significant functional partitioning of genes between them. The large chromosome encodes many of the core functions associated with central metabolism and cell growth, whereas the small chromosome carries more accessory functions associated with adaptation and survival in different niches. Genomic comparisons with closely and more distantly related bacteria revealed a greater level of gene order conservation and a greater number of orthologous genes on the large chromosome, suggesting that the two replicons have distinct evolutionary origins. A striking feature of the genome was the presence of 16 genomic islands (GIs) that together made up 6.1% of the genome. Further analysis revealed these islands to be variably present in a collection of invasive and soil isolates but entirely absent from the clonally related organism B. mallei. We propose that variable horizontal gene acquisition by B. pseudomallei is an important feature of recent genetic evolution and that this has resulted in a genetically diverse pathogenic species.
Aspergillus fumigatus is the most ubiquitous opportunistic filamentous fungal pathogen of human. As an initial step toward sequencing the entire genome of A. fumigatus, which is estimated to be approximately 30 Mb in size, we have sequenced a 922 kb region, contained within 16 overlapping bacterial artificial chromosome (BAC) clones. Fifty-four percent of the DNA is predicted to be coding with 341 putative protein coding genes. Functional classification of the proteins showed the presence of a higher proportion of enzymes and membrane transporters when compared to those of Saccharomyces cerevisiae. In addition to the nitrate assimilation gene cluster, the quinate utilisation gene cluster is also present on this 922 kb genomic sequence. We observed large scale synteny between A. fumigatus and Aspergillus nidulans by comparing this sequence to the A. nidulans genetic map of linkage group VIII.
BACKGROUND:
Whipple's disease is a rare multisystem chronic infection, involving the intestinal tract as well as various other organs. The causative agent, Tropheryma whipplei, is a Gram-positive bacterium about which little is known. Our aim was to investigate the biology of this organism by generating and analysing the complete DNA sequence of its genome.
METHODS:
We isolated and propagated T whipplei strain TW08/27 from the cerebrospinal fluid of a patient diagnosed with Whipple's disease. We generated the complete sequence of the genome by the whole genome shotgun method, and analysed it with a combination of automatic and manual bioinformatic techniques.
FINDINGS:
Sequencing revealed a condensed 925938 bp genome with a lack of key biosynthetic pathways and a reduced capacity for energy metabolism. A family of large surface proteins was identified, some associated with large amounts of non-coding repetitive DNA, and an unexpected degree of sequence variation.
INTERPRETATION:
The genome reduction and lack of metabolic capabilities point to a host-restricted lifestyle for the organism. The sequence variation indicates both known and novel mechanisms for the elaboration and variation of surface structures, and suggests that immune evasion and host interaction play an important part in the lifestyle of this persistent bacterial pathogen.
Corynebacterium diphtheriae is a Gram-positive, non-spore forming, non-motile, pleomorphic rod belonging to the genus Corynebacterium and the actinomycete group of organisms. The organism produces a potent bacteriophage-encoded protein exotoxin, diphtheria toxin (DT), which causes the symptoms of diphtheria. This potentially fatal infectious disease is controlled in many developed countries by an effective immunisation programme. However, the disease has made a dramatic return in recent years, in particular within the Eastern European region. The largest, and still on-going, outbreak since the advent of mass immunisation started within Russia and the newly independent states of the former Soviet Union in the 1990s. We have sequenced the genome of a UK clinical isolate (biotype gravis strain NCTC13129), representative of the clone responsible for this outbreak. The genome consists of a single circular chromosome of 2 488 635 bp, with no plasmids. It provides evidence that recent acquisition of pathogenicity factors goes beyond the toxin itself, and includes iron-uptake systems, adhesins and fimbrial proteins. This is in contrast to Corynebacterium's nearest sequenced pathogenic relative, Mycobacterium tuberculosis, where there is little evidence of recent horizontal DNA acquisition. The genome itself shows an unusually extreme large-scale compositional bias, being noticeably higher in G+C near the origin than at the terminus.
The African trypanosome, Trypanosoma brucei, causes sleeping sickness in humans in sub-Saharan Africa. Here we report the sequence and analysis of the 1.1 Mb chromosome I, which encodes approximately 400 predicted genes organised into directional clusters, of which more than 100 are located in the largest cluster of 250 kb. A 160-kb region consists primarily of three gene families of unknown function, one of which contains a hotspot for retroelement insertion. We also identify five novel gene families. Indeed, almost 20% of predicted genes are members of families. In some cases, tandemly arrayed genes are 99-100% identical, suggesting an active process of amplification and gene conversion. One end of the chromosome consists of a putative bloodstream-form variant surface glycoprotein (VSG) gene expression site that appears truncated and degenerate. The other chromosome end carries VSG and expression site-associated genes and pseudogenes over 50 kb of subtelomeric sequence where, unusually, the telomere-proximal VSG gene is oriented away from the telomere. Our analysis includes the cataloguing of minor genetic variations between the chromosome I homologues and an estimate of crossing-over frequency during genetic exchange. Genetic polymorphisms are exceptionally rare in sequences located within and around the strand-switches between several gene clusters.
Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica are closely related Gram-negative beta-proteobacteria that colonize the respiratory tracts of mammals. B. pertussis is a strict human pathogen of recent evolutionary origin and is the primary etiologic agent of whooping cough. B. parapertussis can also cause whooping cough, and B. bronchiseptica causes chronic respiratory infections in a wide range of animals. We sequenced the genomes of B. bronchiseptica RB50 (5,338,400 bp; 5,007 predicted genes), B. parapertussis 12822 (4,773,551 bp; 4,404 genes) and B. pertussis Tohama I (4,086,186 bp; 3,816 genes). Our analysis indicates that B. parapertussis and B. pertussis are independent derivatives of B. bronchiseptica-like ancestors. During the evolution of these two host-restricted species there was large-scale gene loss and inactivation; host adaptation seems to be a consequence of loss, not gain, of function, and differences in virulence may be related to loss of regulatory or control functions.
Streptomyces coelicolor is a representative of the group of soil-dwelling, filamentous bacteria responsible for producing most natural antibiotics used in human and veterinary medicine. Here we report the 8,667,507 base pair linear chromosome of this organism, containing the largest number of genes so far discovered in a bacterium. The 7,825 predicted genes include more than 20 clusters coding for known or predicted secondary metabolites. The genome contains an unprecedented proportion of regulatory genes, predominantly those likely to be involved in responses to external stimuli and stresses, and many duplicated gene sets that may represent 'tissue-specific' isoforms operating in different phases of colonial development, a unique situation for a bacterium. An ancient synteny was revealed between the central 'core' of the chromosome and the whole chromosome of pathogens Mycobacterium tuberculosis and Corynebacterium diphtheriae. The genome sequence will greatly increase our understanding of microbial life in the soil as well as aiding the generation of new drug candidates by genetic engineering.
The genome of the lower eukaryote Dictyostelium discoideum comprises six chromosomes. Here we report the sequence of the largest, chromosome 2, which at 8 megabases (Mb) represents about 25% of the genome. Despite an A + T content of nearly 80%, the chromosome codes for 2,799 predicted protein coding genes and 73 transfer RNA genes. This gene density, about 1 gene per 2.6 kilobases (kb), is surpassed only by Saccharomyces cerevisiae (one per 2 kb) and is similar to that of Schizosaccharomyces pombe (one per 2.5 kb). If we assume that the other chromosomes have a similar gene density, we can expect around 11,000 genes in the D. discoideum genome. A significant number of the genes show higher similarities to genes of vertebrates than to those of other fully sequenced eukaryotes. This analysis strengthens the view that the evolutionary position of D. discoideum is located before the branching of metazoa and fungi but after the divergence of the plant kingdom, placing it close to the base of metazoan evolution.
Since the sequencing of the first two chromosomes of the malaria parasite, Plasmodium falciparum, there has been a concerted effort to sequence and assemble the entire genome of this organism. Here we report the sequence of chromosomes 1, 3-9 and 13 of P. falciparum clone 3D7--these chromosomes account for approximately 55% of the total genome. We describe the methods used to map, sequence and annotate these chromosomes. By comparing our assemblies with the optical map, we indicate the completeness of the resulting sequence. During annotation, we assign Gene Ontology terms to the predicted gene products, and observe clustering of some malaria-specific terms to specific chromosomes. We identify a highly conserved sequence element found in the intergenic region of internal var genes that is not associated with their telomeric counterparts.
We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.
Leprosy, a chronic human neurological disease, results from infection with the obligate intracellular pathogen Mycobacterium leprae, a close relative of the tubercle bacillus. Mycobacterium leprae has the longest doubling time of all known bacteria and has thwarted every effort at culture in the laboratory. Comparing the 3.27-megabase (Mb) genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis (4.41 Mb) provides clear explanations for these properties and reveals an extreme case of reductive evolution. Less than half of the genome contains functional genes but pseudogenes, with intact counterparts in M. tuberculosis, abound. Genome downsizing and the current mosaic arrangement appear to have resulted from extensive recombination events between dispersed repetitive sequences. Gene deletion and decay have eliminated many important metabolic activities including siderophore production, part of the oxidative and most of the microaerophilic and anaerobic respiratory chains, and numerous catabolic systems and their regulatory circuits.
Neisseria meningitidis causes bacterial meningitis and is therefore responsible for considerable morbidity and mortality in both the developed and the developing world. Meningococci are opportunistic pathogens that colonize the nasopharynges and oropharynges of asymptomatic carriers. For reasons that are still mostly unknown, they occasionally gain access to the blood, and subsequently to the cerebrospinal fluid, to cause septicaemia and meningitis. N. meningitidis strains are divided into a number of serogroups on the basis of the immunochemistry of their capsular polysaccharides; serogroup A strains are responsible for major epidemics and pandemics of meningococcal disease, and therefore most of the morbidity and mortality associated with this disease. Here we have determined the complete genome sequence of a serogroup A strain of Neisseria meningitidis, Z2491. The sequence is 2,184,406 base pairs in length, with an overall G+C content of 51.8%, and contains 2,121 predicted coding sequences. The most notable feature of the genome is the presence of many hundreds of repetitive elements, ranging from short repeats, positioned either singly or in large multiple arrays, to insertion sequences and gene duplications of one kilobase or more. Many of these repeats appear to be involved in genome fluidity and antigenic variation in this important human pathogen.
Campylobacter jejuni, from the delta-epsilon group of proteobacteria, is a microaerophilic, Gram-negative, flagellate, spiral bacterium-properties it shares with the related gastric pathogen Helicobacter pylori. It is the leading cause of bacterial food-borne diarrhoeal disease throughout the world. In addition, infection with C. jejuni is the most frequent antecedent to a form of neuromuscular paralysis known as Guillain-Barre syndrome. Here we report the genome sequence of C. jejuni NCTC11168. C. jejuni has a circular chromosome of 1,641,481 base pairs (30.6% G+C) which is predicted to encode 1,654 proteins and 54 stable RNA species. The genome is unusual in that there are virtually no insertion sequences or phage-associated sequences and very few repeat sequences. One of the most striking findings in the genome was the presence of hypervariable sequences. These short homopolymeric runs of nucleotides were commonly found in genes encoding the biosynthesis or modification of surface structures, or in closely linked genes of unknown function. The apparently high rate of variation of these homopolymeric tracts may be important in the survival strategy of C. jejuni.
Analysis of Plasmodium falciparum chromosome 3, and comparison with chromosome 2, highlights novel features of chromosome organization and gene structure. The sub-telomeric regions of chromosome 3 show a conserved order of features, including repetitive DNA sequences, members of multigene families involved in pathogenesis and antigenic variation, a number of conserved pseudogenes, and several genes of unknown function. A putative centromere has been identified that has a core region of about 2 kilobases with an extremely high (adenine + thymidine) composition and arrays of tandem repeats. We have predicted 215 protein-coding genes and two transfer RNA genes in the 1,060,106-base-pair chromosome sequence. The predicted protein-coding genes can be divided into three main classes: 52.6% are not spliced, 45.1% have a large exon with short additional 5' or 3' exons, and 2.3% have a multiple exon structure more typical of higher eukaryotes.
Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.