First identified in 1982 as a human pathogen, enterohemorrhagic Escherichia coli of the O157:H7 serotype is a major cause of food-borne acquired human infections. Here, we report the genome sequence of the first known strain of this serotype isolated in the United States.
BACKGROUND: More than 20% of the world's population is at risk for infection by filarial nematodes and >180 million people worldwide are already infected. Along with infection comes significant morbidity that has a socioeconomic impact. The eight filarial nematodes that infect humans are Wuchereria bancrofti, Brugia malayi, Brugia timori, Onchocerca volvulus, Loa loa, Mansonella perstans, Mansonella streptocerca, and Mansonella ozzardi, of which three have published draft genome sequences. Since all have humans as the definitive host, standard avenues of research that rely on culturing and genetics have often not been possible. Therefore, genome sequencing provides an important window into understanding the biology of these parasites. The need for large amounts of high quality genomic DNA from homozygous, inbred lines; the availability of only short sequence reads from next-generation sequencing platforms at a reasonable expense; and the lack of random large insert libraries has limited our ability to generate high quality genome sequences for these parasites. However, the Pacific Biosciences single molecule, real-time sequencing platform holds great promise in reducing input amounts and generating sufficiently long sequences that bypass the need for large insert paired libraries. RESULTS: Here, we report on efforts to generate a more complete genome assembly for L. loa using genetically heterogeneous DNA isolated from a single clinical sample and sequenced on the Pacific Biosciences platform. To obtain the best assembly, numerous assemblers and sequencing datasets were analyzed, combined, and compared. Quiver-informed trimming of an assembly of only Pacific Biosciences reads by HGAP2 was selected as the final assembly of 96.4 Mbp in 2,250 contigs. This results in ~9% more of the genome in ~85% fewer contigs from ~80% less starting material at a fraction of the cost of previous Roche 454-based sequencing efforts. CONCLUSIONS: The result is the most complete filarial nematode assembly produced thus far and demonstrates the utility of single molecule sequencing on the Pacific Biosciences platform for genetically heterogeneous metazoan genomes.
Three recently sequenced strains isolated from patients during an outbreak of Mycobacterium abscessus subsp. massiliense infections at a cystic fibrosis center in the United States were compared with 6 strains from an outbreak at a cystic fibrosis center in the United Kingdom and worldwide strains. Strains from the 2 cystic fibrosis outbreaks showed high-level relatedness with each other and major-level relatedness with strains that caused soft tissue infections during an epidemic in Brazil. We identified unique single-nucleotide polymorphisms in cystic fibrosis and soft tissue outbreak strains, separate single-nucleotide polymorphisms only in cystic fibrosis outbreak strains, and unique genomic traits for each subset of isolates. Our findings highlight the necessity of identifying M. abscessus to the subspecies level and screening all cystic fibrosis isolates for relatedness to these outbreak strains. We propose 2 diagnostic strategies that use partial sequencing of rpoB and secA1 genes and a multilocus sequence typing protocol.
Helicobacter pylori, inhabitant of the gastric mucosa of over half of the world population, with decreasing prevalence in the U.S., has been associated with a variety of gastric pathologies. However, the majority of H. pylori-infected individuals remain asymptomatic, and negative correlations between H. pylori and allergic diseases have been reported. Comprehensive genome characterization of H. pylori populations from different human host backgrounds including healthy individuals provides the exciting potential to generate new insights into the open question whether human health outcome is associated with specific H. pylori genotypes or dependent on other environmental factors. We report the genome sequences of 65 H. pylori isolates from individuals with gastric cancer, preneoplastic lesions, peptic ulcer disease, gastritis, and from asymptomatic adults. Isolates were collected from multiple locations in North America (USA and Canada) as well as from Columbia and Japan. The availability of these H. pylori genome sequences from individuals with distinct clinical presentations provides the research community with a resource for detailed investigations into genetic elements that correlate either positively or negatively with the epidemiology, human host adaptation, and gastric pathogenesis and will aid in the characterization of strains that may favor the development of specific pathology, including gastric cancer.
First identified in 1982, Escherichia coli O157:H7 is the dominant enterohemorrhagic serotype underlying food-borne human infections in North America. Here, we report the genomes of twenty-six strains derived from patients and the bovine reservoir. These resources enable detailed whole-genome comparisons and permit investigations of genotypic and phenotypic plasticity.
Mycobacterium massiliense (Mycobacterium abscessus group) is an emerging pathogen causing pulmonary disease and skin and soft tissue infections. We report the genome sequence of the type strain CCUG 48898.
The Arabidopsis (Arabidopsis thaliana) genome encodes 51 proteins annotated as serine carboxypeptidase-like (SCPL) enzymes. Nineteen of these SCPL proteins are highly similar to one another, and represent a clade that appears to be unique to plants. Two of the most divergent proteins within this group have been characterized to date, sinapoyl-glucose (Glc):malate sinapoyltransferase and sinapoyl-Glc:choline sinapoyltransferase. The fact that two of the least related proteins within this clade are acyltransferases rather than true serine carboxypeptidases suggests that some or all of the remaining members of this group may have similar activities. The gene that encodes sinapoyl-Glc:malate sinapoyltransferase (sinapoyl-Glc accumulator1 [SNG1]: At2g22990) is one of five SCPL genes arranged in a cluster on chromosome 2. In this study, an analysis of deletion mutant lines lacking one or more genes in this SCPL gene cluster reveals that three of these genes also encode sinapoyl-Glc-dependent acyltransferases. At2g23000 encodes sinapoyl-Glc:anthocyanin acyltransferase, an enzyme that is required for the synthesis of the sinapoylated anthocyanins in Arabidopsis. At2g23010 encodes an enzyme capable of synthesizing 1,2-disinapoyl-Glc from two molecules of sinapoyl-Glc, an activity shared by SNG1 and At2g22980. Sequence analysis of these SCPL proteins reveals pairwise percent identities that range from 71% to 78%, suggesting that their differing specificities for acyl acceptor substrates are due to changes in a relatively small subset of amino acids. The study of these SCPL proteins provides an opportunity to examine enzyme structure-function relationships and may shed light on the role of evolution of hydroxycinnamate ester metabolism and the SCPL gene family in Arabidopsis and other flowering plants.
African trypanosomes cause human sleeping sickness and livestock trypanosomiasis in sub-Saharan Africa. We present the sequence and analysis of the 11 megabase-sized chromosomes of Trypanosoma brucei. The 26-megabase genome contains 9068 predicted genes, including approximately 900 pseudogenes and approximately 1700 T. brucei-specific genes. Large subtelomeric arrays contain an archive of 806 variant surface glycoprotein (VSG) genes used by the parasite to evade the mammalian immune system. Most VSG genes are pseudogenes, which may be used to generate expressed mosaic genes by ectopic recombination. Comparisons of the cytoskeleton and endocytic trafficking systems with those of humans and other eukaryotic organisms reveal major differences. A comparison of metabolic pathways encoded by the genomes of T. brucei, T. cruzi, and Leishmania major reveals the least overall metabolic capability in T. brucei and the greatest in L. major. Horizontal transfer of genes of bacterial origin has contributed to some of the metabolic differences in these parasites, and a number of novel potential drug targets have been identified.
Whole-genome sequencing of the protozoan pathogen Trypanosoma cruzi revealed that the diploid genome contains a predicted 22,570 proteins encoded by genes, of which 12,570 represent allelic pairs. Over 50% of the genome consists of repeated sequences, such as retrotransposons and genes for large families of surface molecules, which include trans-sialidases, mucins, gp63s, and a large novel family (>1300 copies) of mucin-associated surface protein (MASP) genes. Analyses of the T. cruzi, T. brucei, and Leishmania major (Tritryp) genomes imply differences from other eukaryotes in DNA repair and initiation of replication and reflect their unusual mitochondrial DNA. Although the Tritryp lack several classes of signaling molecules, their kinomes contain a large and diverse set of protein kinases and phosphatases; their size and diversity imply previously unknown interactions and regulatory processes, which may be targets for intervention.
Sequencing and comparative genome analysis of four strains of Campylobacter including C. lari RM2100, C. upsaliensis RM3195, and C. coli RM2228 has revealed major structural differences that are associated with the insertion of phage- and plasmid-like genomic islands, as well as major variations in the lipooligosaccharide complex. Poly G tracts are longer, are greater in number, and show greater variability in C. upsaliensis than in the other species. Many genes involved in host colonization, including racR/S, cadF, cdt, ciaB, and flagellin genes, are conserved across the species, but variations that appear to be species specific are evident for a lipooligosaccharide locus, a capsular (extracellular) polysaccharide locus, and a novel Campylobacter putative licABCD virulence locus. The strains also vary in their metabolic profiles, as well as their resistance profiles to a range of antibiotics. It is evident that the newly identified hypothetical and conserved hypothetical proteins, as well as uncharacterized two-component regulatory systems and membrane proteins, may hold additional significant information on the major differences in virulence among the species, as well as the specificity of the strains for particular hosts.
        
Title: An expression and bioinformatics analysis of the Arabidopsis serine carboxypeptidase-like gene family Fraser CM, Rider LW, Chapple C Ref: Plant Physiol, 138:1136, 2005 : PubMed
The Arabidopsis (Arabidopsis thaliana) genome encodes a family of 51 proteins that are homologous to known serine carboxypeptidases. Based on their sequences, these serine carboxypeptidase-like (SCPL) proteins can be divided into several major clades. The first group consists of 21 proteins which, despite the function implied by their annotation, includes two that have been shown to function as acyltransferases in plant secondary metabolism: sinapoylglucose:malate sinapoyltransferase and sinapoylglucose:choline sinapoyltransferase. A second group comprises 25 SCPL proteins whose biochemical functions have not been clearly defined. Genes encoding representatives from both of these clades can be found in many plants, but have not yet been identified in other phyla. In contrast, the remaining SCPL proteins include five members that are similar to serine carboxypeptidases from a variety of organisms, including fungi and animals. Reverse transcription PCR results suggest that some SCPL genes are expressed in a highly tissue-specific fashion, whereas others are transcribed in a wide range of tissue types. Taken together, these data suggest that the Arabidopsis SCPL gene family encodes a diverse group of enzymes whose functions are likely to extend beyond protein degradation and processing to include activities such as the production of secondary metabolites.
We report the genome sequence of Theileria parva, an apicomplexan pathogen causing economic losses to smallholder farmers in Africa. The parasite chromosomes exhibit limited conservation of gene synteny with Plasmodium falciparum, and its plastid-like genome represents the first example where all apicoplast genes are encoded on one DNA strand. We tentatively identify proteins that facilitate parasite segregation during host cell cytokinesis and contribute to persistent infection of transformed host cells. Several biosynthetic pathways are incomplete or absent, suggesting substantial metabolic dependence on the host cell. One protein family that may generate parasite antigenic diversity is not telomere-associated.
Staphylococcus aureus is an opportunistic pathogen and the major causative agent of numerous hospital- and community-acquired infections. Staphylococcus epidermidis has emerged as a causative agent of infections often associated with implanted medical devices. We have sequenced the approximately 2.8-Mb genome of S. aureus COL, an early methicillin-resistant isolate, and the approximately 2.6-Mb genome of S. epidermidis RP62a, a methicillin-resistant biofilm isolate. Comparative analysis of these and other staphylococcal genomes was used to explore the evolution of virulence and resistance between these two species. The S. aureus and S. epidermidis genomes are syntenic throughout their lengths and share a core set of 1,681 open reading frames. Genome islands in nonsyntenic regions are the primary source of variations in pathogenicity and resistance. Gene transfer between staphylococci and low-GC-content gram-positive bacteria appears to have shaped their virulence and resistance profiles. Integrated plasmids in S. epidermidis carry genes encoding resistance to cadmium and species-specific LPXTG surface proteins. A novel genome island encodes multiple phenol-soluble modulins, a potential S. epidermidis virulence factor. S. epidermidis contains the cap operon, encoding the polyglutamate capsule, a major virulence factor in Bacillus anthracis. Additional phenotypic differences are likely the result of single nucleotide polymorphisms, which are most numerous in cell envelope proteins. Overall differences in pathogenicity can be attributed to genome islands in S. aureus which encode enterotoxins, exotoxins, leukocidins, and leukotoxins not found in S. epidermidis.
Pseudomonas syringae pv. phaseolicola, a gram-negative bacterial plant pathogen, is the causal agent of halo blight of bean. In this study, we report on the genome sequence of P. syringae pv. phaseolicola isolate 1448A, which encodes 5,353 open reading frames (ORFs) on one circular chromosome (5,928,787 bp) and two plasmids (131,950 bp and 51,711 bp). Comparative analyses with a phylogenetically divergent pathovar, P. syringae pv. tomato DC3000, revealed a strong degree of conservation at the gene and genome levels. In total, 4,133 ORFs were identified as putative orthologs in these two pathovars using a reciprocal best-hit method, with 3,941 ORFs present in conserved, syntenic blocks. Although these two pathovars are highly similar at the physiological level, they have distinct host ranges; 1448A causes disease in beans, and DC3000 is pathogenic on tomato and Arabidopsis. Examination of the complement of ORFs encoding virulence, fitness, and survival factors revealed a substantial, but not complete, overlap between these two pathovars. Another distinguishing feature between the two pathovars is their distinctive sets of transposable elements. With access to a fifth complete pseudomonad genome sequence, we were able to identify 3,567 ORFs that likely comprise the core Pseudomonas genome and 365 ORFs that are P. syringae specific.
Cryptococcus neoformans is a basidiomycetous yeast ubiquitous in the environment, a model for fungal pathogenesis, and an opportunistic human pathogen of global importance. We have sequenced its approximately 20-megabase genome, which contains approximately 6500 intron-rich gene structures and encodes a transcriptome abundant in alternatively spliced and antisense messages. The genome is rich in transposons, many of which cluster at candidate centromeric regions. The presence of these transposons may drive karyotype instability and phenotypic variation. C. neoformans encodes unique genes that may contribute to its unusual virulence properties, and comparison of two phenotypically distinct strains reveals variation in gene content in addition to sequence polymorphisms between the genomes.
Entamoeba histolytica is an intestinal parasite and the causative agent of amoebiasis, which is a significant source of morbidity and mortality in developing countries. Here we present the genome of E. histolytica, which reveals a variety of metabolic adaptations shared with two other amitochondrial protist pathogens: Giardia lamblia and Trichomonas vaginalis. These adaptations include reduction or elimination of most mitochondrial metabolic pathways and the use of oxidative stress enzymes generally associated with anaerobic prokaryotes. Phylogenomic analysis identifies evidence for lateral gene transfer of bacterial genes into the E. histolytica genome, the effects of which centre on expanding aspects of E. histolytica's metabolic repertoire. The presence of these genes and the potential for novel metabolic pathways in E. histolytica may allow for the development of new chemotherapeutic agents. The genome encodes a large number of novel receptor kinases and contains expansions of a variety of gene families, including those associated with virulence. Additional genome features include an abundance of tandemly repeated transfer-RNA-containing arrays, which may have a structural function in the genome. Analysis of the genome provides new insights into the workings and genome evolution of a major human pathogen.
The completion of the 5,373,180-bp genome sequence of the marine psychrophilic bacterium Colwellia psychrerythraea 34H, a model for the study of life in permanently cold environments, reveals capabilities important to carbon and nutrient cycling, bioremediation, production of secondary metabolites, and cold-adapted enzymes. From a genomic perspective, cold adaptation is suggested in several broad categories involving changes to the cell membrane fluidity, uptake and synthesis of compounds conferring cryotolerance, and strategies to overcome temperature-dependent barriers to carbon uptake. Modeling of three-dimensional protein homology from bacteria representing a range of optimal growth temperatures suggests changes to proteome composition that may enhance enzyme effectiveness at low temperatures. Comparative genome analyses suggest that the psychrophilic lifestyle is most likely conferred not by a unique set of genes but by a collection of synergistic changes in overall genome content and amino acid composition.
Dehalococcoides ethenogenes is the only bacterium known to reductively dechlorinate the groundwater pollutants, tetrachloroethene (PCE) and trichloroethene, to ethene. Its 1,469,720-base pair chromosome contains large dynamic duplicated regions and integrated elements. Genes encoding 17 putative reductive dehalogenases, nearly all of which were adjacent to genes for transcription regulators, and five hydrogenase complexes were identified. These findings, plus a limited repertoire of other metabolic modes, indicate that D. ethenogenes is highly evolved to utilize halogenated organic compounds and H2. Diversification of reductive dehalogenase functions appears to have been mediated by recent genetic exchange and amplification. Genome analysis provides insights into the organism's complex nutrient requirements and suggests that an ancestor was a nitrogen-fixing autotroph.
The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for approximately 80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.
The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
Desulfovibrio vulgaris Hildenborough is a model organism for studying the energy metabolism of sulfate-reducing bacteria (SRB) and for understanding the economic impacts of SRB, including biocorrosion of metal infrastructure and bioremediation of toxic metal ions. The 3,570,858 base pair (bp) genome sequence reveals a network of novel c-type cytochromes, connecting multiple periplasmic hydrogenases and formate dehydrogenases, as a key feature of its energy metabolism. The relative arrangement of genes encoding enzymes for energy transduction, together with inferred cellular location of the enzymes, provides a basis for proposing an expansion to the 'hydrogen-cycling' model for increasing energy efficiency in this bacterium. Plasmid-encoded functions include modification of cell surface components, nitrogen fixation and a type-III protein secretion system. This genome sequence represents a substantial step toward the elucidation of pathways for reduction (and bioremediation) of pollutants such as uranium and chromium and offers a new starting point for defining this organism's complex anaerobic respiration.
Bacillus anthracis is the etiologic agent of anthrax, an acute fatal disease among mammals. It was thought to differ from Bacillus cereus, an opportunistic pathogen and cause of food poisoning, by the presence of plasmids pXO1 and pXO2, which encode the lethal toxin complex and the poly-gamma-d-glutamic acid capsule, respectively. This work describes a non-B. anthracis isolate that possesses the anthrax toxin genes and is capable of causing a severe inhalation anthrax-like illness. Although initial phenotypic and 16S rRNA analysis identified this isolate as B. cereus, the rapid generation and analysis of a high-coverage draft genome sequence revealed the presence of a circular plasmid, named pBCXO1, with 99.6% similarity with the B. anthracis toxin-encoding plasmid, pXO1. Although homologues of the pXO2 encoded capsule genes were not found, a polysaccharide capsule cluster is encoded on a second, previously unidentified plasmid, pBC218. A/J mice challenged with B. cereus G9241 confirmed the virulence of this strain. These findings represent an example of how genomics could rapidly assist public health experts responding not only to clearly identified select agents but also to novel agents with similar pathogenic potentials. In this study, we combined a public health approach with genome analysis to provide insight into the correlation of phenotypic characteristics and their genetic basis.
The genomes of three strains of Listeria monocytogenes that have been associated with food-borne illness in the USA were subjected to whole genome comparative analysis. A total of 51, 97 and 69 strain-specific genes were identified in L.monocytogenes strains F2365 (serotype 4b, cheese isolate), F6854 (serotype 1/2a, frankfurter isolate) and H7858 (serotype 4b, meat isolate), respectively. Eighty-three genes were restricted to serotype 1/2a and 51 to serotype 4b strains. These strain- and serotype-specific genes probably contribute to observed differences in pathogenicity, and the ability of the organisms to survive and grow in their respective environmental niches. The serotype 1/2a-specific genes include an operon that encodes the rhamnose biosynthetic pathway that is associated with teichoic acid biosynthesis, as well as operons for five glycosyl transferases and an adenine-specific DNA methyltransferase. A total of 8603 and 105 050 high quality single nucleotide polymorphisms (SNPs) were found on the draft genome sequences of strain H7858 and strain F6854, respectively, when compared with strain F2365. Whole genome comparative analyses revealed that the L.monocytogenes genomes are essentially syntenic, with the majority of genomic differences consisting of phage insertions, transposable elements and SNPs.
The complete genome sequence of Burkholderia mallei ATCC 23344 provides insight into this highly infectious bacterium's pathogenicity and evolutionary history. B. mallei, the etiologic agent of glanders, has come under renewed scientific investigation as a result of recent concerns about its past and potential future use as a biological weapon. Genome analysis identified a number of putative virulence factors whose function was supported by comparative genome hybridization and expression profiling of the bacterium in hamster liver in vivo. The genome contains numerous insertion sequence elements that have mediated extensive deletions and rearrangements of the genome relative to Burkholderia pseudomallei. The genome also contains a vast number (>12,000) of simple sequence repeats. Variation in simple sequence repeats in key genes can provide a mechanism for generating antigenic variation that may account for the mammalian host's inability to mount a durable adaptive immune response to a B. mallei infection.
We sequenced the complete genome of Bacillus cereus ATCC 10987, a non-lethal dairy isolate in the same genetic subgroup as Bacillus anthracis. Comparison of the chromosomes demonstrated that B.cereus ATCC 10987 was more similar to B.anthracis Ames than B.cereus ATCC 14579, while containing a number of unique metabolic capabilities such as urease and xylose utilization and lacking the ability to utilize nitrate and nitrite. Additionally, genetic mechanisms for variation of capsule carbohydrate and flagella surface structures were identified. Bacillus cereus ATCC 10987 contains a single large plasmid (pBc10987), of approximately 208 kb, that is similar in gene content and organization to B.anthracis pXO1 but is lacking the pathogenicity-associated island containing the anthrax lethal and edema toxin complex genes. The chromosomal similarity of B.cereus ATCC 10987 to B.anthracis Ames, as well as the fact that it contains a large pXO1-like plasmid, may make it a possible model for studying B.anthracis plasmid biology and regulatory cross-talk.
We present the complete 2,843,201-bp genome sequence of Treponema denticola (ATCC 35405) an oral spirochete associated with periodontal disease. Analysis of the T. denticola genome reveals factors mediating coaggregation, cell signaling, stress protection, and other competitive and cooperative measures, consistent with its pathogenic nature and lifestyle within the mixed-species environment of subgingival dental plaque. Comparisons with previously sequenced spirochete genomes revealed specific factors contributing to differences and similarities in spirochete physiology as well as pathogenic potential. The T. denticola genome is considerably larger in size than the genome of the related syphilis-causing spirochete Treponema pallidum. The differences in gene content appear to be attributable to a combination of three phenomena: genome reduction, lineage-specific expansions, and horizontal gene transfer. Genes lost due to reductive evolution appear to be largely involved in metabolism and transport, whereas some of the genes that have arisen due to lineage-specific expansions are implicated in various pathogenic interactions, and genes acquired via horizontal gene transfer are largely phage-related or of unknown function.
Methanotrophs are ubiquitous bacteria that can use the greenhouse gas methane as a sole carbon and energy source for growth, thus playing major roles in global carbon cycles, and in particular, substantially reducing emissions of biologically generated methane to the atmosphere. Despite their importance, and in contrast to organisms that play roles in other major parts of the carbon cycle such as photosynthesis, no genome-level studies have been published on the biology of methanotrophs. We report the first complete genome sequence to our knowledge from an obligate methanotroph, Methylococcus capsulatus (Bath), obtained by the shotgun sequencing approach. Analysis revealed a 3.3-Mb genome highly specialized for a methanotrophic lifestyle, including redundant pathways predicted to be involved in methanotrophy and duplicated genes for essential enzymes such as the methane monooxygenases. We used phylogenomic analysis, gene order information, and comparative analysis with the partially sequenced methylotroph Methylobacterium extorquens to detect genes of unknown function likely to be involved in methanotrophy and methylotrophy. Genome analysis suggests the ability of M. capsulatus to scavenge copper (including a previously unreported nonribosomal peptide synthetase) and to use copper in regulation of methanotrophy, but the exact regulatory mechanisms remain unclear. One of the most surprising outcomes of the project is evidence suggesting the existence of previously unsuspected metabolic flexibility in M. capsulatus, including an ability to grow on sugars, oxidize chemolithotrophic hydrogen and sulfur, and live under reduced oxygen tension, all of which have implications for methanotroph ecology. The availability of the complete genome of M. capsulatus (Bath) deepens our understanding of methanotroph biology and its relationship to global carbon cycles. We have gained evidence for greater metabolic flexibility than was previously known, and for genetic components that may have biotechnological potential.
We report the complete genome sequence of the model bacterial pathogen Pseudomonas syringae pathovar tomato DC3000 (DC3000), which is pathogenic on tomato and Arabidopsis thaliana. The DC3000 genome (6.5 megabases) contains a circular chromosome and two plasmids, which collectively encode 5,763 ORFs. We identified 298 established and putative virulence genes, including several clusters of genes encoding 31 confirmed and 19 predicted type III secretion system effector proteins. Many of the virulence genes were members of paralogous families and also were proximal to mobile elements, which collectively comprise 7% of the DC3000 genome. The bacterium possesses a large repertoire of transporters for the acquisition of nutrients, particularly sugars, as well as genes implicated in attachment to plant surfaces. Over 12% of the genes are dedicated to regulation, which may reflect the need for rapid adaptation to the diverse environments encountered during epiphytic growth and pathogenesis. Comparative analyses confirmed a high degree of similarity with two sequenced pseudomonads, Pseudomonas putida and Pseudomonas aeruginosa, yet revealed 1,159 genes unique to DC3000, of which 811 lack a known function.
The complete genome sequence of Geobacter sulfurreducens, a delta-proteobacterium, reveals unsuspected capabilities, including evidence of aerobic metabolism, one-carbon and complex carbon metabolism, motility, and chemotactic behavior. These characteristics, coupled with the possession of many two-component sensors and many c-type cytochromes, reveal an ability to create alternative, redundant, electron transport networks and offer insights into the process of metal ion reduction in subsurface environments. As well as playing roles in the global cycling of metals and carbon, this organism clearly has the potential for use in bioremediation of radioactive metals and in the generation of electricity.
The complete 2,343,479-bp genome sequence of the gram-negative, pathogenic oral bacterium Porphyromonas gingivalis strain W83, a major contributor to periodontal disease, was determined. Whole-genome comparative analysis with other available complete genome sequences confirms the close relationship between the Cytophaga-Flavobacteria-Bacteroides (CFB) phylum and the green-sulfur bacteria. Within the CFB phyla, the genomes most similar to that of P. gingivalis are those of Bacteroides thetaiotaomicron and B. fragilis. Outside of the CFB phyla the most similar genome to P. gingivalis is that of Chlorobium tepidum, supporting the previous phylogenetic studies that indicated that the Chlorobia and CFB phyla are related, albeit distantly. Genome analysis of strain W83 reveals a range of pathways and virulence determinants that relate to the novel biology of this oral pathogen. Among these determinants are at least six putative hemagglutinin-like genes and 36 previously unidentified peptidases. Genome analysis also reveals that P. gingivalis can metabolize a range of amino acids and generate a number of metabolic end products that are toxic to the human host or human gingival tissue and contribute to the development of periodontal disease.
The complete genome sequence of Enterococcus faecalis V583, a vancomycin-resistant clinical isolate, revealed that more than a quarter of the genome consists of probable mobile or foreign DNA. One of the predicted mobile elements is a previously unknown vanB vancomycin-resistance conjugative transposon. Three plasmids were identified, including two pheromone-sensing conjugative plasmids, one encoding a previously undescribed pheromone inhibitor. The apparent propensity for the incorporation of mobile elements probably contributed to the rapid acquisition and dissemination of drug resistance in the enterococci.
Bacillus anthracis is an endospore-forming bacterium that causes inhalational anthrax. Key virulence genes are found on plasmids (extra-chromosomal, circular, double-stranded DNA molecules) pXO1 (ref. 2) and pXO2 (ref. 3). To identify additional genes that might contribute to virulence, we analysed the complete sequence of the chromosome of B. anthracis Ames (about 5.23 megabases). We found several chromosomally encoded proteins that may contribute to pathogenicity--including haemolysins, phospholipases and iron acquisition functions--and identified numerous surface proteins that might be important targets for vaccines and drugs. Almost all these putative chromosomal virulence and surface proteins have homologues in Bacillus cereus, highlighting the similarity of B. anthracis to near-neighbours that are not associated with anthrax. By performing a comparative genome hybridization of 19 B. cereus and Bacillus thuringiensis strains against a B. anthracis DNA microarray, we confirmed the general similarity of chromosomal genes among this group of close relatives. However, we found that the gene sequences of pXO1 and pXO2 were more variable between strains, suggesting plasmid mobility in the group. The complete sequence of B. anthracis is a step towards a better understanding of anthrax pathogenesis.
The genome of Chlamydophila caviae (formerly Chlamydia psittaci, GPIC isolate) (1 173 390 nt with a plasmid of 7966 nt) was determined, representing the fourth species with a complete genome sequence from the Chlamydiaceae family of obligate intracellular bacterial pathogens. Of 1009 annotated genes, 798 were conserved in all three other completed Chlamydiaceae genomes. The C.caviae genome contains 68 genes that lack orthologs in any other completed chlamydial genomes, including tryptophan and thiamine biosynthesis determinants and a ribose-phosphate pyrophosphokinase, the product of the prsA gene. Notable amongst these was a novel member of the virulence-associated invasin/intimin family (IIF) of Gram-negative bacteria. Intriguingly, two authentic frameshift mutations in the ORF indicate that this gene is not functional. Many of the unique genes are found in the replication termination region (RTR or plasticity zone), an area of frequent symmetrical inversion events around the replication terminus shown to be a hotspot for genome variation in previous genome sequencing studies. In C.caviae, the RTR includes several loci of particular interest including a large toxin gene and evidence of ancestral insertion(s) of a bacteriophage. This toxin gene, not present in Chlamydia pneumoniae, is a member of the YopT effector family of type III-secreted cysteine proteases. One gene cluster (guaBA-add) in the RTR is much more similar to orthologs in Chlamydia muridarum than those in the phylogenetically closest species C.pneumoniae, suggesting the possibility of horizontal transfer of genes between the rodent-associated Chlamydiae. With most genes observed in the other chlamydial genomes represented, C.caviae provides a good model for the Chlamydiaceae and a point of comparison against the human atherosclerosis-associated C.pneumoniae. This crucial addition to the set of completed Chlamydiaceae genome sequences is enabling dissection of the roles played by niche-specific genes in these important bacterial pathogens.
The 1,995,275-bp genome of Coxiella burnetii, Nine Mile phase I RSA493, a highly virulent zoonotic pathogen and category B bioterrorism agent, was sequenced by the random shotgun method. This bacterium is an obligate intracellular acidophile that is highly adapted for life within the eukaryotic phagolysosome. Genome analysis revealed many genes with potential roles in adhesion, invasion, intracellular trafficking, host-cell modulation, and detoxification. A previously uncharacterized 13-member family of ankyrin repeat-containing proteins is implicated in the pathogenesis of this organism. Although the lifestyle and parasitic strategies of C. burnetii resemble that of Rickettsiae and Chlamydiae, their genome architectures differ considerably in terms of presence of mobile elements, extent of genome reduction, metabolic capabilities, and transporter profiles. The presence of 83 pseudogenes displays an ongoing process of gene degradation. Unlike other obligate intracellular bacteria, 32 insertion sequences are found dispersed in the chromosome, indicating some plasticity in the C. burnetii genome. These analyses suggest that the obligate intracellular lifestyle of C. burnetii may be a relatively recent innovation.
Species of malaria parasite that infect rodents have long been used as models for malaria disease research. Here we report the whole-genome shotgun sequence of one species, Plasmodium yoelii yoelii, and comparative studies with the genome of the human malaria parasite Plasmodium falciparum clone 3D7. A synteny map of 2,212 P. y. yoelii contiguous DNA sequences (contigs) aligned to 14 P. falciparum chromosomes reveals marked conservation of gene synteny within the body of each chromosome. Of about 5,300 P. falciparum genes, more than 3,300 P. y. yoelii orthologues of predominantly metabolic function were identified. Over 800 copies of a variant antigen gene located in subtelomeric regions were found. This is the first genome sequence of a model eukaryotic parasite, and it provides insight into the use of such systems in the modelling of Plasmodium biology and disease.
The complete genome of the green-sulfur eubacterium Chlorobium tepidum TLS was determined to be a single circular chromosome of 2,154,946 bp. This represents the first genome sequence from the phylum Chlorobia, whose members perform anoxygenic photosynthesis by the reductive tricarboxylic acid cycle. Genome comparisons have identified genes in C. tepidum that are highly conserved among photosynthetic species. Many of these have no assigned function and may play novel roles in photosynthesis or photobiology. Phylogenomic analysis reveals likely duplications of genes involved in biosynthetic pathways for photosynthesis and the metabolism of sulfur and nitrogen as well as strong similarities between metabolic processes in C. tepidum and many Archaeal species.
Virulence and immunity are poorly understood in Mycobacterium tuberculosis. We sequenced the complete genome of the M. tuberculosis clinical strain CDC1551 and performed a whole-genome comparison with the laboratory strain H37Rv in order to identify polymorphic sequences with potential relevance to disease pathogenesis, immunity, and evolution. We found large-sequence and single-nucleotide polymorphisms in numerous genes. Polymorphic loci included a phospholipase C, a membrane lipoprotein, members of an adenylate cyclase gene family, and members of the PE/PPE gene family, some of which have been implicated in virulence or the host immune response. Several gene families, including the PE/PPE gene family, also had significantly higher synonymous and nonsynonymous substitution frequencies compared to the genome as a whole. We tested a large sample of M. tuberculosis clinical isolates for a subset of the large-sequence and single-nucleotide polymorphisms and found widespread genetic variability at many of these loci. We performed phylogenetic and epidemiological analysis to investigate the evolutionary relationships among isolates and the origins of specific polymorphic loci. A number of these polymorphisms appear to have occurred multiple times as independent events, suggesting that these changes may be under selective pressure. Together, these results demonstrate that polymorphisms among M. tuberculosis strains are more extensive than initially anticipated, and genetic variation may have an important role in disease pathogenesis and immunity.
The mosquito-borne malaria parasite Plasmodium falciparum kills an estimated 0.7-2.7 million people every year, primarily children in sub-Saharan Africa. Without effective interventions, a variety of factors-including the spread of parasites resistant to antimalarial drugs and the increasing insecticide resistance of mosquitoes-may cause the number of malaria cases to double over the next two decades. To stimulate basic research and facilitate the development of new drugs and vaccines, the genome of Plasmodium falciparum clone 3D7 has been sequenced using a chromosome-by-chromosome shotgun strategy. We report here the nucleotide sequences of chromosomes 10, 11 and 14, and a re-analysis of the chromosome 2 sequence. These chromosomes represent about 35% of the 23-megabase P. falciparum genome.
The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.
Shewanella oneidensis is an important model organism for bioremediation studies because of its diverse respiratory capabilities, conferred in part by multicomponent, branched electron transport systems. Here we report the sequencing of the S. oneidensis genome, which consists of a 4,969,803-base pair circular chromosome with 4,758 predicted protein-encoding open reading frames (CDS) and a 161,613-base pair plasmid with 173 CDSs. We identified the first Shewanella lambda-like phage, providing a potential tool for further genome engineering. Genome analysis revealed 39 c-type cytochromes, including 32 previously unidentified in S. oneidensis, and a novel periplasmic [Fe] hydrogenase, which are integral members of the electron transport system. This genome sequence represents a critical step in the elucidation of the pathways for reduction (and bioremediation) of pollutants such as uranium (U) and chromium (Cr), and offers a starting point for defining this organism's complex electron transport systems and metal ion-reducing capabilities.
Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.
Pseudomonas putida is a metabolically versatile saprophytic soil bacterium that has been certified as a biosafety host for the cloning of foreign genes. The bacterium also has considerable potential for biotechnological applications. Sequence analysis of the 6.18 Mb genome of strain KT2440 reveals diverse transport and metabolic systems. Although there is a high level of genome conservation with the pathogenic Pseudomonad Pseudomonas aeruginosa (85% of the predicted coding regions are shared), key virulence factors including exotoxin A and type III secretion systems are absent. Analysis of the genome gives insight into the non-pathogenic nature of P. putida and points to potential new applications in agriculture, biocatalysis, bioremediation and bioplastic production.
The 3.31-Mb genome sequence of the intracellular pathogen and potential bioterrorism agent, Brucella suis, was determined. Comparison of B. suis with Brucella melitensis has defined a finite set of differences that could be responsible for the differences in virulence and host preference between these organisms, and indicates that phage have played a significant role in their divergence. Analysis of the B. suis genome reveals transport and metabolic capabilities akin to soil/plant-associated bacteria. Extensive gene synteny between B. suis chromosome 1 and the genome of the plant symbiont Mesorhizobium loti emphasizes the similarity between this animal pathogen and plant pathogens and symbionts. A limited repertoire of genes homologous to known bacterial virulence factors were identified.
Comparison of the whole-genome sequence of Bacillus anthracis isolated from a victim of a recent bioterrorist anthrax attack with a reference reveals 60 new markers that include single nucleotide polymorphisms (SNPs), inserted or deleted sequences, and tandem repeats. Genome comparison detected four high-quality SNPs between the two sequenced B. anthracis chromosomes and seven differences among different preparations of the reference genome. These markers have been tested on a collection of anthrax isolates and were found to divide these samples into distinct families. These results demonstrate that genome-based analysis of microbial pathogens will provide a powerful new tool for investigation of infectious disease outbreaks.
The 2,160,267 bp genome sequence of Streptococcus agalactiae, the leading cause of bacterial sepsis, pneumonia, and meningitis in neonates in the U.S. and Europe, is predicted to encode 2,175 genes. Genome comparisons among S. agalactiae, Streptococcus pneumoniae, Streptococcus pyogenes, and the other completely sequenced genomes identified genes specific to the streptococci and to S. agalactiae. These in silico analyses, combined with comparative genome hybridization experiments between the sequenced serotype V strain 2603 V/R and 19 S. agalactiae strains from several serotypes using whole-genome microarrays, revealed the genetic heterogeneity among S. agalactiae strains, even of the same serotype, and provided insights into the evolution of virulence mechanisms.
The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic environment, coordinates the cell division cycle and multiple cell differentiation events. With the annotated genome sequence, a full description of the genetic network that controls bacterial differentiation, cell growth, and cell cycle progression is within reach. Two-component signal transduction proteins are known to play a significant role in cell cycle progression. Genome analysis revealed that the C. crescentus genome encodes a significantly higher number of these signaling proteins (105) than any bacterial genome sequenced thus far. Another regulatory mechanism involved in cell cycle progression is DNA methylation. The occurrence of the recognition sequence for an essential DNA methylating enzyme that is required for cell cycle regulation is severely limited and shows a bias to intergenic regions. The genome contains multiple clusters of genes encoding proteins essential for survival in a nutrient poor habitat. Included are those involved in chemotaxis, outer membrane channel function, degradation of aromatic ring compounds, and the breakdown of plant-derived carbon sources, in addition to many extracytoplasmic function sigma factors, providing the organism with the ability to respond to a wide range of environmental fluctuations. C. crescentus is, to our knowledge, the first free-living alpha-class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine and human pathogen Brucella abortus.
The 2,160,837-base pair genome sequence of an isolate of Streptococcus pneumoniae, a Gram-positive pathogen that causes pneumonia, bacteremia, meningitis, and otitis media, contains 2236 predicted coding regions; of these, 1440 (64%) were assigned a biological role. Approximately 5% of the genome is composed of insertion sequences that may contribute to genome rearrangements through uptake of foreign DNA. Extracellular enzyme systems for the metabolism of polysaccharides and hexosamines provide a substantial source of carbon and nitrogen for S. pneumoniae and also damage host tissues and facilitate colonization. A motif identified within the signal peptide of proteins is potentially involved in targeting these proteins to the cell surface of low-guanine/cytosine (GC) Gram-positive species. Several surface-exposed proteins that may serve as potential vaccine candidates were identified. Comparative genome hybridization with DNA arrays revealed strain differences in S. pneumoniae that could contribute to differences in virulence and antigenicity.
Here we determine the complete genomic sequence of the gram negative, gamma-Proteobacterium Vibrio cholerae El Tor N16961 to be 4,033,460 base pairs (bp). The genome consists of two circular chromosomes of 2,961,146 bp and 1,072,314 bp that together encode 3,885 open reading frames. The vast majority of recognizable genes for essential cell functions (such as DNA replication, transcription, translation and cell-wall biosynthesis) and pathogenicity (for example, toxins, surface antigens and adhesins) are located on the large chromosome. In contrast, the small chromosome contains a larger fraction (59%) of hypothetical genes compared with the large chromosome (42%), and also contains many more genes that appear to have origins other than the gamma-Proteobacteria. The small chromosome also carries a gene capture system (the integron island) and host 'addiction' genes that are typically found on plasmids; thus, the small chromosome may have originally been a megaplasmid that was captured by an ancestral Vibrio species. The V. cholerae genomic sequence provides a starting point for understanding how a free-living, environmental organism emerged to become a significant human bacterial pathogen.
The genome sequences of Chlamydia trachomatis mouse pneumonitis (MoPn) strain Nigg (1 069 412 nt) and Chlamydia pneumoniae strain AR39 (1 229 853 nt) were determined using a random shotgun strategy. The MoPn genome exhibited a general conservation of gene order and content with the previously sequenced C.trachomatis serovar D. Differences between C.trachomatis strains were focused on an approximately 50 kb 'plasticity zone' near the termination origins. In this region MoPn contained three copies of a novel gene encoding a >3000 amino acid toxin homologous to a predicted toxin from Escherichia coli O157:H7 but had apparently lost the tryptophan biosyntheis genes found in serovar D in this region. The C. pneumoniae AR39 chromosome was >99.9% identical to the previously sequenced C.pneumoniae CWL029 genome, however, comparative analysis identified an invertible DNA segment upstream of the uridine kinase gene which was in different orientations in the two genomes. AR39 also contained a novel 4524 nt circular single-stranded (ss)DNA bacteriophage, the first time a virus has been reported infecting C. pneumoniae. Although the chlamydial genomes were highly conserved, there were intriguing differences in key nucleotide salvage pathways: C.pneumoniae has a uridine kinase gene for dUTP production, MoPn has a uracil phosphororibosyl transferase, while C.trachomatis serovar D contains neither gene. Chromosomal comparison revealed that there had been multiple large inversion events since the species divergence of C.trachomatis and C.pneumoniae, apparently oriented around the axis of the origin of replication and the termination region. The striking synteny of the Chlamydia genomes and prevalence of tandemly duplicated genes are evidence of minimal chromosome rearrangement and foreign gene uptake, presumably owing to the ecological isolation of the obligate intracellular parasites. In the absence of genetic analysis, comparative genomics will continue to provide insight into the virulence mechanisms of these important human pathogens.
Arabidopsis thaliana is an important model system for plant biologists. In 1996 an international collaboration (the Arabidopsis Genome Initiative) was formed to sequence the whole genome of Arabidopsis and in 1999 the sequence of the first two chromosomes was reported. The sequence of the last three chromosomes and an analysis of the whole genome are reported in this issue. Here we present the sequence of chromosome 3, organized into four sequence segments (contigs). The two largest (13.5 and 9.2 Mb) correspond to the top (long) and the bottom (short) arms of chromosome 3, and the two small contigs are located in the genetically defined centromere. This chromosome encodes 5,220 of the roughly 25,500 predicted protein-coding genes in the genome. About 20% of the predicted proteins have significant homology to proteins in eukaryotic genomes for which the complete sequence is available, pointing to important conserved cellular functions among eukaryotes.
The 2,272,351-base pair genome of Neisseria meningitidis strain MC58 (serogroup B), a causative agent of meningitis and septicemia, contains 2158 predicted coding regions, 1158 (53.7%) of which were assigned a biological role. Three major islands of horizontal DNA transfer were identified; two of these contain genes encoding proteins involved in pathogenicity, and the third island contains coding sequences only for hypothetical proteins. Insights into the commensal and virulence behavior of N. meningitidis can be gleaned from the genome, in which sequences for structural proteins of the pilus are clustered and several coding regions unique to serogroup B capsular polysaccharide synthesis can be identified. Finally, N. meningitidis contains more genes that undergo phase variation than any pathogen studied to date, a mechanism that controls their expression and contributes to the evasion of the host immune system.
The genome of the flowering plant Arabidopsis thaliana has five chromosomes. Here we report the sequence of the largest, chromosome 1, in two contigs of around 14.2 and 14.6 megabases. The contigs extend from the telomeres to the centromeric borders, regions rich in transposons, retrotransposons and repetitive elements such as the 180-base-pair repeat. The chromosome represents 25% of the genome and contains about 6,850 open reading frames, 236 transfer RNAs (tRNAs) and 12 small nuclear RNAs. There are two clusters of tRNA genes at different places on the chromosome. One consists of 27 tRNA(Pro) genes and the other contains 27 tandem repeats of tRNA(Tyr)-tRNA(Tyr)-tRNA(Ser) genes. Chromosome 1 contains about 300 gene families with clustered duplications. There are also many repeat elements, representing 8% of the sequence.
An international consortium has been formed to sequence the entire genome of the human malaria parasite Plasmodium falciparum. We sequenced chromosome 2 of clone 3D7 using a shotgun sequencing strategy. Chromosome 2 is 947 kb in length, has a base composition of 80.2% A + T, and contains 210 predicted genes. In comparison to the Saccharomyces cerevisiae genome, chromosome 2 has a lower gene density, a greater proportion of genes containing introns, and nearly twice as many proteins containing predicted non-globular domains. A group of putative surface proteins was identified, rifins, which are encoded by a gene family comprising up to 7% of the protein-encoding gene in the genome. The rifins exhibit considerable sequence diversity and may play an important role in antigenic variation. Sixteen genes encoded on chromosome 2 showed signs of a plastid or mitochondrial origin, including several genes involved in fatty acid biosynthesis. Completion of the chromosome 2 sequence demonstrated that the A + T-rich genome of P. falciparum can be sequenced by the shotgun approach. Within 2-3 years, the sequence of almost all P. falciparum genes will have been determined, paving the way for genetic, biochemical, and immunological research aimed at developing new drugs and vaccines against malaria.
Arabidopsis thaliana (Arabidopsis) is unique among plant model organisms in having a small genome (130-140 Mb), excellent physical and genetic maps, and little repetitive DNA. Here we report the sequence of chromosome 2 from the Columbia ecotype in two gap-free assemblies (contigs) of 3.6 and 16 megabases (Mb). The latter represents the longest published stretch of uninterrupted DNA sequence assembled from any organism to date. Chromosome 2 represents 15% of the genome and encodes 4,037 genes, 49% of which have no predicted function. Roughly 250 tandem gene duplications were found in addition to large-scale duplications of about 0.5 and 4.5 Mb between chromosomes 2 and 1 and between chromosomes 2 and 4, respectively. Sequencing of nearly 2 Mb within the genetically defined centromere revealed a low density of recognizable genes, and a high density and diverse range of vestigial and presumably inactive mobile elements. More unexpected is what appears to be a recent insertion of a continuous stretch of 75% of the mitochondrial genome into chromosome 2.
The 1,860,725-base-pair genome of Thermotoga maritima MSB8 contains 1,877 predicted coding regions, 1,014 (54%) of which have functional assignments and 863 (46%) of which are of unknown function. Genome analysis reveals numerous pathways involved in degradation of sugars and plant polysaccharides, and 108 genes that have orthologues only in the genomes of other thermophilic Eubacteria and Archaea. Of the Eubacteria sequenced to date, T. maritima has the highest percentage (24%) of genes that are most similar to archaeal genes. Eighty-one archaeal-like genes are clustered in 15 regions of the T. maritima genome that range in size from 4 to 20 kilobases. Conservation of gene order between T. maritima and Archaea in many of the clustered regions suggests that lateral gene transfer may have occurred between thermophilic Eubacteria and Archaea.
The complete genome sequence of the radiation-resistant bacterium Deinococcus radiodurans R1 is composed of two chromosomes (2,648,638 and 412,348 base pairs), a megaplasmid (177,466 base pairs), and a small plasmid (45,704 base pairs), yielding a total genome of 3,284, 156 base pairs. Multiple components distributed on the chromosomes and megaplasmid that contribute to the ability of D. radiodurans to survive under conditions of starvation, oxidative stress, and high amounts of DNA damage were identified. Deinococcus radiodurans represents an organism in which all systems for DNA repair, DNA damage export, desiccation and starvation recovery, and genetic redundancy are present in one cell.
The complete genome sequence of Treponema pallidum was determined and shown to be 1,138,006 base pairs containing 1041 predicted coding sequences (open reading frames). Systems for DNA replication, transcription, translation, and repair are intact, but catabolic and biosynthetic activities are minimized. The number of identifiable transporters is small, and no phosphoenolpyruvate:phosphotransferase carbohydrate transporters were found. Potential virulence factors include a family of 12 potential membrane proteins and several putative hemolysins. Comparison of the T. pallidum genome sequence with that of another pathogenic spirochete, Borrelia burgdorferi, the agent of Lyme disease, identified unique and common genes and substantiates the considerable diversity observed among pathogenic spirochetes.
Chromosome 2 of Plasmodium falciparum was sequenced; this sequence contains 947,103 base pairs and encodes 210 predicted genes. In comparison with the Saccharomyces cerevisiae genome, chromosome 2 has a lower gene density, introns are more frequent, and proteins are markedly enriched in nonglobular domains. A family of surface proteins, rifins, that may play a role in antigenic variation was identified. The complete sequencing of chromosome 2 has shown that sequencing of the A+T-rich P. falciparum genome is technically feasible.
The genome of the bacterium Borrelia burgdorferi B31, the aetiologic agent of Lyme disease, contains a linear chromosome of 910,725 base pairs and at least 17 linear and circular plasmids with a combined size of more than 533,000 base pairs. The chromosome contains 853 genes encoding a basic set of proteins for DNA replication, transcription, translation, solute transport and energy metabolism, but, like Mycoplasma genitalium, it contains no genes for cellular biosynthetic reactions. Because B. burgdorferi and M. genitalium are distantly related eubacteria, we suggest that their limited metabolic capacities reflect convergent evolution by gene loss from more metabolically competent progenitors. Of 430 genes on 11 plasmids, most have no known biological function; 39% of plasmid genes are paralogues that form 47 gene families. The biological significance of the multiple plasmid-encoded genes is not clear, although they may be involved in antigenic variation or immune evasion.
Archaeoglobus fulgidus is the first sulphur-metabolizing organism to have its genome sequence determined. Its genome of 2,178,400 base pairs contains 2,436 open reading frames (ORFs). The information processing systems and the biosynthetic pathways for essential components (nucleotides, amino acids and cofactors) have extensive correlation with their counterparts in the archaeon Methanococcus jannaschii. The genomes of these two Archaea indicate dramatic differences in the way these organisms sense their environment, perform regulatory and transport functions, and gain energy. In contrast to M. jannaschii, A. fulgidus has fewer restriction-modification systems, and none of its genes appears to contain inteins. A quarter (651 ORFs) of the A. fulgidus genome encodes functionally uncharacterized yet conserved proteins, two-thirds of which are shared with M. jannaschii (428 ORFs). Another quarter of the genome encodes new proteins indicating substantial archaeal gene diversity.
Helicobacter pylori, strain 26695, has a circular genome of 1,667,867 base pairs and 1,590 predicted coding sequences. Sequence analysis indicates that H. pylori has well-developed systems for motility, for scavenging iron, and for DNA restriction and modification. Many putative adhesins, lipoproteins and other outer membrane proteins were identified, underscoring the potential complexity of host-pathogen interaction. Based on the large number of sequence-related genes encoding outer membrane proteins and the presence of homopolymeric tracts and dinucleotide repeats in coding sequences, H. pylori, like several other mucosal pathogens, probably uses recombination and slipped-strand mispairing within repeats as mechanisms for antigenic variation and adaptive evolution. Consistent with its restricted niche, H. pylori has a few regulatory networks, and a limited metabolic repertoire and biosynthetic capacity. Its survival in acid conditions depends, in part, on its ability to establish a positive inside-membrane potential in low pH.
        
Title: Alanine scanning mutagenesis of conserved arginine/lysine-arginine/lysine-X-X-arginine/lysine G protein-activating motifs on m1 muscarinic acetylcholine receptors Lee NH, Geoghagen NS, Cheng E, Cline RT, Fraser CM Ref: Molecular Pharmacology, 50:140, 1996 : PubMed
Alanine scanning mutagenesis of B-B-X-X-B motifis (where B is a basic residue and X is any nonbasic residue) in m1 muscarinic acetylcholine receptors was performed to determine the relative roles of basic amino acids in receptor coupling. This conserved motif is found in many G protein-coupled receptors and has been implicated in G protein activation. The KKAAR365 motif, located at the carboxyl-terminal third intracellular loop of m1 receptors, was mutated to AAAAA365, thereby generating a triple-substitution mutant devoid of ability to stimulate either phosphoinositide (PI) hydrolysis or cAMP accumulation. In contrast, a triple-alanine substitution of the KRTPR140 motif in the carboxyl-terminal second intracellular loop, yielding mutant AATPA140, had no effect on receptor coupling to the two independent second messenger pathways. Analysis of a series of single- and double-substitution mutants demonstrate that all three basic residues of the KKAAR365 motif participate in efficient m1 receptor coupling. The presence of second and third basic residues in this motif was absolutely critical for full agonist recognition of a high and low affinity state of the receptor. Mutation of either Lys362 or Lys365, but not-Lys361, abolished guanine nucleotide-dependent conversion of agonist affinity states and correlated with an inability of full agonists to fully activate PI hydrolysis. The different combinatorial double-substitution mutants also revealed that Lys365 was necessary but not sufficient, in the context of the KKAAR365 motif, for efficient receptor coupling. This residue cannot facilitate full agonist-stimulated Pl hydrolysis in the absence of both Lys361 and Lys362. In comparison, the critical residue Lys362 was both necessary and sufficient. Substitution of nearby basic residues Lys361 and Lys365 with alanine yielded mutant AKAAA365, which exhibited partial ability to couple PI hydrolysis after full agonist stimulation. Therefore, Lys365 seems to function in a hierarchal (interdependent) manner with nearby basic residues, whereas Lys361 and Lys362 can act independent of surrounding basic residues to facilitate partial m1 receptor coupling after full agonist stimulation. In contrast, all three residues must be present for stimulation of PI hydrolysis by a partial agonist.
An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism.
The complete nucleotide sequence (580,070 base pairs) of the Mycoplasma genitalium genome, the smallest known genome of any free-living organism, has been determined by whole-genome random sequencing and assembly. A total of only 470 predicted coding regions were identified that include genes required for DNA replication, transcription and translation, DNA repair, cellular transport, and energy metabolism. Comparison of this genome to that of Haemophilus influenzae suggests that differences in genome content are reflected as profound differences in physiology and metabolic capacity between these two organisms.
        
Title: Regulation of muscarinic receptor expression by changes in mRNA stability Fraser CM, Lee NH Ref: Life Sciences, 56(11-12):899, 1995 : PubMed
Regulation of muscarinic acetylcholine receptor (mAChR) subtype mRNAs was investigated in the human neuroblastoma cell line IMR-32 and in transfected CHO cells. IMR-32 cells express both m1 and m3 subtypes of mAChR. Exposure of IMR-32 cells to the muscarinic agonist, carbamylcholine (CBC) leads to a time dependent down-regulation of mAChRs which was maximal by 9 hours. mAChR activation resulted in a differential regulation of mAChR subtype mRNAs. m1 mAChR mRNA was down-regulated following 12 hours of agonist treatment and was associated with a decreased stability of the receptor transcript. In contrast, the m3 mAChR mRNA was resistant to agonist treatment for up to 24 hours. Using transfected CHO cells, we identified sequence elements within the 3'-untranslated region (3'-UTR) of the m1 mAChR gene which dictate agonist-induced destabilization of the m1 mAChR mRNA. Removal of these sequences abolished the ability of chronic agonist exposure to destabilize m1 mAChR mRNA. These findings suggest that sequence specific differences between m1 and m3 mAChR subtypes, which both preferentially couple to hydrolysis of phosphoinositides, may be responsible for differences in the regulation of mAChR gene expression.
        
Title: Discrete activation of transduction pathways associated with acetylcholine m1 receptor by several muscarinic ligands Gurwitz D, Haring R, Heldman E, Fraser CM, Manor D, Fisher A Ref: European Journal of Pharmacology, 267:21, 1994 : PubMed
Activation of transfected muscarinic m1 acetylcholine receptors (m1AChR) has been linked to several signal transduction pathways which include phosphoinositide hydrolysis, arachidonic acid release and cAMP accumulation. In Chinese hamster ovary cells stably transfected with the rat m1AChR gene, carbachol elicited all three responses with EC50 values of 2.6, 3.8 and 76 microM, respectively. However, pilocarpine and the selective muscarinic agonist AF102B activated phosphoinositide hydrolysis (by 94 and 27% vs. carbachol, respectively), while antagonizing carbachol-mediated cAMP accumulation. Carbachol also activated (by 4-fold) adenylyl cyclase in membranes prepared from these cells, indicating independence of this signal from intracellular mediators. Moreover, carbachol and AF102B similarly elevated cytosolic Ca2+ in intact m1AChR-transfected cells. The ligand-selective cAMP accumulation, its independence from Ca2+ and the carbachol-activated adenylyl cyclase in membranes suggest that it represents an independent m1AChR-mediated signal, unrelated to phosphoinositide hydrolysis. Selective muscarinic ligands such as AF102B may independently activate distinct signalling pathways, which may be important for designing cholinergic replacement therapy for treating Alzheimer's disease.
        
Title: Agonist-mediated destabilization of m1 muscarinic acetylcholine receptor mRNA. Elements involved in mRNA stability are localized in the 3'-untranslated region Lee NH, Earle-Hughes J, Fraser CM Ref: Journal of Biological Chemistry, 269:4291, 1994 : PubMed
The effects of chronic agonist exposure on receptor number (down-regulation) have been shown, in part, to be due to effects on mRNA levels. Agonist-mediated effects on muscarinic acetylcholine receptor (mAChR) mRNA were investigated in Chinese hamster ovary (CHO) cells stably transfected with m1 mAChR gene constructs containing the open reading frame and a series of deletions of the flanking 3'-untranslated region (3'-UTR). Carbachol (CBC) down-regulated m1 mAChRs encoded by the construct m1C1, an m1 mAChR transcript containing the entire flanking 3'UTR (nucleotides 1526-2622), in a time-dependent fashion with maximal decreases occurring by 12 h. Steady-state levels of m1C1 mRNA declined in a parallel fashion beginning 6 h after CBC pretreatment. Similar findings were obtained with m1C2, a construct which is missing all but 261 bases of flanking 3'-UTR (nucleotides (nt) 1526-1786). Since the rate of mRNA degradation represents an important potential regulatory mechanism to control the level of gene expression, we investigated the effects of CBC treatment on m1C1 and m1C2 mRNA stability. The half-life of either transcript in untreated cells was approximately 14 h, whereas m1C1 and m1C2 transcript half-lives decreased to approximately 3 h in cells treated with CBC. Agonist-induced destabilization of m1C2 mRNA could be mimicked by phorbol esters in a concentration-dependent manner and blocked by the protein kinase inhibitor, H-7. In contrast, m1 mAChR mRNA constructs missing nt 1526-1786 of the 3'-UTR (m1C3 and m1C4) did not undergo agonist- or phorbol ester-induced destabilization. In the neuroblastoma cell line IMR-32, endogenous m1 mAChR mRNA was down-regulated and destabilized following CBC treatment. These results demonstrate that agonist-induced mRNA destabilization is a potential mechanism for regulating m1 mAChR levels. Furthermore, deletion studies identify a 261 base region of the 3'-UTR having the potential to form stable stem-loop structures which likely harbors element(s) responsible for message destabilization.
        
Title: Cross-talk between m1 muscarinic acetylcholine and beta 2-adrenergic receptors. cAMP and the third intracellular loop of m1 muscarinic receptors confer heterologous regulation Lee NH, Fraser CM Ref: Journal of Biological Chemistry, 268:7949, 1993 : PubMed
Genes encoding the m1 muscarinic (m1 mAChR) and beta 2-adrenergic receptors (beta 2AR) were stably co-expressed into Chinese hamster ovary (CHO) cells to study receptor regulation and cross-talk. Persistent activation of the beta 2AR/adenylate cyclase pathway by isoproterenol leads to heterologous desensitization, internalization, and down-regulation of the m1 mAChR which is comparable, but smaller in magnitude, with that seen with persistent activation of the m1 mAChR by carbachol. This heterologous effect was mimicked by dibutyryl cAMP and forskolin and antagonized by the protein kinase A (PKA) inhibitor H-8. A potential consensus sequence for phosphorylation by PKA (Lys351-Arg-Lys-Thr354) exists on the third intracellular loop of the m1 mAChR, suggesting that receptor phosphorylation by PKA may be involved in heterologous regulation. The loss of m1 mAChRs induced by carbachol was not reversed by H-8, indicating that homologous regulation is not dependent on PKA. Recent evidence suggests that muscarinic agonist-mediated internalization of the m1 mAChR involves the third intracellular loop (i3) (Maeda, S., Lameh, J., Mallet, W. G., Philip, M., Ramachandran, J., and Sadee, W. (1990) FEBS Lett. 269, 386-388). Three deletion mutant receptors were constructed in which the majority, or small regions, of i3 were eliminated but the membrane proximal portions of the loop were left intact. Each of the mutants was co-expressed with the beta 2AR in CHO cells. A small region in i3 was identified which is crucial for carbachol- and isoproterenol-promoted internalization and down-regulation. This region contains a series of 6 serine residues within an 8-amino acid stretch. A similar domain has been identified in the carboxyl tail of the beta 2AR and has been proposed to participate in receptor internalization (Hausdorff, W. P., Campbell, P. T., Ostrowski, J., Yu, S. S., Caron, M. G., and Lefkowitz, R. J. (1991) Proc. Natl. Acad. Sci. U.S.A. 88, 2979-2983).
        
Title: Poster: Post-transcriptional regulation of the m1 muscarinic acetylcholine receptor Lee NH, Fraser CM Ref: Life Sciences, 52(5-6):562, 1993 : PubMed