The current standard of care for treating Alzheimer's disease is acetylcholinesterase inhibitors, which nonselectively increase cholinergic signaling by indirectly enhancing activity of nicotinic and muscarinic receptors. These drugs improve cognitive function in patients, but also produce unwanted side effects that limit their efficacy. In an effort to selectively improve cognition and avoid the cholinergic side effects associated with the standard of care, various efforts have been aimed at developing selective M1 muscarinic receptor activators. In this work, we describe the preclinical and clinical pharmacodynamic effects of the M1 muscarinic receptor-positive allosteric modulator, MK-7622. MK-7622 attenuated the cognitive-impairing effects of the muscarinic receptor antagonist scopolamine and altered quantitative electroencephalography (qEEG) in both rhesus macaque and human. For both scopolamine reversal and qEEG, the effective exposures were similar between species. However, across species the minimum effective exposures to attenuate the scopolamine impairment were lower than for qEEG. Additionally, there were differences in the spectral power changes produced by MK-7622 in rhesus versus human. In sum, these results are the first to demonstrate translation of preclinical cognition and target modulation to clinical effects in humans for a selective M1 muscarinic receptor-positive allosteric modulator.
Toxoplasma gondii is a zoonotic protozoan parasite which infects nearly one third of the human population and is found in an extraordinary range of vertebrate hosts. Its epidemiology depends heavily on horizontal transmission, especially between rodents and its definitive host, the cat. Neospora caninum is a recently discovered close relative of Toxoplasma, whose definitive host is the dog. Both species are tissue-dwelling Coccidia and members of the phylum Apicomplexa; they share many common features, but Neospora neither infects humans nor shares the same wide host range as Toxoplasma, rather it shows a striking preference for highly efficient vertical transmission in cattle. These species therefore provide a remarkable opportunity to investigate mechanisms of host restriction, transmission strategies, virulence and zoonotic potential. We sequenced the genome of N. caninum and transcriptomes of the invasive stage of both species, undertaking an extensive comparative genomics and transcriptomics analysis. We estimate that these organisms diverged from their common ancestor around 28 million years ago and find that both genomes and gene expression are remarkably conserved. However, in N. caninum we identified an unexpected expansion of surface antigen gene families and the divergence of secreted virulence factors, including rhoptry kinases. Specifically we show that the rhoptry kinase ROP18 is pseudogenised in N. caninum and that, as a possible consequence, Neospora is unable to phosphorylate host immunity-related GTPases, as Toxoplasma does. This defense strategy is thought to be key to virulence in Toxoplasma. We conclude that the ecological niches occupied by these species are influenced by a relatively small number of gene products which operate at the host-parasite interface and that the dominance of vertical transmission in N. caninum may be associated with the evolution of reduced virulence in this species.
Leishmania parasites cause a spectrum of clinical pathology in humans ranging from disfiguring cutaneous lesions to fatal visceral leishmaniasis. We have generated a reference genome for Leishmania mexicana and refined the reference genomes for Leishmania major, Leishmania infantum, and Leishmania braziliensis. This has allowed the identification of a remarkably low number of genes or paralog groups (2, 14, 19, and 67, respectively) unique to one species. These were found to be conserved in additional isolates of the same species. We have predicted allelic variation and find that in these isolates, L. major and L. infantum have a surprisingly low number of predicted heterozygous SNPs compared with L. braziliensis and L. mexicana. We used short read coverage to infer ploidy and gene copy numbers, identifying large copy number variations between species, with 200 tandem gene arrays in L. major and 132 in L. mexicana. Chromosome copy number also varied significantly between species, with nine supernumerary chromosomes in L. infantum, four in L. mexicana, two in L. braziliensis, and one in L. major. A significant bias against gene arrays on supernumerary chromosomes was shown to exist, indicating that duplication events occur more frequently on disomic chromosomes. Taken together, our data demonstrate that there is little variation in unique gene content across Leishmania species, but large-scale genetic heterogeneity can result through gene amplification on disomic chromosomes and variation in chromosome number. Increased gene copy number due to chromosome amplification may contribute to alterations in gene expression in response to environmental conditions in the host, providing a genetic basis for disease tropism.
Candida species are the most common cause of opportunistic fungal infection worldwide. Here we report the genome sequences of six Candida species and compare these and related pathogens and non-pathogens. There are significant expansions of cell wall, secreted and transporter gene families in pathogenic species, suggesting adaptations associated with virulence. Large genomic tracts are homozygous in three diploid species, possibly resulting from recent recombination events. Surprisingly, key components of the mating and meiosis pathways are missing from several species. These include major differences at the mating-type loci (MTL); Lodderomyces elongisporus lacks MTL, and components of the a1/2 cell identity determinant were lost in other species, raising questions about how mating and cell types are controlled. Analysis of the CUG leucine-to-serine genetic-code change reveals that 99% of ancestral CUG codons were erased and new ones arose elsewhere. Lastly, we revise the Candida albicans gene catalogue, identifying many new genes.
Enteropathogenic Escherichia coli (EPEC) was the first pathovar of E. coli to be implicated in human disease; however, no EPEC strain has been fully sequenced until now. Strain E2348/69 (serotype O127:H6 belonging to E. coli phylogroup B2) has been used worldwide as a prototype strain to study EPEC biology, genetics, and virulence. Studies of E2348/69 led to the discovery of the locus of enterocyte effacement-encoded type III secretion system (T3SS) and its cognate effectors, which play a vital role in attaching and effacing lesion formation on gut epithelial cells. In this study, we determined the complete genomic sequence of E2348/69 and performed genomic comparisons with other important E. coli strains. We identified 424 E2348/69-specific genes, most of which are carried on mobile genetic elements, and a number of genetic traits specifically conserved in phylogroup B2 strains irrespective of their pathotypes, including the absence of the ETT2-related T3SS, which is present in E. coli strains belonging to all other phylogroups. The genome analysis revealed the entire gene repertoire related to E2348/69 virulence. Interestingly, E2348/69 contains only 21 intact T3SS effector genes, all of which are carried on prophages and integrative elements, compared to over 50 effector genes in enterohemorrhagic E. coli O157. As E2348/69 is the most-studied pathogenic E. coli strain, this study provides a genomic context for the vast amount of existing experimental data. The unexpected simplicity of the E2348/69 T3SS provides the first opportunity to fully dissect the entire virulence strategy of attaching and effacing pathogens in the genomic context.
Candida dubliniensis is the closest known relative of Candida albicans, the most pathogenic yeast species in humans. However, despite both species sharing many phenotypic characteristics, including the ability to form true hyphae, C. dubliniensis is a significantly less virulent and less versatile pathogen. Therefore, to identify C. albicans-specific genes that may be responsible for an increased capacity to cause disease, we have sequenced the C. dubliniensis genome and compared it with the known C. albicans genome sequence. Although the two genome sequences are highly similar and synteny is conserved throughout, 168 species-specific genes are identified, including some encoding known hyphal-specific virulence factors, such as the aspartyl proteinases Sap4 and Sap5 and the proposed invasin Als3. Among the 115 pseudogenes confirmed in C. dubliniensis are orthologs of several filamentous growth regulator (FGR) genes that also have suspected roles in pathogenesis. However, the principal differences in genomic repertoire concern expansion of the TLO gene family of putative transcription factors and the IFA family of putative transmembrane proteins in C. albicans, which represent novel candidate virulence-associated factors. The results suggest that the recent evolutionary histories of C. albicans and C. dubliniensis are quite different. While gene families instrumental in pathogenesis have been elaborated in C. albicans, C. dubliniensis has lost genomic capacity and key pathogenic functions. This could explain why C. albicans is a more potent pathogen in humans than C. dubliniensis.
Whereas most nontyphoidal Salmonella (NTS) are associated with gastroenteritis, there has been a dramatic increase in reports of NTS-associated invasive disease in sub-Saharan Africa. Salmonella enterica serovar Typhimurium isolates are responsible for a significant proportion of the reported invasive NTS in this region. Multilocus sequence analysis of invasive S. Typhimurium from Malawi and Kenya identified a dominant type, designated ST313, which currently is rarely reported outside of Africa. Whole-genome sequencing of a multiple drug resistant (MDR) ST313 NTS isolate, D23580, identified a distinct prophage repertoire and a composite genetic element encoding MDR genes located on a virulence-associated plasmid. Further, there was evidence of genome degradation, including pseudogene formation and chromosomal deletions, when compared with other S. Typhimurium genome sequences. Some of this genome degradation involved genes previously implicated in virulence of S. Typhimurium or genes for which the orthologs in S. Typhi are either pseudogenes or are absent. Genome analysis of other epidemic ST313 isolates from Malawi and Kenya provided evidence for microevolution and clonal replacement in the field.
BACKGROUND: Pseudomonas fluorescens are common soil bacteria that can improve plant health through nutrient cycling, pathogen antagonism and induction of plant defenses. The genome sequences of strains SBW25 and Pf0-1 were determined and compared to each other and with P. fluorescens Pf-5. A functional genomic in vivo expression technology (IVET) screen provided insight into genes used by P. fluorescens in its natural environment and an improved understanding of the ecological significance of diversity within this species. RESULTS: Comparisons of three P. fluorescens genomes (SBW25, Pf0-1, Pf-5) revealed considerable divergence: 61% of genes are shared, the majority located near the replication origin. Phylogenetic and average amino acid identity analyses showed a low overall relationship. A functional screen of SBW25 defined 125 plant-induced genes including a range of functions specific to the plant environment. Orthologues of 83 of these exist in Pf0-1 and Pf-5, with 73 shared by both strains. The P. fluorescens genomes carry numerous complex repetitive DNA sequences, some resembling Miniature Inverted-repeat Transposable Elements (MITEs). In SBW25, repeat density and distribution revealed 'repeat deserts' lacking repeats, covering approximately 40% of the genome. CONCLUSIONS: P. fluorescens genomes are highly diverse. Strain-specific regions around the replication terminus suggest genome compartmentalization. The genomic heterogeneity among the three strains is reminiscent of a species complex rather than a single species. That 42% of plant-inducible genes were not shared by all strains reinforces this conclusion and shows that ecological success requires specialized and core functions. The diversity also indicates the significant size of genetic information within the Pseudomonas pan genome.
BACKGROUND: The Gram-negative bacterium Photorhabdus asymbiotica (Pa) has been recovered from human infections in both North America and Australia. Recently, Pa has been shown to have a nematode vector that can also infect insects, like its sister species the insect pathogen P. luminescens (Pl). To understand the relationship between pathogenicity to insects and humans in Photorhabdus we have sequenced the complete genome of Pa strain ATCC43949 from North America. This strain (formerly referred to as Xenorhabdus luminescens strain 2) was isolated in 1977 from the blood of an 80 year old female patient with endocarditis, in Maryland, USA. Here we compare the complete genome of Pa ATCC43949 with that of the previously sequenced insect pathogen P. luminescens strain TT01 which was isolated from its entomopathogenic nematode vector collected from soil in Trinidad and Tobago. RESULTS: We found that the human pathogen Pa had a smaller genome (5,064,808 bp) than that of the insect pathogen Pl (5,688,987 bp) but that each pathogen carries approximately one megabase of DNA that is unique to each strain. The reduced size of the Pa genome is associated with a smaller diversity in insecticidal genes such as those encoding the Toxin complexes (Tc's), Makes caterpillars floppy (Mcf) toxins and the Photorhabdus Virulence Cassettes (PVCs). The Pa genome, however, also shows the addition of a plasmid related to pMT1 from Yersinia pestis and several novel pathogenicity islands including a novel Type Three Secretion System (TTSS) encoding island. Together these data suggest that Pa may show virulence against man via the acquisition of the pMT1-like plasmid and specific effectors, such as SopB, that promote its persistence inside human macrophages. Interestingly the loss of insecticidal genes in Pa is not reflected by a loss of pathogenicity towards insects. CONCLUSION: Our results suggest that North American isolates of Pa have acquired virulence against man via the acquisition of a plasmid and specific virulence factors with similarity to those shown to play roles in pathogenicity against humans in other bacteria.
Pseudomonas aeruginosa isolates have a highly conserved core genome representing up to 90% of the total genomic sequence with additional variable accessory genes, many of which are found in genomic islands or islets. The identification of the Liverpool Epidemic Strain (LES) in a children's cystic fibrosis (CF) unit in 1996 and its subsequent observation in several centers in the United Kingdom challenged the previous widespread assumption that CF patients acquire only unique strains of P. aeruginosa from the environment. To learn about the forces that shaped the development of this important epidemic strain, the genome of the earliest archived LES isolate, LESB58, was sequenced. The sequence revealed the presence of many large genomic islands, including five prophage clusters, one defective (pyocin) prophage cluster, and five non-phage islands. To determine the role of these clusters, an unbiased signature tagged mutagenesis study was performed, followed by selection in the chronic rat lung infection model. Forty-seven mutants were identified by sequencing, including mutants in several genes known to be involved in Pseudomonas infection. Furthermore, genes from four prophage clusters and one genomic island were identified and in direct competition studies with the parent isolate; four were demonstrated to strongly impact on competitiveness in the chronic rat lung infection model. This strongly indicates that enhanced in vivo competitiveness is a major driver for maintenance and diversifying selection of these genomic prophage genes.
BACKGROUND: Stenotrophomonas maltophilia is a nosocomial opportunistic pathogen of the Xanthomonadaceae. The organism has been isolated from both clinical and soil environments in addition to the sputum of cystic fibrosis patients and the immunocompromised. Whilst relatively distant phylogenetically, the closest sequenced relatives of S. maltophilia are the plant pathogenic xanthomonads. RESULTS: The genome of the bacteremia-associated isolate S. maltophilia K279a is 4,851,126 bp and of high G+C content. The sequence reveals an organism with a remarkable capacity for drug and heavy metal resistance. In addition to a number of genes conferring resistance to antimicrobial drugs of different classes via alternative mechanisms, nine resistance-nodulation-division (RND)-type putative antimicrobial efflux systems are present. Functional genomic analysis confirms a role in drug resistance for several of the novel RND efflux pumps. S. maltophilia possesses potentially mobile regions of DNA and encodes a number of pili and fimbriae likely to be involved in adhesion and biofilm formation that may also contribute to increased antimicrobial drug resistance. CONCLUSION: The panoply of antimicrobial drug resistance genes and mobile genetic elements found suggests that the organism can act as a reservoir of antimicrobial drug resistance determinants in a clinical environment, which is an issue of considerable concern.
BACKGROUND: The fish pathogen Aliivibrio salmonicida is the causative agent of cold-water vibriosis in marine aquaculture. The Gram-negative bacterium causes tissue degradation, hemolysis and sepsis in vivo. RESULTS: In total, 4 286 protein coding sequences were identified, and the 4.6 Mb genome of A. salmonicida has a six partite architecture with two chromosomes and four plasmids. Sequence analysis revealed a highly fragmented genome structure caused by the insertion of an extensive number of insertion sequence (IS) elements. The IS elements can be related to important evolutionary events such as gene acquisition, gene loss and chromosomal rearrangements. New A. salmonicida functional capabilities that may have been aquired through horizontal DNA transfer include genes involved in iron-acquisition, and protein secretion and play potential roles in pathogenicity. On the other hand, the degeneration of 370 genes and consequent loss of specific functions suggest that A. salmonicida has a reduced metabolic and physiological capacity in comparison to related Vibrionaceae species. CONCLUSION: Most prominent is the loss of several genes involved in the utilisation of the polysaccharide chitin. In particular, the disruption of three extracellular chitinases responsible for enzymatic breakdown of chitin makes A. salmonicida unable to grow on the polymer form of chitin. These, and other losses could restrict the variety of carrier organisms A. salmonicida can attach to, and associate with. Gene acquisition and gene loss may be related to the emergence of A. salmonicida as a fish pathogen.
Plasmodium knowlesi is an intracellular malaria parasite whose natural vertebrate host is Macaca fascicularis (the 'kra' monkey); however, it is now increasingly recognized as a significant cause of human malaria, particularly in southeast Asia. Plasmodium knowlesi was the first malaria parasite species in which antigenic variation was demonstrated, and it has a close phylogenetic relationship to Plasmodium vivax, the second most important species of human malaria parasite (reviewed in ref. 4). Despite their relatedness, there are important phenotypic differences between them, such as host blood cell preference, absence of a dormant liver stage or 'hypnozoite' in P. knowlesi, and length of the asexual cycle (reviewed in ref. 4). Here we present an analysis of the P. knowlesi (H strain, Pk1(A+) clone) nuclear genome sequence. This is the first monkey malaria parasite genome to be described, and it provides an opportunity for comparison with the recently completed P. vivax genome and other sequenced Plasmodium genomes. In contrast to other Plasmodium genomes, putative variant antigen families are dispersed throughout the genome and are associated with intrachromosomal telomere repeats. One of these families, the KIRs, contains sequences that collectively match over one-half of the host CD99 extracellular domain, which may represent an unusual form of molecular mimicry.
Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only approximately 200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader-associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage.
Toxoplasma gondii is a globally distributed protozoan parasite that can infect virtually all warm-blooded animals and humans. Despite the existence of a sexual phase in the life cycle, T. gondii has an unusual population structure dominated by three clonal lineages that predominate in North America and Europe, (Types I, II, and III). These lineages were founded by common ancestors approximately10,000 yr ago. The recent origin and widespread distribution of the clonal lineages is attributed to the circumvention of the sexual cycle by a new mode of transmission-asexual transmission between intermediate hosts. Asexual transmission appears to be multigenic and although the specific genes mediating this trait are unknown, it is predicted that all members of the clonal lineages should share the same alleles. Genetic mapping studies suggested that chromosome Ia was unusually monomorphic compared with the rest of the genome. To investigate this further, we sequenced chromosome Ia and chromosome Ib in the Type I strain, RH, and the Type II strain, ME49. Comparative genome analyses of the two chromosomal sequences revealed that the same copy of chromosome Ia was inherited in each lineage, whereas chromosome Ib maintained the same high frequency of between-strain polymorphism as the rest of the genome. Sampling of chromosome Ia sequence in seven additional representative strains from the three clonal lineages supports a monomorphic inheritance, which is unique within the genome. Taken together, our observations implicate a specific combination of alleles on chromosome Ia in the recent origin and widespread success of the clonal lineages of T. gondii.
Bordetella avium is a pathogen of poultry and is phylogenetically distinct from Bordetella bronchiseptica, Bordetella pertussis, and Bordetella parapertussis, which are other species in the Bordetella genus that infect mammals. In order to understand the evolutionary relatedness of Bordetella species and further the understanding of pathogenesis, we obtained the complete genome sequence of B. avium strain 197N, a pathogenic strain that has been extensively studied. With 3,732,255 base pairs of DNA and 3,417 predicted coding sequences, it has the smallest genome and gene complement of the sequenced bordetellae. In this study, the presence or absence of previously reported virulence factors from B. avium was confirmed, and the genetic bases for growth characteristics were elucidated. Over 1,100 genes present in B. avium but not in B. bronchiseptica were identified, and most were predicted to encode surface or secreted proteins that are likely to define an organism adapted to the avian rather than the mammalian respiratory tracts. These include genes coding for the synthesis of a polysaccharide capsule, hemagglutinins, a type I secretion system adjacent to two very large genes for secreted proteins, and unique genes for both lipopolysaccharide and fimbrial biogenesis. Three apparently complete prophages are also present. The BvgAS virulence regulatory system appears to have polymorphisms at a poly(C) tract that is involved in phase variation in other bordetellae. A number of putative iron-regulated outer membrane proteins were predicted from the sequence, and this regulation was confirmed experimentally for five of these.
Plasmodium berghei and Plasmodium chabaudi are widely used model malaria species. Comparison of their genomes, integrated with proteomic and microarray data, with the genomes of Plasmodium falciparum and Plasmodium yoelii revealed a conserved core of 4500 Plasmodium genes in the central regions of the 14 chromosomes and highlighted genes evolving rapidly because of stage-specific selective pressures. Four strategies for gene expression are apparent during the parasites' life cycle: (i) housekeeping; (ii) host-related; (iii) strategy-specific related to invasion, asexual replication, and sexual development; and (iv) stage-specific. We observed posttranscriptional gene silencing through translational repression of messenger RNA during sexual development, and a 47-base 3' untranslated region motif is implicated in this process.
Leishmania species cause a spectrum of human diseases in tropical and subtropical regions of the world. We have sequenced the 36 chromosomes of the 32.8-megabase haploid genome of Leishmania major (Friedlin strain) and predict 911 RNA genes, 39 pseudogenes, and 8272 protein-coding genes, of which 36% can be ascribed a putative function. These include genes involved in host-pathogen interactions, such as proteolytic enzymes, and extensive machinery for synthesis of complex surface glycoconjugates. The organization of protein-coding genes into long, strand-specific, polycistronic clusters and lack of general transcription factors in the L. major, Trypanosoma brucei, and Trypanosoma cruzi (Tritryp) genomes suggest that the mechanisms regulating RNA polymerase II-directed transcription are distinct from those operating in other eukaryotes, although the trypanosomatids appear capable of chromatin remodeling. Abundant RNA-binding proteins are encoded in the Tritryp genomes, consistent with active posttranscriptional regulation of gene expression.
Entamoeba histolytica is an intestinal parasite and the causative agent of amoebiasis, which is a significant source of morbidity and mortality in developing countries. Here we present the genome of E. histolytica, which reveals a variety of metabolic adaptations shared with two other amitochondrial protist pathogens: Giardia lamblia and Trichomonas vaginalis. These adaptations include reduction or elimination of most mitochondrial metabolic pathways and the use of oxidative stress enzymes generally associated with anaerobic prokaryotes. Phylogenomic analysis identifies evidence for lateral gene transfer of bacterial genes into the E. histolytica genome, the effects of which centre on expanding aspects of E. histolytica's metabolic repertoire. The presence of these genes and the potential for novel metabolic pathways in E. histolytica may allow for the development of new chemotherapeutic agents. The genome encodes a large number of novel receptor kinases and contains expansions of a variety of gene families, including those associated with virulence. Additional genome features include an abundance of tandemly repeated transfer-RNA-containing arrays, which may have a structural function in the genome. Analysis of the genome provides new insights into the workings and genome evolution of a major human pathogen.
Aspergillus fumigatus is exceptional among microorganisms in being both a primary and opportunistic pathogen as well as a major allergen. Its conidia production is prolific, and so human respiratory tract exposure is almost constant. A. fumigatus is isolated from human habitats and vegetable compost heaps. In immunocompromised individuals, the incidence of invasive infection can be as high as 50% and the mortality rate is often about 50% (ref. 2). The interaction of A. fumigatus and other airborne fungi with the immune system is increasingly linked to severe asthma and sinusitis. Although the burden of invasive disease caused by A. fumigatus is substantial, the basic biology of the organism is mostly obscure. Here we show the complete 29.4-megabase genome sequence of the clinical isolate Af293, which consists of eight chromosomes containing 9,926 predicted genes. Microarray analysis revealed temperature-dependent expression of distinct sets of genes, as well as 700 A. fumigatus genes not present or significantly diverged in the closely related sexual species Neosartorya fischeri, many of which may have roles in the pathogenicity phenotype. The Af293 genome sequence provides an unparalleled resource for the future understanding of this remarkable fungus.
Aspergillus fumigatus is the most ubiquitous opportunistic filamentous fungal pathogen of human. As an initial step toward sequencing the entire genome of A. fumigatus, which is estimated to be approximately 30 Mb in size, we have sequenced a 922 kb region, contained within 16 overlapping bacterial artificial chromosome (BAC) clones. Fifty-four percent of the DNA is predicted to be coding with 341 putative protein coding genes. Functional classification of the proteins showed the presence of a higher proportion of enzymes and membrane transporters when compared to those of Saccharomyces cerevisiae. In addition to the nitrate assimilation gene cluster, the quinate utilisation gene cluster is also present on this 922 kb genomic sequence. We observed large scale synteny between A. fumigatus and Aspergillus nidulans by comparing this sequence to the A. nidulans genetic map of linkage group VIII.
Since the sequencing of the first two chromosomes of the malaria parasite, Plasmodium falciparum, there has been a concerted effort to sequence and assemble the entire genome of this organism. Here we report the sequence of chromosomes 1, 3-9 and 13 of P. falciparum clone 3D7--these chromosomes account for approximately 55% of the total genome. We describe the methods used to map, sequence and annotate these chromosomes. By comparing our assemblies with the optical map, we indicate the completeness of the resulting sequence. During annotation, we assign Gene Ontology terms to the predicted gene products, and observe clustering of some malaria-specific terms to specific chromosomes. We identify a highly conserved sequence element found in the intergenic region of internal var genes that is not associated with their telomeric counterparts.
We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.
Leprosy, a chronic human neurological disease, results from infection with the obligate intracellular pathogen Mycobacterium leprae, a close relative of the tubercle bacillus. Mycobacterium leprae has the longest doubling time of all known bacteria and has thwarted every effort at culture in the laboratory. Comparing the 3.27-megabase (Mb) genome sequence of an armadillo-derived Indian isolate of the leprosy bacillus with that of Mycobacterium tuberculosis (4.41 Mb) provides clear explanations for these properties and reveals an extreme case of reductive evolution. Less than half of the genome contains functional genes but pseudogenes, with intact counterparts in M. tuberculosis, abound. Genome downsizing and the current mosaic arrangement appear to have resulted from extensive recombination events between dispersed repetitive sequences. Gene deletion and decay have eliminated many important metabolic activities including siderophore production, part of the oxidative and most of the microaerophilic and anaerobic respiratory chains, and numerous catabolic systems and their regulatory circuits.
Analysis of Plasmodium falciparum chromosome 3, and comparison with chromosome 2, highlights novel features of chromosome organization and gene structure. The sub-telomeric regions of chromosome 3 show a conserved order of features, including repetitive DNA sequences, members of multigene families involved in pathogenesis and antigenic variation, a number of conserved pseudogenes, and several genes of unknown function. A putative centromere has been identified that has a core region of about 2 kilobases with an extremely high (adenine + thymidine) composition and arrays of tandem repeats. We have predicted 215 protein-coding genes and two transfer RNA genes in the 1,060,106-base-pair chromosome sequence. The predicted protein-coding genes can be divided into three main classes: 52.6% are not spliced, 45.1% have a large exon with short additional 5' or 3' exons, and 2.3% have a multiple exon structure more typical of higher eukaryotes.
Countless millions of people have died from tuberculosis, a chronic infectious disease caused by the tubercle bacillus. The complete genome sequence of the best-characterized strain of Mycobacterium tuberculosis, H37Rv, has been determined and analysed in order to improve our understanding of the biology of this slow-growing pathogen and to help the conception of new prophylactic and therapeutic interventions. The genome comprises 4,411,529 base pairs, contains around 4,000 genes, and has a very high guanine + cytosine content that is reflected in the biased amino-acid content of the proteins. M. tuberculosis differs radically from other bacteria in that a very large portion of its coding capacity is devoted to the production of enzymes involved in lipogenesis and lipolysis, and to two new families of glycine-rich proteins with a repetitive structure that may represent a source of antigenic variation.