The genome of the unicellular cyanobacterium Synechocystis sp. PCC 6803 consists of a single chromosome and several plasmids of different sizes, and the nucleotide sequences of the chromosome and three small plasmids (5.2 kb, 2.4 kb, and 2.3 kb) have already been sequenced. We newly determined the nucleotide sequences of four large plasmids, which have been identified in our laboratory (pSYSM:120 kb, pSYSX:106 kb, pSYSA:103 kb, and pSYSG:44 kb). Computer-aided analysis was performed to explore the genetic information carried by these plasmids. A total of 397 potential protein-encoding genes were predicted, but little information was obtained about the functional relationship of plasmids to host cell, as a large portion of the predicted genes (77%) were of unknown function. The occurrence of the potential genes on plasmids was divergent, and parA was the only gene common to all four large plasmids. The distribution data of a Cyanobacterium-specific sequence (HIP1: 5'-GCGATCGC-3') suggested that respective plasmids could have originated from different cyanobacterial strains.
The nucleotide sequence of the entire genome of a cyanobacterium Gloeobacter violaceus PCC 7421 was determined. The genome of G. violaceus was a single circular chromosome 4,659,019 bp long with an average GC content of 62%. No plasmid was detected. The chromosome comprises 4430 potential protein-encoding genes, one set of rRNA genes, 45 tRNA genes representing 44 tRNA species and genes for tmRNA, B subunit of RNase P, SRP RNA and 6Sa RNA. Forty-one percent of the potential protein-encoding genes showed sequence similarity to genes of known function, 37% to hypothetical genes, and the remaining 22% had no apparent similarity to reported genes. Comparison of the assigned gene components with those of other cyanobacteria has unveiled distinctive features of the G. violaceus genome. Genes for PsaI, PsaJ, PsaK, and PsaX for Photosystem I and PsbY, PsbZ and Psb27 for Photosystem II were missing, and those for PsaF, PsbO, PsbU, and PsbV were poorly conserved. cpcG for a rod core linker peptide for phycobilisomes and nblA related to the degradation of phycobilisomes were also missing. Potential signal peptides of the presumptive products of petJ and petE for soluble electron transfer catalysts were less conserved than the remaining portions. These observations may be related to the fact that photosynthesis in G. violaceus takes place not in thylakoid membranes but in the cytoplasmic membrane. A large number of genes for sigma factors and transcription factors in the LuxR, LysR, PadR, TetR, and MarR families could be identified, while those for major elements for circadian clock, kaiABC were not found. These differences may reflect the phylogenetic distance between G. violaceus and other cyanobacteria.
The complete nucleotide sequence of the genome of a symbiotic bacterium Bradyrhizobium japonicum USDA110 was determined. The genome of B. japonicum was a single circular chromosome 9,105,828 bp in length with an average GC content of 64.1%. No plasmid was detected. The chromosome comprises 8317 potential protein-coding genes, one set of rRNA genes and 50 tRNA genes. Fifty-two percent of the potential protein genes showed sequence similarity to genes of known function and 30% to hypothetical genes. The remaining 18% had no apparent similarity to reported genes. Thirty-four percent of the B. japonicum genes showed significant sequence similarity to those of both Mesorhizobium loti and Sinorhizobium meliloti, while 23% were unique to this species. A presumptive symbiosis island 681 kb in length, which includes a 410-kb symbiotic region previously reported by Gottfert et al., was identified. Six hundred fifty-five putative protein-coding genes were assigned in this region, and the functions of 301 genes, including those related to symbiotic nitrogen fixation and DNA transmission, were deduced. A total of 167 genes for transposases/104 copies of insertion sequences were identified in the genome. It was remarkable that 100 out of 167 transposase genes are located in the presumptive symbiotic island. DNA segments of 4 to 97 kb inserted into tRNA genes were found at 14 locations in the genome, which generates partial duplication of the target tRNA genes. These observations suggest plasticity of the B. japonicum genome, which is probably due to complex genome rearrangements such as horizontal transfer and insertion of various DNA elements, and to homologous recombination.
The entire genome of a thermophilic unicellular cyanobacterium, Thermosynechococcus elongatus BP-1, was sequenced. The genome consisted of a circular chromosome 2,593,857 bp long, and no plasmid was detected. A total of 2475 potential protein-encoding genes, one set of rRNA genes, 42 tRNA genes representing 42 tRNA species and 4 genes for small structural RNAs were assigned to the chromosome by similarity search and computer prediction. The translated products of 56% of the potential protein-encoding genes showed sequence similarity to experimentally identified and predicted proteins of known function, and the products of 34% of these genes showed sequence similarity to the translated products of hypothetical genes. The remaining 10% lacked significant similarity to genes for predicted proteins in the public DNA databases. Sixty-three percent of the T. elongatus genes showed significant sequence similarity to those of both Synechocystis sp. PCC 6803 and Anabaena sp. PCC 7120, while 22% of the genes were unique to this species, indicating a high degree of divergence of the gene information among cyanobacterial strains. The lack of genes for typical fatty acid desaturases and the presence of more genes for heat-shock proteins in comparison with other mesophilic cyanobacteria may be genomic features of thermophilic strains. A remarkable feature of the genome is the presence of 28 copies of group II introns, 8 of which contained a presumptive gene for maturase/reverse transcriptase. A trace of genome rearrangement mediated by the group II introns was also observed.
The nucleotide sequence of the entire genome of a filamentous cyanobacterium, Anabaena sp. strain PCC 7120, was determined. The genome of Anabaena consisted of a single chromosome (6,413,771 bp) and six plasmids, designated pCC7120alpha (408,101 bp), pCC7120beta (186,614 bp), pCC7120gamma (101,965 bp), pCC7120delta (55,414 bp), pCC7120epsilon (40,340 bp), and pCC7120zeta (5,584 bp). The chromosome bears 5368 potential protein-encoding genes, four sets of rRNA genes, 48 tRNA genes representing 42 tRNA species, and 4 genes for small structural RNAs. The predicted products of 45% of the potential protein-encoding genes showed sequence similarity to known and predicted proteins of known function, and 27% to translated products of hypothetical genes. The remaining 28% lacked significant similarity to genes for known and predicted proteins in the public DNA databases. More than 60 genes involved in various processes of heterocyst formation and nitrogen fixation were assigned to the chromosome based on their similarity to the reported genes. One hundred and ninety-five genes coding for components of two-component signal transduction systems, nearly 2.5 times as many as those in Synechocystis sp. PCC 6803, were identified on the chromosome. Only 37% of the Anabaena genes showed significant sequence similarity to those of Synechocystis, indicating a high degree of divergence of the gene information between the two cyanobacterial strains.
The complete nucleotide sequence of the genome of a symbiotic bacterium Mesorhizobium loti strain MAFF303099 was determined. The genome of M. loti consisted of a single chromosome (7,036,071 bp) and two plasmids, designated as pMLa (351,911 bp) and pMLb (208, 315 bp). The chromosome comprises 6752 potential protein-coding genes, two sets of rRNA genes and 50 tRNA genes representing 47 tRNA species. Fifty-four percent of the potential protein genes showed sequence similarity to genes of known function, 21% to hypothetical genes, and the remaining 25% had no apparent similarity to reported genes. A 611-kb DNA segment, a highly probable candidate of a symbiotic island, was identified, and 30 genes for nitrogen fixation and 24 genes for nodulation were assigned in this region. Codon usage analysis suggested that the symbiotic island as well as the plasmids originated and were transmitted from other genetic systems. The genomes of two plasmids, pMLa and pMLb, contained 320 and 209 potential protein-coding genes, respectively, for a variety of biological functions. These include genes for the ABC-transporter system, phosphate assimilation, two-component system, DNA replication and conjugation, but only one gene for nodulation was identified.
The sequence determination of the entire genome of the Synechocystis sp. strain PCC6803 was completed. The total length of the genome finally confirmed was 3,573,470 bp, including the previously reported sequence of 1,003,450 bp from map position 64% to 92% of the genome. The entire sequence was assembled from the sequences of the physical map-based contigs of cosmid clones and of lambda clones and long PCR products which were used for gap-filling. The accuracy of the sequence was guaranteed by analysis of both strands of DNA through the entire genome. The authenticity of the assembled sequence was supported by restriction analysis of long PCR products, which were directly amplified from the genomic DNA using the assembled sequence data. To predict the potential protein-coding regions, analysis of open reading frames (ORFs), analysis by the GeneMark program and similarity search to databases were performed. As a result, a total of 3,168 potential protein genes were assigned on the genome, in which 145 (4.6%) were identical to reported genes and 1,257 (39.6%) and 340 (10.8%) showed similarity to reported and hypothetical genes, respectively. The remaining 1,426 (45.0%) had no apparent similarity to any genes in databases. Among the potential protein genes assigned, 128 were related to the genes participating in photosynthetic reactions. The sum of the sequences coding for potential protein genes occupies 87% of the genome length. By adding rRNA and tRNA genes, therefore, the genome has a very compact arrangement of protein- and RNA-coding regions. A notable feature on the gene organization of the genome was that 99 ORFs, which showed similarity to transposase genes and could be classified into 6 groups, were found spread all over the genome, and at least 26 of them appeared to remain intact. The result implies that rearrangement of the genome occurred frequently during and after establishment of this species.