Rice (Oryza sativa L.) chromosome 3 is evolutionarily conserved across the cultivated cereals and shares large blocks of synteny with maize and sorghum, which diverged from rice more than 50 million years ago. To begin to completely understand this chromosome, we sequenced, finished, and annotated 36.1 Mb ( approximately 97%) from O. sativa subsp. japonica cv Nipponbare. Annotation features of the chromosome include 5915 genes, of which 913 are related to transposable elements. A putative function could be assigned to 3064 genes, with another 757 genes annotated as expressed, leaving 2094 that encode hypothetical proteins. Similarity searches against the proteome of Arabidopsis thaliana revealed putative homologs for 67% of the chromosome 3 proteins. Further searches of a nonredundant amino acid database, the Pfam domain database, plant Expressed Sequence Tags, and genomic assemblies from sorghum and maize revealed only 853 nontransposable element related proteins from chromosome 3 that lacked similarity to other known sequences. Interestingly, 426 of these have a paralog within the rice genome. A comparative physical map of the wild progenitor species, Oryza nivara, with japonica chromosome 3 revealed a high degree of sequence identity and synteny between these two species, which diverged approximately 10,000 years ago. Although no major rearrangements were detected, the deduced size of the O. nivara chromosome 3 was 21% smaller than that of japonica. Synteny between rice and other cereals using an integrated maize physical map and wheat genetic map was strikingly high, further supporting the use of rice and, in particular, chromosome 3, as a model for comparative studies among the cereals.
The genome of the model plant Arabidopsis thaliana has been sequenced by an international collaboration, The Arabidopsis Genome Initiative. Here we report the complete sequence of chromosome 5. This chromosome is 26 megabases long; it is the second largest Arabidopsis chromosome and represents 21% of the sequenced regions of the genome. The sequence of chromosomes 2 and 4 have been reported previously and that of chromosomes 1 and 3, together with an analysis of the complete genome sequence, are reported in this issue. Analysis of the sequence of chromosome 5 yields further insights into centromere structure and the sequence determinants of heterochromatin condensation. The 5,874 genes encoded on chromosome 5 reveal several new functions in plants, and the patterns of gene organization provide insights into the mechanisms and extent of genome evolution in plants.
The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.