The nucleotide sequence of the 948,061 base pairs of chromosome XVI has been determined, completing the sequence of the yeast genome. Chromosome XVI was the last yeast chromosome identified, and some of the genes mapped early to it, such as GAL4, PEP4 and RAD1 (ref. 2) have played important roles in the development of yeast biology. The architecture of this final chromosome seems to be typical of the large yeast chromosomes, and shows large duplications with other yeast chromosomes. Chromosome XVI contains 487 potential protein-encoding genes, 17 tRNA genes and two small nuclear RNA genes; 27% of the genes have significant similarities to human gene products, and 48% are new and of unknown biological function. Systematic efforts to explore gene function have begun.
Chromosome XV was one of the last two chromosomes of Saccharomyces cerevisiae to be discovered. It is the third-largest yeast chromosome after chromosomes XII and IV, and is very similar in size to chromosome VII. It alone represents 9% of the yeast genome (8% if ribosomal DNA is included). When systematic sequencing of chromosome XV was started, 93 genes or markers were identified, and most of them were mapped. However, very little else was known about chromosome XV which, in contrast to shorter chromosomes, had not been the object of comprehensive genetic or molecular analysis. It was therefore decided to start sequencing chromosome XV only in the third phase of the European Yeast Genome Sequencing Programme, after experience was gained on chromosomes III, XI and II. The sequence of chromosome XV has been determined from a set of partly overlapping cosmid clones derived from a unique yeast strain, and physically mapped at 3.3-kilobase resolution before sequencing. As well as numerous new open reading frames (ORFs) and genes encoding tRNA or small RNA molecules, the sequence of 1,091,283 base pairs confirms the high proportion of orphan genes and reveals a number of ancestral and successive duplications with other yeast chromosomes.
The complete DNA sequence of the yeast Saccharomyces cerevisiae chromosome IV has been determined. Apart from chromosome XII, which contains the 1-2 Mb rDNA cluster, chromosome IV is the longest S. cerevisiae chromosome. It was split into three parts, which were sequenced by a consortium from the European Community, the Sanger Centre, and groups from St Louis and Stanford in the United States. The sequence of 1,531,974 base pairs contains 796 predicted or known genes, 318 (39.9%) of which have been previously identified. Of the 478 new genes, 225 (28.3%) are homologous to previously identified genes and 253 (32%) have unknown functions or correspond to spurious open reading frames (ORFs). On average there is one gene approximately every two kilobases. Superimposed on alternating regional variations in G+C composition, there is a large central domain with a lower G+C content that contains all the yeast transposon (Ty) elements and most of the tRNA genes. Chromosome IV shares with chromosomes II, V, XII, XIII and XV some long clustered duplications which partly explain its origin.
The yeast Saccharomyces cerevisiae is the pre-eminent organism for the study of basic functions of eukaryotic cells. All of the genes of this simple eukaryotic cell have recently been revealed by an international collaborative effort to determine the complete DNA sequence of its nuclear genome. Here we describe some of the features of chromosome XII.
In 1992 we started assembling an ordered library of cosmid clones from chromosome XIV of the yeast Saccharomyces cerevisiae. At that time, only 49 genes were known to be located on this chromosome and we estimated that 80% to 90% of its genes were yet to be discovered. In 1993, a team of 20 European laboratories began the systematic sequence analysis of chromosome XIV. The completed and intensively checked final sequence of 784,328 base pairs was released in April, 1996. Substantial parts had been published before or had previously been made available on request. The sequence contained 419 known or presumptive protein-coding genes, including two pseudogenes and three retrotransposons, 14 tRNA genes, and three small nuclear RNA genes. For 116 (30%) protein-coding sequences, one or more structural homologues were identified elsewhere in the yeast genome. Half of them belong to duplicated groups of 6-14 loosely linked genes, in most cases with conserved gene order and orientation (relaxed interchromosomal synteny). We have considered the possible evolutionary origins of this unexpected feature of yeast genome organization.
The complete nucleotide sequence of Saccharomyces cerevisiae chromosome VII has 572 predicted open reading frames (ORFs), of which 341 are new. No correlation was found between G+C content and gene density along the chromosome, and their variations are random. Of the ORFs, 17% show high similarity to human proteins. Almost half of the ORFs could be classified in functional categories, and there is a slight increase in the number of transcription (7.0%) and translation (5.2%) factors when compared with the complete S. cerevisiae genome. Accurate verification procedures demonstrate that there are less than two errors per 10,000 base pairs in the published sequence.
The complete DNA sequence of the yeast Saccharomyces cerevisiae chromosome XI has been determined. In addition to a compact arrangement of potential protein coding sequences, the 666,448-base-pair sequence has revealed general chromosome patterns; in particular, alternating regional variations in average base composition correlate with variations in local gene density along the chromosome. Significant discrepancies with the previously published genetic map demonstrate the need for using independent physical mapping criteria.
In the framework of the EU genome-sequencing programmes, the complete DNA sequence of the yeast Saccharomyces cerevisiae chromosome II (807 188 bp) has been determined. At present, this is the largest eukaryotic chromosome entirely sequenced. A total of 410 open reading frames (ORFs) were identified, covering 72% of the sequence. Similarity searches revealed that 124 ORFs (30%) correspond to genes of known function, 51 ORFs (12.5%) appear to be homologues of genes whose functions are known, 52 others (12.5%) have homologues the functions of which are not well defined and another 33 of the novel putative genes (8%) exhibit a degree of similarity which is insufficient to confidently assign function. Of the genes on chromosome II, 37-45% are thus of unpredicted function. Among the novel putative genes, we found several that are related to genes that perform differentiated functions in multicellular organisms of are involved in malignancy. In addition to a compact arrangement of potential protein coding sequences, the analysis of this chromosome confirmed general chromosome patterns but also revealed particular novel features of chromosomal organization. Alternating regional variations in average base composition correlate with variations in local gene density along chromosome II, as observed in chromosomes XI and III. We propose that functional ARS elements are preferably located in the AT-rich regions that have a spacing of approximately 110 kb. Similarly, the 13 tRNA genes and the three Ty elements of chromosome II are found in AT-rich regions. In chromosome II, the distribution of coding sequences between the two strands is biased, with a ratio of 1.3:1. An interesting aspect regarding the evolution of the eukaryotic genome is the finding that chromosome II has a high degree of internal genetic redundancy, amounting to 16% of the coding capacity.