The complete DNA sequence of the yeast Saccharomyces cerevisiae chromosome IV has been determined. Apart from chromosome XII, which contains the 1-2 Mb rDNA cluster, chromosome IV is the longest S. cerevisiae chromosome. It was split into three parts, which were sequenced by a consortium from the European Community, the Sanger Centre, and groups from St Louis and Stanford in the United States. The sequence of 1,531,974 base pairs contains 796 predicted or known genes, 318 (39.9%) of which have been previously identified. Of the 478 new genes, 225 (28.3%) are homologous to previously identified genes and 253 (32%) have unknown functions or correspond to spurious open reading frames (ORFs). On average there is one gene approximately every two kilobases. Superimposed on alternating regional variations in G+C composition, there is a large central domain with a lower G+C content that contains all the yeast transposon (Ty) elements and most of the tRNA genes. Chromosome IV shares with chromosomes II, V, XII, XIII and XV some long clustered duplications which partly explain its origin.
The yeast Saccharomyces cerevisiae is the pre-eminent organism for the study of basic functions of eukaryotic cells. All of the genes of this simple eukaryotic cell have recently been revealed by an international collaborative effort to determine the complete DNA sequence of its nuclear genome. Here we describe some of the features of chromosome XII.
In 1992 we started assembling an ordered library of cosmid clones from chromosome XIV of the yeast Saccharomyces cerevisiae. At that time, only 49 genes were known to be located on this chromosome and we estimated that 80% to 90% of its genes were yet to be discovered. In 1993, a team of 20 European laboratories began the systematic sequence analysis of chromosome XIV. The completed and intensively checked final sequence of 784,328 base pairs was released in April, 1996. Substantial parts had been published before or had previously been made available on request. The sequence contained 419 known or presumptive protein-coding genes, including two pseudogenes and three retrotransposons, 14 tRNA genes, and three small nuclear RNA genes. For 116 (30%) protein-coding sequences, one or more structural homologues were identified elsewhere in the yeast genome. Half of them belong to duplicated groups of 6-14 loosely linked genes, in most cases with conserved gene order and orientation (relaxed interchromosomal synteny). We have considered the possible evolutionary origins of this unexpected feature of yeast genome organization.
        
Title: Sequence analysis of a 37.6 kbp cosmid clone from the right arm of Saccharomyces cerevisiae chromosome XII, carrying YAP3, HOG1, SNR6, tRNA-Arg3 and 23 new open reading frames, among which several homologies to proteins involved in cell division control and to mammalian growth factors and other animal proteins are found Verhasselt P, Volckaert G Ref: Yeast, 13:241, 1997 : PubMed
The nucleotide sequence of 37,639 bp of the right arm of chromosome XII has been determined. Twenty-five open reading frames (ORFs) longer than 300 bp were detected, two of which extend into the flanking cosmids. Only two (L2931 and L2961) of the 25 ORFs correspond to previously sequenced genes (HOG1 and YAP3, respectively). Another ORF is distinct from YAP3 but shows pronounced similarity to it. About half of the remaining ORFs show similarity to other genes or display characteristic protein signatures. In particular, ORF L2952 has striking homology with the probable cell cycle control protein crn of Drosophila melanogaster. L2949 has significant similarity to the human ZFM1 (related to a potential suppressor oncogene) and mouse CW17R genes, though it lacks the carboxy-terminal oligoproline and oligoglutamine stretches encoded by these mammalian genes. The small ORF L2922 is similar to part of the much larger yeast flocculation gene FLO1. Other sequences found in the 37639 bp fragment are one delta and one solo-sigma element, the tRNA-Arg3 gene, the small nuclear RNA gene SNR6 and three ARS consensus sequences.
        
Title: The sequence of a nearly unclonable 22.8 kb segment on the left arm chromosome VII from Saccharomyces cerevisiae reveals ARO2, RPL9A, TIP1, MRF1 genes and six new open reading frames Voet M, Defoor E, Verhasselt P, Riles L, Robben J, Volckaert G Ref: Yeast, 13:177, 1997 : PubMed
The nucleotide sequence of 22,803 bp on the left arm of chromosome VII was determined by polymerase chain reaction-based approaches to compensate for the unstable character of cosmid clones from this region of the chromosome. The coding density of the sequence is particularly high (more than 83%). Twelve open reading frames (ORFs) longer than 300 bp were found, two of which (at the left side) have been described previously (James et al., 1995) after sequencing of an overlapping cosmid. Four other ORFs correspond to published sequences of the known genes ARO2, RPL9A, TIP1 and MRF1. ARO2 codes for chorismate synthetase. RPL9A for protein L9 of the large ribosomal subunit and MRF1 for a mitochondrial translation release factor. The TIP1 product interacts with Sec20p and is thus involved in transport from endoplasmic reticulum to Golgi. Five of the remaining ORFs have not been identified previously, while the sixth (YGL142c) has been partially sequenced as it lies 5' upstream of MRF1. These six ORFs are relatively large (between 933 and 3657 nucleotides). YGL146c, YGL142c, YGL140c and YGL139w have no significant homology to any protein sequence presently available in the public databases, but show two, nine, nine and eight putative transmembrane spans, respectively. YGL144c has a serine active site signature of lipases. YGL141w has limited homology to several human proteins, one of which mediates complex formation between papillomavirus E6 oncoprotein and tumor suppressor protein p53.
        
Title: Twelve open reading frames revealed in the 23.6 kb segment flanking the centromere on the Saccharomyces cerevisiae chromosome XIV right arm Verhasselt P, Aert R, Voet M, Volckaert G Ref: Yeast, 10:1355, 1994 : PubMed
The nucleotide sequence of 23.6 kb of the right arm of chromosome XIV is described, starting from the centromeric region. Both strands were sequenced with an average redundancy of 4.87 per base pair. The overall G+C content is 38.8% (42.5% for putative coding regions versus 29.4% for non-coding regions). Twelve open reading frames (ORFs) greater than 100 amino acids were detected. Codon frequencies of the twelve ORFs agree with codon usage in Saccharomyces cerevisiae and all show the characteristics of low level expressed genes. Five ORFs (N2019, N2029, N2031, N2048 and N2050) are encoded by previously sequenced genes (the mitochondrial citrate synthase gene, FUN34, RPC34, PRP2 and URK1, respectively). ORF N2052 shows the characteristics of a transmembrane protein. Other elements in this region are a tRNA(Pro) gene, a tRNA(Asn) gene, a tau 34 and a truncated delta 34 element. Nucleotide sequence comparison results in relocation of the SIS1 gene to the left arm of the chromosome as confirmed by colinearity analysis.