(Below N is a link to NCBI taxonomic web page and E link to ESTHER at designed phylum.) > cellular organisms: NE > Eukaryota: NE > Opisthokonta: NE > Metazoa: NE > Eumetazoa: NE > Bilateria: NE > Deuterostomia: NE > Chordata: NE > Craniata: NE > Vertebrata: NE > Gnathostomata: NE > Teleostomi: NE > Euteleostomi: NE > Sarcopterygii: NE > Dipnotetrapodomorpha: NE > Tetrapoda: NE > Amniota: NE > Mammalia: NE > Theria: NE > Eutheria: NE > Boreoeutheria: NE > Euarchontoglires: NE > Primates: NE > Haplorrhini: NE > Simiiformes: NE > Catarrhini: NE > Hominoidea: NE > Hominidae: NE > Homininae: NE > Homo: NE > Homo sapiens: NE
LegendThis sequence has been compared to family alignement (MSA) red => minority aminoacid blue => majority aminoacid color intensity => conservation rate title => sequence position(MSA position)aminoacid rate Catalytic site Catalytic site in the MSA NKPSKLPGSPLILIASSGPSSSVFPTSRRHRFWQSQLSCLGKVIPVATHL LNNGSGVGVLQCLEHMIGAVRSKVLEIHSHFPHKPIILIGWNTGALVACH VSVMEYVTAVVCLGFPLLTVDGPRGDVDDPLLDMKTPVLFVIGQNSLQCH PEAMEDFREKIRAENSLVVVGGADDNLRISKAKKKSEGLTQSMVDRCIQD EIVDFLTGVLTRA
The evolutionary conserved NSL complex is a prominent epigenetic regulator controlling expression of thousands of genes. Here we uncover a novel function of the NSL complex members in mitosis. As the cell enters mitosis, KANSL1 and KANSL3 undergo a marked relocalisation from the chromatin to the mitotic spindle. By stabilizing microtubule minus ends in a RanGTP-dependent manner, they are essential for spindle assembly and chromosome segregation. Moreover, we identify KANSL3 as a microtubule minus-end-binding protein, revealing a new class of mitosis-specific microtubule minus-end regulators. By adopting distinct functions in interphase and mitosis, KANSL proteins provide a link to coordinate the tasks of faithful expression and inheritance of the genome during different phases of the cell cycle.
Human MOF (MYST1), a member of the MYST (Moz-Ybf2/Sas3-Sas2-Tip60) family of histone acetyltransferases (HATs), is the human ortholog of the Drosophila males absent on the first (MOF) protein. MOF is the catalytic subunit of the male-specific lethal (MSL) HAT complex, which plays a key role in dosage compensation in the fly and is responsible for a large fraction of histone H4 lysine 16 (H4K16) acetylation in vivo. MOF was recently reported to be a component of a second HAT complex, designated the non-specific lethal (NSL) complex (Mendjan, S., Taipale, M., Kind, J., Holz, H., Gebhardt, P., Schelder, M., Vermeulen, M., Buscaino, A., Duncan, K., Mueller, J., Wilm, M., Stunnenberg, H. G., Saumweber, H., and Akhtar, A. (2006) Mol. Cell 21, 811-823). Here we report an analysis of the subunit composition and substrate specificity of the NSL complex. Proteomic analyses of complexes purified through multiple candidate subunits reveal that NSL is composed of nine subunits. Two of its subunits, WD repeat domain 5 (WDR5) and host cell factor 1 (HCF1), are shared with members of the MLL/SET family of histone H3 lysine 4 (H3K4) methyltransferase complexes, and a third subunit, MCRS1, is shared with the human INO80 chromatin-remodeling complex. In addition, we show that assembly of the MOF HAT into MSL or NSL complexes controls its substrate specificity. Although MSL-associated MOF acetylates nucleosomal histone H4 almost exclusively on lysine 16, NSL-associated MOF exhibits a relaxed specificity and also acetylates nucleosomal histone H4 on lysines 5 and 8.
The analysis of proteome-wide phosphorylation events is still a major analytical challenge because of the enormous complexity of protein phosphorylation networks. In this work, we evaluate the complementarity of Lys-N, Lys-C, and trypsin with regard to their ability to contribute to the global analysis of the phosphoproteome. A refined version of low-pH strong cation exchange was used to efficiently separate N-terminally acetylated, phosphorylated, and nonmodified peptides. A total of 5036 nonredundant phosphopeptides could be identified with a false discovery rate of <1% from 1 mg of protein using a combination of the three enzymes. Our data revealed that the overlap between the phosphopeptide data sets generated with different proteases was marginal, whereas the overlap between two similarly generated tryptic data sets was found to be at least 4 times higher. In this way, the parallel use of Lys-N and trypsin enabled a 72% increase in the number of detected phosphopeptides as compared to trypsin alone, whereas a trypsin replicate experiment only led to a 25% increase. Thus, when focusing solely on the trypsin and Lys-N data, we identified 4671 nonredundant phosphopeptides. Further analysis of the detected sites showed that the Lys-N and trypsin data sets were enriched in significantly different phosphorylation motifs, further evidencing that multiprotease approaches are very valuable in phosphoproteome analyses.
The evolutionary conserved NSL complex is a prominent epigenetic regulator controlling expression of thousands of genes. Here we uncover a novel function of the NSL complex members in mitosis. As the cell enters mitosis, KANSL1 and KANSL3 undergo a marked relocalisation from the chromatin to the mitotic spindle. By stabilizing microtubule minus ends in a RanGTP-dependent manner, they are essential for spindle assembly and chromosome segregation. Moreover, we identify KANSL3 as a microtubule minus-end-binding protein, revealing a new class of mitosis-specific microtubule minus-end regulators. By adopting distinct functions in interphase and mitosis, KANSL proteins provide a link to coordinate the tasks of faithful expression and inheritance of the genome during different phases of the cell cycle.
Human MOF (MYST1), a member of the MYST (Moz-Ybf2/Sas3-Sas2-Tip60) family of histone acetyltransferases (HATs), is the human ortholog of the Drosophila males absent on the first (MOF) protein. MOF is the catalytic subunit of the male-specific lethal (MSL) HAT complex, which plays a key role in dosage compensation in the fly and is responsible for a large fraction of histone H4 lysine 16 (H4K16) acetylation in vivo. MOF was recently reported to be a component of a second HAT complex, designated the non-specific lethal (NSL) complex (Mendjan, S., Taipale, M., Kind, J., Holz, H., Gebhardt, P., Schelder, M., Vermeulen, M., Buscaino, A., Duncan, K., Mueller, J., Wilm, M., Stunnenberg, H. G., Saumweber, H., and Akhtar, A. (2006) Mol. Cell 21, 811-823). Here we report an analysis of the subunit composition and substrate specificity of the NSL complex. Proteomic analyses of complexes purified through multiple candidate subunits reveal that NSL is composed of nine subunits. Two of its subunits, WD repeat domain 5 (WDR5) and host cell factor 1 (HCF1), are shared with members of the MLL/SET family of histone H3 lysine 4 (H3K4) methyltransferase complexes, and a third subunit, MCRS1, is shared with the human INO80 chromatin-remodeling complex. In addition, we show that assembly of the MOF HAT into MSL or NSL complexes controls its substrate specificity. Although MSL-associated MOF acetylates nucleosomal histone H4 almost exclusively on lysine 16, NSL-associated MOF exhibits a relaxed specificity and also acetylates nucleosomal histone H4 on lysines 5 and 8.
The analysis of proteome-wide phosphorylation events is still a major analytical challenge because of the enormous complexity of protein phosphorylation networks. In this work, we evaluate the complementarity of Lys-N, Lys-C, and trypsin with regard to their ability to contribute to the global analysis of the phosphoproteome. A refined version of low-pH strong cation exchange was used to efficiently separate N-terminally acetylated, phosphorylated, and nonmodified peptides. A total of 5036 nonredundant phosphopeptides could be identified with a false discovery rate of <1% from 1 mg of protein using a combination of the three enzymes. Our data revealed that the overlap between the phosphopeptide data sets generated with different proteases was marginal, whereas the overlap between two similarly generated tryptic data sets was found to be at least 4 times higher. In this way, the parallel use of Lys-N and trypsin enabled a 72% increase in the number of detected phosphopeptides as compared to trypsin alone, whereas a trypsin replicate experiment only led to a 25% increase. Thus, when focusing solely on the trypsin and Lys-N data, we identified 4671 nonredundant phosphopeptides. Further analysis of the detected sites showed that the Lys-N and trypsin data sets were enriched in significantly different phosphorylation motifs, further evidencing that multiprotease approaches are very valuable in phosphoproteome analyses.
The eukaryotic cell division cycle is characterized by a sequence of orderly and highly regulated events resulting in the duplication and separation of all cellular material into two newly formed daughter cells. Protein phosphorylation by cyclin-dependent kinases (CDKs) drives this cycle. To gain further insight into how phosphorylation regulates the cell cycle, we sought to identify proteins whose phosphorylation is cell cycle regulated. Using stable isotope labeling along with a two-step strategy for phosphopeptide enrichment and high mass accuracy mass spectrometry, we examined protein phosphorylation in a human cell line arrested in the G(1) and mitotic phases of the cell cycle. We report the identification of >14,000 different phosphorylation events, more than half of which, to our knowledge, have not been described in the literature, along with relative quantitative data for the majority of these sites. We observed >1,000 proteins with increased phosphorylation in mitosis including many known cell cycle regulators. The majority of sites on regulated phosphopeptides lie in [S/T]P motifs, the minimum required sequence for CDKs, suggesting that many of the proteins may be CDK substrates. Analysis of non-proline site-containing phosphopeptides identified two unique motifs that suggest there are at least two undiscovered mitotic kinases.
Dosage compensation in Drosophila is dependent on MSL proteins and involves hypertranscription of the male X chromosome, which ensures equal X-linked gene expression in both sexes. Here, we report the purification of enzymatically active MSL complexes from Drosophila embryos, Schneider cells, and human HeLa cells. We find a stable association of the histone H4 lysine 16-specific acetyltransferase MOF with the RNA/protein containing MSL complex as well as with an evolutionary conserved complex. We show that the MSL complex interacts with several components of the nuclear pore, in particular Mtor/TPR and Nup153. Strikingly, knockdown of Mtor or Nup153 results in loss of the typical MSL X-chromosomal staining and dosage compensation in Drosophila male cells but not in female cells. These results reveal an unexpected physical and functional connection between nuclear pore components and chromatin regulation through MSL proteins, highlighting the role of nucleoporins in gene regulation in higher eukaryotes.
Human chromosome 2 is unique to the human lineage in being the product of a head-to-head fusion of two intermediate-sized ancestral chromosomes. Chromosome 4 has received attention primarily related to the search for the Huntington's disease gene, but also for genes associated with Wolf-Hirschhorn syndrome, polycystic kidney disease and a form of muscular dystrophy. Here we present approximately 237 million base pairs of sequence for chromosome 2, and 186 million base pairs for chromosome 4, representing more than 99.6% of their euchromatic sequences. Our initial analyses have identified 1,346 protein-coding genes and 1,239 pseudogenes on chromosome 2, and 796 protein-coding genes and 778 pseudogenes on chromosome 4. Extensive analyses confirm the underlying construction of the sequence, and expand our understanding of the structure and evolution of mammalian chromosomes, including gene deserts, segmental duplications and highly variant regions.
Determining the site of a regulatory phosphorylation event is often essential for elucidating specific kinase-substrate relationships, providing a handle for understanding essential signaling pathways and ultimately allowing insights into numerous disease pathologies. Despite intense research efforts to elucidate mechanisms of protein phosphorylation regulation, efficient, large-scale identification and characterization of phosphorylation sites remains an unsolved problem. In this report we describe an application of existing technology for the isolation and identification of phosphorylation sites. By using a strategy based on strong cation exchange chromatography, phosphopeptides were enriched from the nuclear fraction of HeLa cell lysate. From 967 proteins, 2,002 phosphorylation sites were determined by tandem MS. This unprecedented large collection of sites permitted a detailed accounting of known and unknown kinase motifs and substrates.
As a base for human transcriptome and functional genomics, we created the "full-length long Japan" (FLJ) collection of sequenced human cDNAs. We determined the entire sequence of 21,243 selected clones and found that 14,490 cDNAs (10,897 clusters) were unique to the FLJ collection. About half of them (5,416) seemed to be protein-coding. Of those, 1,999 clusters had not been predicted by computational methods. The distribution of GC content of nonpredicted cDNAs had a peak at approximately 58% compared with a peak at approximately 42%for predicted cDNAs. Thus, there seems to be a slight bias against GC-rich transcripts in current gene prediction procedures. The rest of the cDNAs unique to the FLJ collection (5,481) contained no obvious open reading frames (ORFs) and thus are candidate noncoding RNAs. About one-fourth of them (1,378) showed a clear pattern of splicing. The distribution of GC content of noncoding cDNAs was narrow and had a peak at approximately 42%, relatively low compared with that of protein-coding cDNAs.
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.
With the complete human genomic sequence being unraveled, the focus will shift to gene identification and to the functional analysis of gene products. The generation of a set of cDNAs, both sequences and physical clones, which contains the complete and noninterrupted protein coding regions of all human genes will provide the indispensable tools for the systematic and comprehensive analysis of protein function to eventually understand the molecular basis of man. Here we report the sequencing and analysis of 500 novel human cDNAs containing the complete protein coding frame. Assignment to functional categories was possible for 52% (259) of the encoded proteins, the remaining fraction having no similarities with known proteins. By aligning the cDNA sequences with the sequences of the finished chromosomes 21 and 22 we identified a number of genes that either had been completely missed in the analysis of the genomic sequences or had been wrongly predicted. Three of these genes appear to be present in several copies. We conclude that full-length cDNA sequencing continues to be crucial also for the accurate identification of genes. The set of 500 novel cDNAs, and another 1000 full-coding cDNAs of known transcripts we have identified, adds up to cDNA representations covering 2%--5 % of all human genes. We thus substantially contribute to the generation of a gene catalog, consisting of both full-coding cDNA sequences and clones, which should be made freely available and will become an invaluable tool for detailed functional studies.
        
Title: Prediction of the coding sequences of unidentified human genes. XVI. The complete sequences of 150 new cDNA clones from brain which code for large proteins in vitro Nagase T, Kikuno R, Ishikawa KI, Hirosawa M, Ohara O Ref: DNA Research, 7:65, 2000 : PubMed
We have carried out a human cDNA sequencing project to accumulate information regarding the coding sequences of unidentified human genes. As an extension of the preceding reports, we herein present the entire sequences of 150 cDNA clones of unknown human genes, named KIAA1294 to KIAA1443, from two sets of size-fractionated human adult and fetal brain cDNA libraries. The average sizes of the inserts and corresponding open reading frames of cDNA clones analyzed here reached 4.8 kb and 2.7 kb (910 amino acid residues), respectively. From sequence similarities and protein motifs, 73 predicted gene products were functionally annotated and 97% of them were classified into the following four functional categories: cell signaling/communication, nucleic acid management, cell structure/motility and protein management. Additionally, the chromosomal loci of the genes were assigned by using human-rodent hybrid panels for those genes whose mapping data were not available in the public databases. The expression profiles of the genes were also studied in 10 human tissues, 8 brain regions, spinal cord, fetal brain and fetal liver by reverse transcription-coupled polymerase chain reaction, products of which were quantified by enzyme-linked immunosorbent assay.