Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
        
Title: Induction of a regenerative microenvironment in skeletal muscle is sufficient to induce embryonal rhabdomyosarcoma in p53-deficient mice Camboni M, Hammond S, Martin LT, Martin PT Ref: Journal of Pathology, 226:40, 2012 : PubMed
We have previously reported that mice with muscular dystrophy, including mdx mice, develop embryonal rhabdomyosarcoma (eRMS) with a low incidence after 1 year of age and that almost all such tumours contain cancer-associated p53 mutations. To further demonstrate the relevance of p53 inactivation, we created p53-deficient mdx mice. Here we demonstrate that loss of one or both p53 (Trp53) alleles accelerates eRMS incidence in the mdx background, such that almost all Trp53(-/-) mdx animals develop eRMS by 5 months of age. To ascertain whether increased tumour incidence was due to the regenerative microenvironment found in dystrophic skeletal muscles, we induced muscle regeneration in Trp53(+/+) and Trp53(-/-) animals using cardiotoxin (Ctx). Wild-type (Trp53(+/+) ) animals treated with Ctx, either once every 7 days or once every 14 days from 1 month of age onwards, developed no eRMS; however, all similarly Ctx-treated Trp53(-/-) animals developed eRMS by 5 months of age at the site of injection. Most of these tumours displayed markers of human eRMS, including over-expression of Igf2 and phosphorylated Akt. These data demonstrate that the presence of a regenerative microenvironment in skeletal muscle, coupled with Trp53 deficiency, is sufficient to robustly induce eRMS in young mice. These studies further suggest that consideration should be given to the potential of the muscle microenvironment to support tumourigenesis in regenerative therapies for myopathies.
Chinese hamster ovary (CHO)-derived cell lines are the preferred host cells for the production of therapeutic proteins. Here we present a draft genomic sequence of the CHO-K1 ancestral cell line. The assembly comprises 2.45 Gb of genomic sequence, with 24,383 predicted genes. We associate most of the assembled scaffolds with 21 chromosomes isolated by microfluidics to identify chromosomal locations of genes. Furthermore, we investigate genes involved in glycosylation, which affect therapeutic protein quality, and viral susceptibility genes, which are relevant to cell engineering and regulatory concerns. Homologs of most human glycosylation-associated genes are present in the CHO-K1 genome, although 141 of these homologs are not expressed under exponential growth conditions. Many important viral entry genes are also present in the genome but not expressed, which may explain the unusual viral resistance property of CHO cell lines. We discuss how the availability of this genome sequence may facilitate genome-scale science for the optimization of biopharmaceutical protein production.
        
Title: Mice lacking dystrophin or alpha sarcoglycan spontaneously develop embryonal rhabdomyosarcoma with cancer-associated p53 mutations and alternatively spliced or mutant Mdm2 transcripts Fernandez K, Serinagaoglu Y, Hammond S, Martin LT, Martin PT Ref: American Journal of Pathology, 176:416, 2010 : PubMed
Altered expression of proteins in the dystrophin-associated glycoprotein complex results in muscular dystrophy and has more recently been implicated in a number of forms of cancer. Here we show that loss of either of two members of this complex, dystrophin in mdx mice or alpha sarcoglycan in Sgca(-/-) mice, results in the spontaneous development of muscle-derived embryonal rhabdomyosarcoma (RMS) after 1 year of age. Many mdx and Sgca(-/-) tumors showed increased expression of insulin-like growth factor 2, retinoblastoma protein, and phosphorylated Akt and decreased expression of phosphatase and tensin homolog gene, much as is found in a human RMS. Further, all mdx and Sgca(-/-) RMS analyzed had increased expression of p53 and murine double minute (mdm)2 protein and contained missense p53 mutations previously identified in human cancers. The mdx RMS also contained missense mutations in Mdm2 or alternatively spliced Mdm2 transcripts that lacked an exon encoding a portion of the p53-binding domain. No Pax3:Fkhr or Pax7:Fkhr translocation mRNA products were evident in any tumor. Expression of natively glycosylated alpha dystroglycan and alpha sarcoglycan was reduced in mdx RMS, whereas dystrophin expression was absent in almost all human RMS, both for embryonal and alveolar RMS subtypes. These studies show that absence of members of the dystrophin-associated glycoprotein complex constitutes a permissive environment for spontaneous development of embryonal RMS associated with mutation of p53 and mutation or altered splicing of Mdm2.
The reference sequence for each human chromosome provides the framework for understanding genome function, variation and evolution. Here we report the finished sequence and biological annotation of human chromosome 1. Chromosome 1 is gene-dense, with 3,141 genes and 991 pseudogenes, and many coding sequences overlap. Rearrangements and mutations of chromosome 1 are prevalent in cancer and many other diseases. Patterns of sequence variation reveal signals of recent selection in specific genes that may contribute to human fitness, and also in regions where no function is evident. Fine-scale recombination occurs in hotspots of varying intensity along the sequence, and is enriched near genes. These and other studies of human biology and disease encoded within chromosome 1 are made possible with the highly accurate annotated sequence, as part of the completed set of chromosome sequences that comprise the reference human genome.
The finished sequence of human chromosome 10 comprises a total of 131,666,441 base pairs. It represents 99.4% of the euchromatic DNA and includes one megabase of heterochromatic sequence within the pericentromeric region of the short and long arm of the chromosome. Sequence annotation revealed 1,357 genes, of which 816 are protein coding, and 430 are pseudogenes. We observed widespread occurrence of overlapping coding genes (either strand) and identified 67 antisense transcripts. Our analysis suggests that both inter- and intrachromosomal segmental duplications have impacted on the gene count on chromosome 10. Multispecies comparative analysis indicated that we can readily annotate the protein-coding genes with current resources. We estimate that over 95% of all coding exons were identified in this study. Assessment of single base changes between the human chromosome 10 and chimpanzee sequence revealed nonsense mutations in only 21 coding genes with respect to the human sequence.
Chromosome 13 is the largest acrocentric human chromosome. It carries genes involved in cancer including the breast cancer type 2 (BRCA2) and retinoblastoma (RB1) genes, is frequently rearranged in B-cell chronic lymphocytic leukaemia, and contains the DAOA locus associated with bipolar disorder and schizophrenia. We describe completion and analysis of 95.5 megabases (Mb) of sequence from chromosome 13, which contains 633 genes and 296 pseudogenes. We estimate that more than 95.4% of the protein-coding genes of this chromosome have been identified, on the basis of comparison with other vertebrate genome sequences. Additionally, 105 putative non-coding RNA genes were found. Chromosome 13 has one of the lowest gene densities (6.5 genes per Mb) among human chromosomes, and contains a central region of 38 Mb where the gene density drops to only 3.1 genes per Mb.
Chromosome 9 is highly structurally polymorphic. It contains the largest autosomal block of heterochromatin, which is heteromorphic in 6-8% of humans, whereas pericentric inversions occur in more than 1% of the population. The finished euchromatic sequence of chromosome 9 comprises 109,044,351 base pairs and represents >99.6% of the region. Analysis of the sequence reveals many intra- and interchromosomal duplications, including segmental duplications adjacent to both the centromere and the large heterochromatic block. We have annotated 1,149 genes, including genes implicated in male-to-female sex reversal, cancer and neurodegenerative disease, and 426 pseudogenes. The chromosome contains the largest interferon gene cluster in the human genome. There is also a region of exceptionally high gene and G + C content including genes paralogous to those in the major histocompatibility complex. We have also detected recently duplicated genes that exhibit different rates of sequence divergence, presumably reflecting natural selection.
Chromosome 6 is a metacentric chromosome that constitutes about 6% of the human genome. The finished sequence comprises 166,880,988 base pairs, representing the largest chromosome sequenced so far. The entire sequence has been subjected to high-quality manual annotation, resulting in the evidence-supported identification of 1,557 genes and 633 pseudogenes. Here we report that at least 96% of the protein-coding genes have been identified, as assessed by multi-species comparative sequence analysis, and provide evidence for the presence of further, otherwise unsupported exons/genes. Among these are genes directly implicated in cancer, schizophrenia, autoimmunity and many other diseases. Chromosome 6 harbours the largest transfer RNA gene cluster in the genome; we show that this cluster co-localizes with a region of high transcriptional activity. Within the essential immune loci of the major histocompatibility complex, we find HLA-B to be the most polymorphic gene on chromosome 6 and in the human genome.
The finished sequence of human chromosome 20 comprises 59,187,298 base pairs (bp) and represents 99.4% of the euchromatic DNA. A single contig of 26 megabases (Mb) spans the entire short arm, and five contigs separated by gaps totalling 320 kb span the long arm of this metacentric chromosome. An additional 234,339 bp of sequence has been determined within the pericentromeric region of the long arm. We annotated 727 genes and 168 pseudogenes in the sequence. About 64% of these genes have a 5' and a 3' untranslated region and a complete open reading frame. Comparative analysis of the sequence of chromosome 20 to whole-genome shotgun-sequence data of two other vertebrates, the mouse Mus musculus and the puffer fish Tetraodon nigroviridis, provides an independent measure of the efficiency of gene annotation, and indicates that this analysis may account for more than 95% of all coding exons and almost all genes.