(Below N is a link to NCBI taxonomic web page and E link to ESTHER at designed phylum.) > cellular organisms: NE > Eukaryota: NE > Opisthokonta: NE > Metazoa: NE > Eumetazoa: NE > Bilateria: NE > Deuterostomia: NE > Chordata: NE > Craniata: NE > Vertebrata: NE > Gnathostomata: NE > Teleostomi: NE > Euteleostomi: NE > Sarcopterygii: NE > Dipnotetrapodomorpha: NE > Tetrapoda: NE > Amniota: NE > Mammalia: NE > Theria: NE > Eutheria: NE > Boreoeutheria: NE > Laurasiatheria: NE > Cetartiodactyla: NE > Ruminantia: NE > Pecora: NE > Bovidae: NE > Bovinae: NE > Bos: NE > Bos taurus: NE
LegendThis sequence has been compared to family alignement (MSA) red => minority aminoacid blue => majority aminoacid color intensity => conservation rate title => sequence position(MSA position)aminoacid rate Catalytic site Catalytic site in the MSA MNAMMETSELPAVFDGVKLAAVAAVLYVIVRCLNLKSPTAPPDLYFQDSG LSRFLLKSCPLLTKEYIPPLIWGKSGHIQTALYGKMGRVRSPHPYGHRKF ITMSDGATSTFDLFEPLAEHCVGDDITMVICPGIANHSEKQYIRTFVDYA QKNGYRCAVLNHLGALPNIELTSPRMFTYGCTWEFGAMVNYIKKTYPLTQ LVVVGFSLGGNIVCKYLGETQANQEKVLCCVSVCQGYSALRAQETFMQWD QCRRFYNFLMADNMKKIILSHRQALFGDHVKKPQSLEDTDLSRLYTATSL MQIDDNVMRKFHGYNSLKEYYEEESCMRYLHRIYVPLMLVNAADDPLVHE SLLAIPKSLSEKRENVMFVLPLHGGHLGFFEGSVLFPEPLTWMDKLVVEY ANAICQWERNKSQCSDTELVEADLE
BACKGROUND: Genome assemblies rely on the existence of transcript sequence to stitch together contigs, verify assembly of whole genome shotgun reads, and annotate genes. Functional genomics studies also rely on transcript sequence to create expression microarrays or interpret digital tag data produced by methods such as Serial Analysis of Gene Expression (SAGE). Transcript sequence can be predicted based on reconstruction from overlapping expressed sequence tags (EST) that are obtained by single-pass sequencing of random cDNA clones, but these reconstructions are prone to errors caused by alternative splice forms, transcripts from gene families with related sequences, and expressed pseudogenes. These errors confound genome assembly and annotation. The most useful transcript sequences are derived by complete insert sequencing of clones containing the entire length, or at least the full protein coding sequence (CDS) portion, of the source mRNA. While the bovine genome sequencing initiative is nearing completion, there is currently a paucity of bovine full-CDS mRNA and protein sequence data to support bovine genome assembly and functional genomics studies. Consequently, the production of high-quality bovine full-CDS cDNA sequences will enhance the bovine genome assembly and functional studies of bovine genes and gene products. The goal of this investigation was to identify and characterize the full-CDS sequences of bovine transcripts from clones identified in non-full-length enriched cDNA libraries. In contrast to several recent full-length cDNA investigations, these full-CDS cDNAs were selected, sequenced, and annotated without the benefit of the target organism's genomic sequence, by using comparison of bovine EST sequence to existing human mRNA to identify likely full-CDS clones for full-length insert cDNA (FLIC) sequencing. RESULTS: The predicted bovine protein lengths, 5' UTR lengths, and Kozak consensus sequences from 954 bovine FLIC sequences (bFLICs; average length 1713 nt, representing 762 distinct loci) are all consistent with previously sequenced mammalian full-length transcripts. CONCLUSION: In most cases, the bFLICs span the entire CDS of the genes, providing the basis for creating predicted bovine protein sequences to support proteomics and comparative evolutionary research as well as functional genomics and genome annotation. The results demonstrate the utility of the comparative approach in obtaining predicted protein sequences in other species.
An essential component of functional genomics studies is the sequence of DNA expressed in tissues of interest. To provide a resource of bovine-specific expressed sequence data and facilitate this powerful approach in cattle research, four normalized cDNA libraries were produced and arrayed for high-throughput sequencing. The libraries were made with RNA pooled from multiple tissues to increase efficiency of normalization and maximize the number of independent genes for which sequence data were obtained. Target tissues included those with highest likelihood to have impact on production parameters of animal health, growth, reproductive efficiency, and carcass merit. Success of normalization and inter- and intralibrary redundancy were assessed by collecting 6000-23,000 sequences from each of the libraries (68,520 total sequences deposited in GenBank). Sequence comparison and assembly of these sequences was performed in combination with 56,500 other bovine EST sequences present in the GenBank dbEST database to construct a cattle Gene Index (available from The Institute for Genomic Research at http:\/\/www.tigr.org/tdb/tgi.shtml). The 124,381 bovine ESTs present in GenBank at the time of the analysis form 16,740 assemblies that are listed and annotated on the Web site. Analysis of individual library sequence data indicates that the pooled-tissue approach was highly effective in preparing libraries for efficient deep sequencing.