Bark beetles (Coleoptera: Curculionidae: Scolytinae) are major insect pests of many woody plants around the world. The mountain pine beetle (MPB), Dendroctonus ponderosae Hopkins, is a significant historical pest of western North American pine forests. It is currently devastating pine forests in western North America--particularly in British Columbia, Canada--and is beginning to expand its host range eastward into the Canadian boreal forest, which extends to the Atlantic coast of North America. Limited genomic resources are available for this and other bark beetle pests, restricting the use of genomics-based information to help monitor, predict, and manage the spread of these insects. To overcome these limitations, we generated comprehensive transcriptome resources from fourteen full-length enriched cDNA libraries through paired-end Sanger sequencing of 100,000 cDNA clones, and single-end Roche 454 pyrosequencing of three of these cDNA libraries. Hybrid de novo assembly of the 3.4 million sequences resulted in 20,571 isotigs in 14,410 isogroups and 246,848 singletons. In addition, over 2300 non-redundant full-length cDNA clones putatively containing complete open reading frames, including 47 cytochrome P450s, were sequenced fully to high quality. This first large-scale genomics resource for bark beetles provides the relevant sequence information for gene discovery; functional and population genomics; comparative analyses; and for future efforts to annotate the MPB genome. These resources permit the study of this beetle at the molecular level and will inform research in other Dendroctonus spp. and more generally in the Curculionidae and other Coleoptera.
The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5'-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline.