Comprehensive analysis of Sichuan white geese (Anser cygnoides) transcriptome.
High-throughput RNA sequencing was performed for comprehensively analyzing the transcriptome of geese. A total of 28,803,759 bp of raw sequence data was generated by 454 GS Flx+. After removal of adaptor sequences, 28,730,361 bp remained and 117,279 reads were obtained, with an average length of 244 bases. Simultaneously, complementary DNA samples from two different reproductive stages of goose ovarian, hypothalamus and pituitary tissue were sequenced separately using Illumina MiSeq platform. A total of 12?688?673?148 bp of raw sequence data were generated by Illumina MiSeq. After removal of adaptor sequences, 8?198?126?562 bp remained and 60?382?786 clean reads were obtained, with an average length of 135 bases. Assembly of all the reads from both 454 Flx+ and Illumina platforms formed 56,839 contigs. The sequence size ranges from 38 to 28,206 bp in size, with an average size of 2584 bp and an N50 of 4624. The assembly produced a substantial number of large contigs: 35,545 (62.5%) were longer than 1?kb, of which 8850 (15.6%) were longer than 5?kb. The sequencing depth was 85 X on average. We performed comprehensive function annotations on unigenes including protein sequence similarity, gene ontology (GO) term classification, and Kyoto Encylcopedia of Genes and Genomes (KEGG) pathway enrichment. GO analysis showed that approximately 63% of the contigs had annotation information, among the 35,953 annotated isotigs in Nr database, 24,783 (68.9%) sequences were assigned with one or more GO terms. There were 14,634 (40.7%) isotigs for biological processes, 10,557(29.3%) isotigs for cellular component, 22,607 (62.9%) isotigs for molecular function. The result of KEGG pathway mapping 8926 sequences had the pathway annotation, and took part in 477 pathways. Additionally, 10,685 simple sequence repeat (SSR) markers were identified from the assembled sequences. The most frequent repeat motifs were trinucleotides, which accounted for 53.03% of all SSRs, followed by dinucleotides (39.9%), tetranucleotides (5.08%), pentanucleotides (1.68%) and hexanucleotides (0.32%). Transcriptome sequencing on mixture issue of the geese yielded substantial transcriptional sequences and potentially useful SSR markers which provide an important data source for geese research.