Peatlands harbor more than one-third of terrestrial carbon leading to the argument that the bryophytes, as major components of peatland ecosystems, store more organic carbon in soils than any other collective plant taxa. Plants of the genus Sphagnum is an important component of peatland ecosystems and are potentially vulnerable to changing climatic conditions. However, the response of Sphagnum to rising temperatures, elevated CO2 and shifts in local hydrology have yet to be fully characterized. In this review, we examine Sphagnum biology and ecology and explore the role of this keystone species and its associated microbiome in carbon and nitrogen cycling using literature review and model simulations. Several issues are highlighted including the consequences of a variable environment on plant-microbiome interactions, uncertainty associated with CO2 diffusion resistances and the relationship between fixed N and that partitioned to the photosynthetic apparatus. We note that the Sphagnum fallax genome is currently being sequenced and outline potential applications of population-level genomics and corresponding plant photosynthesis and microbial metabolic modeling techniques. We highlight Sphagnum as a model organism to explore ecosystem response to a changing climate, and to define the role that Sphagnum can play at the intersection of physiology, genetics and functional genomics.
Organelle sequences have a long history of utility in phylogenetic analyses. Chloroplast sequences when combined with nuclear data can help resolve relationships among flowering plant genera, and within genera incongruence can point to reticulate evolution. Plastome sequences are becoming plentiful because they are increasingly easier to obtain. Complete plastome sequences allow us to detect rare rearrangements and test the tempo of sequence evolution. Chloroplast sequences are generally considered a nuisance to be kept to a minimum in bacterial artificial chromosome libraries. Here, we sequenced two bacterial artificial chromosomes per species to generate complete plastome sequences from seven species. The plastome sequences from Glycine syndetika and six other perennial Glycine species are similar in arrangement and gene content to the previously published soybean plastome. Repetitive sequences were detected in high frequencies as in soybean, but further analysis showed that repeat sequence numbers are inflated. Previous chloroplast-based phylogenetic trees for perennial Glycine were incongruent with nuclear gene-based phylogenetic trees. We tested whether the hypothesis of introgression was supported by the complete plastomes. Alignment of complete plastome sequences and Bayesian analysis allowed us to date putative hybridization events supporting the hypothesis of introgression and chloroplast "capture."
Genetic maps are key tools in genetic research as they constitute the framework for many applications, such as quantitative trait locus analysis, and support the assembly of genome sequences. The resequencing of the two parents of a cross between Eucalyptus urophylla and Eucalyptus grandis was used to design a single nucleotide polymorphism (SNP) array of 6000 markers evenly distributed along the E. grandis genome. The genotyping of 1025 offspring enabled the construction of two high-resolution genetic maps containing 1832 and 1773 markers with an average marker interval of 0.45 and 0.5 cM for E. grandis and E. urophylla, respectively. The comparison between genetic maps and the reference genome highlighted 85% of collinear regions. A total of 43 noncollinear regions and 13 nonsynthetic regions were detected and corrected in the new genome assembly. This improved version contains 4943 scaffolds totalling 691.3 Mb of which 88.6% were captured by the 11 chromosomes. The mapping data were also used to investigate the effect of population size and number of markers on linkage mapping accuracy. This study provides the most reliable linkage maps for Eucalyptus and version 2.0 of the E. grandis genome.
Basidiomycota (basidiomycetes) make up 32% of the described fungi and include most wood-decaying species, as well as pathogens and mutualistic symbionts. Wood-decaying basidiomycetes have typically been classified as either white rot or brown rot, based on the ability (in white rot only) to degrade lignin along with cellulose and hemicellulose. Prior genomic comparisons suggested that the two decay modes can be distinguished based on the presence or absence of ligninolytic class II peroxidases (PODs), as well as the abundance of enzymes acting directly on crystalline cellulose (reduced in brown rot). To assess the generality of the white-rot/brown-rot classification paradigm, we compared the genomes of 33 basidiomycetes, including four newly sequenced wood decayers, and performed phylogenetically informed principal-components analysis (PCA) of a broad range of gene families encoding plant biomass-degrading enzymes. The newly sequenced Botryobasidium botryosum and Jaapia argillacea genomes lack PODs but possess diverse enzymes acting on crystalline cellulose, and they group close to the model white-rot species Phanerochaete chrysosporium in the PCA. Furthermore, laboratory assays showed that both B. botryosum and J. argillacea can degrade all polymeric components of woody plant cell walls, a characteristic of white rot. We also found expansions in reducing polyketide synthase genes specific to the brown-rot fungi. Our results suggest a continuum rather than a dichotomy between the white-rot and brown-rot modes of wood decay. A more nuanced categorization of rot types is needed, based on an improved understanding of the genomics and biochemistry of wood decay.
The process of plant speciation often involves the evolution of divergent ecotypes in response to differences in soil water availability between habitats. While the same set of traits is frequently associated with xeric/mesic ecotype divergence, it is unknown whether those traits evolve independently or if they evolve in tandem as a result of genetic colocalization either by pleiotropy or genetic linkage. The self-fertilizing C4 grass species Panicum hallii includes two major ecotypes found in xeric (var. hallii) or mesic (var. filipes) habitats. We constructed the first linkage map for P. hallii by genotyping a reduced representation genomic library of an F2 population derived from an intercross of var. hallii and filipes. We then evaluated the genetic architecture of divergence between these ecotypes through quantitative trait locus (QTL) mapping. Overall, we mapped QTLs for nine morphological traits that are involved in the divergence between the ecotypes. QTLs for five key ecotype-differentiating traits all colocalized to the same region of linkage group five. Leaf physiological traits were less divergent between ecotypes, but we still mapped five physiological QTLs. We also discovered a two-locus Dobzhansky-Muller hybrid incompatibility. Our study suggests that ecotype-differentiating traits may evolve in tandem as a result of genetic colocalization.
Common bean (Phaseolus vulgaris L.) is the most important grain legume for human consumption and has a role in sustainable agriculture owing to its ability to fix atmospheric nitrogen. We assembled 473 Mb of the 587-Mb genome and genetically anchored 98% of this sequence in 11 chromosome-scale pseudomolecules. We compared the genome for the common bean against the soybean genome to find changes in soybean resulting from polyploidy. Using resequencing of 60 wild individuals and 100 landraces from the genetically differentiated Mesoamerican and Andean gene pools, we confirmed 2 independent domestications from genetic pools that diverged before human colonization. Less than 10% of the 74 Mb of sequence putatively involved in domestication was shared by the two domestication events. We identified a set of genes linked with increased leaf and seed size and combined these results with quantitative trait locus data from Mesoamerican cultivars. Genes affected by domestication may be useful for genomics-enabled crop improvement.
Cultivated citrus are selections from, or hybrids of, wild progenitor species whose identities and contributions to citrus domestication remain controversial. Here we sequence and compare citrus genomes--a high-quality reference haploid clementine genome and mandarin, pummelo, sweet-orange and sour-orange genomes--and show that cultivated types derive from two progenitor species. Although cultivated pummelos represent selections from one progenitor species, Citrus maxima, cultivated mandarins are introgressions of C. maxima into the ancestral mandarin species Citrus reticulata. The most widely cultivated citrus, sweet orange, is the offspring of previously admixed individuals, but sour orange is an F1 hybrid of pure C. maxima and C. reticulata parents, thus implying that wild mandarins were part of the early breeding germplasm. A Chinese wild 'mandarin' diverges substantially from C. reticulata, thus suggesting the possibility of other unrecognized wild citrus species. Understanding citrus phylogeny through genome analysis clarifies taxonomic relationships and facilitates sequence-directed genetic improvement.
Eucalypts are the world's most widely planted hardwood trees. Their outstanding diversity, adaptability and growth have made them a global renewable resource of fibre and energy. We sequenced and assembled >94% of the 640-megabase genome of Eucalyptus grandis. Of 36,376 predicted protein-coding genes, 34% occur in tandem duplications, the largest proportion thus far in plant genomes. Eucalyptus also shows the highest diversity of genes for specialized metabolites such as terpenes that act as chemical defence and provide unique pharmaceutical oils. Genome sequencing of the E. grandis sister species E. globulus and a set of inbred E. grandis tree genomes reveals dynamic genome evolution and hotspots of inbreeding depression. The E. grandis genome is the first reference for the eudicot order Myrtales and is placed here sister to the eurosids. This resource expands our understanding of the unique biology of large woody perennials and provides a powerful tool to accelerate comparative biology, breeding and biotechnology.
The green alga Chlamydomonas reinhardtii is a popular unicellular organism for studying photosynthesis, cilia biogenesis, and micronutrient homeostasis. Ten years since its genome project was initiated an iterative process of improvements to the genome and gene predictions has propelled this organism to the forefront of the omics era. Housed at Phytozome, the plant genomics portal of the Joint Genome Institute (JGI), the most up-to-date genomic data include a genome arranged on chromosomes and high-quality gene models with alternative splice forms supported by an abundance of whole transcriptome sequencing (RNA-Seq) data. We present here the past, present, and future of Chlamydomonas genomics. Specifically, we detail progress on genome assembly and gene model refinement, discuss resources for gene annotations, functional predictions, and locus ID mapping between versions and, importantly, outline a standardized framework for naming genes.
Common bean (Phaseolus vulgaris) is an important legume crop grown and consumed worldwide. With the availability of the common bean genome sequence, the next challenge is to annotate the genome and characterize functional DNA elements. Transposable elements (TEs) are the most abundant component of plant genomes and can dramatically affect genome evolution and genetic variation. Thus, it is pivotal to identify TEs in the common bean genome. In this study, we performed a genome-wide transposon annotation in common bean using a combination of homology and sequence structure-based methods. We developed a 2.12-Mb transposon database which includes 791 representative transposon sequences and is available upon request or from www.phytozome.org. Of note, nearly all transposons in the database are previously unrecognized TEs. More than 5,000 transposon-related expressed sequence tags (ESTs) were detected which indicates that some transposons may be transcriptionally active. Two Ty1-copia retrotransposon families were found to encode the envelope-like protein which has rarely been identified in plant genomes. Also, we identified an extra open reading frame (ORF) termed ORF2 from 15 Ty3-gypsy families that was located between the ORF encoding the retrotransposase and the 3'LTR. The ORF2 was in opposite transcriptional orientation to retrotransposase. Sequence homology searches and phylogenetic analysis suggested that the ORF2 may have an ancient origin, but its function is not clear. These transposon data provide a useful resource for understanding the genome organization and evolution and may be used to identify active TEs for developing transposon-tagging system in common bean and other related genomes.
Next generation sequence data provides valuable information and tools for genetic and genomic research and offers new insights useful for marker development. This data is useful for the design of accurate and user-friendly molecular tools. Common bean (Phaseolus vulgaris L.) is a diverse crop in which separate domestication events happened in each gene pool followed by race and market class diversification that has resulted in different morphological characteristics in each commercial market class. This has led to essentially independent breeding programs within each market class which in turn has resulted in limited within market class sequence variation. Sequence data from selected genotypes of five bean market classes (pinto, black, navy, and light and dark red kidney) were used to develop InDel-based markers specific to each market class. Design of the InDel markers was conducted through a combination of assembly, alignment and primer design software using 1.6× to 5.1× coverage of Illumina GAII sequence data for each of the selected genotypes. The procedure we developed for primer design is fast, accurate, less error prone, and higher throughput than when they are designed manually. All InDel markers are easy to run and score with no need for PCR optimization. A total of 2687 InDel markers distributed across the genome were developed. To highlight their usefulness, they were employed to construct a phylogenetic tree and a genetic map, showing that InDel markers are reliable, simple, and accurate.
Meiotic recombination rates can vary widely across genomes, with hotspots of intense activity interspersed among cold regions. In yeast, hotspots tend to occur in promoter regions of genes, whereas in humans and mice, hotspots are largely defined by binding sites of the positive-regulatory domain zinc finger protein 9. To investigate the detailed recombination pattern in a flowering plant, we use shotgun resequencing of a wild population of the monkeyflower Mimulus guttatus to precisely locate over 400,000 boundaries of historic crossovers or gene conversion tracts. Their distribution defines some 13,000 hotspots of varying strengths, interspersed with cold regions of undetectably low recombination. Average recombination rates peak near starts of genes and fall off sharply, exhibiting polarity. Within genes, recombination tracts are more likely to terminate in exons than in introns. The general pattern is similar to that observed in yeast, as well as in positive-regulatory domain zinc finger protein 9-knockout mice, suggesting that recombination initiation described here in Mimulus may reflect ancient and conserved eukaryotic mechanisms.
Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders.
Next-generation whole-genome shotgun assemblies of complex genomes are highly useful, but fail to link nearby sequence contigs with each other or provide a linear order of contigs along individual chromosomes. Here, we introduce a strategy based on sequencing progeny of a segregating population that allows de novo production of a genetically anchored linear assembly of the gene space of an organism. We demonstrate the power of the approach by reconstructing the chromosomal organization of the gene space of barley, a large, complex and highly repetitive 5.1 Gb genome. We evaluate the robustness of the new assembly by comparison to a recently released physical and genetic framework of the barley genome, and to various genetically ordered sequence-based genotypic datasets. The method is independent of the need for any prior sequence resources, and will enable rapid and cost-efficient establishment of powerful genomic information for many species.
Despite the central importance of noncoding DNA to gene regulation and evolution, understanding of the extent of selection on plant noncoding DNA remains limited compared to that of other organisms. Here we report sequencing of genomes from three Brassicaceae species (Leavenworthia alabamica, Sisymbrium irio and Aethionema arabicum) and their joint analysis with six previously sequenced crucifer genomes. Conservation across orthologous bases suggests that at least 17% of the Arabidopsis thaliana genome is under selection, with nearly one-quarter of the sequence under selection lying outside of coding regions. Much of this sequence can be localized to approximately 90,000 conserved noncoding sequences (CNSs) that show evidence of transcriptional and post-transcriptional regulation. Population genomics analyses of two crucifer species, A. thaliana and Capsella grandiflora, confirm that most of the identified CNSs are evolving under medium to strong purifying selection. Overall, these CNSs highlight both similarities and several key differences between the regulatory DNA of plants and other species.
The shift from outcrossing to selfing is common in flowering plants, but the genomic consequences and the speed at which they emerge remain poorly understood. An excellent model for understanding the evolution of self fertilization is provided by Capsella rubella, which became self compatible <200,000 years ago. We report a C. rubella reference genome sequence and compare RNA expression and polymorphism patterns between C. rubella and its outcrossing progenitor Capsella grandiflora. We found a clear shift in the expression of genes associated with flowering phenotypes, similar to that seen in Arabidopsis, in which self fertilization evolved about 1 million years ago. Comparisons of the two Capsella species showed evidence of rapid genome-wide relaxation of purifying selection in C. rubella without a concomitant change in transposable element abundance. Overall we document that the transition to selfing may be typified by parallel shifts in gene expression, along with a measurable reduction of purifying selection.
In higher eukaryotes, centromeres are typically composed of megabase-sized arrays of satellite repeats that evolve rapidly and homogenize within a species genome. Despite the importance of centromeres, our knowledge is limited to a few model species. We conducted a comprehensive analysis of common bean (Phaseolus vulgaris) centromeric satellite DNA using genomic data, fluorescence in situ hybridization (FISH), immunofluorescence and chromatin immunoprecipitation (ChIP). Two unrelated centromere-specific satellite repeats, CentPv1 and CentPv2, and the common bean centromere-specific histone H3 (PvCENH3) were identified. FISH showed that CentPv1 and CentPv2 are predominantly located at subsets of eight and three centromeres, respectively. Immunofluorescence- and ChIP-based assays demonstrated the functional significance of CentPv1 and CentPv2 at centromeres. Genomic analysis revealed several interesting features of CentPv1 and CentPv2: (i) CentPv1 is organized into an higher-order repeat structure, named Nazca, of 528 bp, whereas CentPv2 is composed of tandemly organized monomers; (ii) CentPv1 and CentPv2 have undergone chromosome-specific homogenization; and (iii) CentPv1 and CentPv2 are not likely to be commingled in the genome. These findings suggest that two distinct sets of centromere sequences have evolved independently within the common bean genome, and provide insight into centromere satellite evolution.
Rosaceae is the most important fruit-producing clade, and its key commercially relevant genera (Fragaria, Rosa, Rubus and Prunus) show broadly diverse growth habits, fruit types and compact diploid genomes. Peach, a diploid Prunus species, is one of the best genetically characterized deciduous trees. Here we describe the high-quality genome sequence of peach obtained from a completely homozygous genotype. We obtained a complete chromosome-scale assembly using Sanger whole-genome shotgun methods. We predicted 27,852 protein-coding genes, as well as noncoding RNAs. We investigated the path of peach domestication through whole-genome resequencing of 14 Prunus accessions. The analyses suggest major genetic bottlenecks that have substantially shaped peach genome diversity. Furthermore, comparative analyses showed that peach has not undergone recent whole-genome duplication, and even though the ancestral triplicated blocks in peach are fragmentary compared to those in grape, all seven paleosets of paralogs from the putative paleoancestor are detectable.
Switchgrass (Panicum virgatum L.) is a perennial C4 grass with the potential to become a major bioenergy crop. To help realize this potential, a set of RNA-based resources were developed. Expressed sequence tags (ESTs) were generated from two tetraploid switchgrass genotypes, Alamo AP13 and Summer VS16. Over 11.5 million high-quality ESTs were generated with 454 sequencing technology, and an additional 169 079 Sanger sequences were obtained from the 5 and 3 ends of 93 312 clones from normalized, full-length-enriched cDNA libraries. AP13 and VS16 ESTs were assembled into 77 854 and 30 524 unique transcripts (unitranscripts), respectively, using the Newbler and pave programs. Published Sanger-ESTs (544 225) from Alamo, Kanlow, and 15 other cultivars were integrated with the AP13 and VS16 assemblies to create a universal switchgrass gene index (PviUT1.2) with 128 058 unitranscripts, which were annotated for function. An Affymetrix cDNA microarray chip (Pvi_cDNAa520831) containing 122 973 probe sets was designed from PviUT1.2 sequences, and used to develop a Gene Expression Atlas for switchgrass (PviGEA). The PviGEA contains quantitative transcript data for all major organ systems of switchgrass throughout development. We developed a web server that enables flexible, multifaceted analyses of PviGEA transcript data. The PviGEA was used to identify representatives of all known genes in the phenylpropanoid-monolignol biosynthesis pathway.
The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five percent of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25× higher than those between inbred lines and 50× lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP-encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.
Halophytes are plants that can naturally tolerate high concentrations of salt in the soil, and their tolerance to salt stress may occur through various evolutionary and molecular mechanisms. Eutrema salsugineum is a halophytic species in the Brassicaceae that can naturally tolerate multiple types of abiotic stresses that typically limit crop productivity, including extreme salinity and cold. It has been widely used as a laboratorial model for stress biology research in plants. Here, we present the reference genome sequence (241?Mb) of E. salsugineum at 8× coverage sequenced using the traditional Sanger sequencing-based approach with comparison to its close relative Arabidopsis thaliana. The E. salsugineum genome contains 26,531 protein-coding genes and 51.4% of its genome is composed of repetitive sequences that mostly reside in pericentromeric regions. Comparative analyses of the genome structures, protein-coding genes, microRNAs, stress-related pathways, and estimated translation efficiency of proteins between E. salsugineum and A. thaliana suggest that halophyte adaptation to environmental stresses may occur via a global network adjustment of multiple regulatory mechanisms. The E. salsugineum genome provides a resource to identify naturally occurring genetic alterations contributing to the adaptation of halophytic plants to salinity and that might be bioengineered in related crop species.
A small fast neutron (FN) mutant population has been established from Phaseolus vulgaris cv. Red Hawk. We leveraged the available P. vulgaris genome sequence and high throughput next generation DNA sequencing to examine the genomic structure of five P. vulgaris cv. Red Hawk FN mutants with striking visual phenotypes. Analysis of these genomes identified three classes of structural variation (SV); between cultivar variation, natural variation within the FN mutant population, and FN induced mutagenesis. Our analyses focused on the latter two classes. We identified 23 large deletions (>40 bp) common to multiple individuals, illustrating residual heterogeneity and regions of SV within the common bean cv. Red Hawk. An additional 18 large deletions were identified in individual mutant plants. These deletions, ranging in size from 40 bp to 43,000 bp, are potentially the result of FN mutagenesis. Six of the 18 deletions lie near or within gene coding regions, identifying potential candidate genes causing the mutant phenotype.
Chromohalobacter salexigens is one of nine currently known species of the genus Chromohalobacter in the family Halomonadaceae. It is the most halotolerant of the so-called moderately halophilic bacteria currently known and, due to its strong euryhaline phenotype, it is an established model organism for prokaryotic osmoadaptation. C. salexigens strain 1H11(T) and Halomonas elongata are the first and the second members of the family Halomonadaceae with a completely sequenced genome. The 3,696,649 bp long chromosome with a total of 3,319 protein-coding and 93 RNA genes was sequenced as part of the DOE Joint Genome Institute Program DOEM 2004.
Herpetosiphon aurantiacus Holt and Lewin 1968 is the type species of the genus Herpetosiphon, which in turn is the type genus of the family Herpetosiphonaceae, type family of the order Herpetosiphonales in the phylum Chloroflexi. H. aurantiacus cells are organized in filaments which can rapidly glide. The species is of interest not only because of its rather isolated position in the tree of life, but also because Herpetosiphon ssp. were identified as predators capable of facultative predation by a wolf pack strategy and of degrading the prey organisms by excreted hydrolytic enzymes. The genome of H. aurantiacus strain 114-95(T) is the first completely sequenced genome of a member of the family Herpetosiphonaceae. The 6,346,587 bp long chromosome and the two 339,639 bp and 99,204 bp long plasmids with a total of 5,577 protein-coding and 77 RNA genes was sequenced as part of the DOE Joint Genome Institute Program DOEM 2005.
Tolumonas auensis Fischer-Romero et al. 1996 is currently the only validly named species of the genus Tolumonas in the family Aeromonadaceae. The strain is of interest because of its ability to produce toluene from phenylalanine and other phenyl precursors, as well as phenol from tyrosine. This is of interest because toluene is normally considered to be a tracer of anthropogenic pollution in lakes, but T. auensis represents a biogenic source of toluene. Other than Aeromonas hydrophila subsp. hydrophila, T. auensis strain TA 4(T) is the only other member in the family Aeromonadaceae with a completely sequenced type-strain genome. The 3,471,292 bp chromosome with a total of 3,288 protein-coding and 116 RNA genes was sequenced as part of the DOE Joint Genome Institute Program JBEI 2008.
Genes underlying repeated adaptive evolution in natural populations are still largely unknown. Stickleback fish (Gasterosteus aculeatus) have undergone a recent dramatic evolutionary radiation, generating numerous examples of marine-freshwater species pairs and a small number of benthic-limnetic species pairs found within single lakes . We have developed a new genome-wide SNP genotyping array to study patterns of genetic variation in sticklebacks over a wide geographic range, and to scan the genome for regions that contribute to repeated evolution of marine-freshwater or benthic-limnetic species pairs. Surveying 34 global populations with 1,159 informative markers revealed substantial genetic variation, with predominant patterns reflecting demographic history and geographic structure. After correcting for geographic structure and filtering for neutral markers, we detected large repeated shifts in allele frequency at some loci, identifying both known and novel loci likely contributing to marine-freshwater and benthic-limnetic divergence. Several novel loci fall close to genes implicated in epithelial barrier or immune functions, which have likely changed as sticklebacks adapt to contrasting environments. Specific alleles differentiating sympatric benthic-limnetic species pairs are shared in nearby solitary populations, suggesting an allopatric origin for adaptive variants and selection pressures unrelated to sympatry in the initial formation of these classic vertebrate species pairs.
Brown rot decay removes cellulose and hemicellulose from wood--residual lignin contributing up to 30% of forest soil carbon--and is derived from an ancestral white rot saprotrophy in which both lignin and cellulose are decomposed. Comparative and functional genomics of the "dry rot" fungus Serpula lacrymans, derived from forest ancestors, demonstrated that the evolution of both ectomycorrhizal biotrophy and brown rot saprotrophy were accompanied by reductions and losses in specific protein families, suggesting adaptation to an intercellular interaction with plant tissue. Transcriptome and proteome analysis also identified differences in wood decomposition in S. lacrymans relative to the brown rot Postia placenta. Furthermore, fungal nutritional mode diversification suggests that the boreal forest biome originated via genetic coevolution of above- and below-ground biota.
The spider mite Tetranychus urticae is a cosmopolitan agricultural pest with an extensive host plant range and an extreme record of pesticide resistance. Here we present the completely sequenced and annotated spider mite genome, representing the first complete chelicerate genome. At 90 megabases T. urticae has the smallest sequenced arthropod genome. Compared with other arthropods, the spider mite genome shows unique changes in the hormonal environment and organization of the Hox complex, and also reveals evolutionary innovation of silk production. We find strong signatures of polyphagy and detoxification in gene families associated with feeding on different hosts and in new gene families acquired by lateral gene transfer. Deep transcriptome analysis of mites feeding on different plants shows how this pest responds to a changing host environment. The T. urticae genome thus offers new insights into arthropod evolution and plant-herbivore interactions, and provides unique opportunities for developing novel plant protection strategies.
Rhodospirillum rubrum (Esmarch 1887) Molisch 1907 is the type species of the genus Rhodospirillum, which is the type genus of the family Rhodospirillaceae in the class Alphaproteobacteria. The species is of special interest because it is an anoxygenic phototroph that produces extracellular elemental sulfur (instead of oxygen) while harvesting light. It contains one of the most simple photosynthetic systems currently known, lacking light harvesting complex 2. Strain S1(T) can grow on carbon monoxide as sole energy source. With currently over 1,750 PubMed entries, R. rubrum is one of the most intensively studied microbial species, in particular for physiological and genetic studies. Next to R. centenum strain SW, the genome sequence of strain S1(T) is only the second genome of a member of the genus Rhodospirillum to be published, but the first type strain genome from the genus. The 4,352,825 bp long chromosome and 53,732 bp plasmid with a total of 3,850 protein-coding and 83 RNA genes were sequenced as part of the DOE Joint Genome Institute Program DOEM 2002.
Many challenges face plant scientists, in particular those working on crop production, such as a projected increase in population, decrease in water and arable land, changes in weather patterns and predictability. Advances in genome sequencing and resequencing can and should play a role in our response to meeting these challenges. However, several barriers prevent rapid and effective deployment of these tools to a wide variety of crops. Because of the complexity of crop genomes, de novo sequencing with next-generation sequencing technologies is a process fraught with difficulties that then create roadblocks to the utilization of these genome sequences for crop improvement. Collecting rapid and accurate phenotypes in crop plants is a hindrance to integrating genomics with crop improvement, and advances in informatics are needed to put these tools in the hands of the scientists on the ground.
• R(US) is a major dominant gene controlling quantitative resistance, inherited from Populus trichocarpa, whereas R(1) is a gene governing qualitative resistance, inherited from P. deltoides. • Here, we report a reiterative process of concomitant fine-scale genetic and physical mapping guided by the P. trichocarpa genome sequence. The high-resolution linkage maps were developed using a P. deltoides × P. trichocarpa progeny of 1415 individuals. R(US) and R(1) were mapped in a peritelomeric region of chromosome 19. Markers closely linked to R(US) were used to screen a bacterial artificial chromosome (BAC) library constructed from the P. trichocarpa parent, heterozygous at the locus R(US) . • Two local physical maps were developed, one encompassing the R(US) allele and the other spanning r(US) . The alignment of the two haplophysical maps showed structural differences between haplotypes. The genetic and physical maps were anchored to the genome sequence, revealing genome sequence misassembly. Finally, the R(US) locus was localized within a 0.8-cM interval, whereas R(1) was localized upstream of R(US) within a 1.1-cM interval. • The alignment of the genetic and physical maps with the local reorder of the chromosome 19 sequence indicated that R(US) and R(1) belonged to a genomic region rich in nucleotide-binding site leucine-rich repeat (NBS-LRR) and serine threonine kinase (STK) genes.
Thermostable enzymes and thermophilic cell factories may afford economic advantages in the production of many chemicals and biomass-based fuels. Here we describe and compare the genomes of two thermophilic fungi, Myceliophthora thermophila and Thielavia terrestris. To our knowledge, these genomes are the first described for thermophilic eukaryotes and the first complete telomere-to-telomere genomes for filamentous fungi. Genome analyses and experimental data suggest that both thermophiles are capable of hydrolyzing all major polysaccharides found in biomass. Examination of transcriptome data and secreted proteins suggests that the two fungi use shared approaches in the hydrolysis of cellulose and xylan but distinct mechanisms in pectin degradation. Characterization of the biomass-hydrolyzing activity of recombinant enzymes suggests that these organisms are highly efficient in biomass decomposition at both moderate and high temperatures. Furthermore, we present evidence suggesting that aside from representing a potential reservoir of thermostable enzymes, thermophilic fungi are amenable to manipulation using classical and molecular genetics.
Nocardioides sp. strain JS614 grows on ethene and vinyl chloride (VC) as sole carbon and energy sources and is of interest for bioremediation and biocatalysis. Sequencing of the complete genome of JS614 provides insight into the genetic basis of alkene oxidation, supports ongoing research into the physiology and biochemistry of growth on ethene and VC, and provides biomarkers to facilitate detection of VC/ethene oxidizers in the environment. This is the first genome sequence from the genus Nocardioides and the first genome of a VC/ethene-oxidizing bacterium.
Vascular plants appeared ~410 million years ago, then diverged into several lineages of which only two survive: the euphyllophytes (ferns and seed plants) and the lycophytes. We report here the genome sequence of the lycophyte Selaginella moellendorffii (Selaginella), the first nonseed vascular plant genome reported. By comparing gene content in evolutionarily diverse taxa, we found that the transition from a gametophyte- to a sporophyte-dominated life cycle required far fewer new genes than the transition from a nonseed vascular to a flowering plant, whereas secondary metabolic genes expanded extensively and in parallel in the lycophyte and angiosperm lineages. Selaginella differs in posttranscriptional gene regulation, including small RNA regulation of repetitive elements, an absence of the trans-acting small interfering RNA pathway, and extensive RNA editing of organellar genes.
Rust fungi are some of the most devastating pathogens of crop plants. They are obligate biotrophs, which extract nutrients only from living plant tissues and cannot grow apart from their hosts. Their lifestyle has slowed the dissection of molecular mechanisms underlying host invasion and avoidance or suppression of plant innate immunity. We sequenced the 101-Mb genome of Melampsora larici-populina, the causal agent of poplar leaf rust, and the 89-Mb genome of Puccinia graminis f. sp. tritici, the causal agent of wheat and barley stem rust. We then compared the 16,399 predicted proteins of M. larici-populina with the 17,773 predicted proteins of P. graminis f. sp tritici. Genomic features related to their obligate biotrophic lifestyle include expanded lineage-specific gene families, a large repertoire of effector-like small secreted proteins, impaired nitrogen and sulfur assimilation pathways, and expanded families of amino acid and oligopeptide membrane transporters. The dramatic up-regulation of transcripts coding for small secreted proteins, secreted hydrolytic enzymes, and transporters in planta suggests that they play a role in host infection and nutrient acquisition. Some of these genomic hallmarks are mirrored in the genomes of other microbial eukaryotes that have independently evolved to infect plants, indicating convergent adaptation to a biotrophic existence inside plant cells.
The plant-pathogenic fungus Mycosphaerella graminicola (asexual stage: Septoria tritici) causes septoria tritici blotch, a disease that greatly reduces the yield and quality of wheat. This disease is economically important in most wheat-growing areas worldwide and threatens global food production. Control of the disease has been hampered by a limited understanding of the genetic and biochemical bases of pathogenicity, including mechanisms of infection and of resistance in the host. Unlike most other plant pathogens, M. graminicola has a long latent period during which it evades host defenses. Although this type of stealth pathogenicity occurs commonly in Mycosphaerella and other Dothideomycetes, the largest class of plant-pathogenic fungi, its genetic basis is not known. To address this problem, the genome of M. graminicola was sequenced completely. The finished genome contains 21 chromosomes, eight of which could be lost with no visible effect on the fungus and thus are dispensable. This eight-chromosome dispensome is dynamic in field and progeny isolates, is different from the core genome in gene and repeat content, and appears to have originated by ancient horizontal transfer from an unknown donor. Synteny plots of the M. graminicola chromosomes versus those of the only other sequenced Dothideomycete, Stagonospora nodorum, revealed conservation of gene content but not order or orientation, suggesting a high rate of intra-chromosomal rearrangement in one or both species. This observed "mesosynteny" is very different from synteny seen between other organisms. A surprising feature of the M. graminicola genome compared to other sequenced plant pathogens was that it contained very few genes for enzymes that break down plant cell walls, which was more similar to endophytes than to pathogens. The stealth pathogenesis of M. graminicola probably involves degradation of proteins rather than carbohydrates to evade host defenses during the biotrophic stage of infection and may have evolved from endophytic ancestors.
We report the 207-Mb genome sequence of the North American Arabidopsis lyrata strain MN47 based on 8.3× dideoxy sequence coverage. We predict 32,670 genes in this outcrossing species compared to the 27,025 genes in the selfing species Arabidopsis thaliana. The much smaller 125-Mb genome of A. thaliana, which diverged from A. lyrata 10 million years ago, likely constitutes the derived state for the family. We found evidence for DNA loss from large-scale rearrangements, but most of the difference in genome size can be attributed to hundreds of thousands of small deletions, mostly in noncoding DNA and transposons. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome. The high-quality reference genome sequence for A. lyrata will be an important resource for functional, evolutionary and ecological studies in the genus Arabidopsis.
Bacteria of the deeply branching phylum Verrucomicrobia are rarely cultured yet commonly detected in metagenomic libraries from aquatic, terrestrial, and intestinal environments. We have sequenced the genome of Opitutus terrae PB90-1, a fermentative anaerobe within this phylum, isolated from rice paddy soil and capable of propionate production from plant-derived polysaccharides.
The social amoebae (Dictyostelia) are a diverse group of Amoebozoa that achieve multicellularity by aggregation and undergo morphogenesis into fruiting bodies with terminally differentiated spores and stalk cells. There are four groups of dictyostelids, with the most derived being a group that contains the model species Dictyostelium discoideum.
The genome of soybean (Glycine max), a commercially important crop, has recently been sequenced and is one of six crop species to have been sequenced. Here we report the genome sequence of G. soja, the undomesticated ancestor of G. max (in particular, G. soja var. IT182932). The 48.8-Gb Illumina Genome Analyzer (Illumina-GA) short DNA reads were aligned to the G. max reference genome and a consensus was determined for G. soja. This consensus sequence spanned 915.4 Mb, representing a coverage of 97.65% of the G. max published genome sequence and an average mapping depth of 43-fold. The nucleotide sequence of the G. soja genome, which contains 2.5 Mb of substituted bases and 406 kb of small insertions/deletions relative to G. max, is ?0.31% different from that of G. max. In addition to the mapped 915.4-Mb consensus sequence, 32.4 Mb of large deletions and 8.3 Mb of novel sequence contigs in the G. soja genome were also detected. Nucleotide variants of G. soja versus G. max confirmed by Roche Genome Sequencer FLX sequencing showed a 99.99% concordance in single-nucleotide polymorphism and a 98.82% agreement in insertion/deletion calls on Illumina-GA reads. Data presented in this study suggest that the G. soja/G. max complex may be at least 0.27 million y old, appearing before the relatively recent event of domestication (6,000?9,000 y ago). This suggests that soybean domestication is complicated and that more in-depth study of population genetics is needed. In any case, genome comparison of domesticated and undomesticated forms of soybean can facilitate its improvement.
The multicellular green alga Volvox carteri and its morphologically diverse close relatives (the volvocine algae) are well suited for the investigation of the evolution of multicellularity and development. We sequenced the 138-mega-base pair genome of V. carteri and compared its approximately 14,500 predicted proteins to those of its unicellular relative Chlamydomonas reinhardtii. Despite fundamental differences in organismal complexity and life history, the two species have similar protein-coding potentials and few species-specific protein-coding gene predictions. Volvox is enriched in volvocine-algal-specific proteins, including those associated with an expanded and highly compartmentalized extracellular matrix. Our analysis shows that increases in organismal complexity can be associated with modifications of lineage-specific proteins rather than large-scale invention of protein-coding capacity.
The western clawed frog Xenopus tropicalis is an important model for vertebrate development that combines experimental advantages of the African clawed frog Xenopus laevis with more tractable genetics. Here we present a draft genome sequence assembly of X. tropicalis. This genome encodes more than 20,000 protein-coding genes, including orthologs of at least 1700 human disease genes. Over 1 million expressed sequence tags validated the annotation. More than one-third of the genome consists of transposable elements, with unusually prevalent DNA transposons. Like that of other tetrapods, the genome of X. tropicalis contains gene deserts enriched for conserved noncoding elements. The genome exhibits substantial shared synteny with human and chicken over major parts of large chromosomes, broken by lineage-specific chromosome fusions and fissions, mainly in the mammalian lineage.
Although dimorphic sexes have evolved repeatedly in multicellular eukaryotes, their origins are unknown. The mating locus (MT) of the sexually dimorphic multicellular green alga Volvox carteri specifies the production of eggs and sperm and has undergone a remarkable expansion and divergence relative to MT from Chlamydomonas reinhardtii, which is a closely related unicellular species that has equal-sized gametes. Transcriptome analysis revealed a rewired gametic expression program for Volvox MT genes relative to Chlamydomonas and identified multiple gender-specific and sex-regulated transcripts. The retinoblastoma tumor suppressor homolog MAT3 is a Volvox MT gene that displays sexually regulated alternative splicing and evidence of gender-specific selection, both of which are indicative of cooption into the sexual cycle. Thus, sex-determining loci affect the evolution of both sex-related and non-sex-related genes.
Much remains to be learned about the biology of mushroom-forming fungi, which are an important source of food, secondary metabolites and industrial enzymes. The wood-degrading fungus Schizophyllum commune is both a genetically tractable model for studying mushroom development and a likely source of enzymes capable of efficient degradation of lignocellulosic biomass. Comparative analyses of its 38.5-megabase genome, which encodes 13,210 predicted genes, reveal the speciess unique wood-degrading machinery. One-third of the 471 genes predicted to encode transcription factors are differentially expressed during sexual development of S. commune. Whereas inactivation of one of these, fst4, prevented mushroom formation, inactivation of another, fst3, resulted in more, albeit smaller, mushrooms than in the wild-type fungus. Antisense transcripts may also have a role in the formation of fruiting bodies. Better insight into the mechanisms underlying mushroom formation should affect commercial production of mushrooms and their industrial use for producing enzymes and pharmaceuticals.
The living coelacanth is a lobe-finned fish that represents an early evolutionary departure from the lineage that led to land vertebrates, and is of extreme interest scientifically. It has changed very little in appearance from fossilized coelacanths of the Cretaceous (150 to 65 million years ago), and is often referred to as a "living fossil." An important general question is whether long-term stasis in morphological evolution is associated with stasis in genome evolution. To this end we have used targeted genome sequencing for acquiring 1,612,752 bp of high quality finished sequence encompassing the four HOX clusters of the Indonesian coelacanth Latimeria menadoensis. Detailed analyses were carried out on genomic structure, gene and repeat contents, conserved noncoding regions, and relative rates of sequence evolution in both coding and noncoding tracts. Our results demonstrate conclusively that the coelacanth HOX clusters are evolving comparatively slowly and that this taxon should serve as a viable outgroup for interpretation of the genomes of tetrapod species.
Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70% more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78% of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75% of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.
Long terminal repeat (LTR) retrotransposons, the most abundant genomic components in flowering plants, are classifiable into autonomous and nonautonomous elements based on their structural completeness and transposition capacity. It has been proposed that selection is the major force for maintaining sequence (e.g., LTR) conservation between nonautonomous elements and their autonomous counterparts. Here, we report the structural, evolutionary, and expression characterization of a giant retrovirus-like soybean (Glycine max) LTR retrotransposon family, SNARE. This family contains two autonomous subfamilies, SARE(A) and SARE(B), that appear to have evolved independently since the soybean genome tetraploidization event approximately 13 million years ago, and a nonautonomous subfamily, SNRE, that originated from SARE(A). Unexpectedly, a subset of the SNRE elements, which amplified from a single founding SNRE element within the last approximately 3 million years, have been dramatically homogenized with either SARE(A) or SARE(B) primarily in the LTR regions and bifurcated into distinct subgroups corresponding to the two autonomous subfamilies. We uncovered evidence of region-specific swapping of nonautonomous elements with autonomous elements that primarily generated various nonautonomous recombinants with LTR sequences from autonomous elements of different evolutionary lineages, thus revealing a molecular mechanism for the enhancement of preexisting partnership and the establishment of new partnership between autonomous and nonautonomous elements.
Here, we demonstrate how comparative sequence analysis facilitates genome-wide base-pair-level interpretation of individual genetic variation and address two questions of importance for human personal genomics: first, whether an individuals functional variation comes mostly from noncoding or coding polymorphisms; and, second, whether population-specific or globally-present polymorphisms contribute more to functional variation in any given individual. Neither has been definitively answered by analyses of existing variation data because of a focus on coding polymorphisms, ascertainment biases in favor of common variation, and a lack of base-pair-level resolution for identifying functional variants. We resequenced 575 amplicons within 432 individuals at genomic sites enriched for evolutionary constraint and also analyzed variation within three published human genomes. We find that single-site measures of evolutionary constraint derived from mammalian multiple sequence alignments are strongly predictive of reductions in modern-day genetic diversity across a range of annotation categories and across the allele frequency spectrum from rare (<1%) to high frequency (>10% minor allele frequency). Furthermore, we show that putatively functional variation in an individual genome is dominated by polymorphisms that do not change protein sequence and that originate from our shared ancestral population and commonly segregate in human populations. These observations show that common, noncoding alleles contribute substantially to human phenotypes and that constraint-based analyses will be of value to identify phenotypically relevant variants in individual genomes.
The molecular mechanisms underlying major phenotypic changes that have evolved repeatedly in nature are generally unknown. Pelvic loss in different natural populations of threespine stickleback fish has occurred through regulatory mutations deleting a tissue-specific enhancer of the Pituitary homeobox transcription factor 1 (Pitx1) gene. The high prevalence of deletion mutations at Pitx1 may be influenced by inherent structural features of the locus. Although Pitx1 null mutations are lethal in laboratory animals, Pitx1 regulatory mutations show molecular signatures of positive selection in pelvic-reduced populations. These studies illustrate how major expression and morphological changes can arise from single mutational leaps in natural populations, producing new adaptive alleles via recurrent regulatory alterations in a key developmental control gene.
Since its start, the Mammalian Gene Collection (MGC) has sought to provide at least one full-protein-coding sequence cDNA clone for every human and mouse gene with a RefSeq transcript, and at least 6200 rat genes. The MGC cloning effort initially relied on random expressed sequence tag screening of cDNA libraries. Here, we summarize our recent progress using directed RT-PCR cloning and DNA synthesis. The MGC now contains clones with the entire protein-coding sequence for 92% of human and 89% of mouse genes with curated RefSeq (NM-accession) transcripts, and for 97% of human and 96% of mouse genes with curated RefSeq transcripts that have one or more PubMed publications, in addition to clones for more than 6300 rat genes. These high-quality MGC clones and their sequences are accessible without restriction to researchers worldwide.
Sexual antagonism, or conflict between the sexes, has been proposed as a driving force in both sex-chromosome turnover and speciation. Although closely related species often have different sex-chromosome systems, it is unknown whether sex-chromosome turnover contributes to the evolution of reproductive isolation between species. Here we show that a newly evolved sex chromosome contains genes that contribute to speciation in threespine stickleback fish (Gasterosteus aculeatus). We first identified a neo-sex chromosome system found only in one member of a sympatric species pair in Japan. We then performed genetic linkage mapping of male-specific traits important for reproductive isolation between the Japanese species pair. The neo-X chromosome contains loci for male courtship display traits that contribute to behavioural isolation, whereas the ancestral X chromosome contains loci for both behavioural isolation and hybrid male sterility. Our work not only provides strong evidence for a large X-effect on reproductive isolation in a vertebrate system, but also provides direct evidence that a young neo-X chromosome contributes to reproductive isolation between closely related species. Our data indicate that sex-chromosome turnover might have a greater role in speciation than was previously appreciated.
The ascomycetous fungus Nectria haematococca, (asexual name Fusarium solani), is a member of a group of >50 species known as the "Fusarium solani species complex". Members of this complex have diverse biological properties including the ability to cause disease on >100 genera of plants and opportunistic infections in humans. The current research analyzed the most extensively studied member of this complex, N. haematococca mating population VI (MPVI). Several genes controlling the ability of individual isolates of this species to colonize specific habitats are located on supernumerary chromosomes. Optical mapping revealed that the sequenced isolate has 17 chromosomes ranging from 530 kb to 6.52 Mb and that the physical size of the genome, 54.43 Mb, and the number of predicted genes, 15,707, are among the largest reported for ascomycetes. Two classes of genes have contributed to gene expansion: specific genes that are not found in other fungi including its closest sequenced relative, Fusarium graminearum; and genes that commonly occur as single copies in other fungi but are present as multiple copies in N. haematococca MPVI. Some of these additional genes appear to have resulted from gene duplication events, while others may have been acquired through horizontal gene transfer. The supernumerary nature of three chromosomes, 14, 15, and 17, was confirmed by their absence in pulsed field gel electrophoresis experiments of some isolates and by demonstrating that these isolates lacked chromosome-specific sequences found on the ends of these chromosomes. These supernumerary chromosomes contain more repeat sequences, are enriched in unique and duplicated genes, and have a lower G+C content in comparison to the other chromosomes. Although the origin(s) of the extra genes and the supernumerary chromosomes is not known, the gene expansion and its large genome size are consistent with this species diverse range of habitats. Furthermore, the presence of unique genes on supernumerary chromosomes might account for individual isolates having different environmental niches.
Picoeukaryotes are a taxonomically diverse group of organisms less than 2 micrometers in diameter. Photosynthetic marine picoeukaryotes in the genus Micromonas thrive in ecosystems ranging from tropical to polar and could serve as sentinel organisms for biogeochemical fluxes of modern oceans during climate change. These broadly distributed primary producers belong to an anciently diverged sister clade to land plants. Although Micromonas isolates have high 18S ribosomal RNA gene identity, we found that genomes from two isolates shared only 90% of their predicted genes. Their independent evolutionary paths were emphasized by distinct riboswitch arrangements as well as the discovery of intronic repeat elements in one isolate, and in metagenomic data, but not in other genomes. Divergence appears to have been facilitated by selection and acquisition processes that actively shape the repertoire of genes that are mutually exclusive between the two isolates differently than the core genes. Analyses of the Micromonas genomes offer valuable insights into ecological differentiation and the dynamic nature of early plant evolution.
Soybeans grown in the upper Midwestern United States often suffer from iron deficiency chlorosis, which results in yield loss at the end of the season. To better understand the effect of iron availability on soybean yield, we identified genes in two near isogenic lines with changes in expression patterns when plants were grown in iron sufficient and iron deficient conditions.
Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghums drought tolerance.
Polyploidy often confers emergent properties, such as the higher fibre productivity and quality of tetraploid cottons than diploid cottons bred for the same environments. Here we show that an abrupt five- to sixfold ploidy increase approximately 60?million years (Myr) ago, and allopolyploidy reuniting divergent Gossypium genomes approximately 1-2 Myr ago, conferred about 30-36-fold duplication of ancestral angiosperm (flowering plant) genes in elite cottons (Gossypium hirsutum and Gossypium barbadense), genetic complexity equalled only by Brassica among sequenced angiosperms. Nascent fibre evolution, before allopolyploidy, is elucidated by comparison of spinnable-fibred Gossypium herbaceum A and non-spinnable Gossypium longicalyx F genomes to one another and the outgroup D genome of non-spinnable Gossypium raimondii. The sequence of a G. hirsutum A(t)D(t) (in which t indicates tetraploid) cultivar reveals many non-reciprocal DNA exchanges between subgenomes that may have contributed to phenotypic innovation and/or other emergent properties such as ecological adaptation by polyploids. Most DNA-level novelty in G. hirsutum recombines alleles from the D-genome progenitor native to its New World habitat and the Old World A-genome progenitor in which spinnable fibre evolved. Coordinated expression changes in proximal groups of functionally distinct genes, including a nuclear mitochondrial DNA block, may account for clusters of cotton-fibre quantitative trait loci affecting diverse traits. Opportunities abound for dissecting emergent properties of other polyploids, particularly angiosperms, by comparison to diploid progenitors and outgroups.
Cryptophyte and chlorarachniophyte algae are transitional forms in the widespread secondary endosymbiotic acquisition of photosynthesis by engulfment of eukaryotic algae. Unlike most secondary plastid-bearing algae, miniaturized versions of the endosymbiont nuclei (nucleomorphs) persist in cryptophytes and chlorarachniophytes. To determine why, and to address other fundamental questions about eukaryote-eukaryote endosymbiosis, we sequenced the nuclear genomes of the cryptophyte Guillardia theta and the chlorarachniophyte Bigelowiella natans. Both genomes have >21,000?protein genes and are intron rich, and B. natans exhibits unprecedented alternative splicing for a single-celled organism. Phylogenomic analyses and subcellular targeting predictions reveal extensive genetic and biochemical mosaicism, with both host- and endosymbiont-derived genes servicing the mitochondrion, the host cell cytosol, the plastid and the remnant endosymbiont cytosol of both algae. Mitochondrion-to-nucleus gene transfer still occurs in both organisms but plastid-to-nucleus and nucleomorph-to-nucleus transfers do not, which explains why a small residue of essential genes remains locked in each nucleomorph.
Genomic comparisons of chordates, hemichordates, and echinoderms can inform hypotheses for the evolution of these strikingly different phyla from the last common deuterostome ancestor. Because hox genes play pivotal developmental roles in bilaterian animals, we analyzed the Hox complexes of two hemichordate genomes. We find that Saccoglossus kowalevskii and Ptychodera flava both possess 12-gene clusters, with mir10 between hox4 and hox5, in 550 kb and 452 kb intervals, respectively. Genes hox1-hox9/10 of the clusters are in the same genomic order and transcriptional orientation as their orthologs in chordates, with hox1 at the 3 end of the cluster. At the 5 end, each cluster contains three posterior genes specific to Ambulacraria (the hemichordate-echinoderm clade), two forming an inverted terminal pair. In contrast, the echinoderm Strongylocentrotus purpuratus contains a 588 kb cluster of 11 orthologs of the hemichordate genes, ordered differently, plausibly reflecting rearrangements of an ancestral hemichordate-like ambulacrarian cluster. Hox clusters of vertebrates and the basal chordate amphioxus have similar organization to the hemichordate cluster, but with different posterior genes. These results provide genomic evidence for a well-ordered complex in the deuterostome ancestor for the hox1-hox9/10 region, with the number and kind of posterior genes still to be elucidated.
Agaricus bisporus is the model fungus for the adaptation, persistence, and growth in the humic-rich leaf-litter environment. Aside from its ecological role, A. bisporus has been an important component of the human diet for over 200 y and worldwide cultivation of the "button mushroom" forms a multibillion dollar industry. We present two A. bisporus genomes, their gene repertoires and transcript profiles on compost and during mushroom formation. The genomes encode a full repertoire of polysaccharide-degrading enzymes similar to that of wood-decayers. Comparative transcriptomics of mycelium grown on defined medium, casing-soil, and compost revealed genes encoding enzymes involved in xylan, cellulose, pectin, and protein degradation are more highly expressed in compost. The striking expansion of heme-thiolate peroxidases and ?-etherases is distinctive from Agaricomycotina wood-decayers and suggests a broad attack on decaying lignin and related metabolites found in humic acid-rich environment. Similarly, up-regulation of these genes together with a lignolytic manganese peroxidase, multiple copper radical oxidases, and cytochrome P450s is consistent with challenges posed by complex humic-rich substrates. The gene repertoire and expression of hydrolytic enzymes in A. bisporus is substantially different from the taxonomically related ectomycorrhizal symbiont Laccaria bicolor. A common promoter motif was also identified in genes very highly expressed in humic-rich substrates. These observations reveal genetic and enzymatic mechanisms governing adaptation to the humic-rich ecological niche formed during plant degradation, further defining the critical role such fungi contribute to soil structure and carbon sequestration in terrestrial ecosystems. Genome sequence will expedite mushroom breeding for improved agronomic characteristics.
Vertebrate sensory systems have evolved remarkable diversity, but little is known about the underlying genetic mechanisms. The lateral line sensory system of aquatic vertebrates is a promising model for genetic investigations of sensory evolution because there is extensive variation within and between species, and this variation is easily quantified. In the present study, we compare the lateral line sensory system of threespine sticklebacks (Gasterosteus aculeatus) from an ancestral marine and a derived benthic lake population. We show that lab-raised individuals from these populations display differences in sensory neuromast number, neuromast patterning, and groove morphology. Using genetic linkage mapping, we identify regions of the genome that influence different aspects of lateral line morphology. Distinct loci independently affect neuromast number on different body regions, suggesting that a modular genetic structure underlies the evolution of peripheral receptor number in this sensory system. Pleiotropy and/or tight linkage are also important, as we identify a region on linkage group 21 that affects multiple aspects of lateral line morphology. Finally, we detect epistasis between a locus on linkage group 4 and a locus on linkage group 21; interactions between these loci contribute to variation in neuromast pattern. Our results reveal a complex genetic architecture underlying the evolution of the stickleback lateral line sensory system. This study further uncovers a genetic relationship between sensory morphology and non-neural traits (bony lateral plates), creating an opportunity to investigate morphological constraints on sensory evolution in a vertebrate model system.
The mammalian Dlx3 and Dlx4 genes are configured as a bigene cluster, and their respective expression patterns are controlled temporally and spatially by cis-elements that largely reside within the intergenic region of the cluster. Previous work revealed that there are conspicuously conserved elements within the intergenic region of the Dlx3-4 bigene clusters of mouse and human. In this paper we have extended these analyses to include 12 additional mammalian taxa (including a marsupial and a monotreme) in order to better define the nature and molecular evolutionary trends of the coding and non-coding functional elements among morphologically divergent mammals. Dlx3-4 regions were fully sequenced from 12 divergent taxa of interest. We identified three theria-specific amino acid replacements in homeodomain of Dlx4 gene that functions in placenta. Sequence analyses of constrained nucleotide sites in the intergenic non-coding region showed that many of the intergenic conserved elements are highly conserved and have evolved slowly within the mammals. In contrast, a branchial arch/craniofacial enhancer I37-2 exhibited accelerated evolution at the branch between the monotreme and therian common ancestor despite being highly conserved among therian species. Functional analysis of I37-2 in transgenic mice has shown that the equivalent region of the platypus fails to drive transcriptional activity in branchial arches. These observations, taken together with our molecular evolutionary data, suggest that theria-specific episodic changes in the I37-2 element may have contributed to craniofacial innovation at the base of the mammalian lineage.
Wood is a major pool of organic carbon that is highly resistant to decay, owing largely to the presence of lignin. The only organisms capable of substantial lignin decay are white rot fungi in the Agaricomycetes, which also contains non-lignin-degrading brown rot and ectomycorrhizal species. Comparative analyses of 31 fungal genomes (12 generated for this study) suggest that lignin-degrading peroxidases expanded in the lineage leading to the ancestor of the Agaricomycetes, which is reconstructed as a white rot species, and then contracted in parallel lineages leading to brown rot and mycorrhizal species. Molecular clock analyses suggest that the origin of lignin degradation might have coincided with the sharp decrease in the rate of organic carbon burial around the end of the Carboniferous period.
The oomycete vegetable pathogen Phytophthora capsici has shown remarkable adaptation to fungicides and new hosts. Like other members of this destructive genus, P. capsici has an explosive epidemiology, rapidly producing massive numbers of asexual spores on infected hosts. In addition, P. capsici can remain dormant for years as sexually recombined oospores, making it difficult to produce crops at infested sites, and allowing outcrossing populations to maintain significant genetic variation. Genome sequencing, development of a high-density genetic map, and integrative genomic or genetic characterization of P. capsici field isolates and intercross progeny revealed significant mitotic loss of heterozygosity (LOH) in diverse isolates. LOH was detected in clonally propagated field isolates and sexual progeny, cumulatively affecting >30% of the genome. LOH altered genotypes for more than 11,000 single-nucleotide variant sites and showed a strong association with changes in mating type and pathogenicity. Overall, it appears that LOH may provide a rapid mechanism for fixing alleles and may be an important component of adaptability for P. capsici.
Polynucleobacter necessarius subsp. asymbioticus strain QLW-P1DMWA-1(T) is a planktonic freshwater bacterium affiliated with the family Burkholderiaceae (class Betaproteobacteria). This strain is of interest because it represents a subspecies with cosmopolitan and ubiquitous distribution in standing freshwater systems. The 16S-23S ITS genotype represented by the sequenced strain comprised on average more than 10% of bacterioplankton in its home habitat. While all strains of the subspecies P. necessarius asymbioticus are free-living freshwater bacteria, strains belonging to the only other subspecies, P. necessarius subsp. necessarius are obligate endosymbionts of the ciliate Euplotes aediculatus. The two subspecies of P. necessarius are the instances of two closely related subspecies that differ in their lifestyle (free-living vs. obligate endosymbiont), and they are the only members of the genus Polynucleobacter with completely sequenced genomes. Here we describe the features of P. necessarius subsp. asymbioticus, together with the complete genome sequence and annotation. The 2,159,490 bp long chromosome with a total of 2,088 protein-coding and 48 RNA genes is the first completed genome sequence of the genus Polynucleobacter to be published and was sequenced as part of the DOE Joint Genome Institute Community Sequencing Program 2006.
Little is known about the mechanisms of adaptation of life to the extreme environmental conditions encountered in polar regions. Here we present the genome sequence of a unicellular green alga from the division chlorophyta, Coccomyxa subellipsoidea C-169, which we will hereafter refer to as C-169. This is the first eukaryotic microorganism from a polar environment to have its genome sequenced.
We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ?400-Mb assembly covers ?80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).
The perennial grass, switchgrass (Panicum virgatum L.), is a promising bioenergy crop and the target of whole genome sequencing. We constructed two bacterial artificial chromosome (BAC) libraries from the AP13 clone of switchgrass to gain insight into the genome structure and organization, initiate functional and comparative genomic studies, and assist with genome assembly. Together representing 16 haploid genome equivalents of switchgrass, each library comprises 101,376 clones with average insert sizes of 144 (HindIII-generated) and 110 kb (BstYI-generated). A total of 330,297 high quality BAC-end sequences (BES) were generated, accounting for 263.2 Mbp (16.4%) of the switchgrass genome. Analysis of the BES identified 279,099 known repetitive elements, >50,000 SSRs, and 2,528 novel repeat elements, named switchgrass repetitive elements (SREs). Comparative mapping of 47 full-length BAC sequences and 330K BES revealed high levels of synteny with the grass genomes sorghum, rice, maize, and Brachypodium. Our data indicate that the sorghum genome has retained larger microsyntenous regions with switchgrass besides high gene order conservation with rice. The resources generated in this effort will be useful for a broad range of applications.
Marine stickleback fish have colonized and adapted to thousands of streams and lakes formed since the last ice age, providing an exceptional opportunity to characterize genomic mechanisms underlying repeated ecological adaptation in nature. Here we develop a high-quality reference genome assembly for threespine sticklebacks. By sequencing the genomes of twenty additional individuals from a global set of marine and freshwater populations, we identify a genome-wide set of loci that are consistently associated with marine-freshwater divergence. Our results indicate that reuse of globally shared standing genetic variation, including chromosomal inversions, has an important role in repeated evolution of distinct marine and freshwater sticklebacks, and in the maintenance of divergent ecotypes during early stages of reproductive isolation. Both coding and regulatory changes occur in the set of loci underlying marine-freshwater evolution, but regulatory changes appear to predominate in this well known example of repeated adaptive evolution in nature.
Parasitism and saprotrophic wood decay are two fungal strategies fundamental for succession and nutrient cycling in forest ecosystems. An opportunity to assess the trade-off between these strategies is provided by the forest pathogen and wood decayer Heterobasidion annosum sensu lato. We report the annotated genome sequence and transcript profiling, as well as the quantitative trait loci mapping, of one member of the species complex: H. irregulare. Quantitative trait loci critical for pathogenicity, and rich in transposable elements, orphan and secreted genes, were identified. A wide range of cellulose-degrading enzymes are expressed during wood decay. By contrast, pathogenic interaction between H. irregulare and pine engages fewer carbohydrate-active enzymes, but involves an increase in pectinolytic enzymes, transcription modules for oxidative stress and secondary metabolite production. Our results show a trade-off in terms of constrained carbohydrate decomposition and membrane transport capacity during interaction with living hosts. Our findings establish that saprotrophic wood decay and necrotrophic parasitism involve two distinct, yet overlapping, processes.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.