We sequenced the genome through a combination of short- and long read approaches, including Illumina, 10X Genomics, and PacBio, totaling 615-fold coverage of the genome . Illumina and 10X Genomics data were assembled and scaffolded with the software package DenovoMAGIC3 , which has recently been used to assemble the allotetraploid wheat genome. We further scaffolded the genome to chromosome scale by using Hi-C data in combination with the HiRise pipeline , then performed gap-filling with 43-fold-coverage error-corrected PacBio reads with PBJelly. The total length of the final assembly is 805,488,706bp, distributed across 28 chromosome-level pseudomolecules and representing ~99% of the estimated genome size, on the basis of flow cytometry measurements. A genetic map for Fragaria×ananassa was used to correct any misassemblies, and comparisons to Fragaria vesca were used to identify homoeologous chromosomes. We annotated 108,087 protein-coding genes along with 30,703 genes encoding long noncoding RNAs , which were subdivided into 15,621 long intergenic noncoding RNAs, 9,265 antisense overlapping transcripts , and 5,817 sense overlapping transcripts . Gene annotation and genome-assembly quality were evaluated with the Bench marking Universal Single-Copy Orthologs v 2 method. Most of the 1,440 core genes in the embryophyta dataset were identified in the annotation, thus supporting a high-quality genome assembly.
The repetitive components of the nuclear genome were annotated with a custom-repeat-library approach, blueberry packaging containers including DNA transposons, long-terminal-repeat retrotransposons , and non-LTR retrotransposons . Transposable element – related sequences make up ~36% of the total genome assembly, and LTR-RTs are the most abundant TEs . The plastid and mitochondrial genomes were also assembled, annotated, and verified for completeness .Using the Fragaria×ananassa reference-genome assembly, we sought to identify the extant diploid relatives of each sub-genome donor. Previous phylogenetic studies aimed at identifying these progenitor species, often analyzing a limited number or different sets of molecular markers, have obtained inconsistent results. However, F. vesca has long been suspected to be a progenitor, on the basis of meiotic chromosome pairing; subsequent molecular phylogenetic analyses supported it being one of the diploid progenitors along with Fragaria iinumae and two additional unknown species. We sequenced and de novo assembled 31 transcriptomes of every described diploid Fragaria species, which we used to identify progenitor species on the basis of the phylogenetic analysis of 19,302 nuclear genes in the genome . To our knowledge, this is the most comprehensive molecular phylogenetic analysis of the genus Fragaria to date, including the greatest number of molecular markers and sampling of diploid species, aimed at identifying the extant relatives of the progenitor species of octoploid strawberry . Our phylogenetic analyses provided strong genome-wide support for the two diploid progenitor species that had been previously hypothesized and identified the two previously unknown diploid progenitors. This discovery, together with the geographic distributions, natural history, and genomic footprints of the diploid species, provided a model for the chronological formation of intermediate polyploids that culminated in the formation of the octoploid .
Our phylogenetic analyses revealed F. iinumae and Fragaria nipponica as two of the four extant diploid progenitor species, both of which are endemic to Japan and in geographic proximity to all five described tetraploid species in China. The third species identified in our analyses, Fragaria viridis, is geographically distributed in Europe and Asia, and partially overlaps with the sole hexaploid species, Fragaria moschata. Therefore, we hypothesized that these tetraploid and hexaploid species may be evolutionary intermediates between the diploids and the wild octoploid species. This possibility is supported by a previous phylogenetic analysis identifying F. viridis as a possible parental contributor to both F. moschata and the octoploid event. Finally, we identified F. vesca subsp. bracheata, which is endemic to the western part of North America, spanning Mexico to British Columbia, as the fourth parental contributor. Our species sampling also included two other F. vesca subspecies: F. vesca subsp. vesca, which is distributed from Europe to the Russian Far East, and F. vesca subsp. californica, which is endemic to the coast of California. Octoploid strawberry species are geographically restricted to the New World and are largely distributed across North America, with the exception of isolated F. chiloensis populations in Chile and the Hawaiian Islands. Therefore, our phylogenetic analyses combined with the geographic distributions of extant species not only support a North American origin for the octoploid strawberry but also suggest that F. vesca subsp. bracheata was probably the last diploid progenitor species to contribute to the formation of the ancestral octoploid strawberry. This possibility is further supported by a previous study revealing F. vesca subsp. bracheata as the likely maternal donor of the octoploid event, on the basis of the phylogenetic history of the plastid genome. This finding is consistent with our analysis of the plastid genome of ‘Camarosa’ . Thus, these data suggest that the hexaploid ancestor probably crossed into North America from Asia and hybridized with native populations of F. vesca subsp. bracheata, an event dated at ~1.1 million years before present.
Our phylogenetic analysis also identified related diploid species possibly arising from ancient hybridization and introgression events with putative progenitor species or issues related to incomplete lineage sorting and/or missing data . Future studies will be able to more thoroughly investigate these possibilities after reference quality genomes are assembled for these other diploid progenitor species.After most ancient allopolyploid events, one of the sub-genomes, commonly referred to as the ‘dominant’ sub-genome, emerges with significantly greater gene content and more highly expressed homoeologs than those of the other ‘submissive’ sub-genome. Biased fractionation, which results in greater gene content of the dominant sub-genome, was first described in the model plant Arabidopsis thaliana and later described in Zea mays, Brassica rapa , and Triticum aestivum. The dominant sub-genome has also been shown to be under stronger selective constraints and to be heritable through successive allopolyploid events, and, as predicted, it is not observed in ancient autopolyploids. Moreover, sub-genome expression dominance has recently been shown to occur instantly after interspecific hybridization and to increase over successive generations in monkey flower. However, some allopolyploids, including Capsella bursa-pastoris and Cucurbita species, do not exhibit sub-genome dominance. The emergence of a dominant sub-genome may resolve various genetic and epigenetic conflicts that arise from the genomic merger of divergent diploid progenitor species, including mismatches between transcriptional regulators and their target genes. The mechanistic basis of sub-genome dominance, at least in part, appears to be related to sub-genome differences in the content and regulation of TEs. Thus, the merger of sub-genomes with different TE densities results in higher gene expression for the dominant homoeolog with fewerTEs. The abundance and distribution of TEs can be used to predict gene expression dominance and eventual gene loss at the individual homoeolog level. Having identified the extant diploid relatives of octoploid strawberry, we used this information to investigate the evolutionary dynamics among the four sub-genomes. We identified a dominant sub-genome that was contributed by the F. vesca progenitor and has retained 20.2% more protein-coding genes and 14.2% more lncRNA genes, and has overall 19.5% fewer TEs than the other homoeologous chromosomes . Furthermore, we identified ~40.6% more tandem gene duplications on homoeologous chromosomes of F. vesca compared with the other sub-genomes . The F. vesca sub-genome, compared with the other sub-genomes, also contains a greater number of tandem gene arrays as well as larger average tandemgene-array sizes on six of seven homoeologous chromosomes. These findings suggest that the dominant F. vesca sub-genome, blueberry packing boxes compared with the other three sub-genomes, has been under stronger selective constraints to retain genes, including tandemly duplicated genes known to be biased toward gene families that encode important adaptive traits. For example, major disease-resistance genes in plants, including nucleotide-binding-site leucine-rich-repeat genes , which are usually clustered in tandem arrays, are biased toward the dominant F. vesca sub-genome . Because strawberry production is threatened by several agriculturally important diseases, we analyzed, in greater depth, the major family of plant resistance genes. Collectively, 423 NBSLRR genes were identified, including 195 encoding an N-terminal coiled-coil , 79 encoding toll interleukin 1 receptor , and 24 encoding resistance to powdery mildew 8 domains . Recent work has demonstrated that many R proteins recognize pathogen effectors through integrated decoy domains, and the F. vesca genome encodes 20 such protein models. Fragaria×ananassa has a greatly expanded set of 105 diverse domains that are fused to the R-protein structures and have the potential to function as integrated decoys. Only a few resistance genes have been phenotypically identified in Fragaria×ananassa, but none have been functionally characterized. The annotated genome thus provides a framework for accelerating R-gene discovery, connecting phenotype to genotype, and pyramiding R genes by developing targeted, homoeolog-specific molecular markers.
Although chromosomes contributed by the F. vesca progenitor retained the most genes overall, certain regions on chromosomes from the other progenitor species retained higher numbers of ancestral genes . Further analysis revealed that these regions are the products of homoeologous exchanges or gene-conversion events. Notably, most HEs in octoploid strawberry involved replacements of the submissive homoeologs by corresponding regions of the dominant F. vesca sub-genome . For example, our phylogenetic and comparative genomic analyses showed that HEs are 7.3×biased toward the F. vesca sub-genome compared with F. iinumae, but they are not unidirectional as previously reported3 . HEs were even more biased toward the F. vesca sub-genomes compared with the other two sub-genomes . These analyses validate findings from a previous study in wild octoploid strawberry and show that portions of the F. iinumae sub-genome have been replaced with the F. vesca sub-genome . Here, we identified HEs ranging in size from single genes to megabase-sized regions on chromosomes , findings similar to the patterns observed in other allopolyploids including Brassica napus, Gossypium hirsutum , and bread wheat. The observed bias of HEs genome wide may be due to selection favoring the maintenance of proper network stoichiometry and altered dosage of certain gene products during the establishment of the dominant sub-genome. Interestingly, 32.6% of NBS-LRR genes encoded on the three submissive sub-genomes are derived from HE with the F. vesca sub-genome. This result suggests that although the F. vesca sub-genome may also dominate disease resistance in strawberry, the maintained diversity of resistance mechanisms contributed by the other three diploid progenitors may also have been under selection.Finally, we examined gene expression in diverse organs to test whether the dominant F. vesca sub-genome is more highly expressed than the submissive genomes , as predicted by the sub-genome-dominance hypothesis. The density of TEs near genes was found to be negatively correlated with gene expression across all sub-genomes . Because HEs reshuffled and replaced homoeologs across each of the four parental chromosomes, only homoeolog pairs that had support for sub-genome assignment were evaluated for sub-genome expression dominance . Our analyses revealed that the dominant F. vesca sub-genome, which had the lowest overall TE densities near genes of all sub-genomes , encodes more significantly dominantly expressed homoeologs than the other three submissive sub-genomes combined . This finding supports the hypothesis that sub-genome expression dominance is influenced by overall TE-density differences between sub-genomes. At the individual homoeolog level, many dominantly expressed homoeologs were also contributed by one of the three submissive sub-genomes. This observation was expected, given the variation in TE densities near homoeologs in each of the diploid progenitor genomes. Most HEs in octoploid strawberry resulted in the dominant F. vesca sub-genome replacing the corresponding homoeologous regions of one of the submissive sub-genomes. Thus, the observed homoeolog expression bias toward the F. vesca sub-genome in Fig. 3 is an underestimate of transcriptome-wide expression dominance . This bias has resulted in certain biological pathways being largely controlled by a single dominant sub-genome. Our analyses revealed that certain metabolic pathways, including those that give rise to strawberry flavor, color, and aroma, are largely controlled by the dominant sub-genome. For example F. vesca homoeologs in octoploid strawberry are responsible for 88.8% of the biosynthesis of anthocyanins, the metabolites responsible for the red pigments in ripening strawberry fruit; 89.2% of the biosynthesis of geranyl acetate, a terpene associated with fruit aroma; and 95.3% of the biosynthesis of fructose associated with sweetness .