The genomes of the progenitor species, Fragaria virginiana and Fragaria chiloensis, are the products of polyploid evolution: they were formed by the fusion of and interactions among genomes from four diploid progenitor species approximately 1 million years before present. Whereas two of the diploid progenitor species have been identified, the other two diploid progenitor species have remained unknown. Moreover, the history of events leading to the formation of the octoploid lineage and the evolutionary dynamics among the four sub-genomes that restabilized cellular processes after ‘genomic shock’ in allopolyploids remain poorly understood. Here, we present what is, to our knowledge, the first chromosome-scale assembly of an octoploid strawberry genome, the identities of the extant diploid progenitor species of each sub-genome, and novel insights into the collective evolutionary processes involved in establishing a dominant sub-genome in this highly polyploid species. The Rosaceae are a large eudicot family including a rich diversity of crops with major economic importance worldwide, such as nuts , ornamentals , pome fruits , stone fruits , and berries. Strawberries are prized by consumers, largely because of their complex array of flavors and aromas. The genus Fragaria was named by the botanist Carl Linnaeus, on the basis of the Latin word ‘fragrans’, meaning ‘sweet scented’, describing its striking, 25 litre plant pot highly aromatic fruit. A total of 22 wild species of Fragaria have been described, ranging from diploid to decaploid.
The genus Fragaria is highly interfertile between and within ploidy levels, thus leading to the natural formation of higher-polyploid species. Polyploid events, also known as whole-genome duplications, have been an important recurrent process throughout the evolutionary history of eukaryotes and have probably contributed to novel and varied phenotypes. Polyploids are grouped into two main categories: autopolyploids and allopolyploids, involving either a single or multiple diploid progenitor species, respectively. Many crop species are allopolyploids, thus contributing to the emergence of important agronomic traits such as spinnable fibers in cotton, diversified morphotypes in Brassica, and varied aroma and flavor profiles in strawberry. Allopolyploids face the challenge of organizing distinct parental sub-genomes—each with a unique genetic and epigenetic makeup shaped by independent evolutionary histories—residing within a single nucleus. Previous studies have proposed, as part of the ‘sub-genome dominance’ hypothesis, that the establishment of a single dominant sub-genome may resolve various genetic conflicts in allopolyploids. However, understanding of the underlying mechanisms and ultimate consequences of sub-genome dominance remains largely incomplete. Subgenome-level analyses in most allopolyploid systems are greatly hindered by the inability to confidently assign parental gene copies to each sub-genome, owing to both large-scale chromosomal changes and homoeologous exchanges that shuffle and replace homoeologs among parental chromosomes. Octoploid strawberry still has a complete set of homoeologous chromosomes from all four parental sub-genomes, thus greatly simplifying homoeolog assignment. Furthermore, gene sequences from extant relatives of the diploid progenitor species, which probably still exist for octoploid strawberry , can be used to accurately assign homoeologs to each parental sub-genome.
However, a high quality reference genome for the octoploid is needed to fully exploit strawberry as a model system for studying allopolyploidy as well as to provide a platform for identifying biologically and agriculturally important genes and applying genomic-enabled breeding approaches. The assembly of the octoploid strawberry genome, with an estimated genome size of 813.4 Mb, has been particularly challenging because of its high heterozygosity and ploidy level. For example, the most recently published version of the octoploid strawberry genome is highly fragmented, with more than 625,000 scaffolds, and largely incomplete, with less than 660 Mb assembled after removal of the numerous gaps. Thus, that version of the genome, owing to its overall highly fragmented nature, has not been a useful resource for genome-wide analyses including the discovery of molecular markers for breeding.Our goal was to obtain a high-quality reference genome for the Fragaria×ananassa cultivar ‘Camarosa’, one of the most historically important and widely grown strawberry cultivars worldwide. We sequenced the genome through a combination of short- and longread approaches, including Illumina, 10X Genomics, and PacBio, totaling 615-fold coverage of the genome . Illumina and 10X Genomics data were assembled and scaffolded with the software package DenovoMAGIC3 , which has recently been used to assemble the allotetraploid wheat genome. We further scaffolded the genome to chromosome scale by using Hi-C data in combination with the HiRise pipeline , then performed gap-filling with 43-fold-coverage error-corrected PacBio reads with PBJelly. The total length of the final assembly is 805,488,706bp, distributed across 28 chromosome-level pseudomolecules and representing ~99% of the estimated genome size, on the basis of flow cytometry measurements. A genetic map for Fragaria×ananassa was used to correct any misassemblies, and comparisons to Fragaria vesca were used to identify homoeologous chromosomes. We annotated 108,087 protein-coding genes along with 30,703 genes encoding long noncoding RNAs , which were subdivided into 15,621 long intergenic noncoding RNAs, 9,265 antisense overlapping transcripts , and 5,817 sense overlapping transcripts .
Gene annotation and genome-assembly quality were evaluated with the Benchmarking Universal Single-Copy Orthologs v 2 method . Most of the 1,440 core genes in the embryophyta dataset were identified in the annotation, thus supporting a high-quality genome assembly. The repetitive components of the nuclear genome were annotated with a custom-repeat-library approach, including DNA transposons, long-terminal-repeat retrotransposons , and non-LTR retrotransposons . Transposable element – related sequences make up ~36% of the total genome assembly, and LTR-RTs are the most abundant TEs . The plastid and mitochondrial genomes were also assembled, annotated, and verified for completeness .Using the Fragaria×ananassa reference-genome assembly, we sought to identify the extant diploid relatives of each sub-genome donor. Previous phylogenetic studies aimed at identifying these progenitor species, often analyzing a limited number or different sets of molecular markers, have obtained inconsistent results. However, F. vesca has long been suspected to be a progenitor, on the basis of meiotic chromosome pairing; subsequent molecular phylogenetic analyses supported it being one of the diploid progenitors along with Fragaria iinumae and two additional unknown species. We sequenced and de novo assembled 31 transcriptomes of every described diploid Fragaria species, which we used to identify progenitor species on the basis of the phylogenetic analysis of 19,302 nuclear genes in the genome . To our knowledge, this is the most comprehensive molecular phylogenetic analysis of the genus Fragaria to date, including the greatest number of molecular markers and sampling of diploid species, aimed at identifying the extant relatives of the progenitor species of octoploid strawberry . Our phylogenetic analyses provided strong genome-wide support for the two diploid progenitor species that had been previously hypothesized and identified the two previously unknown diploid progenitors. This discovery, together with the geographic distributions, natural history, and genomic footprints of the diploid species, provided a model for the chronological formation of intermediate polyploids that culminated in the formation of the octoploid . Our phylogenetic analyses revealed F. iinumae and Fragaria nipponica as two of the four extant diploid progenitor species, both of which are endemic to Japan and in geographic proximity to all five described tetraploid species in China. The third species identified in our analyses, Fragaria viridis, is geographically distributed in Europe and Asia, and partially overlaps with the sole hexaploid species, Fragaria moschata. Therefore, we hypothesized that these tetraploid and hexaploid species may be evolutionary intermediates between the diploids and the wild octoploid species. This possibility is supported by a previous phylogenetic analysis identifying F. viridis as a possible parental contributor to both F. moschata and the octoploid event. Finally, we identified F. vesca subsp. bracheata, which is endemic to the western part of North America, 30 litre plant pots bulk spanning Mexico to British Columbia, as the fourth parental contributor. Our species sampling also included two other F. vesca subspecies: F. vesca subsp. vesca, which is distributed from Europe to the Russian Far East, and F. vesca subsp. californica, which is endemic to the coast of California. Octoploid strawberry species are geographically restricted to the New World and are largely distributed across North America, with the exception of isolated F. chiloensis populations in Chile and the Hawaiian Islands. Therefore, our phylogenetic analyses combined with the geographic distributions of extant species not only support a North American origin for the octoploid strawberry but also suggest that F. vesca subsp. bracheata was probably the last diploid progenitor species to contribute to the formation of the ancestral octoploid strawberry.
This possibility is further supported by a previous study revealing F. vesca subsp. bracheata as the likely maternal donor of the octoploid event, on the basis of the phylogenetic history of the plastid genome. This finding is consistent with our analysis of the plastid genome of ‘Camarosa’ . Thus, these data suggest that the hexaploid ancestor probably crossed into North America from Asia and hybridized with native populations of F. vesca subsp. bracheata, an event dated at ~1.1 million years before present. Our phylogenetic analysis also identified related diploid species possibly arising from ancient hybridization and introgression events with putative progenitor species or issues related to incomplete lineage sorting and/or missing data . Future studies will be able to more thoroughly investigate these possibilities after reference quality genomes are assembled for these other diploid progenitor species.After most ancient allopolyploid events, one of the sub-genomes, commonly referred to as the ‘dominant’ sub-genome, emerges with significantly greater gene content and more highly expressed homoeologs than those of the other ‘submissive’ sub-genome. Biased fractionation, which results in greater gene content of the dominant sub-genome, was first described in the model plant Arabidopsis thaliana and later described in Zea mays , Brassica rapa , and Triticum aestivum. The dominant sub-genome has also been shown to be under stronger selective constraints and to be heritable through successive allopolyploid events, and, as predicted, it is not observed in ancient autopolyploids. Moreover, sub-genome expression dominance has recently been shown to occur instantly after interspecific hybridization and to increase over successive generations in monkeyflower. However, some allopolyploids, including Capsella bursa-pastoris and Cucurbita species, do not exhibit sub-genome dominance. The emergence of a dominant sub-genome may resolve various genetic and epigenetic conflicts that arise from the genomic merger of divergent diploid progenitor species, including mismatches between transcriptional regulators and their target genes. The mechanistic basis of sub-genome dominance, at least in part, appears to be related to sub-genome differences in the content and regulation of TEs. Thus, the merger of sub-genomes with different TE densities results in higher gene expression for the dominant homoeolog with fewerTEs. The abundance and distribution of TEs can be used to predict gene expression dominance and eventual gene loss at the individual homoeolog level. Having identified the extant diploid relatives of octoploid strawberry, we used this information to investigate the evolutionary dynamics among the four sub-genomes. We identified a dominant sub-genome that was contributed by the F. vesca progenitor and has retained 20.2% more protein-coding genes and 14.2% more lncRNA genes, and has overall 19.5% fewer TEs than the other homoeologous chromosomes . The overall TE densities near genes were also lowest for F. vesca compared with the other parental sub-genomes . Furthermore, we identified ~40.6% more tandem gene duplications on homoeologous chromosomes of F. vesca compared with the other sub-genomes . The F. vesca sub-genome, compared with the other sub-genomes, also contains a greater number of tandem gene arrays as well as larger average tandemgene-array sizes on six of seven homoeologous chromosomes. These findings suggest that the dominant F. vesca sub-genome, compared with the other three sub-genomes, has been under stronger selective constraints to retain genes, including tandemly duplicated genes known to be biased toward gene families that encode important adaptive traits. For example, major disease-resistance genes in plants, including nucleotide-binding-site leucine-rich-repeat genes , which are usually clustered in tandem arrays, are biased toward the dominant F. vesca sub-genome . Because strawberry production is threatened by several agriculturally important diseases, we analyzed, in greater depth, the major family of plant resistance genes. Collectively, 423 NBSLRR genes were identified, including 195 encoding an N-terminal coiled-coil , 79 encoding toll interleukin 1 receptor , and 24 encoding resistance to powdery mildew 8 domains . Recent work has demonstrated that many R proteins recognize pathogen effectors through integrated decoy domains, and the F. vesca genome encodes 20 such protein models. Fragaria×ananassa has a greatly expanded set of 105 diverse domains that are fused to the R-protein structures and have the potential to function as integrated decoys .