Interestingly, KNOX functions as a bridge connecting a peripheral gene network module to the core network that includes leaf cell proliferation regulators. Because past evo-devo studies had demonstrated that KNOX expression was recruited repeatedly to influence leaf shape variation in several plant lineages, the bottle-neck position of KNOX in the GRN suggests it might be an evolutionary hotspot . Such GCN studies provide positional information on genes in the GRN, allowing us to understand their function in morphological evolution. Evolution and development are likely regulated by net-work modules where the transcription factors function as intra-modular hub genes. Identifying such genes and/or network modules that are major contributors to the evolution or organs/organisms is of great interest. To address this, several statistical approaches have been developed. One simple way is using network properties such as connectivity and modularity to identify key genes in the network. Connectivity reflects how frequently a node interacts with other nodes. Hub genes can be defined as having an extremely high level of connectivity. In a gene co-expression network, plant raspberry in container hub genes represent a small pro-portion of nodes with maximal information exchange with other nodes. Hub genes tend to be associated with essential roles in biological processes.
Modularity is another network property to measure how well a network is partitioned into sub networks. Genes that are highly interconnected within the network are usually involved in the same biological modules or pathways. In addition, differential network analysis is a recent bio-informatics tool that allows detection of how interactions between genes and modules in a GRN are rewired. Recent advances in differential network construction utilized transcriptome data alone to identify the causal transcription factors responsible for network rewiring, and these interactions were subsequently experimentally confirmed . This method defined a ‘regulatory impact factor’ to address which regulator has the most altered ability to predict the abundance of differentially expressed genes. In the case of bovine Piedmontese myostatin mutants, most frequently used bio-informatics approaches like transcript abundance, differential expression and co-expression could not identify the causal gene, but construction of differential networks could. Fukushima et al. utilized this differential network generation approach to characterize systematic differential interactions in a tomato transcriptomic dataset . This analysis demonstrated that duplicated genes had significantly different coexpression among organs, indicating the prevalence of sub-functionalization or neo-functionalization of duplicated genes during evolution.
These statistical approaches emphasize the comparative network context rather than transcript abundance alone, and will be important tools for the gene prioritization in the evo-devo research field. Proof-of-concept studies using experimental approaches can then be used to confirm the relevance of these genes identified through statistical approaches.In order to better understand how organisms develop, evolve, respond to biotic and abiotic stimuli and function in the context of their environment, transcriptomics level data will have to be integrated with data on translation, DNA sequence variation, environmental or metagenomic data, regulation of gene expression at the level of DNA and chromatin modifications, metabolicreadouts, and alteration in gene expression in response to stimuli. Furthermore, new technical advances in isolation of specific tissue/cells by tissue/cell-specific promoters or microdissection, as well as microfluidic-based cell handling can provide detailed expression datasets. Also singlemolecule long-read transcriptomes allow us to not only estimate transcript abundance with high resolution but also define precise splicing variants. These data should be very informative in understanding evolution and development. Construction of gene networks presents one useful method that can help reduce the complexity of the entirety of biological information into discrete and interpretable pieces of data that can be the starting point for hypothesis construction and testing.
Current technology of CRISPR allows us to edit genomes, providing an opportunity to modulate edges rather than nodes . Once we identify the GRN underlying a particular biological phenomenon, we can assess the significance of each edge in the system and also rewire the GRN to design output phenotypes.The genetic basis of many qualitative and quantitative phenotypic differences in plants has been associated with sequence polymorphisms and the corresponding changes in gene function.However, differences in the levels of steady-state transcripts, without underlying changes in coding sequences, also significantly influence plant phenotypes. Closely related plant species often have little coding sequence divergence; nonetheless, the related species often develop unique physiological, metabolic, and developmental characteristics, indicating that patterns of gene expression are important in species- level phenotypic variation . Phenotypic differences attributed to variations in gene expression patterns have been found to influence disease resistance, insect resistance, phosphate sensing, flowering time, circadian rhythm, and plant development . Global transcript level changes across precise genetic backgrounds have been used to define expression quantitative trait loci by identifying genomic regions responsible for the variation in transcript levels . An eQTL is a chromosomal region that drives variation in gene expression patterns between individuals of a genetic mapping population and can be treated as a heritable quantitative trait . Depending upon the proximity to the gene being regulated, eQTL can be classified into two groups: cis-eQTL when the physical location of an eQTL coincides with the location of the regulated gene, and trans-eQTL when an eQTL is located at a different position from the gene be- ing regulated . eQTL studies with the model plant Arabidopsis showed that cis-eQTL have a significant effect on local expression levels, whereas trans-eQTL often have global influences on gene regulation . Global eQTL studies also identified transacting eQTL hot spots, which contain master regulators controlling the expressionof a suite of genes that act in the same biological process or pathway. For example, eQTL hot spots in Arabidopsis colocate with the ERECTA locus, which has been shown to pleiotropically influence many traits, including those regulating morphology . Similarly, the rice sub1 locus, which regulates submergence tolerance by controlling internode and leaf elongation, controls the activity of ethylene response factors with significant trans effects . In addition, the eQTL identified using pathogen-challenged tissues in barley were enriched for genes related to pathogen response . Thus, eQTL analyses have the potential to reveal a genome-wide view of the com- plex genetic architecture of gene expression regulation and the underlying gene regulatory networks and may also identify master transcriptional regulators. Cultivated tomatoes, along with their wild relatives, harbor broad genetic diversity and large phenotypic variability . Wide interspecific crosses bring together divergent genomes, and hybridization of such diverse genotypes leads to extensive gene expression alterations compared to either parent. Introgression lines , developed by crosses between wild relatives and the cultivated tomato to bring discrete wild relative genomic segments into the cultivated background, have proved to be a useful genetic resource for genomics and molecular breeding studies. These IL’s may vary in the size of the introgressed region that may range from a few genes to more than a thousand genes. IL’s developed from the wild desert-adapted species Solanum pennellii and domesticated Solanum lycopersicum cv M82 have proved to be a useful genetic resource .
This population has been successfully used to map numerous QTL for metabolites, enzymatic activity, yield, fitness traits, plastic seedling pots and develop- mental features, such as leaf shape, size, and complexity . Comparative transcriptomics for the two parents enabled identification of transcript abundance variation potentially underlying trait differences between species . However, the genetic regulators of these transcriptional differences between the species still need to be elucidated. Therefore, we used a genomics approach in combination with statistical methods to identify the genetic basis of transcript level variation in tomato using the S. pennellii introgression lines. Here, we report on a comprehensive transcriptome profile of the IL’s, a comparison between the transcript abundance patterns of the IL’s and the cultivated M82 background , as well as a global eQTL analysis to identify patterns of genetic regulation of transcript abundance in the tomato shoot apex. We have identified more than 7,200 cis- and transeQTL in total, which regulate the transcript abundance of approximately 5,300 genes in tomato. Additional analyses using Barnes-Hut t-distributed stochastic neighbor embedding identified 42 modules revealing novel associations be- tween transcript abundance patterns and biological processes. The transcript abundance patterns under strong genetic regulation are related to plant defense, photosynthesis, and plant developmental traits. We also report important eQTL regulating steady-state transcript abundance pattern associated with leaf number, complexity, and hypocotyl length phenotypes.Transcriptome Profiling and Global eQTL Analysis RNA-seq reads obtained from the tomato shoot apex with developing leaves and hypocotyl were used to identify DE genes at the transcript level between each S. pennellii IL and the cultivated M82 . The total number of genes differentially expressed for each IL both in cis and trans, along with the number of genes in the introgression regions, is presented in Figure 1 andSupplemental Table S1. There was a strong correlation between the number of genes in the introgression regions and the number of DE genes in cis . In contrast, the number of DE genes in trans was poorly correlated with introgression size . For example, IL12.1.1, despite having one of the smallest introgressions, showed 96% of approximately 500 DE genes regulated in trans . In contrast, IL1.1 and IL12.3, the IL’s with highest number of genes in the introgression regions, showed smaller numbers of total and trans DE genes . These examples suggest that specific loci and not the introgression size determine gene regulation in trans. This could, in part, be due to the presence of genes encoding key transcription factors or develop- mental regulators in the regions with strong influence on transcript expression pattern, as is seen in the ERECTA containing genomic region in Arabidopsis . A total of 7,943 unique tomato genes were DE between the IL’s and cv M82, representing approximately one-third of the approximately 21,000 genes with sufficient sequencing depth to allow DE analysis. There were 2,286 genes, more than one-fourth of unique DE genes between the IL’s and cv M82, which showed transgressive expression patterns, that is, those genes were differentially expressed at the transcript level for the IL but not for S. pennellii compared to cv M82 . These data suggest that in addition to protein coding differences, transcriptional regulation of less than one-third of all genes ac- counts for most of the phenotypic and trait differences between the IL’s and the cultivated parent. Identifying eQTL localized to subsets of the introgressions, based on overlaps between them, enabled us to narrow down the regions that contain the regulatory loci. This analysis brings us one step closer to identifying potential candidates that influence transcript abundance patterns in tomato. We identified 7,225 significant eQTL involving 5,289 unique genes across the 74IL’s . These 7,225 significant eQTL were designated as cis, trans, or chromo0 as defined in the methods and illustrated in Supplemental Figure S3, and either up or down based on increase or decrease in transcript levels. This correlation resulted in a total of 1,759 cis-up and 1,747 cis-down eQTL, 2,710 transup and 920 transdown eQTL, and 51 chromo0-up and 8 chromo0-down eQTL . The majority of genes are under the regulation of a single eQTL . This observation shows the predominance of cis-eQTL for genetic regulation of transcript abundance in the tomato IL’s. Similar correlation between transcript level variation and genome-wide sequence divergence within seven Arabidopsis accessions was reported to be due to cis control of a majority of the detected variation . Bins on chromosomes 6, 8, and 4, such as 6B, 6C, 4D, 8A, and 8B, contain pre- dominantly trans-eQTL . In contrast, three bins, 1F, 3I, and 8G, which each contain more than 100 genes, have no significant trans or cis-eQTL and are transcriptionally silent. As expected, bins containing more than 100 significant cis-eQTL are scattered across the genome . The abundance of trans-eQTL on chromosomes 4, 6, and 8 strengthens the idea of trans-eQTL hot spots controlling expression of a large number of transcripts, as reported in other organisms . The resolution in this analysis is at the level of bin, and these significant eQTL likely map to a smaller number of genes within the bins. Functional classification of genes being regulated by these eQTL and phenotypic association with the relevant IL’s was undertaken to glean insights into the identity of candidate genes in the bin.To functionally categorize the eQTL regulated genes, BH-SNE was performed on the 5,289 genes with eQTL to detect novel associations between transcript abundance patterns. This clustering resulted in 42 distinct modules containing 3,592 genes .