top of page

Understanding genomes, piece by piece

Genomes are made up of thousands of individual pieces – genes – which are expressed at different levels. Researchers at EMBL have shed light on how the placement of a gene affects its expression, as well as that of its neighbours.

Gene expression is influenced by neighbouring transcriptional patterns. Genetic rearrangements in a synthetic yeast chromosome revealed that characteristics of a gene's new transcriptional neighbourhood predicted its transcript isoform boundaries and abundance. Credit: Tobias Wüstefeld

The celebrated physicist Richard Feynman is credited with the quote, “What I cannot create, I do not understand.” As well as informing Feynman’s approach to theoretical physics, it’s a good way of describing the motivations of synthetic biologists, with their interest in building genomes from scratch. By designing and building synthetic genomes, they hope to better understand the code of life. Synthetic biology has been organised around the concept of using DNA sequences as ‘parts’ with reproducible functions. Now, through successful collaborations and the use of cutting-edge tools, EMBL’s Steinmetz Group has gained an important insight into the variation of gene expression that results from the position or context of these DNA parts within the genome.

Explaining the underlying question motivating the work, Amanda Hughes co-lead author and postdoc in the Steinmetz Group said, “In synthetic biology, you tend to break things down into modular, ‘plug-and-play’ parts. These are promotor parts, coding regions, and terminator parts. We wanted to test whether these pieces really are ‘plug-and-play,’ functioning the same way in any context, or whether their position affects their function. We wanted to better understand how the linear organisation of genes affects their functions and identify general design principles that could be applied to building genomes.”

A synthetic biology toolbox delivers contextual insights

This work, funded by BMBF and the Volkswagen Foundation’s “Life?” initiative, was possible because of two key technologies: synthetic yeast strains from the Sc2.0 consortium and long-read direct RNA sequencing. The strains obtained from the Sc2.0 consortium included a design feature called ‘SCRaMbLE’ that provides the ability to rearrange genes into different locations at a previously unachievable scale. The expertise and tools available in the Genomics Core facility at EMBL, including Oxford Nanopore’s GridION, allowed the team to perform long-read direct RNA sequencing, permitting identification of both the start and end of RNA molecules and their assignment to particular rearrangements. The combination of these cutting-edge technologies was critical to measure full-length RNA molecules from genes across many contexts.

The paper, published in Science showed that context – and in particular transcriptional context – alters the RNA output of a gene. Using long-read direct RNA sequencing, they were able to observe changes in the start, end, and amount of full-length RNA molecules expressed from DNA sequences that had been randomly rearranged in synthetic yeast genomes. Relocating a gene affected the length and abundance of its RNA output; however, these changes were not always explained by the new adjacent DNA sequence. It appeared to be transcription occurring around it, rather than the sequence itself, that altered a gene’s RNA output.

Gleaning general principles from such a large, stochastic dataset was not a trivial task, as the lead author Aaron Brooks explained: “To reach our conclusions, we had to observe genes in many alternative genetic contexts, which were present in the SCRaMbLE strains. Putting the pieces back together, however, was a big effort. We had to generate a massive sequencing dataset, which, in turn, required us to develop new software tools. We had to rely on sophisticated machine learning algorithms to help us understand the complex patterns we were observing.” Modelling a gene’s RNA output based on its new upstream and downstream contexts revealed that features related to surrounding transcriptional patterns predicted RNA boundaries and abundance. For example, if a gene was relocated next to a highly expressed neighbour, its expression also tended to increase.

Defining design principles for building genomes