Skip to main content

A Look At How Genomes Are Reconstructed


Pixabay

There is a vast array of technologies in use today that scientists have developed and utilize to reconstruct complex genomes.

By Patrick James Hibbert 
1 Jul 2019

Genome sequencing has become widely available and more inexpensive, it is also being used more often in clinical practices to treat cancer. Though many genomes have been sequenced, large and complex genomes, like those in most animals and plants, create significant challenges. 

In those genomes, DNA sequencing software often produces incomplete and fragmented reconstructions that require additional, experimentally derived, information and manual intervention to reconstruct individual chromosome arms. Newer technologies, designed to capture chromatin structure, have proven to effectively complement sequencing data, creating more contiguous reconstructions of genomes than previously possible. 

Automating the reconstruction of entire genomes is a difficult task, mostly because of genomic repeats. The ambiguity produced cannot be resolved with the information contained in the reads alone. And because of unusual base-pair compositions, genomes also contain regions difficult to sequence. 

As a result, typical genome assemblies of eukaryotes are highly fragmented and contain thousands of contiguous genomic segments (contigs). Sources of information for the genome scaffold comes from any type of information that hints at the relative location of genomic segments along a chromosome. Usually, the information comes from custom genomic technologies, designed to analyze the structure of chromosomes. 

Recently, researchers from the University of Maryland surveyed technologies and algorithms used to assemble and analyze large eukaryotic genomes and placed them within the historical context of genome scaffolding technologies that have been in existence since the dawn of the genomic era. They published this research in PLOS Computational Biology’s scientific journal. 

Their work shows technological advances and how, both, the experimental and computational sides have dramatically improved the ability to reconstruct the genomes of complex eukaryotic organisms. It looks at and explains six sequencing and mapping technologies. They are physical mapping, subcloning, long-read data, paired read, chromosome conformation, and synteny.



Physical mapping technologies attempt to estimate the location of specific loci along genomic chromosomes. The loci can be short DNA segments that are unique within the genome, as in the case of sequence-tagged sites. 

The approximate location of the markers along chromosomes can be identified through a number of techniques, from fluorescence in situ hybridization (FISH) to the analysis of the random breakage of DNA being exposed to X-rays (radiation hybrid mapping) to the direct measurement of restriction fragment sizes, as performed in restriction mapping. It’s data is among the earliest technologies used to order genomic contigs along a chromosome. 

Subcloning involves breaking up the genome into large fragments that are then sequenced separately, retaining the connection between the sequencing reads generated from the same fragment, creating what are subsequently called “linked reads”. The assembly process is then ran for each fragment separately, and the resulting assemblies are merged to reconstruct the complete genome sequence. 

This technique was used in the early days of genomics, to sequence the first human genome. Recently, new technologies have been developed that perform the subcloning process in labs. 

The long-read data sequencing technologies generate long sequencing reads and can be seen as a special case of subcloning. Genome assemblers are effective at reconstructing genomic contigs from long-read data. They achieve high-quality assemblies with only long-read data, but the genome needs to be sequenced at considerably high coverage, incurring significant costs. 

Paired-read technologies are the most common source of information for scaffolding. The technology yields information about the relative placement of pairs of reads along the genome being sequenced. Typically, this information is produced by carefully controlling DNA shearing prior to sequencing in order to obtain fragments of uniform sizes and by tracking the link between DNA sequences “read” from the same fragment.

Chromosomal contact data is a special type of paired-read data generated by techniques recently developed to study the three-dimensional structure of chromosomes inside a cell. 

These techniques are collectively referred to as chromosomal conformation capture, which generate pairwise linking information between reads that originate from genomic regions that are physically adjacent in a cell. Unlike mate-pair data, the distance and the relative orientation between the paired reads are not known beforehand.

Synteny refers to the co-localization of genes or genomic loci along a chromosome. In many cases, whereas the DNA sequence itself may diverge significantly during evolution, related organisms often preserve synteny and gene order. 

The conservation of synteny can be used to help order contigs along a chromosome by inferring their placement based on their location within a related genome of the orthologs of the genes found in the contigs. Despite the rapid increase of complete and draft genomes in public databases, the use of synteny information in genome reconstruction has not been widely adopted.

Advances in genomic technologies may make the automatic reconstruction of mammalian genomes possible in the near future. Recent advances in nanopore sequencing devices are already creating longer reads. This could lead to the ability to assemble complete eukaryotic genomes from nanopore data alone. 

In the near future, it is also likely that many previously intractable genomes will be reconstructed with the help of long-read sequencing data coupled with paired-read information from chromosome conformation capture technologies, augmented by short-read and short mate-pair technologies aimed at resolving the small-scale structure of genomes.

 

Comments

Popular posts from this blog

Scientists Find That Social Distancing Reduces COVID-19’s Infection Rate by Approximately 1% per Day

  Social distancing, one of many interventions used to combat airborne communicable diseases. By Patrick James Hibbert  25 AUG 2020 Researchers predict social distancing will prevent a rapid, overwhelming epidemic according to modeling studies. Governments also used this type of intervention in prior pandemics. One being the 1918 influenza pandemic in which it had moderate success.  There is not much information about the health benefits of imposing statewide social distancing measures to reduce the transmission of COVID-19. Because of this, a team of researchers from the United States, South Africa, and the United Kingdom conducted a study on it. They wanted to know what the COVID-19 case growth rate was, before and after social distancing measures where enacted. And, what the public health impacts of government-mandated non-pharmacological interventions were after they started and before they ended. In response to the Spanish Flu pandemic, social distancing and masks were used.  Alb

How The Nervous System Controls The Immune System

ColiN00B/ Pixabay Scientists set out to identify the specific connections between the nervous system and the immune system's organs. By Patrick James Hibbert  2 Jan 2020 The autonomic nervous system is the involuntary part of the nervous system and is composed of sympathetic and parasympathetic regions. The sympathetic region controls the “fight or flight” mechanism and is activated when the body is stimulated.  Conversely, the parasympathetic region acts when the body is at rest and controls functions like digestion. And it’s nerve innervations are found in the parenchyma of all the classic immune organs. Those organs include lymph nodes, the tonsils, the thymus, the spleen, the appendix, bone marrow, gut-associated lymphatic tissue and it’s Peyer’s patch. The way the immune system is controlled by the nervous system is not well-understood by the scientific community. In response to this, researchers in Beijing, China set out to understand how neural signals control the immune sys

Multiple Resource Theory Explains Multi-Tasking Limits in Adolescents

Brains have resources for the execution of only a finite number of tasks at the same time.  Geralt/Pexels The brain's shared pool of resources are allocated across different tasks, modalities, and processes, spanning from sensory level processing to meaning level processing.  By Patrick James Hibbert  14 Jun 2020 Balance prevents falls, improves sports and mechanical skills, and promotes growth and development in adolescents. Maintaining balance and upright body position requires lots of cognitive resources. This fact is demonstrated in the inability of adolescents to multi-task cognitive tasks and keeping upright body posture. When adolescents perform tasks with their eyes closed their upper body sways, according to studies. This is an example of the Multiple Resource Theory . Here, cognitive and visual processing streams compete for common central resources. Evidence shows performing a cognitive task inhibits visual processing. And  researchers from Anqing Normal University ’