DNA sequence assembly is a process that involves aligning and merging fragments of a DNA sequence to reconstruct the original structure of the DNA. This is an essential step of the genome analysis process because the entire genome cannot be interpreted in one step with current sequencing technology. Instead, small sections of the genome with up to 30,000 nucleotide bases are read at once and then assembled to reform the DNA.
How to sequence the human genome - Mark J. Kiel Play
Assembly is a difficult task because there are often many different fragments of the genome, which need to be pieced back together in the correct order to make sense of the information. There is also some propensity for error in the assembly process due to the repetition of the nucleotide bases in the genome, which may increase the difficulty of assembling the fragments together correctly.
Alignment of fragments
In order to achieve the correct DNA sequence assembly, it is necessary to read multiple fragments of sequences and then link them back together in the correct order. This involves overlapping the ends of the fragments because the current DNA sequencing technology is unable to read the entire genome sequence at once.
The correct alignment of the DNA segments is essential to ensure that the DNA is assembled to form the original. This ensures that understanding taken from the analysis of the genome is relevant and may be able to provide an appropriate use in practice.
A genome is defined as finished when there is a single continuous sequence of DNA with clearly defined nucleotide bases at each point. There should not be any ambiguity as to the replicons in the sequence.
Approaches
There are two broad types of assembly techniques that may be utilized: de novo and comparative assembly.
De novo assembly is used for new genomes that have not been previously sequenced or are not similar to genomes that have previously been sequenced. This type of assembly is usually harder to conduct due to computational difficulties.
Comparative, or mapping, assembly is used for genomes that have an existing sequence, or have a genome that is similar to another organism that already has an assembled genome, which can be used as a reference.
Assembler technology
The first sequence assemblers to align DNA fragments with automated sequencing instruments were introduced in the early 1980s. Over time, the technology to assemble DNA sequences has evolved considerably as the progression in genomics research have required more sophisticated techniques to manage the information of the genome project underway.
Computing clusters are used with terabytes of sequencing data to assemble the DNA. Identical or very similar sections of the DNA, referred to as repeats, can confound the process and increase the time and complexity of the algorithms required for assembly. Additionally, if there are any minor errors in the DNA fragments due to incorrect calibration of the sequencing instruments or another factor, it can be very difficult to identify the error in the data and exclude the fragment from the results.
There are currently new technologies being developed to help improve the process of DNA assembly. It is hoped that these will contribute to improving the ease and speed of DNA sequence assembly in the future so that the results can be used in practical applications.
References
http://www.nature.com/nrg/journal/v14/n3/full/nrg3367.html
http://samoa.santafe.edu/media/workingpapers/09-04-010.pdf
https://academic.oup.com/nar/article/23/24/4992/2400677/A-new-DNA-sequence-assembly-program
http://www.aaai.org/Papers/ISMB/1995/ISMB95-033.pdf
Further Reading