Genome projects typically involve three main phases: DNA sequencing, assembly of DNA to represent original chromosome, and analysis of the representation.
DNA sequencing is the process to determine the nucleotide order in a specific DNA molecule, which is useful when attempting to understand its function and consequent effects in the organism it resides in. DNA sequence assembly involves the alignment and merging of DNA fragments to reconstruct the DNA so that smaller sections of the genome can be analyzed.
The analysis of DNA phase is the final step in genome analysis. It brings together the discoveries from the previous phases of the project to form conclusions, which can offer true value to further our knowledge of the genome and be applied in relevant situations.
Annotation of DNA
DNA annotation is the process of identifying the locations of genes and coding regions in a genome to create ideas about the possible functions of the genes. There are three main steps to annotate the genome, which include to:
- Identify the portions of the genome that are not involved in coding proteins
- Identify the main elements of the genome (gene prediction)
- Connect the main elements of the genome with biological information
It is important to consider how the genome is similar to other genomes that are already known, as this can help when establishing the role of the gene. Additionally, the plasmids, phages and resistance genes of the genome can reveal information about the nature of the genome.
The traditional method of curation method uses the Basic Local Alignment Search Tool (BLAST) algorithm to find similarities to annotate the genome. However, this approach involves expert knowledge and experimental verification to be carried out.
There are also several tools that can automatically annotate the genome in silico, which tend to be more efficient than the curation method and can provide additional information. Both curation and technological automation tools are often used together to complement each other in the provision of results.
Technology for genome analysis
The recent advances in technology that allow high throughput genomic sequencing to be undertaken quickly and relatively cheaply has propelled the work of genome analysis forward.
However, this progression also places a large demand for efficient and robust tools of analysis to interpret the data into a form that can be utilized in practice. The massive sets of data that have been produced by projects, such as the Human Genome Project, remain largely under-utilized, despite the fact that the project concluded more than a decade ago.
It is important to develop techniques to both analyze the information that we currently have available and the level of data that we are generating.
There is currently a lack of robust analysis tools that are able to handle the depth of data in these genome projects and assist researchers in making use of the information.
There are several important functions of the tools used in the process to analyze genomes. The tools should have capabilities to:
- Compare variant calls between genomes
- Export data into convenient formats for analysis
- Filter and annotate results to increase ease of analysis
- Create a reference genome for successive analyses
The development of suitable tools to assist in the genome analysis process should be a priority for the future to continue the growth of knowledge and understanding the field of genomics.
References
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2928508/
- https://microbialinformaticsj.biomedcentral.com/articles/10.1186/2042-5783-3-2
- http://www.completegenomics.com/public-data/analysis-tools/cgatools/
Further Reading