Bioinformatics: Biology Meets Programming

Bioinformatics is a computational science field concerned with the collection, storage, analysis of sequences and dissemination of biological molecules, most often DNA, RNA, or Protein Molecules.

Bioinformatics study biological information using a computational approach, which depends on knowledge of both underlying biological, physical and chemical information. It focuses on qualitative analysis of information relating to biological macromolecules (DNA, RNA or Protein) with the aid of a computer (interdisciplinary research area at the interface between the computer and biological science). Most biological research involves applications of mathematical, statistical or computational tools to help synthesize recorded data and integrate various types of information in the process of answering particular biological questions.

Bioinformatics focuses on living cells and how it functions at the molecular level, by analysing raw data, molecular sequence, which generate insights and provides a ‘global perspective of the cells’. However, the reason that the function of the cells can better be understood by analysing sequences is because of how genetic information is passed from parents to offspring, which is dictated by the “central dogma of biology” in which DNA is transcribed to RNA, and RNA translates to PROTEIN.

Subfield of Bioinformatics

Development of computational tools and databases: which included writing software, structural and functional analysis, as well as construction and curating of biological datasets.

Application of bioinformatics tools and databases in generating biological knowledge to better understand living systems which are used in:

1. Genomics and molecular biology research

2. Molecular structural analysis

3. Molecular functional analysis.

Application of Bioinformatics.

Bioinformatics has not focused only on molecular biology research and basic genomic but cut across major areas of biomedical sciences, biotechnology.

The primary goal of bioinformatics includes:

Bioinformatics plays a key role in the area of functional structural, and nutritional genomics. It covers aspects of emerging technology and scientific research and the exploration of omics technology e.g proteomics, transcriptomics, genomics, and metabolomics.

Bioinformatics plays a vital role in an increasingly important role in almost all aspects of drug discovery, drug development and drug commercialization.

Bioinformatics is used to structurally and identify modify the natural product, to design a compound of interest properties and to access its effect on a therapeutic level, theoretically.

Bioinformatics tools are very active in predicting, analysing and interpreting preclinical and clinical findings.

Other Applications of Bioinformatics are:

Development of personalised medicine

In the health care system, genomics and bioinformatics are poised to revolutionise personalised and customized drugs production. The high-speed genomics and quality information tech allow doctors in the clinic to quickly sequence a patient’s genome and detect harmful mutations and engage in early diagnosis and effective treatment.

Development of molecular medicine

The human genome has intellectually deep effects on the fields of clinical medicine and biomedical research.

The complete set of the human genome and the use of bioinformatics tools ethically means that we can search for the genes directly associated with different diseases and begin to understand the molecular basis of these diseases more clearly. This knowledge of molecular analysis of disease re-enables better treatment cures and even preventive test to be developed in the world of science.

Agricultural biotechnology

Plant genome database has been subjected to a bioinformatics approach, also gene expression profile analysis has played an important role in the development of new crop varieties that have productively and more resistant to disease.

Forensic DNA analysis

The result for molecular phylogenetic analysis is evidence in Criminal Court Bayesian statistics and the likelihood-base method for analysis of DNA have been applied in the analysis of forensic identification.

Knowledge-based drug design

Protein-protein interaction under computational research provides identification of novel which leads to synthetic drugs. the reduce the time and cost necessary to develop drugs with high-potency, fewer side effects and less toxicity than using the traditional-trial error methods.

Microbial genome applications

The application of the complete set of genome sequences and their potential to provide a useful insight into the microbial world.

Implications for health, industrial, environmental and energy applications. However, the US Department of Energy in 1994 initiated the Microbial Genome Project to sequence genomes of bacteria useful in industrial processing, environmental cleanup, energy production, toxic waste reduction.

Studying the genetic material of these organisms, scientists begin to understand these microbes at the very fundamental level and isolate the genes that give them the ability to survive under extreme conditions.

Climatic change Studies 

The increasing level of the greenhouse effect like carbon dioxide emissions, methane, carbon monoxide, sulphur dioxide, nitrogen dioxide, via expanding use of fossil fuels for energy contributes to global climate change.

Recently, the Department of Energy, USA launched a program to decrease atmospheric carbon dioxide level. A method to do so is to study the genomes of microbes that use carbon dioxide as their sole carbon source.

Evolutionary studies 

Sequencing of genomes from all three life domains, Archaea, Bacteria and Eukaryota, means evolutionary studies can be performed in a quest to determine the tree of life and the last universal common ancestor.

Biological Database

A database is a computerised archive, used to store and organise data in a way that information can be retrieved through the search criteria. This composed of computer hardware and software for data management.

This organises data in a set of structured records to enable easy retrieval of information. In a biological dataset, each record also called an “entry” contain several fields that hold the actual data items, for example, field names, phone numbers, addresses. To retrieve a particular record from the database users can easily input particular information called “value” in a process known as making a query.

Types of biological database

Primary database: this contains original biological data sets. They are achieved from raw sequences or structural data submitted by the scientific community. GenBank, The European molecular biology laboratory (EMBL) database, and DNA data bank of Japan (DDBJ) are three major databases that store raw nucleic acid sequence data produced and submitted by researchers worldwide. They collaborates ask exchange data daily and they contribute “international nucleotide sequence database collaboration (INSDC)”

Secondary database

Contain computational or manually curated process of information, base on original information from the primary database. A prominent example includes the SWISS-PROT, which provides a detailed that includes structure, function, and protein family assignment.

Specialized database

These are databases that cater to particular research interests, for example, Flybase, HIV sequence database, and ribosomal database project. A specialised database normally serves specific research commonly or focus on a particular organism.

Pairwise Sequence Alignment 

Pairwise sequence alignment is the process of aligning two sequences and is the basis of database similarity searching. This process sequence is compared by searching for common character pattern and establishing the residue-residue correspondence among related sequences. Pairwise sequence alignment is the fundamental component of many bioinformatics application, which is useful in structural, functional and evolutionary analysis of the sequence. Alignment of the sequence is performed using the BLAST( Basic Local Alignment Search Tools).

BLAST output gives a considerable amount of information about the alignment and provides but score and expectations value. The higher the score, the more similar the two sequence. The e-value is the number of times the hit may have occurred by chance. If a number is low, the findings likely occurred just by chance. Thus, the low the e-value, the more significant the score is.

BLAST belong to a programming family including BLASTN, BLASTP, BLASTX.

BLASTN queries nucleotides sequences with a nucleotide sequence database. BLASTP uses protein sequences as queries to search against a protein sequence database. BLASTX uses nucleotides sequence and translates them in all six reading frames to produce translated protein sequence which is used to query a protein sequence database. Another well-establish program apart from BLAST is the FASTA (FASTA AII) which uses an alternative algorithm to detect sequence similarity.

Leave a Reply