Can the physicochemical properties of the components of a genome tell us something about their functions? Yes, and that too with high levels of precision, according to a recent paper by researchers from the Indian Institute of Technology Delhi (IIT-D) who have been able to distinguish genes that code for proteins from those that assist in the process by studying their interactions with water.

The researchers were able to accurately discriminate over two million genes coding for messenger Ribonucleic acid (mRNA) from 56,251 genes coding for transfer RNA (tRNA) in completely sequenced prokaryotic genomes from a global database. Prokaryotes are micro-organisms like bacteria that don’t have a cell nucleus.

Scientists use several methods to try and interpret the language of DNA including bio-informatics tools built on sophisticated pattern recognition models. Using the physicochemical properties of DNA sequences as a guide to understand genome organisation is an approach that has received relatively little attention but it could be more accurate than existing methods, says B Jayaram, professor at IIT-D’s department of chemistry, who co-authored the paper with Garima Khandelwal. The research paper was published in the Journal of the American Chemical Society in May.

The question the researchers sought to answer was whether the physico-chemical properties of different functional units on genomes differ, and they went about it by studying the way the mRNA and tRNA interacted with their water environment. The conversion of a genome to transcriptome (DNA to RNA) requires unpacking of the DNA, recognition by regulatory regions, unwinding and strand separation. All this involves processes such as stacking-up of genes, hydrogen bonding and DNA solvent interaction.

The mRNA is the RNA molecule that carries protein-coding information and which starts to produce proteins in the ribosomal machinery. The tRNA does not code but carries the amino acid necessary for the protein synthesis. According to the research paper, the problem of identifying tRNA genes has been difficult particularly without secondary structural inputs and most of the computational methods for gene prediction are designed to locate the protein-coding regions.

The IIT-D research used the data on all known prokaryotic genomes from the US-based National Center for Biotechnology Information and then created a table of the hydrogen bond, solvation and stacking energetics using molecular dynamics simulation data from the Ascona B-DNA Consortium, a global group of scientists.

The study of solvation energy suggested that it should be easier to desolvate the tRNA because it has a well-defined structure in which some parts were exposed and others buried. The mRNA, meanwhile, has a loosely defined tertiary structure because it has to pass through the ribosomal machinery and then become protein.

?What turns out in this study is that if you scan the DNA for solvation energies, that immediately conveys to you that mRNA and tRNA are easily separable. We can easily distinguish this by highlighting the property of solvation,? says Jayaram. ?This is a start, I would say and it’s very promising.?

He reckons that if the correlation between the physical-chemical properties and function is taken to a higher level based on further studies, it should eventually be possible to directly interpret a genome sequence based on its properties. ?For instance, take the case of the human genome. Only 3% of the human genome codes for proteins. Then what is the other 97% doing? We have no idea,? he says. ?We really don’t know what the functional significance is and which parts are at least the dividers. What the physicochemical approach will do is at least try to put the boundaries. And then, utilising these correlations, we might start to interpret what these boundaries would mean.?

Genomics has produced a deluge of data in recent years but the global scientific community does not yet have the necessary tools to make full sense of this data, says Surjit Dixit, chief technology officer at Zymeworks Inc, a biotechnology company based in Vancouver, Canada. ?The method used by Jayaram and co-workers is a unique approach compared to the current traditional methods of analysis of this data and is a valuable step in filling this gap and expanding our understanding of the complex molecular components of life and how they work.?

Jayaram says that his lab would now attempt to find out combinations of physicochemical properties for each of the functional units already known such as Ribosomal RNA or microRNAs.