Current Research

Origin of Life and the RNA World

Emergence of Ribozyme and tRNA-like structures in an RNA World

The RNA world hypothesis has received much attention from both experimentalists and theorists as a compelling hypothesis for the for the origin of life on earth. Despite several indirect evidence for the plausibility of such an epoch that preceded our current DNA-protein world, it is not yet clear how RNA catalysts like ribozymes emerged through non-enzymatic reactions in a prebiotic environment that contained just constituent building blocks in the form of monomers U,C,A,G. In this paper we use realistic rates of RNA polymerization reactions derived from experiments to examine how long, structurally complex ribozyme-like sequences may have spontaneously assembled with minimal assumptions about the initial state. We consider an initial state that contained either only free monomers or a few short poly-A and poly-G oligomers. By taking into account both faster, template-directed primer extension as well as the slower non-templated polymerization processes, we show that fairly long (upto ~100 nucleotides) sequences with complex secondary structures can be produced under environmental cycling that segregates polymer extension reaction from hydrolysis reactions that fragment existing polymers. We propose a method for distinguishing between sequences with different secondary structures, classify all RNA sequences on the basis of their secondary structures and identify the most favourable conditions needed for the emergence of ribozyme-like structures and double stranded RNA molecules. Our work [33] is the first to show that by using experimentally determined reaction rates, even cloverleaf structures characteristic of tRNA's can spontaneously emerge starting from free nucleotides, in addition to single-hairpin, double-hairpin and hammerhead-like structures. Our results indicate that under suitable environmental conditions, non-enzymatic processes would have been sufficient to lead to the emergence of a variety of ribozyme-like molecules with complex secondary structures.

Competition between protocells in an RNA World

It has been speculated that life based on RNA molecules existed prior to the emergence of DNA and protein-based life 4 billion years ago. Despite some indirect evidence, it is not yet clear how "live", replicating, cell-like compartments containing different types of RNA catalysts emerged and eventually proliferated in a primordial world. We used a mechanism of RNA replication originally observed in viruses, to suggest how competition between protocells can lead to the preferential selection of protocells containing a functionally diverse set of RNA catalysts. Our work [35] highlights a plausible mechanism of increasing protocellular complexity that might have eventually led to the origin of life in an RNA world.

The formation, growth, division and proliferation of protocells containing RNA strands is an important step in ensuring the viability of a mixed RNA-lipid world. Experiments and computer simulations indicate that RNA encapsulated inside protocells can favor the protocell, promoting its growth while protecting the system from being over-run by selfish RNA sequences. Recent work has also shown that the rolling-circle replication mechanism can be harnessed to ensure the rapid growth of RNA strands and the probabilistic emergence and proliferation of protocells with functionally diverse ribozymes. Despite these advances in our understanding of a primordial RNA-lipid world, key questions remain about the ideal environment for the formation of protocells and its role in regulating the proliferation of functionally complex protocells. The hot spring hypothesis suggests that mineral-rich regions near hot springs, subject to dry-wet cycles, provide an ideal environment for the origin of primitive protocells. We develop a computational model to study protocellular evolution in such environments that are distinguished by the occurrence of three distinct phases, a wet phase, followed by a gel phase, and subsequently by a dry phase. We determine the conditions under which protocells containing multiple types of ribozymes can evolve and proliferate in such regions. We find that diffusion in the gel phase can inhibit the proliferation of complex protocells with the extent of inhibition being most significant when a small fraction of protocells is eliminated during environmental cycling. Our work [37] clarifies how the environment can shape the evolution and proliferation of complex protocells.

Evolutionary Game Theory and the Evolution of Cooperation

Learning Strategies for Evolution of Altruistic Behaviour

Cooperation and conflict are seen in societies that span all biological scales ranging from the most primitive microbial communities to the most advanced human societies. Evolutionary game theory provides a simple yet powerful framework to understand the impact of cooperation and conflict. The use of either of these opposing traits by individuals during interactions with other agents often depend on personal aspirations, the number and attributes of connected neighbours and the underlying structure of the population. In a social conflict, individuals employ different learning strategies to update their behaviour over time. Our work [34,36] in this area focuses on understanding how such changes occur by studying the evolution of cooperation and wealth distribution in populations playing a public goods game on static and dynamic social networks. In an information-rich setting [34], we proposed a new decision heuristic, where the propensity of an individual to cooperate depends on the local strategy environment in which she is embedded as well as her wealth relative to that of her neighbours.In an alternative, low information setting [36], we explored the population dynamics when the strategy-update heuristic does not depend on extraneous information and relies only on a comparison between the actual payoff received and the benefit aspired for. We have shown how restructuring of social ties as well as evolution of individual decisions in these two distinct scenarios, affect the population dynamics, often in unpredictable ways, leading to profound consequences for the persistence and proliferation of altruistic traits.

Bribe and Punishment

Bribe demands present a social conflict scenario where decisions have wide-ranging economic and ethical consequences. Nevertheless, such incidents occur daily in many countries across the globe. Harassment bribes, paid by citizens to corrupt officers for services the former are legally entitled to, constitute one of the most widespread forms of corruption in many countries. Nation states have adopted different policies to address this form of corruption. While some countries make both the bribe giver and the bribe taker equally liable for the crime, others impose a larger penalty on corrupt officers. We examine the consequences of asymmetric and symmetric penalties by developing deterministic and stochastic evolutionary game-theoretic models of bribery. We find that the asymmetric penalty scheme can lead to a reduction in incidents of bribery. However, the extent of reduction depends on how the players update their strategies over time. If the interacting members change their strategies with a probability proportional to the payoff of the alternative strategy option, the reduction in incidents of bribery is less pronounced. Our results indicate that changing from a symmetric to an asymmetric penalty scheme may not suffice in achieving significant reductions in incidents of harassment bribery.

We examined such a scenario of social conflict that is manifest during an interaction between government servants providing a service and citizens who are legally entitled to the service, using evolutionary game-theory in structured populations characterized by an inter-dependent network. We investigated the effect of varying bribe demand made by corrupt officials and the cost of complaining incurred by harassed citizens, on the proliferation of corrupt strategies in the population. We also examined how the connectivity of the various constituent networks affects the spread of corrupt officials in the population. We find that incidents of bribery can be considerably reduced in a network-structured populations compared to mixed populations. Interestingly, we also find that an optimal range for the connectivity of nodes in the citizenís network (signifying the degree of influence a citizen has in affecting the strategy of other citizens in the network) as well as the interaction network aids in the fixation of honest officers. Our results reveal the important role of network structure and connectivity in asymmetric games.

We also employed an evolutionary game-theoretic framework to analyse the evolution of corrupt and honest strategies in structured populations characterized by an interdependent complex network. The effects of changing network topology, average number of links and asymmetry in size of the citizen and officer population on the proliferation of incidents of bribery are explored. A complex network topology is found to be beneficial for the dominance of corrupt strategies over a larger region of phase space when compared with the outcome for a regular network, for equal citizen and officer population sizes. However, the extent of the advantage depends critically on the network degree and topology. A different trend is observed when there is a difference between the citizen and officer population sizes. Under those circumstances, increasing randomness of the underlying citizen network can be beneficial to the fixation of honest officers up to a certain value of the network degree. Our analysis reveals how the interplay between network topology, connectivity and strategy update rules can affect population level outcomes in such asymmetric games.

Origin and Evolution of the Genetic Code

The genetic code provides a recipe for the synthesis of proteins and is a mapping between the 64 possible codons (triplets of bases) in the messenger RNAs and the 20 amino acids. The canonical genetic code was established before the last common ancestor and was initially believed to be used by all living organisms. However, many deviations from the canonical code have been discovered in a wide variety of living organisms. This occurs when a codon gets reassigned from one amino acid to another during the course of evolution. The challenge is to understand how changes in the canonical code could have taken place without being lethal to the organism. Although two distinct qualitative explanations for code change have been proposed, no quantitative modeling of such changes had been carried out. Moreover, while the detailed molecular events responsible for codon reassignment code change are known in many cases, the exact mechanism of codon reassignment in all cases had remained remain unclear. Our main goal is to elucidate the mechanism of code change using sequence comparison (phylogenetic) studies as well as numerical modeling of the process.

Together with Paul Higgs[10] we have presented the first quantitative population genetics model of codon reassignment which leads to alternative genetic codes. Our model unifies four possible mechanisms of reassignment and is based on the observation that all codon reassignments can be understood in terms of a gain and a loss. The loss could be the loss of function or deletion of a tRNA or release factor which was originally used to recognize the codon in question. The gain could be the gain of a new type of tRNA for the reassigned codon (possibly due to gene duplication) or the gain in function of an existing tRNA due to mutation or base modification. By tracking the sequence in which gain and loss events became fixed in the population, we were able to distinguish between the various possible mechanisms of codon reassignment. Numerical simulations of the gain-loss model indicated that all four mechanisms are viable and enabled us to investigate the different circumstances under which each mechanism becomes dominant.

We have also carried out extensive analysis [12] of gene sequence data for the major eukaryotic groups using both protein and RNA sequences from complete mitochondrial genomes. This allowed us to locate the changes in the genetic code on the phylogenetic tree and to identify other significant changes occurring in the genomes that may be linked to genetic code changes, e.g. the gain and loss of tRNA genes from the genome, the occurrence of mutations in tRNA anticodons, and the variation in codon usage patterns between species. Combining these different types of information, we were able to infer the mechanism responsible for codon reassignments in many instances and relate this to the theoretical predictions based on the gain-loss model discussed earlier.

We proposed in vitro experiments [15] to test the viability of the Ambiguous Intermediate mechanism of codon reassignment. These experiments rely on differential reaction rates for the various steps in the translation process to distinguish between the effectiveness of the two alternative modes of decoding in the AI stage. We further argued that some of the reassignments of a sense to a stop codon may have occurred as a consequence of low forward reaction rates and premature peptide release resulting from a mispairing between the codon and the anticodon. Finally, we discuss how these in vitro experiments can also be used to shed light on the Unassigned Codon mechanism, which is another possible mechanism of codon reassignment.

The near universality, non-random and highly optimized structure of the genetic code makes it important to understand the evolutionary forces that led a primordial code to its present form. To address this question, we wanted to study the outcome of competition between a finite population of distinct codes.

Revisiting the Physico-Chemical Hypothesis of Code Origin

The origin of the genetic code marked a major transition from a plausible RNA world to the world of DNA and proteins and is an important milestone in our understanding of the origin of life. We examined the efficacy of the physico-chemical hypothesis of code origin by carrying out simulations of code-sequence coevolution in finite populations in stages, leading first to the emergence of ten amino acid code(s) and subsequently to 14 amino acid code(s). We explored two different scenarios of primordial code evolution. In one scenario, competition occurs between populations of equilibrated code-sequence sets while in another scenario; new codes compete with existing codes as they are gradually introduced into the population with a finite probability. In either case, we find that natural selection between competing codes distinguished by differences in the degree of physico-chemical optimization is unable to explain the structure of the standard genetic code. The code whose structure is most consistent with the standard genetic code is often not among the codes that have a high fixation probability. However, we find that the composition of the code population affects the code fixation probability. A physico-chemically optimized code gets fixed with a significantly higher probability if it competes against a set of randomly generated codes. Our results [19] suggest that physico-chemical optimization may not be the sole driving force in ensuring the emergence of the standard genetic code.

Effect of HGT on the Origin of the Standard Genetic Code

The origin of a universal and optimal genetic code remains a compelling mystery in molecular biology and marks an essential step in the origin of DNA and protein based life. We examined a collective evolution model of genetic code origin that allows for unconstrained horizontal transfer of genetic elements within a finite population of sequences each of which is associated with a genetic code selected from a pool of primordial codes. We find that when horizontal transfer of genetic elements is incorporated in this more realistic model of code-sequence coevolution in a finite population, it can increase the likelihood of emergence of a more optimal code eventually leading to its universality through fixation in the population. The establishment of such an optimal code depends on the probability of HGT events. Only when the probability of HGT events is above a critical threshold, we find that the ten amino acid code having a structure that is most consistent with the standard genetic code (SGC) often gets fixed in the population with the highest probability. We examined how the threshold is determined by factors like the population size, length of the sequences and selection coefficient. Our work [25] reveal the conditions under which sharing of coding innovations through horizontal transfer of genetic elements may have facilitated the emergence of a universal code having a structure similar to that of the SGC.

Our view on the origin and evolution of the Genetic Code has been summarized in a comprehensive review article. There we argue that there have been two distinct phases of evolution of the genetic code: an ancient phase prior to the divergence of the three domains of life, during which the standard genetic code was established and a modern phase, in which many alternative codes have arisen in specific groups of genomes that differ only slightly from the standard code. Here we discuss the factors that are most important in these two phases, and we argue that these are substantially different. In the modern phase, changes are driven by chance events such as tRNA gene deletions and codon disappearance events. Selection acts as a barrier to prevent changes in the code. In contrast, in the ancient phase, selection for increased diversity of amino acids in the code can be a driving force for addition of new amino acids. The pathway of code evolution is constrained by avoiding disruption of genes that are already encoded by earlier versions of the code. The current arrangement of the standard code suggests that it evolved from a four-column code in which Gly, Ala, Asp, and Val were the earliest encoded amino acids.

Past Research


Riboswitches are a type of noncoding RNA that regulate gene expression by switching from one structural conformation to another on ligand binding. The various classes of riboswitches discovered so far are differentiated by the ligand, which on binding induces a conformational switch. Every class of riboswitch is characterized by an aptamer domain, which provides the site for ligand binding, and an expression platform that undergoes conformational change on ligand binding. The sequence and structure of the aptamer domain is highly conserved in riboswitches belonging to the same class. We proposed a method [14] for fast and accurate identification of riboswitches using profile Hidden Markov Models (pHMM). Our method exploits the high degree of sequence conservation that characterizes the aptamer domain.Our method can detect riboswitches in genomic databases rapidly and accurately. Its sensitivity is comparable to the method based on the Covariance Model (CM). For six out of ten riboswitch classes, our method detects more than 99.5% of the candidates identified by the much slower CM method while being several hundred times faster. For three riboswitch classes, our method detects 97-99% of the candidates relative to the CM method. Our method works very well for those classes of riboswitches that are characterized by distinct and conserved sequence motifs.

Accurate identification of riboswitches is the first step towards understanding their regulatory and functional roles in the cell. We have developed a new web application named (Riboswitch Scanner) which provides an automated pipeline for detection of riboswitches in partial as well as complete genomic sequences rapidly, with high sensitivity and specificity.

We have also developed a comprehensive database (RiboD) of prokaryotic riboswitches that allows the user to search for riboswitches using multiple criteria, extract information about riboswitch location and gene/operon it regulates. RiboD provides a very useful resource that can be utilized for the better understanding of riboswitch-based gene regulation in bacteria and archaea. A more comprehensive understanding of the origin and evolution of each class of riboswitch requires a detailed picture of the gene-wise distribution of riboswitches across all bacterial species. Using phylogenetic analysis and comparative genomics we have been able to identify the class of genes/operons regulated by the Purine and Lysine riboswitches and obtain a high resolution map of Purine and Lysine riboswitch distributions across all bacterial groups. Our analysis [18,28] sheds light on the origin and evolution of different Purine [18] and Lysine [28] riboswitches.

Among the nearly 40 different classes of riboswitches discovered in bacteria so far, only the TPP riboswitch has also been found in algae, plants, and in fungi where their presence has been experimentally validated in a few instances. We analyzed [29] all the available complete fungal and related genomes and identified TPP riboswitch-based regulation systems in 138 fungi and 15 oomycetes. We find that TPP riboswitches are most abundant in Ascomycota and Basidiomycota where they regulate TPP biosynthesis and/or transporter genes. Our comprehensive analysis [29] of TPP riboswitches in fungi provides insights about the phylogenomic distribution, regulatory patterns and functioning mechanisms of TPP riboswitches across diverse fungal species and provides a useful resource that will enhance the understanding of RNA-based gene regulation in eukaryotes.

Modeling Cell Division in E. coli

Cell division in the bacteria E.coli is controlled by the oscillatory dynamics of three Min proteins (MinC/D/E). Several groups have quantitatively modeled the Min protein dynamics in E.coli with a system of coupled reaction-diffusion equations, and reproduced the oscillatory dynamics of the Min proteins which ensures accurate cell division by proper positioning of the tubulin-like FtsZ protein at mid-cell. However, present models have failed to show how such spontaneous oscillations can be regenerated in the two daughter cells after cell-division. The reason for this appears to be segregation of Min proteins (seen in all models) in one of the two cells after division; which leaves the other daughter cell with inadequate Min protein density to restart oscillations. In collaboration with A.D. Rutenberg, we have investigated the septation process in detail [13] to determine the cause of the asymmetric partitioning of Min proteins between daughter cells. We find that this partitioning problem arises at certain phases of the MinD and MinE oscillations with respect to septal closure and it persists independently of parameter variation. At most 85% of the daughter cells exhibit Min oscillation following septation. Enhanced MinD binding at the static polar and dynamic septal regions, consistent with cardiolipin domains, does not substantially increase this fraction of oscillating daughters. We believe that this problem will be shared among all existing Min models and discuss possible biological mechanisms that may minimize partitioning errors of Min proteins following septation.

We have also developed a 3D off-lattice stochastic polymerization model [17] to study subcellular oscillation of Min proteins in the bacteria Escherichia coli, and used it to investigate the experimental phenomenon of Min oscillation stuttering. We tuned processivity, the rate of immediate rebinding of MinE released from depolymerizing filament tips, protection of depolymerizing filament tips from MinD binding, and fragmentation of MinD filaments due to MinE. We also investigated the effect of a heterogeneous distribution of membrane phospholipids, with polar distributed cardiolipin, on Min oscillations. We found that each of the effects of processivity, protection, and fragmentation can reduce stuttering, speed up oscillations, and reduce MinD filament lengths. Nevertheless, no single mechanism was sufficient to recover fast stutter-free oscillations.

Classification of HIV Sequences

Accurate classification of HIV-1 subtypes is essential for studying the dynamic spatial distribution pattern of HIV-1 subtypes and also for developing effective methods of treatment that can be targeted to attack specific subtypes. We proposed a classification method based on profile Hidden Markov Model that can accurately identify an unknown strain. We first showed that a standard method that relies on the construction of a positive training set only, to capture unique features associated with a particular subtype, can accurately classify sequences belonging to all subtypes except B and D. The failure of the standard method in distinguishing between closely related subtypes B and D can be attributed to certain drawbacks of the method. These are an arbitrary choice of threshold to distinguish between true positives and true negatives, and the inability to discriminate between closely related subtypes. We then demonstrated how an improved classification method based on construction of a positive as well as a negative training set to improve discriminating ability between closely related subtypes like B and D. Finally, we also demonstrated how the improved method can be used to accurately determine the subtype composition of Common Recombinant Forms of the virus that are made up of two or more subtypes. Our method [16] provides a simple and highly accurate alternative to other classification methods and will be useful in accurately annotating newly sequenced HIV-1 strains.


Last  Updated  on  06.02.2023