The Origin of Life
(Thanks to all who took the time to comment on my previous post. All the comments were helpful; as one who is attempting to summarize and draw conclusions from diverse areas of science, I sometimes struggle to find the right word or right example. The comments on entropy were particularly helpful.)
The simplest living biological organisms that we know about are bacteria. (I am omitting viruses and infectious proteins because most biologists do not classify them as living due to their dependence on living cells.) Bacteria are one-celled organisms without a well-defined nucleus. Cells without a well-defined nucleus are called prokaryotes. Nucleated cells, like those found in multi-celled organisms are called eukaryotes. Bacteria are typically one tenth the size of eukaryote cells; they are less than 10 micrometers in length.
One of the smallest and simplest bacteria is an organism called mycoplasma genitalium. This bacterium infects the urinary tract of humans and primates. It is less than 300 nanometers long, or about one tenth the size of typical bacteria. It also has one of the smallest sets of genetic code. The amount of genetic code can be measured by the number of “base pairs.” “Base pairs” are a count of the letters of the genetic code that make up the DNA of the organism: A (adenine), C (cytosine), G (guanine) and T (thymine). They are called “pairs” because each letter is paired with another letter in the double-stranded DNA molecule: A is paired with T and C is paired with G. The DNA of the bacterium, mycoplasma genitalium, contains about 583,000 base pairs. The human genome, by comparison, contains about 3.2 billion base pairs.
The DNA of mycoplasma genitalium (M. genitalium) contains code for 482 proteins. Recall that the genetic code for proteins is a group of three base pairs that refer to a specific amino acid of the twenty amino acids that comprise all proteins. The average length of these proteins in M. genitalium is 366 amino acid molecules with a large range from smallest (37) to largest (1805). The amount of DNA needed to code 482 proteins with and average length of 366 is 529,236 base pairs or about 91% of the genome. For humans, the corresponding percentage is about 1.5%. In humans, the overwhelming majority of DNA does not directly code for proteins and its function remains somewhat uncharted. (Although recently some light has been shed on this part of our genome by the ENCODE project.)
One of the smaller proteins of M. genitalium is known as “P47633” and functions as part of a protein complex known as a protein folding chaperone or “chaperonin.” P47633 is 110 amino acids long and is the “cap” to a much larger chaperonin, P47632 (543 amino acids). The complete chaperonin of both P47633 and P47632 provide an isolated, barrel-shaped environment in which proteins can fold properly. (Some proteins will fold properly without chaperonin.) Proper folding is absolutely essential to protein function and misfolded proteins in humans have been correlated with certain diseases.
If nature had attempted to form a molecule as simple as P47633 with 110 amino acids by random chance, she would have had to search for a unique combination of 110 amino acids out of approximately 20110 (each of 110 positions can be filled by any of 20 amino acids)! That is a huge number: approximately 1 followed by 143 zeroes! A more realistic calculation would take into account that some amino acids can be substituted for others, but also that amino acids in nature come in two varieties (left and right handed) and biological molecules are only formed from left handed versions. But the number would still be huge. If nature could search at the rate of one combination in the smallest unit of time possible (Planck time), it would take about 10100 seconds to find the protein. That is well beyond the age of the universe (about 1017 seconds). Even if the search were taking place at multiple locations (say 1080 different locations—the number of protons and neutrons in the universe), the length of time would still exceed the age of the universe. And that’s just for one small protein.
This simple calculation plus the fact that life requires numerous proteins, many of them much longer than 110 amino acids, have led many questers after the origin of life to discount the role of random chance. Some think that the laws of nature are favorable to life, as I do. Some think that the earliest organisms must have been much simpler and that more complex organisms such as M. genitalium would have been the product of natural selection. The natural question to ask at this point is how simple were the initial forms of life? Or, how simple could they have been?
If we knew how complex the initial living cells were, we could then evaluate whether such cells were the likely product of random molecular encounters. When I first heard of the Miller-Urey experiment in high school chemistry class, I had the impression that soon we would know how life began. That was over 50 years ago and we don’t appear to be much closer to solving the mystery of the beginning of life. The Miller-Urey experiment showed that some amino acids could be produced in an atmosphere of water vapor, ammonia, methane, and hydrogen by passing an electric spark through those chemicals. The concept presented to me then was that life arose from a primordial chemical soup formed by such chance events. That idea has since been discredited. The current thinking is that life began deep underground or underwater near a thermal source of energy. But the key question remains: did it arise by chance or do the laws of physics and chemistry favor the creation of life?
We don’t know how complex the initial forms of life were, so I am using the simplest example of life that we now have to illustrate a point about the role for random chance in the beginning of life. M. genitalium is one of the simplest living organisms that biologists have studied. It has the additional advantage of being the subject of the Minimal Genome Project which seeks to find the simplest possible genome. Toward that objective, each of the 482 protein coding genes of M. genitalium was individually and systematically deleted until a viable cell with 382 proteins was created in the laboratory. The Minimal Genome Project provides a lower bound on the complexity of a viable living organism capable of both metabolism and replication.
Metabolism is the ability of cells to produce and use energy for homeostasis, or the ability of a cell to maintain itself in its typical environment. Replication is the ability for a cell to pass along essential information to its progeny. Metabolism is primarily protein driven biochemistry and Replication is primarily DNA / RNA driven cell division. Since cell division involves metabolism (energy must be expended for a cell to divide), then replication requires some minimal functioning protein based chemistry. Conversely, metabolism in the modern cell requires DNA directed protein creation, completing what is known as the “chicken and egg” conundrum for origin of life researchers. Physicist Paul Davies sums up the puzzle: “It is hard enough to imagine one of them forming by chance, but to suppose both nucleic acids [DNA] and proteins were happy chemical accidents occurring at the same time and place stretches credulity.”
Not all scientists agree that the simplest original living organism had both the capability for metabolism and replication. There appear to be two camps: one favoring metabolism priority and one favoring replication priority. For example, Freeman Dyson has put forth an abstract mathematical model for metabolism priority that requires only 8 to 10 monomers. In Dyson’s model, monomers could be amino acid molecules, so, if the model is predictive, it would indicate that cells with a stable metabolism could be achieved from proteins built from about half the amino acids we now have. Dyson also indicates that “a few hundred polymers [proteins]” would be sufficient. The problem with metabolism priority is that the resulting cells have no way to reliably pass on the precise composition of its proteins to its progeny (cell division would occur through external events and splitting due to growth). Dyson argues that imprecise replication would be sufficient and actually better than an error-prone, directed replication.
If Dyson’s model is accurate, then the earliest cells would be appreciably less complex than M. genitalium with each protein needing between 10 and 100 amino acid molecules. And there would be only a maximum of 10 amino acids. This would reduce the probability due to random chance to between 1 chance in 1010 and 1 chance in 10100, a large range with the lower number (1 chance in 10 billion) within reach of a reasonable random search. Still, Dyson’s model needs “a few hundred polymers [proteins],” and the combination of over 100 proteins with the simplest protein needing 10 amino acids will give a large space for random combinations. Dyson doesn’t say how many varieties of proteins there might be.
But Dyson’s model doesn’t completely rely on random chance. The model contains provisions for nature to favor life. Part of his model is a table of probabilities that the correct “proteins” will be formed from monomers (amino acids). These factors are called catalyst “discrimination factors” and Dyson characterizes them as “reasonable for the discrimination factor of primitive enzymes.” Their values in his model range from 60 to 100. He goes on to say:
A modern polymerase enzyme typically has a discrimination factor of 5000 or 10000. The modern enzyme is a highly specialized structure perfected by three thousand million years of fine-tuning. It is not to be expected that the original enzymes would have come close to modern standards of performance. On the other hand, simple inorganic catalysts frequently achieve discrimination factors of fifty. It is plausible that a simple peptide catalyst with an active site containing four or five amino acids would have a discrimination factor in the range preferred by the model from sixty to one hundred.
This is significant because metabolism requires enzymes (a type of protein; Dyson’s “peptide catalyst”) which act as catalysts to speed up reaction rates. Without enzymes, reaction rates would be too slow to sustain life. One example is the way that we metabolize sugar for energy. Table sugar is called sucrose and is a combination of glucose and fructose, but glucose is the sugar that we best metabolize. So sucrose is first split into glucose and fructose. Most table sugar comes from either sugarcane or sugar beets which create the sugar through photosynthesis.
The chemical reaction to split sucrose simply requires water and can occur spontaneously, but table sugar placed in a glass of water would not dissociate fast enough to be useful. The reason the reaction would not take place quickly is because there is a cost in energy to break the bonds between glucose and fructose so that the water can interact. Given enough time and heat, thermal activity would eventually begin to break the bonds between glucose and fructose. However, we have an enzyme in our small intestine named “sucrase-isomaltase” which is able to greatly speed up the reaction.
Sucrase-isomaltase is a dual enzyme with the sucrase portion able to split sucrose into glucose and fructose while the isomaltase portion breaks apart the starch from grains. The entire enzyme is 1877 amino acids long with the sucrase portion being 820 in length. The sucrase portion works by locking onto the dual sugar sucrose and by proximity to its own molecular structure lowers the energy cost of breaking the covalent bond between glucose and fructose. A water molecule is then able to intervene and complete the disassociation of the two sugars. There are thousands of enzymes needed for metabolism; life would not be possible without them, so the enzyme efficiency is a key factor in any theory about the beginning of life. Each enzyme is incredibly specific so that, in the case of sugar metabolism, a separate enzyme is needed for sugars from grain (the isomaltase portion). Enzyme specificity is created by the sequence of amino acids and the shape of the protein. Protein folding is a key factor.
Freeman Dyson is a physicist who has turned his attention to the chemistry of life’s beginning. Frank Anet, a professor of biochemistry at UCLA, has criticized Dyson’s model as too simple. This is a charge that Dyson freely admits since his objective was to start the conversation on the metabolism first approach, which is clearly a minority position. The main benefit of Dyson’s model is that it gives mathematical results and Dyson insists that the real proof will be in experimental results. But I fail to see how Dyson’s model could be sufficiently convincing to generate much interest from research labs.
Professor Anet goes on to level a more serious criticism of Dyson’s model:
The range of required discrimination factors is comfortably less than the discrimination factor of several thousands in modern enzymes, as would be expected, and it is similar to the discrimination factor of simple inorganic catalyst. However, the nature of these inorganic catalysts is not given, nor are the catalysed reactions, nor is any reference to the literature provided. No reference is made to any experimental discrimination factors by oligopeptides [small protein enzymes] in catalytic reactions involving closely similar compounds, such as amino acids, which would be the appropriate reference systems. Such a large discrimination factor, it must be stressed, is far more difficult to achieve than mere catalysis. Dyson’s oligopeptides have on the order of 20 monomers, with an ‘active site’ of perhaps five monomers. However, the other fifteen monomers are important in determining the folding of the polymer and therefore also the catalytic efficiency of the active site. Additionally, with such small oligopeptides, the folding is likely to be poorly defined. Thus, it can be concluded that Dyson has no good experimental evidence for choosing high discrimination factors, which are probably too high by at least an order of magnitude. Unfortunately, this destroys his model.
I’m sure that Dyson would repeat his caveat that he is only pointing the way and that empirical results are the domain of chemists, not physicists. Nevertheless, until there are definitive results that lend credibility to small effective organic enzymes, I think metabolism priority will continue to be ignored. Professor Anet also reviews several other researchers who do attempt to find more specific results, but finds them all lacking.
Professor Anet is a proponent of the “RNA world” approach. This is currently the majority position for origin of life research. It is a form of replication priority, but has the advantage that RNA has been demonstrated to function as a catalyst in some situations. Only 6 monomers are needed to form RNA. There is ribose, a sugar that has been demonstrated to form from a reasonable pre-biotic environment on earth. There is a phosphate group that combines with ribose to form the RNA backbone. And there are the four bases for RNA, similar to the four bases for DNA: A (adenine), C (cytosine), G (guanine) and U (uracil). The basic idea is that these 6 simple molecules were available in the pre-biotic environment and that by random movement came together to form RNA. Further chance encounters lead to larger RNA polymers than can replicate themselves and catalyze protein formation. Once replication begins, natural selection can operate leading to the more efficient and stable environment of protein enzymes and DNA code.
The RNA approach is an attractive picture, but strong doubts have been cast on the RNA scenario by Robert Shapiro (1935-2011; no relation to James Shapiro), previously Professor Emeritus of chemistry at NYU. Professor Shapiro has argued that the basic components of RNA were extremely unlikely to have formed in the early earth environment. In particular, the spontaneous formation of ribose cannot proceed in the presence of nitrogen which the four RNA bases need for their formation. Both ribose and one of the bases (cytosine) have a relatively short half-life and it is therefore unlikely that they could be formed at separate locations and then brought together by chance. If life began near high temperature thermal vents, then formation of RNA is even more unlikely. Robert Shapiro became an advocate of the metabolism priority approach after criticizing the RNA world for several years.
Professor Anet was well aware of Shapiro’s criticism and commented in 2004:
From [my] analysis . . ., it does not seem that the metabolism-first theories are ‘robust’ (or to be recommended), as claimed by Shapiro. On the other hand, Shapiro has stressed some very serious weaknesses of the replication-first theories. But this does not mean that a satisfactory replication-first theory is impossible, although theories . . . that require activated nucleotide monomers to be available prebiotically are not really acceptable. The replication-first approach does not require the existence of a primitive organic soup, it should be stressed, and local conditions on Earth may have been quite varied. Shapiro admits that new discoveries or ideas could lead to more optimistic conclusions on the viability of the replication-first approaches. Some new developments that have appeared after the publication of Shapiro’s paper will now be outlined briefly.
Anet goes on to catalog several recent developments, but, in 2004, the most convincing results were not yet available. I am speaking of the momentous 2009 experiments by Tracy Lincoln and Gerald Joyce that showed that RNA could replicate itself in the lab. True, there were several constraints and limitations on the test, but it did show that sustained replication could take place with RNA alone, albeit under artificial conditions. As positive as these results are, they required relatively long RNA molecules of 189 nucleotide bases. Some researchers think this can be shortened to 100 bases, but even 100 bases gives a very large search space for assembly by random chance: 4100 or 1 followed by 60 zeroes. That is such a large space because the half-life of RNA is measured in hours; RNA degrades relatively quickly.
Professor Shapiro gets the last word on this even though he spoke in 2007, 2 years before the Lincoln-Joyce results: Dr. Shapiro asks his audience of scientists to imagine a large pile of Scrabble letters. Then he added, “If you scooped into that heap, and you flung them on the lawn there, and the letters fell into a line which contained the words, ‘To be or not to be, that is the question,’ that is roughly the odds of an RNA molecule, given no feedback —and there would be no feedback, because it wouldn’t be functional until it attained a certain length and could copy itself—appearing on earth.” (If you do the math, Shapiro’s odds are about 1 in 1057!)
Given the ongoing and provisional status of research into the origin of life, what can we know with certainty? And what does that knowledge bring to bear on the central theme of my writing: that there is a conscious, rational power at work in the universe without recourse to supernatural abilities? Another way to frame the question is to ask: are the laws of physics and chemistry favorable to life and consciousness? The one thing we can count on is that all of modern life is based on central dogma of molecular biology: 1) proteins, the primary workhorses of the cell, must be composed of the correct sequence of amino acids and folded into correct shape for them to be effective. 2) Proteins are created through the intermediary of RNA, acting along with other proteins. 3) The instructions for assembling proteins come from messenger RNA which is transcribed from DNA which contains both the genetic code for proteins and instructions for expressing that code. The key point: the importance of DNA for protein assembly is its information content, not its chemical characteristics.
The modern cell is a protein manufacturing and information processing organism. DNA contains the coded sequence of amino acids for proteins. Information in the DNA drives the protein manufacturing work. I worked in information processing for my entire career of more than 35 years. There is no power in nature other than consciousness and intelligence that can create an information processing system. A complete “artificial intelligence” solution to the software development bottleneck has been the holy grail of software management for decades. It has not appeared nor will it appear. There will be improvements in automated design, but there are sound mathematical reasons to think that software development cannot be totally replaced by automation. Modern computer systems have the mathematical property of Gödel incompleteness: they will always need an intelligent agent to make additions and improvements in them. The bottom line is that a conscious, intelligent power that expresses itself through the laws of nature in the ultimate power behind all of life. In other words, decisional consciousness is a property of nature. This has been my conclusion from physics as well. Random chance alone cannot account for the origin of life.
I think that research will eventually find smaller and possibly more primitive life forms, if not on Earth, if not in the lab, then possibly on Mars or some other nearby planet or moon. I have taken the position that the laws of physics are favorable to life and therefore I think that life developed gradually and incrementally from ordinary matter that I think has been imbued with consciousness from the beginning. Once life began, the power of consciousness in ordinary matter became expressed as natural selection. Once natural selection began, something like DNA would essentially be a requirement for life so it could store the code for the wonderful protein inventions discovered through natural selection. Once evolution became advanced, then advanced consciousness would be a natural consequence of this information-rich system.