Most of us use the term “information” to describe some kind of, or a piece of knowledge. When we say that so-and-so passed on such-and-such piece of information, we mean that so-and-so told us such-and-such that we did not know before. However, we now know, thanks to what we have been told. Information equals knowledge, in other words.

The first definition of information in Webster’s dictionary reflects this idea: information is “the communication or reception of knowledge or intelligence.” This idea of information often confuses individuals at first when we begin to talk about information stored in a molecule. It can be said that DNA stores the “know-how” for building molecules in the cell. Yet since neither DNA nor the cellular machinery that receives its instruction set is a conscious agent, equating biological information with knowledge in this way did not seem to quite fit our Webster’s definition. The dictionaries point to another common meaning of the term that would apply to DNA. Webster’s, for instance, has a second definition that defines information as “the attribute inherent in and communicated by alternative sequences or arrangements of something that produce specific effects.” Information, according to this definition, equals an arrangement or string of some kind of characters, specifically characters that accomplish a particular outcome or performs a function of communication.

In common usage, we refer not only to a sequence of English letters in a sentence, but also to a block of binary code in a software program as information. A simple concept to understand and you need to before going on in this series. In this particular sense, information does not require a conscious recipient of a message; it refers to a sequence of characters that produces some specific effect. What the effect is, we do not need to know-although that is our ultimate goal to determine what that sequence of characters ultimately means. This definition suggests a distinct sense in that DNA contains **INFORMATION**. DNA contains “alternative sequences” of nucleotide bases and can produce a specific effect.

Now, for the conceptual point of this article. Neither DNA nor the cellular machinery that uses the DNA information is conscious (directly perceptible to and under the control of the receptor of the information received). However, neither is a paragraph in a book, or a section of software, or the hardware in the computer that “reads” it. Clearly, software contains some kind of information. I first began to think about the DNA enigma in about 1995, after having the opportunity to design software for a medical company. At the time, as I began to consider the vast amount of processing and storage of a wide variety and types of information on computers and it’s retrieval in a recognizable manner, I began a study of the science of information storage, processing, and transmission called “information theory.”

Information theory was developed in the 1940s by a young MIT engineer and mathematician named Claude Shannon. Shannon was studying an obscure branch of algebra and few if any people were paying attention. Shannon had taken nineteenth-century mathematician George Boole’s (from which contains Boolean algebra) system of putting logical expressions in mathematical form and apply the categories of “true” or “false” to switches found in electronic circuits. His master’s thesis was called “possibly the most important, and also the most famous, master’s thesis of the century.”[i] It eventually became the foundation for digital-circuit and digital-computer theory. Nor was Shannon finished laying these foundations. He continued to develop his ideas and eventually published “The Mathematical Theory of Communications.” Scientific American later called it “the Magna Carta of the information age.”[ii] Shannon’s theory of information provided a set of mathematical rules for analyzing how symbols and characters are transmitted across communication channels.

As my interest in the origin of life increased with more contact with the medical field, I read more and more about Shannon’s theory of information. I learned that his mathematical theory of information could be applied to DNA, but there was an unusual catch. Shannon’s theory of information is based upon a fundamental intuition: information and uncertainty are inversely related. The more informative the particular statement is, the more uncertainty that it gets rid of. For example, I live in West Texas. If you were to tell me that it might snow in January, that would not be a very informative statement. In the 22 years I have lived here, we have had from a dusting of snow to ten inches 17 out of those 22 years. So telling me that does not reduce uncertainty. However, I know very little about what the weather will be like in Boise, Montana. If you were to tell me that on May 18 last year, Boise had an unseasonably cold day resulting in a light dusting of snow that would be an informative statement. It would tell me something I could not have predicted based upon what I know already— it would reduce my uncertainty about Boise weather on that day.

Claude Shannon wanted to develop a theory that could quantify the amount of information stored in or conveyed across a communication channel. He did this in two steps: first by linking the concepts of information and uncertainty and second by linking these concepts to measures of probability. According to Shannon, the amount of information conveyed (and the amount of uncertainty reduced) in a series of symbols or characters is inversely proportional to the probability of a particular event, symbol, or character occurring. OK, let us think about this for a minute and what it actually means. I am sure examples would help.

Imagine rolling a six-sided die (single for a pair of dice). Also, think about flipping a coin. The die comes up on the number 6.

The coin lands on tails.

- Before rolling the die, there were six possible outcomes.
- Before flipping the coin, there were two possible outcomes.

The throwing (or formally ‘the cast’) of the die eliminated more uncertainty and, in Shannon’s theory, conveyed more information than the coin toss. Notice that the more improbable event (the die coming up 6) conveys more information; this becomes a central point in the DNA enigma later. By equating information with the reduction of uncertainty, Shannon’s theory implies a mathematical relationship between information and probability. Specifically, it shows that the amount of information conveyed by an event is inversely proportional to the probability of its occurrence. The greater the number of possibilities, the greater the improbability of any one event actually occurring, and therefore the more information that is transmitted when that particular possibility occurs.

Shannon’s theory also implies that information increases as a sequence of the characters in the information grows. The probability of getting heads in a single flip of a fair coin is 1 in 2. The probability of getting four heads in a row is 1/ 2 × 1/ 2 × 1/ 2 × 1/ 2, that is, (1/ 2) 4 or 1/ 16. Therefore, the probability of attaining a specific sequence of heads and tails decreases as the number of trials increases. The amount of information provided increases correspondingly.[iii]

Think of it this way, it might help. A paragraph contains more information than the individual sentences of that make it up; a sentence contains more information than the individual words in that sentence. All other things being equal, short sequences (sentences) have less information than long sequences (sentences). Shannon’s theory explains why in mathematical terms: improbabilities multiply as the number of characters (and combination of possibilities) grows. The important thing for Shannon was that his theory provided a way of measuring the amount of information in a system of symbols or characters.

His equations for calculating the amount of information present in a communication system could be readily applied to any sequence of symbols or coding system that used elements that functioned in a manner similar to alphabetic characters. Within any given alphabet of x possible characters (where each character has an equal chance of occurring), the probability of any one of the characters occurring is 1 chance in x. For instance, if a monkey could bang randomly on a simplified typewriter possessing only keys for the 26 English letters, and assuming he was a perfectly random little monkey, there would be 1 chance in 26 that he would hit any particular letter at any particular moment.

The greater the number of alphabetic characters in use in the system (the greater the value of x), the greater the amount of information conveyed by the occurrence of a specific character in a sequence. In systems where the value of x is known, as in a code or language, mathematicians can generate precise measures of information using Shannon’s equations. The greater the number of possible characters that can be at each place in the sequence and the longer the sequence of characters, the greater the Shannon information associated with the sequence.

Remember I said there was a catch. Well, here it is. Shannon’s theory and his equations have provided a powerful way to measure the amount of information stored in a system or transmitted across a communication channel, but it has an important limit. That limit is that Shannon’s theory did not, and could not, distinguish merely improbable sequences of symbols from those that conveyed a message or “produced a specific effect”— as Webster’s second definition puts it.

As one of Shannon’s collaborators, Warren Weaver, explained in 1949, “The word information in this theory is used in a special mathematical sense that must not be confused with its ordinary usage. In particular, information must not be confused with meaning.”[iv] Ok, I wonder what that means, I hear you saying. Consider two sequences of alphabetic characters: “In the beginning God created” “kd lse ebmgtxodq Pmw wpfuzjf” Both of these sequences has an equal number of characters. Both are composed of the same 26-letter English alphabet, the amount of uncertainty eliminated by each letter (or space) is identical and the probability of producing each of those two sequences at random is identical. Therefore, both sequences have an equal amount of information as measured by Shannon’s theory. So what is the difference? One of these sequences communicates something, while the other one does not. Why is that?

Clearly, the difference has something to do with the way the alphabetic characters are arranged. In the first instance, the alphabetic characters are arranged in a precise way to take advantage of a preexistent convention or code— that of English vocabulary— in order to communicate something. When those words were written in that specific sequence it was to invoked specific concepts— the concept of “In,” the concept of “beginning,” and so on— that has long been associated with specified arrangements of sounds and characters among English speakers and writers. That specific arrangement allows those characters to perform a communication function. In the second sequence, the letters are not arranged according to any established convention or code (except maybe that known as gobbledygook) and therefore is meaningless.

Since both sequences are composed of the same number of equally improbable characters, both sequences have a quantifiable amount of information as calculated by Shannon’s theory. Nevertheless, the first of the two sequences of alphabetic characters has something— a specificity of arrangement— that enables it “to produce a specific effect” or to perform a function, whereas the second sequence does not. That is the catch. Shannon’s theory cannot distinguish functional or message-bearing sequences from random or useless ones. The theory can only measure the improbability of the sequence as a whole. It can quantify the amount of functional or meaningful information that might be present in a given sequence of symbols or characters, but it cannot determine whether the sequence in question “produces a specific effect” or is in fact meaningful.

For this reason, information scientists will often say that Shannon’s theory only measures the “information-carrying capacity,” as opposed to the functionally specified information or “information content,” of a sequence of characters or symbols. This generates an interesting paradox. Long meaningless sequences of alphabetic characters can have more information than shorter meaningful sequences of alphabetic characters, as measured by Shannon’s information theory.

This suggests that there are important distinctions to be made when talking about information in DNA. It is important to distinguish information defined as “a piece of knowledge known by a person” from information defined as “a sequence of characters or arrangements of something that produce a specific effect.” The first of these two definitions of information does not apply to DNA, the second does. However, it is also necessary to distinguish Shannon information from information that performs a function or conveys a meaning. We must distinguish sequences of characters that are (a) merely improbable from sequences that are (b) improbable and specifically arranged to perform a function. In other words, we must distinguish information-carrying capacity from functional information. So what kind of information does DNA possess, Shannon information or some other?

For that we go back to: Will be completed in a week

[i] Gardner, The Mind’s New Science, 11.

[ii] Horgan, “Unicyclist, Juggler and Father of Information Theory.”

[iii] Shannon, “A Mathematical Theory of Communication.” Information theorists found it convenient to measure information additively rather than multiplicatively. Thus, the common mathematical expression (I =– log2p) for calculating information converts probability values into informational measures through a negative logarithmic function, where the negative sign expresses an inverse relationship between information and probability.

[iv] Shannon and Weaver, The Mathematical Theory of Communication, 8.