Heroes of Cyberspace: Claude Shannon

by Charles A. Gimon

for INFO NATION

"We want information...information..." --Number Two, in "The Prisoner"

Claude Shannon isn't well known to the public at large, but he is one of a handful of scientists and thinkers who made our world of instant communications possible. Born in Gaylord, Michigan in 1916, into a fairly well-educated and intellectually stimulating environment, his younger days were spent working with radio kits and morse code, an early start to a promising career. (Later, he would remember Edgar Allan Poe's story "The Gold Bug", with its simple cryptogram, as another early influence.)

In the late 1930s at MIT, he did important work showing how logic could be applied to the design of relay circuits--in short, that the true-and-false of Boolean logic could be the same as the on-and-off of an electric switch. For this pioneering work, important for both phones and computers, Shannon received his doctorate in 1940.

Shannon then spent 31 years at Bell Labs, starting in 1941. Among the many things Shannon worked on there, one great conceptual leap stands out. In 1948, Shannon published "The Mathematical Theory of Communication" in the Bell System Technical Journal, along with Warren Weaver. This surprisingly readable (for a technical paper) document is the basis for what we now call information theory--a field that has made all modern electronic communications possible, and could lead to astounding insights about the physical world and ourselves. Names like Einstein, Heisenberg, or even Kurt Goedel are better known among the people who have led twentieth-century science into the limits of knowledge, but Shannon's information theory makes him at least as important as they were, if not as famous. Yet when he started out, his simple goal was just to find a way to clear up noisy telephone connections.

Now, the subject "information theory" may sound like a postmodern literary thing, or something to do with media criticism. Not at all--information theory has nothing to do with meaning. In information theory, the "content" of the information is irrelevant. Information theory turns all information into quantities, into the on-or-off bits that flip through our computers--not to mention our phones, TV sets, microwave ovens, musical greeting cards, or anything else with a chip in it. As Shannon puts it: "These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen, since this is unknown at the time of design."

So what is information? Information is what you don't know. If the receiver already had the information, you couldn't say that a communication had taken place. We're used to thinking about "information" as facts, data, evidence--but in information theory, information is uncertainty. You have to get a message that you don't know the content of. Information theory talks about the messages you could possibly get, or the number of messages you could choose between. Or if you want to generalize even more, information theory talks about the statistical properties of a message, averaged out over the whole message--without any regard to what the message says.

The immediate benefit of information theory is that it gives engineers the math tools needed to figure out channel capacity--how much information can go from A to B without errors. The information you want is the "signal". The information you don't want is "noise". It sounds simple enough (and engineers had worked on the problem for years before Shannon), but Shannon brought a new, deeper understanding to the situation. (The next time someone complains about the signal-to-noise ratio on Usenet, think of Dr. Shannon.)

In information theory, the more bits of information you have, the more uncertainty you have. The number of possible messages you can make with x number of bits will be two to the x power (since there are only two bits: on and off). Turn this same idea around, and the number of bits you will need to transmit a message will be the base-two logarithm of the number of possible messages.

2^x = M
log(2) M = x

The value x here was given the name "entropy" by Shannon. Usually entropy is measured in "bits per symbol" or a similar relation--if you are using a set of symbols to transmit your message, the entropy is how many bits it will take to represent one symbol. For example, the extended ASCII character set has 256 characters. The base-two logarithm of 256 is 8 (2^8 = 256), so there are 8 bits of entropy per symbol. If one symbol is more likely to follow another (such as "u" often follows "q"), the math gets more complicated, but the relation is still logarithmic.

Shannon used mathematical proofs to show that you could use this measure of entropy to measure the actual capacity of a channel. Not only that, but he proved that for any channel, you could define a capacity for that channel so that if you stay within it, you will have a transmission as free from errors as you would like. There is always a probability of error--noise--but by using the tools of information theory, you can squeeze that probability to an arbitrarily small value. From long distance calls to telnetting across the Internet to satellite uplinks--anyone using modern communications owes a debt to Shannon. Even the everyday compact disc wouldn't be possible without error correction based on information theory.

The startling thing about Shannon's insight is that his equation for information entropy shows the same relation that the Boltzmann equation for thermodynamic entropy does: a logarithmic one. Let's leave the world of telephone switches and network packets for a while and look at the harder world of physics. The entropy of thermodynamics can be described in many ways: as a decrease in order, or as a measure of how reversible a reaction is. (It's easy to mix blue and yellow paint to get green, it's not easy to separate them back out again.) Think about a steam engine. Pistons go up and down, a crank turns, one kind of work is turned into another, but this is always at the cost of a certain amount of waste heat. Some coherent work (the atoms of the piston all moving in the same order) turns into incoherent heat (hot atoms bouncing around at random). You can't do the process backwards, any more than you could make a broken glass jump off the floor and reassemble itself on a table again. And you can't make an engine that will run forever--the reason the engine runs in the first place is because the process is fundamentally unbalanced. This is the Second Law of Thermodynamics: entropy always increases, otherwise nothing would ever happen.

In information theory, the more possible messages you could be receiving, the more uncertainty you have about what actual message you're getting. If you had to track the motion of each atom in the piston of an engine, you could just say they're all moving in the same direction. If you had to track the motion of each atom in the hot exhaust, you would be stuck with lots more information. More information means more entropy.

(Thought experiments about Shannon's information, meaning, and entropy can lead to all sorts of things: you could think of an urban legend (like the poodle in the microwave, or the $250 cookie recipe, or the scuba diver being scooped up out of a lake and dropped on a forest fire) as an engine with low entropy, chugging its way around the Internet fast and efficiently at first, but eventually wearing itself out as its entropy increases.)

Like Boltzmann arond the turn of the century, Shannon took the broad observations people had had about information exchange and put them on a solid mathematical, statistical footing. Shannon's information theory, rather than eliminating noise, allows us to live with it--to keep it out of our hair, and possibly even to make it useful. You can use the entropy of thermodynamics to find out how what percentage of your available energy can be turned into useful work; you can use the entropy of information theory to find out how much of a channel can be used to transmit useful information--the signal.

Information theory has been important for the statistical analysis of languages, and especially for cryptography. In 1949, Shannon published a vital paper for modern cryptography in the Bell System Technical Journal: "Communication Theory of Secrecy Systems". Shannon's precisely defined entropy allows you to measure how many possible decodings there could be for a message. If you are designing a program to encrypt secret messages, you want to get more entropy, since that means anyone who wanted to break the encryption would have to work through that many more possible messages. One-way mathematical functions--math operations that are easy to do one way, but are difficult to unravel, such as factoring--are at the heart of the best modern cryptosystems. Trying to break the encryption is like trying to put that broken glass back together again perfectly, atom by atom--not absolutely impossible, just astronomically improbable.

Researchers have tried to apply information theory in many other fields. Psychologists have looked into the question: how much information can a human take in, process, and output? Some experiments have shown that response times to flashing lights vary logarithmically with the number of lights involved--just like you might expect from information theory. The problem is that humans are mightily complex: a little practice, and the lag in response times disappears.

Other researchers are using the tools of information theory to measure the capacity of nerve tissue. On an even deeper level, DNA itself is just a way to store and forward information--in this case, genetic information. Newer evidence suggests that DNA has built-in error correcting protocols. If DNA contains a program to correct errors in itself, how could random mutations, and so evolution, happen? Work remains to be done in this area. Aging might be understood as a breakdown in DNA's ability to carry information along, an increase in noise as our bodies fall into physical entropy.

If you're really bold, you could even come up with a new way of looking at the universe, where existence lies on a foundation of information, instead of matter and energy.

The entropy of thermodynamics involves decay. If you assume that the universe is a closed system, the whole thing is sliding further into entropy every time anything happens. Every breath you take increases the entropy of the universe by just a little. Life itself could be seen as just a local phenomenon in the slow death of the sun. Yes, the second law of thermodynamics is pretty brutal...but in information theory, more entropy means more information. To some, an eternal increase in entropy means an ever-increasing capacity for information. But remember: in information theory, information is entropy, is uncertainty.

Put it together, and it's almost a parable for twentieth-century science: the more you know, the less certain you are. As information increases, the universe slips away from our capacity to understand it. By combining Shannon's and Boltzmann's concepts of entropy, you can calculate that there is a certain defined amount of energy that you must use to transmit one bit of information. There is no free lunch when you transmit information. You have to spend energy to do it; in fact, when you get information about a physical system, you increase its physical entropy at the same time (just as Heisenberg said: by observing a system, you change it). For those of us in this universe, this means that there are ultimate limits on communication. There is only so much information that you can transmit in a certain period of time, or over a certain astronomical distance. More profoundly, to quote John R. Pierce: "the energy needed to transmit info about the state of a physical system keeps us from ever knowing about the past in complete detail". Information theory suggests pretty strongly that stories of time-travel and alternate worlds will remain science fiction for a long time to come--possibly forever.

It's fairly sobering stuff. For most of us, the sterner implications of information theory are less important that the human-scale devices that it has made possible. Information theory started out as an engineering project. There is a danger that people could be tempted to throw away meaning altogether if information theory becomes a sort of universal explanation for everything. It would be a dull world if everything we know were reduced to strings of ones and zeroes, and the data in our heads were no different from the data in a machine.

Shannon himself has made statements about machines of the future possibly being superior to humans, placing him in a camp with certain other dour late-twentieth-century writers and scientists such as Vernor Vinge and Hans Moravec. But as a private individual, Shannon was no wet blanket. Iin his comfortable semi-retirement, he took up all sorts of interesting pastimes, notably juggling. Legend has it that he could be seen from time to time juggling his way down the hall to his office, and he even wrote a mathematical analysis of juggling. Some minds never rest, even at play.

There would be no Internet without Shannon's information theory--and Internet users benefit from it in every packet they send. Every new modem upgrade, every Zmodem download, every compressed file (which includes any image in .gif or .jpg format), every error-correcting protocol owes something to information theory. You reading this article was made possible by the work of Dr. Shannon.

"The Mathematical Theory of Communication" by Shannon and Weaver is still in print from the University of Illinois Press. For an excellent, broader summary, John R. Pierce's "An Introduction to Information Theory: Symbols, Signals and Noise" is available from Dover Publications. Or follow one of our links:

Duy Nguyen has a nice biography of Claude Shannon at:
http://canyon.ucsd.edu/infoville/schoolhouse/class_html/duy.html

Chris Hillman's Entropy on the World Wide Web page:
http://math.washington.edu/~hillman/entropy.html
has links to lots of advanced pages on information theory.

Jared Saia has an excellent essay on entropy and its implications at:
http://www.cs.unm.edu/~saia/infotheory.html

Steve Mizrach has a provocative essay on information theory at:
http://www.clas.ufl.edu/anthro/noetics/flesh-made-word.html

Tom Schneider's Theory of Molecular Machines page:
http://www-lmmb.ncifcrf.gov/~toms/
leads to pages on his and other sites about the application of information theory to the biological sciences.

The IEEE even has an Information Theory Society--its web page is at:
http://it.ucsd.edu/

Shannon's ideas on juggling are featured at:
http://www.juggling.org/papers/science-1/

Charles A. Gimon teaches an Introduction to the PC class at the English Learning Center in south Minneapolis. He can be reached at gimonca@skypoint.com or ay778@freenet.carleton.ca.