Mavrik Hard: human memory

This is another ‘blog-of-one’- a blog destined to be read only by the writer – that probably constitute 99% of all blogs. According to Lotka’s law, only a minuscule proportion of all bloggers will have significant readership.

The upside of being a blog-of-one is that you enjoy considerably more freedom in your thought processes than if you were trying to communicate effectively with others. Since you are talking to yourself, you can loudthink a lot of zany ideas and ride your hobbyhorse untroubled by extraneous considerations. You can be brief to the point of being cryptic, assuming that if you ever read this stuff again, you will be able to retrace paths through tangled thickets.

The question that I want to pose here is this: when will the amount of information exceed the capacity of humans to comprehend it? Or have we already passed this milestone? The motivation for asking this question is simple: we are assured that information grows at an exponential rate, with a doubling time of about 5 years (?). Human population also grows exponentially with a doubling time of about 30 years – and some demographers would argue that population should stabilize within the next century or so (?). Thus, at some point one would expect information overload. A few centuries ago, the so-called Renaissance Man was a colossus who could span the entire range of human knowledge. Today a researcher finds it difficult to keep up with even one field of study.

Note, however, that this question can be answered in several different ways: there is no unique answer. The other peculiar thing is that it requires one to ask and answer several related questions: What do we mean by information? What is the capacity of an individual human (this capacity, if it is limited, may also vary from person to person)? Does this vary with lifespan? How much do we rely on computers to search for information? Can we rely on computers? How do we store all this information?

Do we define information as raw data or as processed data? If the former, this would include all the bits and bytes that are used to digitize images (photos, videos, movies), 0’s and 1’s in assembly level computer programs, many forms of digital data (e-mails, web pages, instant messages, phone calls etc) and a huge amount of analog data that would have to be digitized at some resolution. This is the approach reported on in an article in the Hindu (3^rd March 2007), entitled:” Time to learn your exabytes and zettabytes”, which quotes from a report by a technology firm IDC. IDC determined that the world generated 161 billion gigabytes (161 exabytes) of digital information last year. This is considered to be three million times the information in all books ever written. Previously, UC Berkeley researchers had estimated an amount of 5 exabytes for global information produced in 2003. This estimate included analog data (which IDC did not) such as memos and radio broadcasts, but only considered original data – not counting the data that was merely copied (unlike IDC). Had IDC excluded copying, the data would come down to about 40 exabytes. IDC estimated world storage capacity of information at 185 exabytes last year, projected to increase to 601 exabytes by 2010 – as compared to 161 exabytes generated last year, with 988 exabytes (close to 1 zettabyte) in 2010. The gap may be reduced if not all information is stored: e.g. if phone call records more than 5 years old are deleted.

Clearly, this type of digital data is going to increase enormously. However, if we refer to only processed data as information, this amount is going to be considerably less. The question then changes to: how do we compress data to answer specific queries or search for particular patterns? For example, if we identify a weather phenomenon as El Nino, that cluster of data that repeats in space and time qualifies as information – not the raw data of surface sea temperatures. This argument has two problems: the same primary data can be analyzed in different ways. The number of different ways is not indefinitely large, but each time, one has to return to the primary data for validation. Secondly, there may be cases where the data are inherently incompressible: e.g. when we have phenomena described by irrational numbers or by fractals. The second class of problems may not be serious in most cases because of the existence of noise and fluctuations in the real world, but one may not rule the argument out in principle in all cases.

In a sense, the information explosion may be considered along with Malthus’s prediction of a population explosion. There could be a mapping of food on to population, and population on to information. Contrary to Malthus, human population has stabilized without major disasters. However, information continues to be added at an exponential rate and shows no signs of stabilizing.

It may be asked: does information always increase monotonically? Examples to the contrary include the burning of the library of Alexandria, the fact that formats may become unreadable, scripts unreadable and whole languages lost irretrievably (the Cretan and Indus Valley scripts are instances). However, the overall trend despite such glitches seems to be one of increasing information. The so-called Dark Ages were dark in Europe, but not so in the rest of the world. Today information is scattered world-wide, but the places at which the generation of information is fastest may still shift around.

As the human species pushes the envelope, for example, to newer environments (outer space and the oceans, maybe manmade cities/worldlets in these places), the amount of information generated is bound to increase in its attempt to understand these new possibilities. These new places may also require new sensory modalities or means of motor control (remote or autonomous) which will also expand horizons.

Mavrik Hard

Saturday, September 1, 2007

human memory

No comments:

Blog Archive

About Me