A Brief History of Data Storage

This useful infographic tracks the timeline of data storage progress.

For the next 5000 years, this was as close as humanity would get to a flash drive.

The first storage medium was baked clay tablets, invented by the Sumerians in 3400 BC. However, virtually no progress was made on this front for over five millennia (the only real improvement in the pre-computer age was the invention of microfiche).

The 20th century, with the rise of electromechanical calculating machines (and eventually computers), brought about several inventions, such as magnetic tape, compact discs, DVDs, blue-ray discs, platter hard drives, and solid-state drives. Clay tablets have been replaced by electronic tablets with tens of gigabytes of storage capacity. The world’s largest data warehouse holds 12.1 petabytes of data.

I haven’t delved heavily into speculative future storage technologies, since I find technology speculation an inherently tiresome and frustrating task. Suffice it to say, brighter minds than me say we are not even close to reaching the maximum theoretical data storage density.

How “big” are we talking? What exactly is a big dataset?

As an easy cop-out, I could just give you the honest answer: Everyone has their own definition. But in the interest of keeping my readers entertained and informed, I will press forth with more inquiry. According to Wikipedia though, data storage requirements increase year-by-year, making a “large dataset” one year old news the next. Most datasets are handled at the individual PC level, which a single machine can handle. However, larger datasets require parallelized computer arrays to handle effectively. I myself handled datasets during my PhD work of several gigabytes. I recall a talk at my university from a computational biologist who said she routinely worked with datasets of 70 terabytes. The Large Hadron Collider produces over 700 TB a year. Large corporations deal with data in the hundred terabyte range. The interesting thing, is that despite the enormous advances made in storage capacity, the same fundamental technology of storing information on hard drive platters is in use (with the relatively recent advance of solid-state drives).

Rising Waters, Sinking Hope

But all is not well in Camelot. The 21st century will have vastly higher data requirements than the 20th, which is being driven by the proliferation of cheap sensor technologies. Not only are they cheap, but the data they collect is very trustworthy. These sensor platforms are the real driving force behind the explosion of data taking place today. According to this TechRadar article, the rate of data production is exceeding the rate of capacity production. According to the fellow in that interview article, simply building more data warehouses with current technology would cost “in the hundreds of billions”, and thus is not a very realistic solution to the data storage problem. As a result, Big Tech is investing in a host of new technologies, none of which I shall bother with recounting here until they are actually reduced to practice.

More Email Flooding?

I predict in the future we will need sophisticated, personal AI “adjutants” to help us process all of the data coming into our senses on a daily (or even hourly) basis. Such technologies are already under development for mundane tasks such as scheduling meetings. While the AMY bot could have its uses, I do not think it will be adequate for the enormous datastreams people will be contending with in the future. Heck, scheduling meetings isn’t even that difficult. Actually extracting useful knowledge from your email deluge is a more worthwhile (and profitable) goal.

Silicon Valley and the US Government – a Match Made in Hell

When I was a kid, I often admired the hacking scene and its commitment to freedom. Hackers from a prior age had that “Don’t Tread On Me” spirit, and had the fortitude to question authority. Of course, I went in a different direction in high school and college, as I was more interested in chemistry than technical computing.

Obviously, things have changed. I am skeptical of Silicon Valley’s intentions regarding their huge stores of data. Abandoning their counter-culture roots, the tech titans have willingly collaborated with the government to turn this country into East Germany writ large. In retrospect, it isn’t really all that surprising. Silicon Valley had no qualms with turning Asia into the world’s biggest sweat shop – what compunction would they have about turning America into the world’s largest jail? What was once a proud, free republic has collapsed into an electronic police state – all with the people’s silent, obedient consent.

Metadata = Real Data

We are told “its just metadata“, and therefore of limited worth to the government. This of course, as it comes from the mouths of our politicians, is balderdash. I recall from Simon Singh’s “The Code Book” several direct examples of metadata being used during WW2 to aid in prosecution of the war effort. Encrypted German radio messages could be deciphered without effort, simply by identifying where they came from and what action the enemy took after the message was emitted. Such methods could also be used to break the German encryption, by comparing the ciphertext with what the highly-probably plaintext of the message was.

The government’s “defense” that the metadata is not as important is a willful misdirection.

Consider two cases, one where the government has just the message, and the other case where the government has full metadata on a phone call.

Just-the-message case

“I’ll meet you at the 7-11.”

Metadata case

  • Who called who?
  • When, where, what direction they were walking in?
  • How many rings before the call was answered?

Which contains more information? The first case is the “Why”, the second case is “Who? What? Where? When?” Is it that difficult to infer the “Why?” from those other variables?

Not really friends.


