The Metro New York City chapter of ARMA International has a fabulous program designed to help records and information management professionals develop skills in speaking and presenting, and last night they asked me to share a few thoughts on the topic. Here is a handout that I created for my discussion.
Here are the slides from the webinar we just completed on “Solving Shared Drives.” I personally don’t find slides divorced from their presentation that useful, but this will give you a flavor of what we talked about. Also, we will be shortly following up with a whitepaper on the topic as well as the recording of the webinar, so look for that too.
In today’s slide, we explore the simple but powerful statistic that, for every 1044 pages of evidence preserved, captured, copied, collated, reviewed and handled in e-discovery, only 1 is actually produced. What does this metric tell us? Well, certainly it tells us that there are . . . ahem . . . some problems with e-discovery. But of greater interest – at least to me – is that it shows how profoundly we are failing at information governance. After all, if all of this stuff didn’t exist in the first place, we wouldn’t be spending millions of unnecessary dollars to handle this junk (like a TSA agent?) in the first place. How much of this content is duplicate or near-duplicate? How much of it could have been – and should have been – thrown away years before litigation commenced?
On the first question, some estimate that the amount of duplicate information in the known digital universe is 75% (IDC). On the latter question, when we assess our client’s information environment, it is not uncommon to find that over 50% of all content they have in storage is past its due date, and should have been – according to the client’s own policies – tossed long ago.
Email me if you would like the original PowerPoint file. (btblair at vialumina.com). You can find the report these numbers come from here (opens a PDF).
“Data growth is the biggest data center hardware infrastructure challenge for large enterprises, according to a new survey by research firm Gartner Inc.”
Data growth remains IT’s biggest challenge, Gartner says, Computerworld, November 2, 1010
Building on the theme I started with last week’s PowerPoint slide (which has been viewed hundreds times in a few days thanks mostly to a posting on an Australian records management listserv [thanks, Andrew]), I thought I would share another slide that I use to tell the information governance story.
We have all seen the studies that attempt to quantify the amount of information on the planet. The first one that I was aware of (and used extensively in my first book, Information Nation) was the “How Much Information” study done at University of California Berkeley in 2000. The latest one I know of is an IDC study published in May 2010. I dig into the numbers behind this study in in this week’s PowerPoint slide.
There are three “stacks” in this slide, each representing a different dimension of the study, with the relevance to information governance increasing from left to right. Let’s start with the first stack.
The first stack shows the expected overall growth of digital information on the planet between 2009 and 2020. The study projects that this will grow by 44 times, from .8 Zetabytes to 35 Zetabytes (1 Zetabyte = 1 trillion Gigabytes). Although I find the scale of these numbers impressive, and intellectually know that this is an incredible amount of information, the numbers are almost too big to be meaningful. Even attempts to analogize these numbers, like “a stack of books from here to the moon,” don’t really help me. Perhaps this could form the basis of a successful Zen Buddhist Koan – “a story, dialogue, question, or statement, the meaning of which cannot be understood by rational thinking but may be accessible through intuition.” (Wikipedia) What is the sound of one hand clapping? How big is a Zetabyte?
However, moving to the middle stack of hard drives, we get to some numbers that mean something to me. According to the study, the number of individual files or “containers” of data will grow at a faster rate than the overall raw volume of data. In fact, it will grow by 67 times in the same period, or almost 50% more than the overall volume.
Aha, now we are getting somewhere. The problems of unstructured data (or at least, “not well structured” data) is at the core of the information governance problem. All of my clients have the same three problems at or near the top of their problem list: 1) Email 2) Unstructured files in shared drives and C drives, and 3) Backup tapes. According to this study, these kinds of problems are going to get at least 67 times worse over the next decade. Now, in the fog of all this data growth, the information governance problem really starts to take shape.
The final stack, on the right, takes us even further in understanding how the information governance problem is growing faster than the problem of data volume itself. As we complete the transition from paper to digital, the kinds of data we are creating and the kind of management it requires is changing. According to the study, the amount of data requiring some type of information governance (i.e, for “privacy, compliance, custodial protection, confidentiality, or absolute lock down” purposes) by 2020 will nearly double. Moreover, the portion requiring the highest levels of information governance control will grow 100 times. Furthermore, when viewed from a files – rather than an absolute volume perspective – the number of files requiring some kind of information governance will be over 90%.
This is the heart of the information governance problem: not only is overall data volume growing at an astonishing rate, but the number of individual piece of data we have to manage is growing at a faster rate, and the amount of data that we have to manage and control in a special way is growing even faster.
Email me if you would like the original PowerPoint file. (btblair at vialumina.com)
Rather than keep them locked away in my private treasure trove, and rather than simply dump them into a SlideShare queue where they make no sense without the verbal presentation that they were designed to enhance, I thought I would start sharing some of my PowerPoint slides with you, along with my thinking behind them.
Here’s the first one.
I created this slide to illustrate the information governance problem.
Let’s start on the bottom tine of the trident in this graphic, which shows us that the cost of raw hard disk space is about 100 times less now that what it was 10 years ago (it costs less than 1% what it did ten years ago). Dramatic, but obvious to anyone who has purchased a computer in the last decade.
Now, look at the middle tine – it shows us that the money we spend on enterprise storage equipment has remained relatively unchanged over those same ten years. At first, this doesn’t seem all that dramatic. In fact, when I first show this to people, they are surprised that the enterprise storage hardware numbers haven’t gone up dramatically. The reality is that the numbers fluctuate significantly with economic conditions-like any commodity. The other factor is that we are starting the comparison near the peak of an unprecedented boom in IT spending (i.e., the dot com years). However, this misses the larger point, which is quite startling: we are spending as much on storage 10 years later, when the price of the raw materials – disk drives – has dropped to 1% of what it was.
Let’s put this in perspective with an analogy. The average American drives 12,000 miles each year. At a rate of 30 mpg, that means he/she uses 400 gallons of fuel, at current prices of $3.00 per gallon. As such, he/she spends $1200 each year on gas. Now, if the price of gas dropped the equivalent of the price of hard drives – from $3.00 per gallon to 3 cents per gallon, for that same $1200, he/she could drive 1.2 million miles per year, not 12,000. And that is exactly what we have been doing with digital information, as the cost of hard drives has dropped 100 times, we have continued to spend the same amount of money even though the cost is less than 1% of what it was. Clearly, we are “driving” more.
The third tine at the top of the graphic shows a natural consequence of this – the market for software to manage all this data is growing dramatically – more than doubling in the same decade. This tracks well to the growth in interest and investment in information governance. Managing all this information is no longer a storage problem – it’s about how well we can manage, harness, and govern that information.
If you find this graphic useful, you are free to use it in your own presentations under the Creative Commons Attribution-ShareAlike 3.0 Unported License.
I have also included the original PPT file here.