The Metro New York City chapter of ARMA International has a fabulous program designed to help records and information management professionals develop skills in speaking and presenting, and last night they asked me to share a few thoughts on the topic. Here is a handout that I created for my discussion.
Here are the slides from the webinar we just completed on “Solving Shared Drives.” I personally don’t find slides divorced from their presentation that useful, but this will give you a flavor of what we talked about. Also, we will be shortly following up with a whitepaper on the topic as well as the recording of the webinar, so look for that too.
In today’s slide, we explore the simple but powerful statistic that, for every 1044 pages of evidence preserved, captured, copied, collated, reviewed and handled in e-discovery, only 1 is actually produced. What does this metric tell us? Well, certainly it tells us that there are . . . ahem . . . some problems with e-discovery. But of greater interest – at least to me – is that it shows how profoundly we are failing at information governance. After all, if all of this stuff didn’t exist in the first place, we wouldn’t be spending millions of unnecessary dollars to handle this junk (like a TSA agent?) in the first place. How much of this content is duplicate or near-duplicate? How much of it could have been – and should have been – thrown away years before litigation commenced?
On the first question, some estimate that the amount of duplicate information in the known digital universe is 75% (IDC). On the latter question, when we assess our client’s information environment, it is not uncommon to find that over 50% of all content they have in storage is past its due date, and should have been – according to the client’s own policies – tossed long ago.
Email me if you would like the original PowerPoint file. (btblair at vialumina.com). You can find the report these numbers come from here (opens a PDF).
“Data growth is the biggest data center hardware infrastructure challenge for large enterprises, according to a new survey by research firm Gartner Inc.”
Data growth remains IT’s biggest challenge, Gartner says, Computerworld, November 2, 1010
Building on the theme I started with last week’s PowerPoint slide (which has been viewed hundreds times in a few days thanks mostly to a posting on an Australian records management listserv [thanks, Andrew]), I thought I would share another slide that I use to tell the information governance story.
We have all seen the studies that attempt to quantify the amount of information on the planet. The first one that I was aware of (and used extensively in my first book, Information Nation) was the “How Much Information” study done at University of California Berkeley in 2000. The latest one I know of is an IDC study published in May 2010. I dig into the numbers behind this study in in this week’s PowerPoint slide.
There are three “stacks” in this slide, each representing a different dimension of the study, with the relevance to information governance increasing from left to right. Let’s start with the first stack.
The first stack shows the expected overall growth of digital information on the planet between 2009 and 2020. The study projects that this will grow by 44 times, from .8 Zetabytes to 35 Zetabytes (1 Zetabyte = 1 trillion Gigabytes). Although I find the scale of these numbers impressive, and intellectually know that this is an incredible amount of information, the numbers are almost too big to be meaningful. Even attempts to analogize these numbers, like “a stack of books from here to the moon,” don’t really help me. Perhaps this could form the basis of a successful Zen Buddhist Koan – “a story, dialogue, question, or statement, the meaning of which cannot be understood by rational thinking but may be accessible through intuition.” (Wikipedia) What is the sound of one hand clapping? How big is a Zetabyte?
However, moving to the middle stack of hard drives, we get to some numbers that mean something to me. According to the study, the number of individual files or “containers” of data will grow at a faster rate than the overall raw volume of data. In fact, it will grow by 67 times in the same period, or almost 50% more than the overall volume.
Aha, now we are getting somewhere. The problems of unstructured data (or at least, “not well structured” data) is at the core of the information governance problem. All of my clients have the same three problems at or near the top of their problem list: 1) Email 2) Unstructured files in shared drives and C drives, and 3) Backup tapes. According to this study, these kinds of problems are going to get at least 67 times worse over the next decade. Now, in the fog of all this data growth, the information governance problem really starts to take shape.
The final stack, on the right, takes us even further in understanding how the information governance problem is growing faster than the problem of data volume itself. As we complete the transition from paper to digital, the kinds of data we are creating and the kind of management it requires is changing. According to the study, the amount of data requiring some type of information governance (i.e, for “privacy, compliance, custodial protection, confidentiality, or absolute lock down” purposes) by 2020 will nearly double. Moreover, the portion requiring the highest levels of information governance control will grow 100 times. Furthermore, when viewed from a files – rather than an absolute volume perspective – the number of files requiring some kind of information governance will be over 90%.
This is the heart of the information governance problem: not only is overall data volume growing at an astonishing rate, but the number of individual piece of data we have to manage is growing at a faster rate, and the amount of data that we have to manage and control in a special way is growing even faster.
Email me if you would like the original PowerPoint file. (btblair at vialumina.com)