What is Big Data to Information Governance Professionals?

Spring is in the air in New York City. Here’s a picture of the beautiful Magnolia trees in Prospect Park. Magnolia Trees in Prospect Park

Last week, I was pleased to help lead the discussion at The Cowen Group’s Leadership Breakfast in Manhattan. I’ve been spending a lot of time thinking and writing about Big Data lately, and jumped at the chance to hear what this community was thinking about it. Then, this week we did it again in Washington, DC.

It was a great group of breakfasters – predominantly law firm attendees, with a mix of in-house lawyers, consultants, and at least one journalist. The discussion was fast ride through a landscape of emotional responses to Big Data: excitement, skepticism, curiosity, confusion, optimism, confusion, and ennui. Just like every other discussion I have had about Big Data.

We spent a lot of time talking about what, exactly, Big Data is. The problem with this discussion is that, like most technology marketing terms, it can mean something or nothing at all. How can a bunch of smart people having breakfast in the same room one morning be expected to define Big Data when the people who are paid to create such definitions leave us feeling . . .  confused?

Here’s how Gartner defines Big Data:

Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

 Here’s how McKinsey defines it:

‘Big data’ refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. This definition is intentionally subjective . . .


Big Data is the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.

Huh? No wonder we were confused as we scarfed our bacon and eggs.

Big Data is a squishy term, and for lawyers without a serious technology or data science background it is even squishier.

The concepts behind it are not new. However, there are some relatively new elements. One is the focus on unstructured data (e.g., documents, email messages, social media) instead of data stored in enterprise databases (the traditional focus of “Business Intelligence.) Two is the technologies that store, manage, and process data in a way that is not just incrementally better, bigger, or faster, but that are profoundly different (new file systems; aggregating massive pools of unstructured data instead of databases; storage on cheap connected hard drives, etc.). Three is newly commercialized tools and methods for performing analysis on these pools of unstructured data (even data that you don’t own) to draw business conclusions. There is a lot of skepticism about the third point – specifically about the ease with which truly insightful and accurate predictions can be generated from Big Data. Even Nate Silver –  famous for accurately predicting the outcome of the 2012 US Presidential Election with data – cautions that even though data is growing exponentially, the “amount of useful information almost certainly isn’t.” Also, correlative insights often get sold as causative insights.

Big Data is a lot of things to a lot of people. But what is it to e-discovery professionals? I think there are three pieces to the Big Data discussion that are relevant for this community.

  1. Is Data Good or Bad? In the world of Big Data, all data is good and more data is better. A well-known data scientist was recently quoted in the New York Times as saying, “Storing things is cheap. I’ve tended to take the attitude, ‘Don’t throw electronic things away.” To a data scientist this makes sense. After all, statistical analysis gets better with more (good) data. However, e-discovery professionals know that storage is not cheap when its full potential lifecyle is calculated, such as a company spending “$900,000 to produce an amount of data that would consume less than one-quarter of the available capacity of an ordinary DVD.” Data itself is of course neither good or bad, but e-discovery professional need to help Big Data proponents understand that data most definitely can have a downside. I wrote about this tension extensively here.

  2. Data Analytics for E-Discovery. Though not often talked about, I believe there is serious potential for some parties in the e-discovery process to analyse the data flowing through its process and to monetize that analysis. What correlations could a smart data scientist investigate between the nature of the data collected and produced across multiple cases and their outcomes and costs. Could useful predictions be made? Could e-discovery processes be improved and routinized? I have some idea, but no firm answers. We should dig into this further, as a community.

  3. Privacy and Accessibility. What does “readily available” mean in our age — an age where a huge chunk of all human knowledge can be accessed in seconds using a device you carry around in your pocket? Does better access to information simply offer speed and convenience, or does it offer something more profound? When a local newspaper posted the names and addresses of gun permit holders on an interactive map in the wake of the Sandy Hook Elementary School shooting, there was a huge outcry –  despite the fact that this information is publicly available, by law. This is a critical emerging issue as the pressure to consolidate and mine unstructured information to gain business insight collides with expectations of privacy and confidentiality.

Simply put legal and ediscovery professionals need to be at the table when Big Data discussions are happening. They bring a critical perspective that no one else offers.

Monica Bay provided an overview of the event, artfully putting it in context of what is going on across the legal industry.

By the way, my article about accessing and getting rid of information in the Big Data era has been syndicated to the National Law Journal, under the title, “Data’s Dark Side, and How to Live With It.” Check it out here. You can also check out my podcast discussion with Monica Bay about the article here.


  1. Jim Mullen

    So, if no one agrees on what it is, and it’s left to interpretation of vendors with a solution product, are we left to be sold a fad as opposed to using the knowledge, techniques, and tools we already have to deal with this supposed problem?

  2. Melv

    Recently attended a conference here in Sydney, Australia, held by a Group ICT company, who’s name I’ll leave out for now. In that event, the presenter for the section of Big Data, described it as (from my own understanding); a process of consulting with an individual or group to help identify their interests towards a mix of things, be it procurement for products or requirement for services, then recording and using that data to help that individual or group with identifying other requirements and/or making decisions towards other requirements – almost automatically.

    A lawyer was in the audience who raised a question of privacy, can’t remember what the response was but another presenter was in the audience who answered that they actually already had customers who allowed them to help them with these kind of decising making processes.

    I was confused at the end. Because in my early read up of Big Data, there was all this talk of infrastructure topics, which understandably is a part of the above process, with the assumption that enterprises can procure Big Data products or services to help these enterprises understand information they have against customer base / prospect better, which can result to the need for solutions to store and analyse all of it. I see decision making processes towards Big Data starting with Marketing, and ending with Finance.

  3. Andrew

    Great forum of ideas, concepts and potential learnings…As an advocate of speed to insight and someone who embraces customer first thinking, I do also question ‘Big Data’ and its definition. I have come to accept and promote the ideal that ‘Big Data’ is purely a source, you need to mine and understand it. Hence rather than focus on ‘Big Data’ I like to focus on outcomes and learnings, something I call ‘Big Analytics’. One of the great benefits of ‘Big Analytics’ is that you can blend all sorts of ‘Big Data’ feeds, structured or unstructured and gain insight or evaluate a learning. Speech is the last frontier and there are many great companies and research professionals breaking ground in this space. Unlocking a conversation, by converting it to text and ingesting it into a ‘Big Analytics Engine’, understanding, emotion, sentiment associated with the true context of a conversation is incredibly exciting.

    Hence have we transitioned to Big Analytics or are we still stuck on Big Data. Voice is the largest untapped volume of Big Data and it will remain so until you unlock it and turn it into Big Analytics… My humble observation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s