Tagged: speaking

Speaking engagement: Information Governance and Big Data

I will be providing the keynote address on a half-day seminar hosted by Sita Corp, SAP, and HP at New York Athletic Club, on October 15, 2013 from 8:30-10:30 am.

I am going to be talking about the challenges of Information Governance in a Big Data world.

Register now at: http://ow.ly/po2mm

Briefing Notes: 5 Questions about Big Data for Attorneys and E-Discovery Professionals

I recently provide a briefing to a group of e-discovery professionals about Big Data and why it matters to them, and I thought there might be some value in sharing my notes.

1. What is Big Data?

  • Gartner: Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision-making.

  • McKinsey: ‘Big data’ refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. This definition is intentionally subjective . . .

  • It is subjective, but has definable elements

    • The data itself: large, unstructured information

    • The infrastructure: “Internet scale” in the enterprise

    • The analysis: Asking questions using very large data sets

2. Why Does Big Data Matter to E-Discovery Professionals?

  • Data scientists and technologists do not understand the risk side of information

  • You need to be at the table to educate them on:

    • The legal and business value of deleting information

    • The privacy requirements and implications

    • E-Discovery implications of too much data

  • The technologies of Big Data may process and manipulate information in a way that affects their accessibility and evidentiary value –  you need to be aware of this and guide your clients appropriately

3. Does Big Data offer value to the legal community?

  • Performing sophisticated analysis on large pools of data is not exclusive to any particular industry –  there is no reason it could not be applied to the legal community (and already is being used in some limited ways)

  • Relatively speaking, most law firms do not generate massive amounts of data in their day-to-day operations

  • In e-discovery, the technology innovations of Big Data could be helpful in very large cases to help with storage and processing tasks

4. What are some examples of Big Data in action?

  • President Obama’s data-driven election campaign.

  • An online travel company showing more expensive travel options to those who used higher-prices Macintosh computers to access their website.

  • Tracking unreported side effects of drugs using search data (Journal of American Medical Association). Also Google Flu Trends: tracking the spread of the flu using search trends.

  • NYPD Compstat.

  • Fraud Detection: Targeting $3.5 trillion in fraud from banking, healthcare utilities, and government.

  • The City of New York finding those responsible for dumping cooking oil and grease into the sewers by analysing  data from the Business Integrity Commission, a city agency that certifies that all local restaurants have a carting service to haul away their grease. With a few quick calculations, comparing restaurants that did not have a  carter with geo-spatial data on the sewers, they generate a list of statistically likely suspects to track down dumpers with a 95% success rate.

5. What professional and career opportunities does Big Data represent for e-discovery professionals?

  • Organizations need people who understand the risk side of the equation and who can provide practical guidance

  • Your clients may have Big Data projects that right now, today, are creating unmonitored, unmitigated risk; you need to be able to help them identify and manage that risk

  • Big Data focuses on unstructured information, i.e., the documents, email messages and other information that the e-discovery community knows well. These same skills and techniques can be very useful to business-led Big Data projects.

Video: Fear and Greed in Information Governance

Last week I was lucky enough to spend a day working with my friend Jay Brudz (a partner at Drinker Biddle who also runs their e-discovery sub with IG guru Bennett Borden). Whenever we had a spare moment our conversation would drift back to our favorite topic: is Information Governance about risk or value?

The correct answer, of course, is both. But not always, and not at the same time.

First of all, on a macroeconomic level, the pendulum is always swinging between fear and greed/risk and value. Sometimes organizations circle the wagons,  trim the fat, and break out a litany of clichés to cloak the fact that they are running scared. At other times, organizations light fat cigars, buy fancy umbrella holders, and spend money like the value of an American house will never decline.

So, there are macroeconomic factors that determine, generally, what motivates corporate spending and where management attention is focused.

This tension between risk and value is also driven by corporate culture. Some companies are simply more conservative than others –  even those of similar size in the same vertical –  and as such are more concerned about understanding and managing risk. Some companies build this conservatism into their marketing – especially in the financial services industry, where risk management and compliance typically has its own department (although the power of that office varies widely).

CEOs sometimes have a mandate to change existing cultures –  including attitudes towards risk – as Paul O’Neill did at Alcoa by announcing, “I intend to make Alcoa the safest company in America,” at his first investor meeting as new CEO (as detailed by Charles Duhigg in “The Power of Habit.”) Sometimes, of course, new CEOs mistake coupon-clipping clothing buyers for Apple fanboys and flame out spectacularly.

Industry vertical and market focus, however, are probably the biggest determinants.  Predictably, large, regulated companies who are frequently litigated generally spend more time and money on understanding and managing risk.

But fear only gets you so far, even in the most risk-aware organization. Fear alone will not drive employees to change their information habits. It won’t stop them from hoarding information in “their” shared drive (or shared drive in the cloud) or their email account. It will not stop them from classifying all their documents in the multi-million dollar document management system as “Misc-Other.” It will not stop them from using the cloud service recommended to them by their neighbor to share customer documents with a service provider. Fear will not change information governance behavior in a sustainable way.

So what will?

In Duhigg’s telling it wasn’t fear of safety failures that ultimately changed and sustained the safety culture at Alcoa, but rather the subsequent growth and the success of the company under O’Neill’s reign that corresponded with his focus on safety.

“I knew I had to transform Alcoa . . . but you can’t order people to change. That’s not how the brain works. So I decided I was going to start by focusing on one thing. If I could start disrupting the habits around one thing, it would spread throughout the entire company.” Paul O’Neill, as quoted by Charles Duhigg.

Driving sustainable change is about changing habits, but it is also about appealing to employee self-interest. Providing the employee with tools that help them do their job better, faster, smarter so that they can succeed and be rewarded. But also ensuring that these solutions take care of the company’s business and legal needs –  ideally in the background.

I addressed this question recently in the short video below.  

What is Big Data to Information Governance Professionals?

Spring is in the air in New York City. Here’s a picture of the beautiful Magnolia trees in Prospect Park. Magnolia Trees in Prospect Park

Last week, I was pleased to help lead the discussion at The Cowen Group’s Leadership Breakfast in Manhattan. I’ve been spending a lot of time thinking and writing about Big Data lately, and jumped at the chance to hear what this community was thinking about it. Then, this week we did it again in Washington, DC.

It was a great group of breakfasters – predominantly law firm attendees, with a mix of in-house lawyers, consultants, and at least one journalist. The discussion was fast ride through a landscape of emotional responses to Big Data: excitement, skepticism, curiosity, confusion, optimism, confusion, and ennui. Just like every other discussion I have had about Big Data.

We spent a lot of time talking about what, exactly, Big Data is. The problem with this discussion is that, like most technology marketing terms, it can mean something or nothing at all. How can a bunch of smart people having breakfast in the same room one morning be expected to define Big Data when the people who are paid to create such definitions leave us feeling . . .  confused?

Here’s how Gartner defines Big Data:

Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

 Here’s how McKinsey defines it:

‘Big data’ refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. This definition is intentionally subjective . . .


Big Data is the frontier of a firm’s ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.

Huh? No wonder we were confused as we scarfed our bacon and eggs.

Big Data is a squishy term, and for lawyers without a serious technology or data science background it is even squishier.

The concepts behind it are not new. However, there are some relatively new elements. One is the focus on unstructured data (e.g., documents, email messages, social media) instead of data stored in enterprise databases (the traditional focus of “Business Intelligence.) Two is the technologies that store, manage, and process data in a way that is not just incrementally better, bigger, or faster, but that are profoundly different (new file systems; aggregating massive pools of unstructured data instead of databases; storage on cheap connected hard drives, etc.). Three is newly commercialized tools and methods for performing analysis on these pools of unstructured data (even data that you don’t own) to draw business conclusions. There is a lot of skepticism about the third point – specifically about the ease with which truly insightful and accurate predictions can be generated from Big Data. Even Nate Silver –  famous for accurately predicting the outcome of the 2012 US Presidential Election with data – cautions that even though data is growing exponentially, the “amount of useful information almost certainly isn’t.” Also, correlative insights often get sold as causative insights.

Big Data is a lot of things to a lot of people. But what is it to e-discovery professionals? I think there are three pieces to the Big Data discussion that are relevant for this community.

  1. Is Data Good or Bad? In the world of Big Data, all data is good and more data is better. A well-known data scientist was recently quoted in the New York Times as saying, “Storing things is cheap. I’ve tended to take the attitude, ‘Don’t throw electronic things away.” To a data scientist this makes sense. After all, statistical analysis gets better with more (good) data. However, e-discovery professionals know that storage is not cheap when its full potential lifecyle is calculated, such as a company spending “$900,000 to produce an amount of data that would consume less than one-quarter of the available capacity of an ordinary DVD.” Data itself is of course neither good or bad, but e-discovery professional need to help Big Data proponents understand that data most definitely can have a downside. I wrote about this tension extensively here.

  2. Data Analytics for E-Discovery. Though not often talked about, I believe there is serious potential for some parties in the e-discovery process to analyse the data flowing through its process and to monetize that analysis. What correlations could a smart data scientist investigate between the nature of the data collected and produced across multiple cases and their outcomes and costs. Could useful predictions be made? Could e-discovery processes be improved and routinized? I have some idea, but no firm answers. We should dig into this further, as a community.

  3. Privacy and Accessibility. What does “readily available” mean in our age — an age where a huge chunk of all human knowledge can be accessed in seconds using a device you carry around in your pocket? Does better access to information simply offer speed and convenience, or does it offer something more profound? When a local newspaper posted the names and addresses of gun permit holders on an interactive map in the wake of the Sandy Hook Elementary School shooting, there was a huge outcry –  despite the fact that this information is publicly available, by law. This is a critical emerging issue as the pressure to consolidate and mine unstructured information to gain business insight collides with expectations of privacy and confidentiality.

Simply put legal and ediscovery professionals need to be at the table when Big Data discussions are happening. They bring a critical perspective that no one else offers.

Monica Bay provided an overview of the event, artfully putting it in context of what is going on across the legal industry.

By the way, my article about accessing and getting rid of information in the Big Data era has been syndicated to the National Law Journal, under the title, “Data’s Dark Side, and How to Live With It.” Check it out here. You can also check out my podcast discussion with Monica Bay about the article here.