“We need to take automation to another level, leaving human or manual efforts behind, to increase productivity and lower cost for clients in all areas of the information governance spectrum.”
Jason R. Baron, Of Counsel, Information Governance & eDiscovery Group, Drinker Biddle & Reath LLP
Jason R. Baron, Director of Litigation for NARA and a widely recognized and highly respected authority on e-discovery and electronic records, has left NARA to join the information governance and e-discovery practice of Drinker Biddle & Reath. He is joining an already stacked deck at a group that already includes Bennett B. Borden and Jay Brudz as chairs.
I have known Jason for many years, and not only is he a class act, he is one of the few people who can truly be credited with driving and changing our thinking about e-discovery and information governance. Jason has a long list of accomplishments, but most significant for me is the tireless academic and evangelism work he has done to drive understanding of advanced search, predictive coding, and other techniques that help to automate information governance. Automation is the future of information governance, and it is a future that only exists because of people like Jason.
I had the pleasure to interview Jason about his big career change (he was at NARA for 13 years), and loved to see how excited he is about the future of information governance.
Highlights of our discussion include:
- Jason was NARA’s first Director of Litigation, which speaks both to the changes to the information landscape in the past decade and to Jason’s expertise.
- Jason played a key role in developing a directive that requires all federal agencies to move to all digital form for permanent electronic records by the end of the decade.
- NARA will soon be managing upwards of a billion White House email messages – forever.
- Jason believes that predictive coding and other advanced search and document review methods will drive significant automation of information governance in the coming years.
My Interview with Jason R. Baron
Why now? Why are you leaving your role at NARA to go into private practice?
Well, I can tell you it has nothing to do with being placed on furlough! For the past 13 years, I have considered my time at NARA to be in a dream job for any lawyer. As NARA’s first appointed Director of Litigation, I have had the opportunity to work with high-ranking officials and lawyers throughout government, including in the White House Counsel’s Office, on landmark cases involving electronic recordkeeping and e-discovery issues.
I also have been particularly privileged to work with Archivist David Ferriero and others in crafting a number of high-visibility initiatives in the records and information governance space, including the Archivist’s Managing Government Records Directive (August 2012), which includes an “end of the decade” mandate to federal agencies requiring that all permanent electronic records created after 2019 are preserved in electronic or digital form. With this background and experience, I think I can now be of even greater help in facilitating adoption of industry best practices that meet the Archivist’s various mandates. I also wanted to work on cutting edge e-discovery and information governance matters in a wider context.
What was it that attracted you to Drinker Biddle & Reath? Did you consider other firms or other career paths?
The biggest attraction was knowing that I share the same vision with Bennett B. Borden and Jay Brudz, Co-chairs of Drinker Biddle’s Information Governance and eDiscovery Group. Collectively, we see e-discovery challenges as only part of a more systemic “governance” problem. Big Data is only getting bigger, and I believe our group at Drinker Biddle is on the leading edge of law firms in recognizing the challenge and offering innovative solutions to clients. Of course, there are any number of other firms in e-discovery and other “hot” areas, and I have friends and colleagues at a number of firms and corporations who I have had discussions with. I’d like to think that my closest peers in this area will act as strategic partners with me in any number of educational forums, and I look forward to that prospect.
What will your role at Drinker Biddle be? What will you focus on?
As Of Counsel to the Information Governance and eDiscovery Group, I expect to be most heavily involved in helping to build out three areas of practice. First, providing legal services to those private sector actors that are involved in large IT-related engagements with the federal sector, and wish to optimize information governance requirements. Second, consulting on records and information governance initiatives in the private sector, especially employing cutting-edge automated technologies (predictive coding, auto categorization, and the like). Third and finally, I hope to take on special master assignments in the area of e-discovery, as the need arises, and would consider it a great honor to do so.
What do you think about the future of NARA and its role as the federal government transitions to the digital world?
As I said earlier, NARA is leading the way in issuing policies that will result in electronic capture of all e-mail records by the end of 2016, as well as ensuring that all electronic records appraised as “permanent” are preserved in future federal digital archives. NARA has shown leadership in issuing an important joint directive with OMB in 2012, which followed on the heels of President Obama’s Memorandum on Managing Government Records dated November 2011.
If NARA doesn’t lead in the area of setting information governance policies for federal applications, including in the cloud, it risks becoming an irrelevant player in the digital age. The present Archivist of the US and other senior leaders inside NARA are committed to doing everything they can to avoid that fate.
What are the key initiatives that you are working on right now?
My plate is full: Along with a few others, I have been involved in finishing up an update of The Sedona Conference’s 2007 Search Commentary and 2009 Commentary on Achieving Quality in E-Discovery. Over the next few weeks I will be criss-crossing the United States to participate in some excellent forums, including in October the upcoming EDI Summit in Santa Monica, where I am moderating a panel on “Beyond IS0 9001,” all about standards in the e-discovery and information governance space; and being invited to speak at the inaugural IT-Lex Conference in Orlando, where along with Ralph Losey and Maura Grossman I will be speaking on the future of predictive coding.
You will also find me at ARMA 2013 in Las Vegas, at Georgetown’s Advanced E-Discovery Institute, and of course at LegalTech next February, all wonderful venues to get a message out about cutting edge issues in these areas.
What do you think is the most interesting thing happening in the IG space today?
I am most excited about bringing the “good news” of predictive coding and other advanced search and document review methods to a wider records and information governance audience, and intend to speak at any number of upcoming forums on how to do so. We need to take automation to another level, leaving human or manual efforts behind, to increase productivity and lower cost for clients in all areas of the information governance spectrum.
Do you think that organizations will ever achieve the promise of IG? What will it take to get there?
Woody Allen says there are two types of people in the world: those who believe the glass is half full, and those who say it is half poison.
I am optimistic about us doing better in the space – if lawyers can think outside of the box in adopting best practices from other disciplines, including artificial intelligence and information retrieval. A reality check is in order, however, given that predictions about the “future” of anything tend to be overly optimistic (where are the cars that glide over highways, or the cities on the moon, both of which were predicted in the 1964 World’s Fair to already to have happened?).
And the first mention of “yottabytes” by an op-ed columnist in the New York Times occurred in the last couple of weeks. Ask I mentioned earlier, the world of big data is only getting bigger and more complex. I think lawyers in this area can give solid guidance to help clients do better in this “real” world, and certainly hope to do so with the great team already in place at Drinker Biddle.
What was the biggest structural or philosophical change that you observed at NARA during your career there?
I recall going to what was billed as an “e-mail summit” meeting a half decade ago, in which the really great people assembled could not believe that most end users failed to print out email for placement in traditional hard copy files. Archivists and records managers by their very nature are just too good at doing so! However, NARA has come a long way since then, in pushing capture and filter policies for email (the so-called recent “Capstone” initiative), as well as the digital mandate by 2019 I mentioned earlier. These really do represent policy shifts that hold out the potential for leading many agencies to adopt new ways of doing business.
What do you think that private organizations can learn from NARA’s experiences in trying to manage and control the information explosion?
NARA certainly has unique challenges. For example, it needs preserve and provide access on a permanent basis to what I have estimated will soon be upwards of a billion White House emails. What the private sector can learn from NARA’s (and the White House’s experience) in this area is that in an era where massive and ever-increasing data flow through corporate networks, there need to be technological solutions put into place to be able to filter out low-value data, to guard privacy interests, and to provide greater access through advanced means of search and categorization.
NARA knows that it needs to confront all of these issues, and is now engaging in outreach to the private sector in an effort to find solutions in the public space (BB note: I recently attended one of these meetings, and will be writing about it soon.) Corporations of all sizes also need to confront information governance issues before a black swan event occurs that materially affects the bottom line.
What was the most interesting challenge or case you faced at NARA?
I have written and spoken at length about dealing with U.S. v. Phillips Morris (the RICO tobacco case), and so won’t repeat what I have said about my experience searching through 20 million White House emails, and starting on my quest in search of better search methods. My time at NARA just has been one fascinating experience after another, and not just involving electronic records of course, so it’s hard to choose.
At one point I found myself in the back room of Christie’s auction house in Manhattan with a senior archivist, poring over a massive Excel spreadsheet that listed 5000 documents taken from Franklin Roosevelt’s White House by his trusted secretary Grace Tully. We had to decide which documents should have ended up at the Roosevelt Library in Hyde Park. An auction of paintings worth millions was about to take place and all around us people where shouting, “Where are the Picasso’s?” and “What about the Matisse’s?” It was definitely surreal.
And yes, after drafting a Complaint and working with the US Attorney’s Office in the Southern District, we ended up settling the dispute over the Grace Tully collection (where the owners were represented by, among others, former Rep. Elizabeth Holtzman working at a mid-Manhattan law firm), with timely assistance from passage of a special bill in Congress allowing for a favorable valuation of the collection. From one week to the next, I never knew what new disputes involving the history of the 19th and 20th century I would be involved with.
I have been doing some research into data remediation and I came across this interesting model that I think fits pretty well. But it is not from the information governance or even the IT world. The first person to tell me in the comments precisely where this model originates will get a copy of my latest book.
Last week I attended a “Predictive Coding Boot Camp” produced by the E-Discovery Journal and presented by Karl Schieneman of Review Less and Barry Murphy. I’ve participated in many workshops, seminars, discussions, and webinars on the topic, but this half-day seminar went the deepest of any of them into the legal, technology, and case strategy implications of using technology to minimize the cost of human document review in e-discovery. It was a solid event.
(But, I wasn’t there to learn about e-discovery. I’ll tell you why I was there in a moment.)
You see how I snuck in an implied definition above? Because, whatever you call it – predictive coding, technology-assisted review, computer-assisted review, or magic – isn’t that the problem that we are trying to solve? To defensibly reduce the number of documents that a human needs to review during e-discovery? There are a number of way to get there using technology, but the goal is the same.
What does e-discovery have to do with IG?
To review, in civil litigation, both sides have an obligation to produce information to the other side that is potentially relevant to the lawsuit. In the old days, this was a mostly a printing, photocopying, and shipping problem. Today it is primarily a volume, complexity, and cost problem. Although discovery of physical evidence and paper records is obviously still part of the process, electronic evidence naturally dominates.
So, how does a litigant determine whether a given document is potentially relevant and must be produced, or if it is irrelevant, privileged, or otherwise does not need to be produced to the other side?
If I sue my mechanic because he screwed up my transmission repair, the process is pretty simple. I will bring bills, receipts, and other stuff I think is relevant to my lawyer, my mechanic will do the same, our attorneys will examine the documents, determine a case strategy, produce responsive evidence to the other side, perhaps conduct some depositions, and – in real life – a settlement offer will likely be negotiated. In a case like this, there are probably only one or two people who have responsive information, there isn’t much information, and the information is pretty simple.
Now, what happens if 10,000 people want to sue a vehicle manufacturer because their cars seemingly have a habit of accelerating on their own, causing damage, loss, and even death? In a case like this, the process of finding, selecting, and producing responsive information will likely be a multi-year effort costing millions of dollars. The most expensive part of this process has traditionally been the review process. Which of the millions of email messages the manufacturer has in its email archive are related to the case? Which CAD drawings? Which presentations that management used to drive key quality control decisions? Which server logs?
Before we got smart and started applying smart software to this problem, the process was linear, i.e., we made broad cuts based on dates, custodians, departments etc. and then human reviewers – expensive attorneys in fact – would look at each document and make a classification decision. The process was slow, incredibly expensive, and not necessarily that accurate.
Today, we have the option to apply software to the problem. Software that is based on well-known, studied and widely used algorithms and statistical models. Software that, used correctly, can defensibly bring massive time and cost savings to the e-discovery problem. (There are many sources of the current state of case law on predictive coding, such as this.) Predictive coding software, for example, uses a small set of responsive documents to train the coding engine to find similar documents in the much larger document pool. The results can be validated through sampling and other techniques, but the net result is that the right documents can potentially be found much more quickly and cheaply.
Of course predictive coding is just a class of technology. It is a tool. An instrument. And, as many aspiring rock gods have learned, owning a vintage Gibson Les Paul and a Marshal stack will not in and of itself guarantee that your rendition of Stairway to Heaven at open mic night will, like, change the world, man.
So why did I go to the Predictive Coding Bootcamp? I went because I believe that Information Governance will only be made real when we find a way to apply the technologies and techniques of predictive coding to IG. In other words, to the continuous, day-to-day management of business information. Here’s why:
Human classification of content at scale is a fantasy.
I have designed, implemented, and advocated many different systems for human-based classification of business records at dozens of clients over the last decade. In some limited circumstances, they do work, or at least they improve upon an otherwise dismal situation. However, it has become clear to me (and certainly others) that human based-classification methods alone will not solve this problem for most organizations in most situations moving forward. Surely by now we all understand why. There is too much information. The river is flowing too quickly, and the banks have gotten wider. Expecting humans to create dams in the river and siphon of the records is frankly, unrealistic and counterproductive.
Others have come to the same conclusion. For example, yesterday I was discussing this concept with Bennett B. Borden (Chair of the Information Governance and eDiscovery practice at Drinker Biddle & Reath) at the MER Conference in Chicago, where he provided the opening keynote. Here’s what Bennett had to say:
“We’ve been using these tools for years in the e-discovery context. We’ve figured out how to use them in some of the most exacting and high-stakes situations you can imagine. Using them in an IG context is an obvious next step and quite frankly probably a much easier use case in some ways. IG does present different challenges, but they are primarily challenges of corporate culture and change management, rather than legal or technical challenges.”
The technology has been (and continues to be) refined in a high-stakes environment.
E-discovery is often akin to gladiatorial combat. It is often conducted under incredible time pressures, with extreme scrutiny of each decision and action by a both and enemy and a judge. The context of IG in most organizations is positively pastoral by comparison. Yes, there are of course enormous potential consequences for failure in IG, but most organizations have wide legal latitude to design and implement reasonable IG programs as they see fit. Records retention schedules and policies, for example, are rarely scrutinized by regulators outside of a few specific industries.
I recently talked about this issue with Dean Gonsowski, Associate General Counsel at Recommind. Recommind is a leader in predictive coding software for the e-discovery market and is now turning its attention to the IG market in a serious way. Here’s what Dean had to say:
“E-discovery is the testing ground for cutting-edge information classification technology. Predictive coding technology has been intensively scrutinized by the bench and the bar. The courts have swung from questioning if the process was defensible to stating that legal professionals should be using it. The standard in IG is one of reasonableness, which may be a lower standard than the one you must meet in litigation.”
There is an established academic and scientific community.
The statistical methods, algorithms, and other techniques embodied by predictive coding software are the product of a mature and developing body of academic research and publishing. The science is well-understood (at least by people much, much smarter that me). TREC is a great example of this. It is a program sponsored by the US government and overseen by a program committee consisting of representatives from government, industry, and academia. It conducts research and evaluation of the tools and techniques at the heart of predictive coding. The way that this science is implemented by the software vendors who commercialize it varies widely, so purchasers must learn to ask intelligent questions.TREC and other groups help with this as well.
I will soon be writing more about the application of predictive coding technology to IG, but today I wanted to provide an introduction to the concept and the key reasons why I think it points the way forward to IG. Let me know your thoughts.
As you know, we recently published the five central videos from our interview series, “5 Questions About Information Governance in 5 Minutes.” In this series, we asked 30 IG experts a number of definitional and serious questions about IG. Our experts were prepared for that. However, right at the end of the interview, I slipped in one surprise question. Since many of our interviewees have experience with records management, and records management isn’t known as the most light-hearted, spontaneous profession , I thought I would ask, “What’s your favorite records management joke?”
So that’s what I did. I love this video because it shows that this community has a great sense of humor and does not take itself so seriously – a sure sign of health.
Next week we will be posting full interviews with each interviewee as well. You can also check out the six videos we have published so far on our YouTube channel.
And by the way, if you know any good records management jokes, please send them to me, or add them in the comments.
Here is the fifth and final (except for a bonus video coming soon) in our five-part video series where I asked 30 Information Governance the same 5 questions. This video is the longest of the five, as I ask our interviewees to tell us their favorite story about IG – something that illustrates what it is, why it is hard, challenges they have faced and so on. There are some great stories, so get yourself a fresh cup of coffee and a snack and enjoy.
I have been doing some research into the origin of the animal drive to cache (generally healthy) or hoard (generally not healthy) things, especially important things like food or information. I am specifically interested in the evolutionary roots of this behavior, and also what it looks like when it goes wrong. Reality television fans are of course intimately familiar with hoarding thanks to shows like TLC’s Hoarding: Buried Alive and A&E’s Hoarders (with A&E presumably elevating the “E” in its name above the “A” with this show). Currently, “hoarding” as a psychiatric condition is officially a subset of Obsessive Compulsive Personality Disorder, but thanks to a coming revision of the standard psychiatric diagnostic manual, it is getting an entry all its own in 2013 (is reality TV now actually creating reality?).
My working hypothesis is that organizational information hoarding behavior is almost indistinguishable from individual object hoarding behavior. After all, organizations are merely collections and reflections of individuals.
Is this true? Well, lets take a look at the newly proposed “working diagnostic criteria for compulsive hoarding” and do a kind of search and replace exercise to see if the criteria also illuminate organizational disfunction around information retention and management.
Working Diagnostic Criteria For Compulsive Hoarding (of Information)
|#||Diagnostic Criteria for Compulsive Hoarding Proposed for DSM-V||Diagnostic Criteria for Corporate Information Hoarding Proposed by Me|
|A||Persistent difficulty discarding or parting with personal possessions, even those of apparently useless or limited value, due to strong urges to save items, distress, and/or indecision associated with discarding.||Persistent difficulty discarding or parting with information, even information of apparently useless or limited value, due to strong urges to cover one’s ass, paper the file to catch your manager in a lie, and give e-discovery review attorneys something to do.|
|B||The symptoms result in the accumulation of a large number of possessions that fill up and clutter the active living areas of the home, workplace, or other personal surroundings (e.g., office, vehicle, yard) and prevent normal use of the space. If all living areas are uncluttered, it is only because of others’ efforts (e.g., family members, authorities) to keep these areas free of possessions.||The symptoms result in the accumulation of a large volume of information that fill up and clutter active information systems. If information systems are uncluttered, it is only because of others’ efforts (e.g., family members, sleep-deprived paralegals, “accidental” overwriting incidents) to keep these areas free of information.|
|C||The symptoms cause clinically significant distress or impairment in social, occupational, or other important areas of functioning (including maintaining a safe environment for self and others).||The symptoms cause economically significant distress or impairment of business functions.|
|D||The hoarding symptoms are not due to a general medical condition (e.g., brain injury, cerebrovascular disease).||The information hoarding symptoms are not simply due to generally half-assed company management techniques (e.g., incompetent CEO, brother-in-law of CEO running IT).|
|E||The hoarding symptoms are not restricted to the symptoms of another mental disorder (e.g., hoarding due to obsessions in Obsessive Compulsive Disorder (OCD), lack of motivation in Major Depressive Disorder, delusions in Schizophrenia or another Psychotic Disorder, cognitive deficits in Dementia, restricted interests in Autistic Disorder, food storing in Prader–Willi Syndrome).||The hoarding symptoms are not restricted to the symptoms of another organizational disorder (e.g., Big Data Kool-Aid Drinking, Overly Broad Legal Holditis, Keep Everything Forever Spectrum Disorder).|
I recently completed a webinar about defensible deletion with Anthony Diana of Mayer Brown, Katey Wood of Enterprise Strategy Group, and Stephen Stewart of Nuix. We had a good discussion focused on the role of inside council in supporting and driving efforts to get rid of unnecessary data. You can check out the recording here.
The blurb for the webinar is below:
For many years, organizations have kept all their data, often beyond the mandated retention period. But with data volumes growing to hundreds of terabytes – or even petabytes – this is no longer an option. The financial and time costs of maintaining storage systems for so much data are prohibitive. In addition, much of this data is unknown, posing significant business risks and adding to the time and expense of discovery or investigation exercises. Defensible deletion allows organizations to identify, categorize and manage all their data across multiple geographical locations, applications and storage and archive systems. With this knowledge, an organization can delete any data that has no business value or legal hold requirements. Deleting unneeded data allows organizations to reduce storage management costs, speed up discovery and investigations, switch off obsolete storage systems and tame the Big Data beast.
Join information governance thought leaders for a step-by-step guide to developing and implementing a defensible deletion program for your organization.
This session will discuss how you can:
- Make content-driven decisions to identify which data you can delete and which you must retain
- Create sound document retention, deletion and archiving policies
- Select a knowledgeable external counsel who can work with you to create and implement a defensible deletion process
Author: Barclay T. Blair