I Think We are Moving to a World Where We Will Keep Everything Forever. Here’s Why.


  1. Steve

    The value increases as data is kept. Perhaps, but the value has to be linked to the ability to find information within this data lake of ever increasing size. Also, I would not say it is a permanent change in the high water level. Permanent means it stops at one point or another. If the storage war is over and unlimited storage is made available, by definition, the “high water level” is continuously rising.

    • Barclay T. Blair

      In this case, Steve, what I mean by “permanent” is that the water level is not going back down as it does in a flood or Tsunami. Just like NYC, for example, has to prepare for a permanently elevating coastline as opposed to simply preparing for another hurricane Sandy.

    • Barclay T. Blair

      That is the trillion dollar question, right? Moving forward, “delete” becomes just another category, not the unspoken but driving force behind corporate efforts to “retain” information.

  2. Tony Laino

    There are many instances in Canadian law where the requirement reads: “thou shalt keep a permanent record……” In these cases the meaning of the term “permanent” has nothing to do with time. It’s referring to the permanency of the record.

  3. David

    Hard to disagree with basic point that we are moving in a save everything direction but as Jaron Lanier has suggested that entire ecommerce model upon which technology companies are built is not self sustatining. As the middle part of the economy hollows out there will not be enough customer revenue to support model

    • Barclay T. Blair

      Yes, I have read Jaron. However on this point, what I describing here is much more profound than ad-supported cloud services. The industrial Internet (from which GE made $1B last year), quantified life, sensor data, etc. are all changing our world and the idea of throwing anything away is anathema to those in data science. A data scientist recently analogized throwing data away to the destruction of the Buddha statues in Afghanistan, for example.

  4. Gary Rylander

    Ah, I see that you have now joined Lossy in the ‘keep everything forever’ camp. When two people whose opinions I respect have started whistling this tune, it does make me rethink my position. I remain unconvinced for two reasons: First, keeping everything forever mandates that one have a means to locate everything forever. There is significant cost to do this, either capital cost, or OPEX if you use some type of SaaS model. Then too, there is the time, cost and expertise required to train the software to do this if using TAR like methods for unstructured or if building rule sets for structured and semi structured data. When you have finished the effort to build the tools necessary to find everything forever, guess what? You have now build the tools necessary for automated, defensible deletion. In that scenario, there is no added cost for deleting rather than keeping. If you keep everything forever, without building the infrastructure to find what you need in the digital haystack, then you can save money in the short term, but your speed, and cost for everything from eDiscovery to business related data analytics will suffer.

    Second, while I love big data and big data analytics, I think that these methods will turn out to be like every other IT boost for the past 50 years; a handful will use them to build competitive differentiators, but most will end up with ‘me too’ implementations and most will not exhibit the organizational or cultural elements necessary to fully exploit the potential. When this happens, the choir singing for for data lakes and more magical black algorithmic boxes may lose some of its volume and applause. At any rate, for most organizations, the most valuable insights are those about what will happen next week based upon what has happened over the past week, month or year. 50 years of all of the data will likely not result in a higher confidence level in the output than will a statistically significant sample. Finally, the first time a cyber thief manages to hijack one of these “me too’ major company’s repository of everything that has been kept forever, the risk calculus many well change for that company.

    • Barclay T. Blair

      Hi Gary, thanks for taking the time to comment.

      First off I disagree that I have joined Ralph’s “camp,” as well-stocked and outfitted that camp might be. I delivered this particular keynote in September 2014, which was not the first time (or last time) I delivered it and he advanced his blog thesis in November 2014. Anyway . . . Also, I would say that although we may have both observed some of the same trends, we come to very different conclusions about it. IG is not equivalent to human-based classification of information (formerly known as records management) as Ralph seems to imply in his writings. I also do not believe that putting data in a big pile and searching is the present or the future. In a big data world, the ability to understand your information in some detail (say, through automated classification) only grows in importance rather.

      Also, I don’t believe (or at least did not intend to) stake a position as to whether or not keeping everything forever is a good or a bad thing (my opinion on that will come later). I am just pointing out the forces that are taking us there. I am also providing my opinion that these forces are inexorable and this change will happen whether we think it is good or bad – like most technological change.

      The cost of storage of course will never be zero and it will continue to be a growing, multi-billion dollar market. However, the cost of storage is becoming invisible. It is being bundled into the cost of things we want to pay for, like collaboration and communication (Google) or business insight (Pivotal). That changes things.

      As to your other points, the raison d’être of big data infrastructure is exactly what you describe – bringing Internet scale to enterprise infrastructure so that formally inaccessible archives are now accessible, then layering analytics on top of that to put it to some use. Of course that is a fantasy for most organizations today, but I would find it hard to bet against the moneyed ecosystem around it.

      Beyond the big data hype, which I think I see through as well, we do have to ask ourselves – what is really different or what is really changing? Anyone who believes that that storage costs will factor at all into the ambitions of data-driven organizations is clearly not paying attention In the coming battle between keep everything and throw everything away, debacles like Sony certainly do give powerful ammunition to the latter. However, fears of data breaches, fears of privacy invasions, fears of all the bad things that can happen have had no perceivable effect on the integration of information technology and data into our world, and I see nothing that will slow it in any significant way.

      Other than a giant sunspot EMP of course.

  5. Pingback: Keep everything: the New Age of Information Governance - Hanzo Archives
  6. Larry


    So much for ‘defensible destruction”, eh?

    As you and I have discussed many times, Barclay, the problem is exacerbated in the digital world, with the requirements for ensuring persistent access to content for the length of time you elect to/are required to retain it.

    There was an article discussing “paperless’ and why we wouldn’t get there being linked to a lack of C-suite execs being willing to accept the issues relative to security and legal concerns. Similarly, these same execs have to buy into determining how to manage their digital haystacks if they elect to keep everything. As others have said, the volume isn’t (really) the problem, it’s the lack of order.

    “Automagical categorization” (PLEASE, don’t use the term classification) hasn’t been very successful for a number of reasons, and when you get into workflow supported systems that utilized ‘digital approvals’, it only makes the problem of linkages worse.

    Folders, boxes, labels and indelible pens, please =) Either that or develop a common lexicon, taxonomy, bulletproof file naming structure, formats that live forever, full text indexing and an ever accessible media form and we’ll be set.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s