Response to NARA’s Capstone Email Bulletin

On June 6, 2013, the US National Archives and Records Administration published a call for comments on its draft Bulletin regarding a proposed “Capstone” approach to email retention at federal agencies.  NARA was having technical problems with its comment system when I tried to submit my comments, so based on their instructions I have submitted my comments to them directly by email, and I am also posting them here. 

You can find the request for comment and the draft Bulletin on NARA’s website.

Feedback on NARA’s Capstone Email Records Management Bulletin

As requested, I am providing comments on the “Capstone” approach to email management outlined in the June 6, 2013 draft NARA Bulletin provided above. Thank-you for the opportunity to provide input on this important issue.

I am the founder and principal of an information governance consulting firm based in New York. Since 2001 I have advised many organizations and government agencies on the development and implementation of email retention strategies.

Based on my experience and research, I believe that most organizations currently fall into one of two email records management camps.

The first camp does very little. While they may impose mailbox size limitations, they provide sparse guidance to employees who are forced to delete messages to meet these quotas. Consequently, business records are likely lost – especially if no storage space is allocated for retention of records that simply happen to reside in the email system.  Others allow – or turn a blind eye to – the practice of employees exporting email messages out of the corporate email system so they can be tucked away in shared drives, thumb drives, or taken home for “safekeeping.” This practice results in an effective loss of management control over records found in the email system, and can greatly increase collection costs and increase spoliation risk in e-discovery.

The second camp “manages” email, but treats all email messages equally, regardless of their content. Some – seeking to minimize the cost and potential risk of email – automatically purge all email older than 30, 60, or 90 days. In the absence of a method to capture email messages containing record content, records are surely lost – violating laws that require retention of specified records, regardless of their form. Others – perhaps inspired by SEC Rules 17a-3 and 17a-4 and the email archiving software industry that those Rules singlehandedly created – capture a copy of all messages sent and received and keep them in a separate archive for a fixed period of time. This approach ignores the reality that such an archive will undoubtedly contain both trivial content and critical business records. From a compliance perspective, this may be just fine if you are a broker-dealer subject to these unique, email-specific Rules, but is less fine if you are, like most of the business world, subject to retention rules that do not exempt or treat email in special way, but rather require identification and retention of business records regardless of the form they take.

There are of course other approaches to email retention, one of which is outlined in your draft Bulletin. As I understand it, Capstone is a role-based method that uses the role of the email creator/recipient as a predictor of the content of that user’s account. In the past I have advocated such an approach to clients as a pragmatic method for improving otherwise nascent email records management practices.

NARA should certainly be commended for embracing such pragmatism, and in recognizing that complex user classification systems are often impractical and lightly adopted.

However, I would like to share two additional ideas that may be helpful as NARA finalizes its guidance.

First, while a knowledge worker’s role can certainly be a predictor of an email message’s content, our research has shown us the limits of this approach. We have assessed role-based approaches at client organizations by analyzing actual email accounts sampled from a range of user roles. We have then estimated the percentage of email content that would require retention under the client’s own retention rules. Across a range of users we have found as little as 5% and as much as 95% record content. There is certainly some correlation between the percentage of record content and the role of the user, but it is not always categorical. For example, some users are mostly information processors, and thus may have an extremely high percentage of email records in their inboxes.

Consider for example, a claims processor who receives a partially completed claims form attached to an email message, opens that form and completes it using information they possess, and than sends the completed form to an employee who represents the next link in the processing chain. This scenario is very common, even in large organizations. Assuming that these completed claim forms are records, and that they are not otherwise captured in a content management system, this user’s email account is quite important from a records management perspective.

However, a Capstone system based solely on seniority (i.e., “officials at or near the top of an agency,” as described in the Bulletin) may miss this important account and result in such records disappearing as “temporary” records. Conversely, senior officials may have a relatively low percentage of record content in their email system when they use other systems to communicate their decisions, document those decisions formally, or otherwise use other official or formal systems to complete their work.  Capture and permanent retention of their entire account then, would result in retention of largely trivial content.

These issues can in part be addressed by careful examination of the way email is used by each agency and its users, as mentioned in the Bulletin.

Second, I wonder if NARA is turning away from a content-based approach to record identification and retention too soon – in fact, act just at the time in history when technology to enable semi-automated, content-based approaches is becoming widely available. Our clients are currently evaluating and implementing technology from OpenText and Recommind (there are other providers in the market as well) that marries human and machine intelligence to remove the classification burden from the user. Such systems are by no means trivial to implement and configure, but I believe that they point the way forward for email records management. The effectiveness of automated statistical methods for content classification has been demonstrated effectively in the intensely observed world of US civil litigation; a demonstration that I believe provides a foundation for it application to the records management problem.

Further, while the Capstone method would seem – as noted in your Memo – to foster compliance with the “OMB/NARA M-12-18 Managing Government Records Directive” requirement to “manage both permanent and temporary email records in an accessible electronic format,” I wonder to what extent it addresses the spirit of Section A3 of the Directive to “investigate and stimulate applied research in automated technology to reduce the burden of records management responsibilities?”

Once again, thank-you for the opportunity to provide feedback on this important Bulletin, and I am confident that NARA will continue to provide leadership as federal agencies continue this critical transition.

14 comments

  1. Tod Chernikoff, CRM, CIP

    Barclay:

    I see a great deal of validity in your assessment/comments. My experience in working with agencies that are subject to NARA’s regulations follows your research findings. Those individuals and groups that deal with the mission of the organization – the widget-builders I call them (claims processor in your example) tend to have the greatest amounts of records and information, and it tends to be much more routinized, whereas the high-level managers, directors, and the like, tend to have a very small percentage of record content – much of it controlled or filtered through their assistants and subordinates.

    The Capstone approach, based solely on seniority, is a skewed approach that may not properly capture truly important content, while sweeping up a great deal of trivial or transitory content into long term repositories. Not the best use of this type of resource, and as you rightly note, not totally in line with the requirements of Section A3. This approach may relive the classification burden from those in the upper levels, but that is not where the bulk of the burden exists, and further, where the bulk of the burden exists, is where it is likely simpler to use automation to collect and classify the records and information.

  2. Marty Heinrich

    Very good comments and I agree that , if proven and viable, autoclassification and cleanup technology should be considered as an alternative to grabbing and preserving everything from a user’s account. I would like to see documentation and statistics of the proven use of auto-categorization on super-high volume email environments (similar to those in large federal agencies) to accurately identify and classify records against a disposition schedule.

  3. Barclay T. Blair

    Thanks for taking the time to comment, Tod and Marty. I think anyone who is working in or interested in this space should weigh in, as you have, because whatever NARA decides to do will not only be important for federal agencies but will also be influential in the private sector. Marty, I have been having similar conversations about the need for data around auto-classification in this use case. Lots of data exists in the e-discovery context, but not as much in information governance. We need it, and the industry in general needs it to support the next generation of email (and information) governance and management.

    • Marty Heinrich

      Interesting comments by Tod regarding the volume of true records at the senior/executive level vs lower levels and how this may be a flaw in Capstone’s approach.
      I see large agencies strongly considering this approach (even before NARA announced it) because efforts to have end-users involved in the selecting, clicking, dragging/dropping to create and classify emails have failed miserably.

  4. Ron Layel

    Barclay, thanks for posting your response to NARA on this; and for providing a space here for others to share comments and reactions. I’ve submitted some comments directly to NARA; and I’ll copy part of it here to add to the conversation with your followers:

    My primary comments and questions on this Capstone approach Bulletin are:

    1. I wonder if this does not require a revision to Federal Regulation (36 CFR 1236) because –

    a. It modifies the basic definition of Federal Record (at least for Records in Email media) by allowing agencies to manage all Emails of certain specified user accounts as either Permanent or Temporary records for purpose of retention and disposition;

    b. It contradicts the fundamental premise of Federal Records as being “media neutral” by singling out this particular media type (Email) to be managed differently from all other Record media types. If an exception is needed for Email, this should be explicitly stated as an exception to existing Regulation and NARA guidance; and it begs the question of when/how similar exceptions will be made in the future (not far off) for other high-volume electronic record information such as that created in social media, instant messaging and various electronic collaboration/communications platforms.

    2. The apparent intent that this Bulletin and the Capstone approach apply to Temporary email records as well as Permanent email records is not made very clear in the draft; and it gives no clear indication of how this approach could be implemented within agencies for managing Temporary email records. For example –

    a. Is it intended that this Bulletin will give agencies authority to establish other “Capstone-like” and other groups, defined on basis of “the work and/or position of the email account owner”, and that all emails in these accounts will also be captured and then managed as Temporary Records for retention/disposition?

    b. If “yes” to above, are the emails of all employees and contractors with an Agency email account to be handled in this manner? This would mean that “Non-Capstone-temporary” groups must be defined, established and maintained for every category or sub-category of work and/or position in the Agency; and –

    c. That new agency record retention schedules will need to be developed and submitted for NARA approval for each of these defined work/position category groups to govern retention/disposition of their emails as Temporary Records – presumably with different retention periods for each group determined by some logical relationship to email record content that is typical of employees in particular work/position categories. Examples –

    i. All email accounts for departmental managers (not “Capstone” senior executives): cutoff annually and retain as temporary records for 5 years because most managers’ records are administrative, requiring -year retention under current agency schedules.

    ii. All email accounts for those working in Contracting/Procurement: retain as temporary records for 10 years because contract period of performance is average 5 years, and current schedules for contract records require 5-year retention after contract close.

    iii. All email accounts for those working in Human Resources: retain as temporary records for 50 years because many HR records are active until subject employee leaves federal service; average length of employment is 20 years, and some schedules for HR records require 30-year retention after employee separation from service.

    3. This Bulletin also deviates from current NARA guidance in that it excludes (by media type; not content) from the existing requirement that agencies must declare “Official Federal Record” information and manage it throughout lifecycles separate from non-record information. If this is the intent (as I believe it is and agree is the only practical approach in this case) it is precedent setting and may set the stage for a more general acknowledgement and direction away from Record “declaration”. I think the Bulletin should address this, and possibly state that an exception is deliberately being made now of Email information, due to the infeasibility of making these distinctions based on email message content.

    • Barclay T. Blair

      Ron, thanks for sharing the comments you provided to NARA here – I think it will help to foster the kind of discussion we need to have about this approach. You have done an excellent job of identifying the significant practical and regulatory impacts of a fundamental change from content-based classification and retention to media-based classification, which is what Capstone represents. Really insightful.

  5. christianpwalker

    Barclay – in addition to the over / under retention risks that you’ve highlighted, there’s also loss of the thread (i.e.: context) that’s bound to happen. Another potential issue is burden on the users when they’re trying to find information. NARA is addressing only the file-ability of the information; they’re completely missing the find-ability piece.

  6. Scott Burt

    Hi Barclay, thanks for this thread. Interesting that there is so much activity on your Blog on Capstone, yet not on NARA’s. I posted some comments to NARA as well on their blog that have yet to post, and I am providing here as well.

    First off, my compliments to NARA for their leadership and, on this topic specifically, continuing its investments and guidance for handling this important, yet difficult, problem of email management.

    With investments and costs in mind, I would like to inquire as to the research performed by NARA on this topic specifically as to the costs of implementing Capstone. Could you please provide comment to this?

    Specifically, every decision or policy such as this has an up-front cost and an operational, or long-term cost. The benefits of Capstone for the up-front costs are more readily apparent. It’s an easy policy and technology implementation. The challenges in Capstone will lie in the future. The long-term costs of managing billions of email messages that are of no value, but were over-retained due to the role-based retention, will be staggering.

    We see this every day as we advise clients that made similar ‘easy’ decisions to ‘just keep everything’. Today, organizations that have a small fraction of the employees that the Federal Government has are being overwhelmed with many hundreds of terrabytes of email. ‘Storage is Cheap’ is a common misunderstanding. Just ask any IT manager who routinely has to purchase million dollar storage systems to handle the ever-growing storage footprint in their enterprises. But the cost of storage begins to pale when compared to the manpower and technology costs as old systems are retired and these volumes of email must be migrated – without loss of integrity and with a chain-of-custody – to new hardware and software systems.

    As Barclay Blair asserted in his response to Capstone, “I wonder if NARA is turning away from a content-based approach to record identification and retention too soon” . Yes, a lot of organizations have failed with classification of email based on each individual messages content and value. Today, we are able to stand on the shoulders of those that have preceded us and through a simple and intelligent combination of process, technology, and people we are seeing companies be very successful with content aware classification. It is some more effort up front, but the long-term savings are huge.

    This approach does not require NARA to rewrite Federal Regulation (36 CFR 1236), as suggested by Ron Layel (also on Barclay Blair’s blog and sent directly to NARA). Records are media neutral, and it should matter not if the record occurs in email, paper, or other form.

    To serve the needs of government and business to properly manage records based upon each message’s value, our company developed the solution that does work. The solution lies not at the extremes of 100% auto classification, or 100% user classification, but instead of the logical intersection of various approaches. This hybrid model brings out the best, and mitigates the negatives, of various approaches to email management. The keys are this:
    > Exception based processing (users need only take action on 0% to 2% of their email on average)
    > Auto classification with human oversight
    > Simple, and integrated, User Interface with key capabilities including personalization

    Back to the Capstone recommendation of role-based retention, the long-term costs must be considered. These costs of over retention include the following:
    > Storage
    > Hardware and Software upgrades, and content migrations
    > FOIA requests across the huge volumes of email
    > eDiscovery costs including preservation and review
    > Manpower for management of these large data stores
    > Privacy and Personal Information concerns
    > Reputation and legal risk costs

    Thank you for requesting comments, and thank you for your consideration,

    Scott Burt
    Integro.com

  7. pegduncan

    Barclay, thanks for providing an opportunity to comment on Capstone, and thanks to others providing their reactions and thoughts on your blog.

    I’m not sure that the Capstone approach is based solely on seniority. Rather, in item 6 of the draft bulletin, it says, “When adopting the Capstone approach, agencies must identify those email accounts most likely to contain records that should be preserved as permanent. Agencies will determine Capstone accounts based on their unique business needs. Capstone accounts must capture the range of individuals who, by virtue of their office or position, are likely to engage in activities that create or receive permanently valuable Federal records.” I would include your claims processor in that definition.

    I’m also not sure that NARA is turning away from technology-based solutions, for in item 7, it says, “Evolving technologies, such as auto-categorization and advanced search capabilities, may enable agencies to cull out transitory, non-record, and personal email.” NARA may have concluded that auto-classification for -records-, as opposed to identification of ROT, may still be too imprecise to be endorsed in a bulletin.

  8. Pingback: Why NARA has no option but to preserve significant e-mail accounts | Thinking Records
  9. Pingback: IRMS005 – Barclay T. Blair on big data, information governance and records management | IRMS podcast series
  10. Pingback: The Ontario gas plant cancellation records deletion saga from a recordkeeping perspective | Thinking Records

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s