Defensible Disposition of Structured Data Part 2

Part 2.

To begin with, there is no “Easy Button” for the time consuming, labor-intensive categorization of information in structured data systems. This is because structured data presents the following challenges:

  • A disproportionately large percentage of “Dark Data” and “Big Data” is contained within structured data. For instance, one global client of ours deployed SAP for one of their divisions, and within months of the deployment there was a severe spike in the volume of data. The client called it the “hockey stick” effect on a growth chart. Witness the following chart.

IBM-Struct-Data-Growth

  • Identification of the “record” is comparatively easy to all other non-records or data. This is because such non-record data may have a temporary use for business purposes, as opposed to the “record” which is prescribed and defined.
  • It is difficult to apply business rules to structured data, because the rules may not be uniform from system to system such as Oracle/SAP vs. IBM/DB2 vs. HP vs. other similar database-driven software.
  • Knowing what conditions must be in play to defensibly dispose of this data is not well understood, which is in part driven by the third bullet above but also fueled by the lack of understanding or fear of regulations and legal repercussions.
  • The “Right to be Forgotten” should be an important driver for Defensible Disposition of Structured Data, specifically web based data. U.S. and foreign data disposition requirements mean that you cannot simply retain all data simply because storage is cheap.

Besides the above big picture considerations, some sobering metrics also make the case for devoting time and attention to this problem. For instance, information has quadrupled between 2011 and 2015. As of 2013, to find useful information business users waded through:

  • 5,170,526,409 documents of which 1,551,157,923 are over 3 years old
  • 40,130,881,407 application records

As of the end of 2015, this will balloon by 3 billion documents and 50 billion more application records to slow our people and systems:

  • 8,441,675,770 documents, 60% of which are over 3 years old and 1.5 billion will be over 6 years old
  • 92,718,388,402 app records

There is currently no mechanism for the business to indicate when they no longer need data and documents. They will drown in debris and the cost of managing this debris will waste budget. Furthermore, as of 2015, 106 petabytes of data will cost $355 million — $150 million more than 2013 — thereby stressing our operations:

  • In 2013, we backed up 12x production volume so 72,387,369,729 documents were stored (35PB in back up)
  • In 2015, this will bloat to 118,183,460,783 documents and 57PB of back up
  • 40,130,881,407 database records in production should more than double to 92,718,388,402 by the end of 2015
  • There will be 564,298,316,608 database records across our full environment as of the end of 2015 – more than 2x 2013 and xx 2011 — presenting considerable application, infrastructure, data quality, security and operations challenges

Data centers and staff will handle 4x more data in 2015 and almost 10x more data in 2017 than they did in 2011.
Finally, in terms of e-discovery, costs are currently estimated at $8.14 per document. Therefore, 67,216,843,320 documents represent a financial exposure of $547,145,104,625:

  • 0.01% of this exposure is $5,471,451,046 in potential unplanned expense
  • In 2015, document volume balloons to 109,741,785,013 and total exposure to $768,192,495,091; 0.01% discovery would cost $7,341,725,417 if discovery costs decline 10% every year
  • As the scope of eDiscovery continues to expand, 92 billion application records on hand in 2015 will also come into play
  • Failures in legal holds and much longer lead times to collect and process this data volume are highly likely
  • In government-driven extreme eDiscovery, exposure to very high volume and very high expectations for response speed lead to extreme cost at shareholder expense.

The above scenarios thus beg the question: How can companies reduce their IT costs and at the same time ensure correct governance over their data throughout its lifecycle? This is the topic that we will cover in the next instalment of our blog.

Information Governance Solutions would like to acknowledge and thank IBM for much of the enclosed data.

By: John Isaza, Esq., FAI and Tom Reding, CRM
Information Governance Solutions