By Matthew Bernstein, Information Management Strategist, Bernstein Data
IT solutions that discover, classify, and govern information at massive scale are now readily available and can be powerful tools for Information Management professionals. But the diversity and complexity of available solutions that extract and analyze data are overwhelming, and the cost and challenges of implementation are difficult to anticipate.
How does the Information Management professional advise their organization on choosing such a tool?
In implementing any IT solution there are the standard reliability and security concerns, and deployment costs and efforts must be considered. Adequate security and business continuity are table stakes for any solution to be considered, and deployment and cost issues are critical.
But Information Management (or Governance) “functionality” should be the paramount consideration. Fundamentally, the question is: does the solution deliver the data insights, and support the actions, that will enable the organization to achieve its Information Management objectives (such as records retention, privacy, and information security)?
Some of the key “functional” questions to be considered are: Which “Data Discovery” solutions can find data in the relevant data repositories and with meaningful results? Will these tools support the use cases that an Information Management professional is addressing? Can the solution implement “governance” decisions: taking action on data discovered and classified?
These overarching objectives of insight and action may be assessed by considering four dimensions of functionality, as outlined below. While these four categories are comprehensive, the list of questions and examples does not exhaust the issues. But a framework for assessment such as this can help clarify and support decision-making.
Discovery
-
Can the tool examine all the relevant data repositories in use and necessary to meet the organization’s objectives? These could include network file shares, MS365 (Azure Outlook online) and Outlook on prem, enterprise SaaS applications (such as Salesforce), Mac and Windows environments and applications, Google Workspace, online collaboration tools (such as Slack and Teams), and desktop machines.
-
Can the tool analyze the diversity of file types in use, not just the obvious Office 365 applications? This might include email content and attachments, PSTs, PDFs with hard to “OCR” content, and electronic messaging, to name just a very few.
-
How does the tool initially discover potentially relevant data: does it look only at metadata or does it examine content?
Classification
-
Do the solution’s search algorithms incorporate predefined parameters for the Information Management or Governance use case Parameters in question? In other words, is there an “out of the box” baseline set of keywords, regular expressions, and patterns that serves your purpose, or is the tool so “flexible” that you must do all the work?
-
Can the tool conduct “policy simulation” (testing current compliance) by easily incorporating key terms, taxonomies, and retention periods from existing policies and schedules and evaluating where and how that data is managed?
-
What approaches does the tool use to “classify” files and data? What are the NLP (“natural language processing”) techniques employed and why? Does the solution incorporate Machine Learning to discover and classify files, documents, and data that cannot be discerned with NLP techniques (e.g., due to lack of regularly-occurring keywords)?
-
Can multiple Discovery and Classification techniques be combined in the development of use cases, creating a comprehensive and integrated approach to classifying information?
-
Can the user adjust the sensitivity of classification to vary between assuring all potential examples are captured (“recall”) versus finding only results that are highly relevant (“precision”)?
Governance
-
What is the ability of the tool to apply “tags” or “labels” to files and can they be applied both “on prem” and in cloud?
-
Can the solution act on an identified “class” of data, by copying, moving, archiving, or deleting that data, or are specialized tools (and organization resources) required? If an archive is to be created, can automated “retention” be applied, such that actions such as deletion will be triggered at a later date?
-
Will the tool continue to monitor and classify data, automatically identifying information of concern on an on-going basis and reporting this data for review and governance actions?
Reporting
-
Are users able to easily see and report consolidated results of multiple searches and classifications applied to large amounts of data?
-
Can results be reported for use cases or classes of data that integrate metadata, NLP, and Machine Learning techniques?
-
Are visualization tools integrated in the solution or are significant data export, manipulation, and reporting resources and efforts required?
By applying a framework, such as the above, to evaluate “core functionality” at the same time as security, cost, and deployment issues are being considered the Information Management professional can avoid some of the major pitfalls that result in unsuccessful projects. Choosing the right tool – such as not buying a “bazooka to kill a mosquito” – is critical in achieving an organization’s objectives.
Download the full free eBook that talks about how to reach your organization goals by improving your skill sets