Data Preparation: Meeting the Challenges of Analytics-Ready Data

Posted by Kelly Schupp, VP, Data-Driven Marketing on Feb 16, 2017 4:54:06 PM

Excerpt from Eckerson Report, Big Data Management Software for the Data-Driven Enterprise, by David Wells, Research Analyst with Eckerson Group.

The once simple world of data preparation - ETL for operational data integration - has become increasingly complex. Terms such as data wrangling and data blending indicate some of the challenges. The exciting work of analytics doesn't work well until the data is ready for meaningful analysis. The scope of big data, the variety of data uses, and the emergence of business-friendly data visualization and analysis tools all contribute to the complexity. Intuitive, non-technical user interface combined with machine intelligence minimize complexity, reduce waste and rework, and accelerate the data preparation process. Data preparation tools can shift the balance of analyst time-no longer 80% preparing data and 20% analyzing, but 20% for data preparation and 80% for analysis.

Data preparation tools can shift the balance of analyst time-no longer 80% preparing data and 20% analyzing, but 20% for data preparation and 80% for analysis.

Data Preparation Tool Users

The driving force of data preparation technology is self-service analytics. The user base, however, reaches beyond those engaged in self-service. Both business people and data people-both non-technical and technical-find value in the technology.

  • Data scientists use data preparation tools to understand data and prepare it to conform to the preferences of specific analytic algorithms.
  • Data analysts though typically more technical than business analysts, have similar appreciation for speed and ease-of-use and will rely extensively on data discovery functions.
  • Business analysts truly appreciate data preparation tools that provide easy to use, non-technical capabilities for self-service data preparation.
  • Information workers often need to turn data into information quickly without need for deep technical knowledge and without being dependent on IT projects when new information is needed.
  • IT staff such as developers and data providers accelerate projects, deliver data to the business, and quickly generate test data with fast and easy data discovery and data transformation.
  • Data engineers who design and build data structures such as warehouses and data lakes must have good knowledge of data sources-a need that is supported by data discovery. Based on sources they design target data structures, aided by structural discovery and machine-learning based suggestions and recommendations.

Data Preparation Functions

The three primary functions of data preparation tools are discovery to understand data content, transformation to reorganize or restructure data, and governance for useful and trusted data. Data discovery processes find meaning, patterns and specific items in a collection of data. Data transformation processes change data to improve, enrich, combine, format, restructure, or otherwise shape and organize data for a specific use. Governance processes perform data validation and data protection functions and manage the metadata essential to know and trace data lineage.

Data Discovery

Data discovery encompasses the activities of exploring datasets to understand and profile content, evaluating and selecting data, and sourcing data for analysis.

Data Transformation

Data transformation is the process of changing data to meet specific needs and goals-getting the right data in the right forms and formats for analytics. Data transformation activities are performed to improve data, enrich data, format data, and blend data from multiple sources.

Data Governance

Governance functions of data preparation tools focus on data validation, data protection, and managing data lineage and traceability.

Read full report here and register for the upcoming webinar, "Everyone is a Stakeholder in a Data-Driven Enterprise" with Kelly Schupp and Dave Wells.

Topics: Big Data Ecosystem, Data Management