The insights offered by Big Data are key to many businesses today. Getting the information that’s hidden within it isn’t easy but there are plenty of companies set up to help organisations do just that.
Trifacta is one such company. It specialises in cleaning and preparing data, ready for it to be mined for key information, or train machine learning algorithms.
One of its key products, Cloud Dataprep, makes use of Google’s large cloud infrastructure footprint, which puts the data preparation tool in the hands of all manner of companies. It launched as a public beta in September 2017, and is already being used by some of the biggest names in healthcare, finance and insurance.
Internet of Business spoke to Sachin Chawla, VP of Engineering at Trifacta, to find out more.
Internet of Business: Trifacta focuses on preparing both structured and unstructured data for analysis. What does ‘preparation’ mean in this context?
Sachin Chawla: “Examples of data preparation include (though this is certainly not an exhaustive list) cleaning and conforming data, extracting specific parts of text fields for analysis, or blending data that comes from multiple sources.
“To ‘Clean’ or ‘Conform’ data means to edit multiple values that mean the same thing. For example, USA, US of A, and United States may appear in a country field. Before doing any reporting, you’d want only one value representing USA. Other examples could include dates, and formats of numeric values – you’d want to align to one format.
“As an example of extracting relevant values, you might want to do analysis by area code, so would want to pull out the area code in a phone number field. Or you might have log records or XML or JSON data with important information you need to pull out into separate fields.
“When it comes to blending, often data is produced in transactional systems or log files that are purpose-built for some business function, but your analytic needs cross functions. For example, you might use Marketo to run marketing campaigns and Salesforce to sell to qualified marketing generated leads.
“In order to do any analysis on which campaigns lead to more successful deals, you’d have to blend the marketing data from Marketo with sales data from Salesforce.
“Once the above preparation is done, data is ready to be put into a visualisation tool or sliced and diced to get insights.”
Do you think that the ‘democratisation’ of data preparation and analytics beyond those with technical skills is important?
“Absolutely. Analysts need answers to their questions rapidly in order to compete and operate at the speed of modern businesses. In addition, the cost of translating data requirements to those with technical skills, getting the data back, correcting and iterating upon the output is time-consuming and expensive.
“Most of the time, analysts are doing this work themselves already, but are using antiquated tools that aren’t fit for purpose. Typically, analysts in teams that perform risk modelling, testing for compliance, creating regulatory reports are good targets.
“One company working on trials in the pharmaceuticals industry was able to speed up the process of getting lab data normalised and processed in hours as opposed to months by transitioning the work of data preparation to its scientists and analysts by way of the Trifacta data preparation platform.”
What level of technical competence do people need in order to get the most from data? How do teams use their individual roles and skills to make data work harder for them?
“People with the best context of the data are the best at deriving value from it. They simply need the right technologies in order to make the data work hard for them.
Self-service technologies oriented towards the people who have the right business context, but perhaps aren’t the most technically-minded have been on the rise over the past few years.
“Visualisation tools, for example, have allowed analysts to see and understand data with greater clarity. Data preparation platforms are solving the biggest bottleneck in the analytic process by allowing analysts to clean and prepare data for analysis themselves.
Meanwhile, organisations typically have a small number of technical resources, the number of knowledge workers abounds. By leveraging these people, organisations can truly get the most from their data at scale.”
What is the ideal composition of a data team for the modern organisation?
“A sound data strategy requires organisations to consider how to appropriately leverage the different skills of their team. In our experience in deploying data preparation platforms, we’ve found that successful organisations leverage some combination of data analysts, data engineers, data scientists, data architects, and analytics executives.
“Data analyst used to only be accountable for reporting against data, but increasingly they also are also expected to prepare and cleanse data as well. With the rise of new data preparation solutions, data scientists and IT organisations are no longer completing data preparation on behalf of analysts.
“Instead, these solutions have empowered analysts to own the entire analytics process end-to-end. As the front-line of an organisation’s analytic efforts, the number of analysts preparing data will continue to grow, as long as organisations have the right people overseeing this work, so that others in the organisation can leverage it.
“Data engineers play a growing and critical role in tying business and data preparation processes together. Data engineers are no longer just devoted to architecting databases and developing ETL processes.
Organisations have recognised that data engineers’ unique combination of technical skills and data know-how allows them to empower their more business-focused colleagues by helping them streamline and automate data-related processes.
“Meanwhile, data architects decide how data (and the tools that access it) will be configured, integrated, scaled, and governed across different organisations. Their broad interests mean they have a direct and important stake in any business project that uses data owned or touched by IT.
“Analytics initiatives need the buy-in of data architects to succeed, since they typically both govern and control the data that analysts and other stakeholders will use in these projects.
“The efforts of the data analysts, scientists, engineers, and architects are meant to fuel insights for analytics leaders. It is these insights that help to determine business strategy, new ventures, and the future growth of the organisation.
“In today’s data-driven world, they are essential. The more insights that analytics leaders have, the better armed they are to make the right decisions.”