Artificial intelligence (AI) is one of the biggest technology trends of the coming decade. In an increasingly digital world, propagating and collecting data are the default state of modern business and all internet activity. The problem for businesses is no longer the lack of data, but an excess of it. Despite the enormous data available to industrial companies, for most, their AI systems are not delivering the insights that they expected. The solution lies in filtering data so that the right data gets to AI systems. This “smart data” approach will allow AI systems to generate the kind of insights that we have expected. 

What is Smart Data?

AI is a key component of the fourth digital revolution. AI unearths insights from Big Data, insights that no human being could possibly unearth. The more data that AI has, the, the more variables it has, the longer its timescales and the greater its granularity, then, the greater the potential insights that it has.

AI can leverage years of data to discover the optimal parameters for industrial processes using controlling variables. These insights can then be used in these industrial systems to get them to work better than they did before. 

Despite the promise of AI, many industrial companies are yet to see the benefits of propagating and collecting so much information. According to McKinsey, although 75% of industrial companies have tried some kind of AI system, only 15% have enjoyed any meaningful, scalable impact from AI. McKinsey identifies the lack of operational insight into their usage of AI. This approach can be successful, but usually only within very specific parameters, and often with frequent retraining, lots of inputs, and sometimes, it leads to physical or unrealistic results. Therefore, these AI models cannot really be used in the real world or to get the kinds of meaningful change that its users expect. What you get are teams that become frustrated with the system and lose faith in AI.

Smart data is the solution. In order to leverage big data to get the kind of insights that it is expected to get, data has to have fewer variables governed by feature engineering based on first principles. This re-engineering of the data to produce smart data, added to more appropriate training can lead to superior returns of between 5% and 15%.

Smart data has been defined in a number of ways, but the essential features are that it refers to data that has been prepared and organized where it was collected in order for it to be ready and optimized for data analytics of higher quality, speed and insight. 

At a 2018 conference, Donna Ray, then executive director of the U.S. Department of Homeland Security’s Information Sharing and Services Office, said her “teams spend about 80% of their time just searching, ingesting, and getting data ready for analysis”. The smart data approach has helped federal agencies optimize their processes and speed up their operations and make them more intelligent. Wired described smart data as “Smart data means information that actually makes sense”. 

How Do You Generate Smart Data?

Get your Energize the Data! t-shirt out and let’s look at five steps to creating smart data. 

  1. Define the Data

The first step toward creating smart data is defining the process as you would a full coverage painting & flooring project. What this means is that processes must be broken down into clearly outlined steps for the company’s plant engineers and experts, with physical and chemical changes sketched out. The business’ critical instruments and sensors, limits, maintenance timeframes, measurement units, and their controllability must be identified. In physical systems, there are elements of determinism governed by clear equations. These equations must be noted as well as their variables. Teams must also understand the literature around these equations, in order to add to their own understanding.

  1. Enrich the Data

We’ve all heard the expression, “Bad data in, bad data out”, but the reality is, all data is in some sense bad data. Raw process data always has some deficiencies. So, your task is to improve the quality of the dataset, as opposed to increasing the amount of data available. Nonsteady-state information must be weeded out aggressively. 

  1. Reduce the Dimensionality

AI builds models by matching observables to features. In order to get a generalized model, the number of observations must be far in excess of the number of features. Inputs are often combined in order to generate new features. Factoring in the wealth of sensors that the typical plant has, the result is a vast trove of observations. What should be done, however, is to use inputs that describe the physical processes involved, funneled through deterministic equations, to reduce their dimensionality while also creating features that have intelligently combined sensor information. 

  1. Apply Machine learning

Industrial processes have deterministic and stochastic components. First-principle based features supply the deterministic components, and machine-learning the stochastic. Features should be evaluated to assess their importance and explanatory power. The most important, ideally, should be expert-engineered features. 

Plant improvements should be the focus of models, rather than achieving a maximum of predictive accuracy. High correlations are a feature of all process data. Correlations can therefore be meaningless. What is needed is to isolate causal elements and controllable variables.

  1. Implement and Validate Models

In order to actually enjoy the meaningful impact that is expected, models must be implemented. Results need to be continuously assessessed through the examination of key features to see that they match physical processes. Partial dependence plots must also be reviewed so we learn about causality and controllable elements must be confirmed. 

Operations teams must be consulted and made a critical member of the process to better understand what is implementable and what performance expectations make sense. Operators in control rooms need to get model results as they are generated, or teams must conduct on-off testing so that management can determine if it is worth investing capital in full-scale solutions. 

Conclusion

AI has enormous promise and certainly, with the wealth of data that is propagated and collected today, it is counterintuitive to suggest that limits or guard-rails need to be placed around that data. Yet, Big Data often fails to yield meaningful AI insights. Smart data can ensure that AI can deliver the meaningful impact that we expect.