Visualizations created from Machine Learning Predictions

In addition to adding the calculated column to the data table, the application creates the following templates for you:

Model Analysis

When the calculations are complete, Analytics Explorer opens on the Model Analysis tab. This tab has 2 charts:

Prediction Correlation—shows the correlation between the original data and the predicted data.
Importance per Variable—shows the importance of each input variable in the building of the model.

Input Distributions

Displays a histogram for each input variable showing the number of rows of input data for each value in the variable.

Predicted Optimal Input Summary

Displays a box chart of each input column showing key statistical measures, the Q1, median, and Q3 values. the actual values are listed below each chart.

Note that each template produces its own visualizations tailored to specific workflows. Online resources and templates include the following:

Predicted Optimal Input Distributions

The idea behind the predicted optimal distributions is that you select an output column that you wish to maximize (Crude output over a well’s first 12 months, for example). If you choose to perform the feature analysis, the algorithm gives you a numeric range (on the boxplot) and visual distributions for each important variable (defined as being inside the top 90% total importance). These distributions show the predicted values that will lead to the highest output values for the selected label column.

The numeric range is obviously useful at a glance, but the distributions help show how strong the relationship is. The tighter the distribution, the smaller the confidence interval, and the more sure that you can be that those values will produce high output values. If the distribution is very wide or is multi-modal, then the underlying relationship might not be as strong.