Auto Imputation

Auto imputation refers to the process of automatically filling in missing or incomplete data values within a dataset. Auto imputation techniques leverage statistical and machine learning algorithms to infer the missing values. These algorithms analyze the patterns and relationships in the available data to make informed guesses about the missing values. Analytics Explorer uses the XGBoost algorithm to impute the data.

The Auto-Imputation function generates the following visualizations for the selected table:

  • Auto-Imputed Selected Attributes

  • Data comparison

  • Missing Value Map

  • Auto-Imputed Results

Before running auto imputation, know your data. The column your are predicting needs to contain at least 2 values. The percentage of valid data for each table is listed in the right column.

To run auto imputation:

  1. In Spotfire, load the data table or dxp file: File > Open.

  2. Still in Spotfire, select Analytics Explorer > Data Preparation > Auto Imputation

  3. The Auto Imputation dialog box has 3 sections:

    • Selected data table- displays the selected table name

    • Columns to auto-impute - a list of the attributes selected for imputation

    • Supporting columns with high data density - tables that will be used to train the model.

Select the columns that you want to auto impute, and the supporting high data density columns to be used in the auto-imputation process. Note that auto imputation will only be run on numeric data columns with a data density greater than 25%. This supporting data is included for training purposes to build the machine learning models.