Cluster Analysis
Cluster Analysis or Clustering is the process of dividing the population or data points into a number of groups with data points in the same group having similar user-selected attributes or characteristics. For example, this cluster analysis workflow can identify areas in a map that can be related to the best producing wells which can in turn help you propose new wells on an exploratory basis and interpret and analyze goelogical facies.
To run Cluster Analysis:
-
In Spotfire with your data loaded, go to Analytics Explorer > Machine Learning > Cluster Analysis
-
In the Cluster Analysis dialog box, select the data table to run the analysis on and the features to include in the analysis. If you want to limit the analysis to marked rows, check Limit data using markings. The default is to use all data in the selected table.
Other settings include the Algorithm (currently SOM), the Algorithm Settings, and the Visualization Settings. See Cluster Analysis dialog box for details.
-
The current available algorithm is Self-Organizing Map (SOM). Default algorithm settings have been selected and tested by our data analytics team. However, you can adjust the settings if required. See Algorithm Settings for a brief description of each. Mastering the settings is beyond the scope of this document.
-
Specify the visualization settings - the attributes for the X and Y axes for the scatter plot. You can also change the axes attributes on the chart itself after the analysis has been run. The application will default to X and Y values in the table if XY columns exist.
-
(Optional) Save your selections as a template to load for additional runs. The template includes the data table, input attributes and all settings.
-
Click Run. Cluster Analysis produces the following charts:
-
Star Chart—each cluster gives a diagrammatic representation of the relative values of each feature in that cluster. The longer the leg of the star, the larger the relative value of that feature. reference to the cell ids of the scatter plot. The XY values in the chart reflect the range determined from the specified Map dimension in the Algorithm Settings. Highlight cluster(s) to view the corresponding cell(s) and bars in the other visualizations.
-
Self-organizing map—each cell represents a particular cluster. The cell color corresponds with the cell color on the feature scatter plot. Each cell on the scatter plot is mapped to a cluster visible by color.
-
Item properties—this chart shows a pattern of the selected cells, which helps to understand the underlying reationship betweeen associated data points. The data points are normalized to butter highlight the patterns.
-
Feature scatter plot—each cell in the scatter plot is mapped to a cluster on the self-organizing map and star chart based on cell id. The colors in the scatter plot do not match the colors of the cells in the Self-organizing map. The X and Y axes are determined by the specified Visualization Settings in the Cluster Analysis dialog box.
For a visual tour, see the Golden horizon example.