Fault diagnosis is an important component of process monitoring,
relevant in the greater context of developing safer, cleaner and
more cost efficient industrial processes. Data-driven / feature
extraction approaches to fault diagnosis exploit the many
measurements available on modern plants. Certain current feature
extraction approaches are hampered by their linearity assumptions,
motivating the investigation of nonlinear methods. This work looks
at using random forests in fault diagnostic frameworks.
Random forests are recently proposed statistical learning tools,
deriving their predictive accuracy from the nonlinear nature of
their constituent decision tree members and the power of ensembles.
Random forest committees provide more than just predictions; model
information on data proximities can be exploited to provide random
forest features. Variable importance measures show which variables
are closely associated with a chosen response variable, while
partial dependencies indicate the relation of important variables
to said response variable.