TY - JOUR
T1 - A comprehensive comparison of two variable importance analysis techniques in high dimensions
T2 - Application to an environmental multi-indicators system
AU - Wei, Pengfei
AU - Lu, Zhenzhou
AU - Song, Jingwen
N1 - Publisher Copyright:
© 2015 Elsevier Ltd.
PY - 2015/8/1
Y1 - 2015/8/1
N2 - Permutation variable importance measure (PVIM) based on random forest and Morris' screening design are two effective techniques for measuring the variable importance in high dimensions. The former technique is developed in the machine learning discipline and widely used in bioinformatics, while the latter technique is popular in scientific computing. We present three main contributions to variable importance analysis (VIA). First, through theoretical derivation, we show that the PVIM converges to double the non-standardized Sobol' total effect index. This observation indicates that the PVIM is especially useful for variable screening as it captures both the individual and interaction effects. Second, three numerical examples with different types of model behavior are presented for comparing the performances of these two techniques. The main conclusions are as follows. For high-dimensional additive or approximately additive models, the PVIM is much more efficient than Morris' screening design when used for both variable importance ranking and variable screening. For high-dimensional models mainly governed by interaction effects, the performance of PVIM degrades, but it is still a competitive technique. Finally, the two techniques are applied to an environmental multi-indicators system for improving the robustness of the partial order structure of this system.
AB - Permutation variable importance measure (PVIM) based on random forest and Morris' screening design are two effective techniques for measuring the variable importance in high dimensions. The former technique is developed in the machine learning discipline and widely used in bioinformatics, while the latter technique is popular in scientific computing. We present three main contributions to variable importance analysis (VIA). First, through theoretical derivation, we show that the PVIM converges to double the non-standardized Sobol' total effect index. This observation indicates that the PVIM is especially useful for variable screening as it captures both the individual and interaction effects. Second, three numerical examples with different types of model behavior are presented for comparing the performances of these two techniques. The main conclusions are as follows. For high-dimensional additive or approximately additive models, the PVIM is much more efficient than Morris' screening design when used for both variable importance ranking and variable screening. For high-dimensional models mainly governed by interaction effects, the performance of PVIM degrades, but it is still a competitive technique. Finally, the two techniques are applied to an environmental multi-indicators system for improving the robustness of the partial order structure of this system.
KW - High-dimensional model
KW - Morris' screening design
KW - Permutation variables importance measure
KW - Random forest
UR - http://www.scopus.com/inward/record.url?scp=84930204692&partnerID=8YFLogxK
U2 - 10.1016/j.envsoft.2015.04.015
DO - 10.1016/j.envsoft.2015.04.015
M3 - 文章
AN - SCOPUS:84930204692
SN - 1364-8152
VL - 70
SP - 178
EP - 190
JO - Environmental Modelling and Software
JF - Environmental Modelling and Software
ER -