Doctoral Degrees (School of Public Leadership)
Permanent URI for this collection
Browse
Browsing Doctoral Degrees (School of Public Leadership) by Subject "Artificial Intelligence"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemLabel-dependent splitting for multi-label data(Stellenbosch : Stellenbosch University, 2023-12) Muller, Annegret; Steel, S. J.; Sandrock, T.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY: Multi-label classification problems arise in scenarios where every data case can be associated with multiple labels simultaneously. Compared to single-label data, multi-label data possess unique characteristics which result in additional challenges when analysing the data. The aim of this dissertation is to address two of these challenging aspects of multi-label data. The first is the exploitation of label correlations to achieve accurate classification of unseen data cases. Secondly, strategies for input variable ranking within multi-label data are considered to allow for more interpretable results. Effective exploitation of correlation amongst labels can be a vital attribute of an accurate multilabel classification method. However, label correlations are not necessarily shared globally by all data cases. Despite this, existing methods mostly focus on global exploitation of label correlations. Therefore, a new tree-based ensemble method for multi-label classification is proposed in this dissertation, Label-Dependent splitting (LDsplit). LDsplit aims to implicitly exploit local higher-order label correlations within multi-label data by dividing the data into subgroups. The algorithm fits an ensemble of trees based on differently ordered label subsets. For each tree, different labels are used at different levels of the tree, as determined by the label order applicable to that tree. The tree-levels are made up of nodes that are split using any binary classifier. Since a tree-level depends on its label as well as previous splits made when parent nodes were formed using other labels, higher-order label correlations are implicitly incorporated into the model in a simple manner. Depending on whether random or predetermined label orders are used to fit the ensemble, either Random LDsplit or Conditional LDsplit is fit. An extensive empirical study is performed on a range of multi-label benchmark datasets. The empirical evidence shows that despite the simple framework, both Random LDsplit and Conditional LDsplit offer very competitive classification performance in comparison with existing multi-label classification methods. For multi-label data, an input variable is globally important if it is deemed important for several or all labels. However, an input variable can also be deemed locally important for a specific label. Few proposals for input variable ranking within multi-label data consider both global and local importance of variables. Moreover, existing methods mostly neglect to exploit label dependencies within the data. Therefore, different ways are outlined how an LDsplit ensemble can produce global and local input variable rankings and effectively allow for better interpretation of the data. Results obtained from synthetically generated multi-label datasets demonstrate that both the novel global and local importance measures give favourable performance.