Doctoral Degrees (Statistics and Actuarial Science)
Permanent URI for this collection
Browse
Browsing Doctoral Degrees (Statistics and Actuarial Science) by browse.metadata.advisor "Steel, S. J."
Now showing 1 - 7 of 7
Results Per Page
Sort Options
- ItemEdgeworth-corrected small-sample confidence intervals for ratio parameters in linear regression(Stellenbosch : Stellenbosch University, 2002-03) Binyavanga, Kamanzi-wa; Maritz, J. S.; Steel, S. J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistical and Actuarial Science.ENGLISH ABSTRACT: In this thesis we construct a central confidence interval for a smooth scalar non-linear function of parameter vector f3 in a single general linear regression model Y = X f3 + c. We do this by first developing an Edgeworth expansion for the distribution function of a standardised point estimator. The confidence interval is then constructed in the manner discussed. Simulation studies reported at the end of the thesis show the interval to perform well in many small-sample situations. Central to the development of the Edgeworth expansion is our use of the index notation which, in statistics, has been popularised by McCullagh (1984, 1987). The contributions made in this thesis are of two kinds. We revisit the complex McCullagh Index Notation, modify and extend it in certain respects as well as repackage it in the manner that is more accessible to other researchers. On the new contributions, in addition to the introduction of a new small-sample confidence interval, we extend the theory of stochastic polynomials (SP) in three respects. A method, which we believe to be the simplest and most transparent to date, is proposed for deriving cumulants for these. Secondly, the theory of the cumulants of the SP is developed both in the context of Edgeworth expansion as well as in the regression setting. Thirdly, our new method enables us to propose a natural alternative to the method of Hall (1992a, 1992b) regarding skewness-reduction in Edgeworth expansions.
- ItemFeature selection for multi-label classification(Stellenbosch : Stellenbosch University, 2020-12) Contardo-Berning, Ivona E.; Steel, S. J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Economics.ENGLISH ABSTRACT : The field of multi-label learning is a popular new research focus. In the multi-label setting, a data instance can be associated simultaneously with a set of labels instead of only a single label. This dissertation reviews the subject of multi-label classification, emphasising some of the notable developments in the field. The nature of multi-label datasets typically means that these datasets are complex and dimensionality reduction might aid in the analysis of these datasets. The notion of feature selection is therefore introduced and discussed briefly in this dissertation. A new procedure for multi-label feature selection is proposed. This new procedure, relevance pattern feature selection (RPFS), utilises the methodology of the graphical technique of Multiple Correspondence Analysis (MCA) biplots to perform feature selection. An empirical evaluation of the proposed technique is performed using a benchmark multi-label dataset and synthetic multi-label datasets. For the benchmark dataset it is shown that the proposed procedure achieves results similar to the full model, while using significantly fewer features. The empirical evaluation of the procedure on the synthetic datasets shows that the results achieved by the reduced sets of features are better than those achieved with a full set of features for the majority of the methods. The proposed procedure is then compared to two established multi-label feature selection techniques using the synthetic datasets. The results again show that the proposed procedure is effective.
- ItemA framework for estimating risk(Stellenbosch : Stellenbosch University, 2008-03) Kroon, Rodney Stephen; Steel, S. J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.We consider the problem of model assessment by risk estimation. Various approaches to risk estimation are considered in a uni ed framework. This a discussion of various complexity dimensions and approaches to obtaining bounds on covering numbers is also presented. The second type of training sample interval estimator discussed in the thesis is Rademacher bounds. These bounds use advanced concentration inequalities, so a chapter discussing such inequalities is provided. Our discussion of Rademacher bounds leads to the presentation of an alternative, slightly stronger, form of the core result used for deriving local Rademacher bounds, by avoiding a few unnecessary relaxations. Next, we turn to a discussion of PAC-Bayesian bounds. Using an approach developed by Olivier Catoni, we develop new PAC-Bayesian bounds based on results underlying Hoe ding's inequality. By utilizing Catoni's concept of \exchangeable priors", these results allowed the extension of a covering number-based result to averaging classi ers, as well as its corresponding algorithm- and data-dependent result. The last contribution of the thesis is the development of a more exible shell decomposition bound: by using Hoe ding's tail inequality rather than Hoe ding's relative entropy inequality, we extended the bound to general loss functions, allowed the use of an arbitrary number of bins, and introduced between-bin and within-bin \priors". Finally, to illustrate the calculation of these bounds, we applied some of them to the UCI spam classi cation problem, using decision trees and boosted stumps. framework is an extension of a decision-theoretic framework proposed by David Haussler. Point and interval estimation based on test samples and training samples is discussed, with interval estimators being classi ed based on the measure of deviation they attempt to bound. The main contribution of this thesis is in the realm of training sample interval estimators, particularly covering number-based and PAC-Bayesian interval estimators. The thesis discusses a number of approaches to obtaining such estimators. The rst type of training sample interval estimator to receive attention is estimators based on classical covering number arguments. A number of these estimators were generalized in various directions. Typical generalizations included: extension of results from misclassi cation loss to other loss functions; extending results to allow arbitrary ghost sample size; extending results to allow arbitrary scale in the relevant covering numbers; and extending results to allow arbitrary choice of in the use of symmetrization lemmas. These extensions were applied to covering number-based estimators for various measures of deviation, as well as for the special cases of misclassi - cation loss estimators, realizable case estimators, and margin bounds. Extended results were also provided for strati cation by (algorithm- and datadependent) complexity of the decision class. In order to facilitate application of these covering number-based bounds,
- ItemInfluential data cases when the C-p criterion is used for variable selection in multiple linear regression(Stellenbosch : Stellenbosch University, 2003) Uys, Daniel Wilhelm; Steel, S. J.; Van Vuuren, J. O.; Stellenbosch University. Faculty of Economic and Management Sciences . Dept. of Statistical and Actuarial Science.ENGLISH ABSTRACT: In this dissertation we study the influence of data cases when the Cp criterion of Mallows (1973) is used for variable selection in multiple linear regression. The influence is investigated in terms of the predictive power and the predictor variables included in the resulting model when variable selection is applied. In particular, we focus on the importance of identifying and dealing with these so called selection influential data cases before model selection and fitting are performed. For this purpose we develop two new selection influence measures, both based on the Cp criterion. The first measure is specifically developed to identify individual selection influential data cases, whereas the second identifies subsets of selection influential data cases. The success with which these influence measures identify selection influential data cases, is evaluated in example data sets and in simulation. All results are derived in the coordinate free context, with special application in multiple linear regression.
- ItemMulti-label feature selection with application to musical instrument recognition(Stellenbosch : Stellenbosch University, 2013-12) Sandrock, Trudie; Steel, S. J.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH ABSTRACT: An area of data mining and statistics that is currently receiving considerable attention is the field of multi-label learning. Problems in this field are concerned with scenarios where each data case can be associated with a set of labels instead of only one. In this thesis, we review the field of multi-label learning and discuss the lack of suitable benchmark data available for evaluating multi-label algorithms. We propose a technique for simulating multi-label data, which allows good control over different data characteristics and which could be useful for conducting comparative studies in the multi-label field. We also discuss the explosion in data in recent years, and highlight the need for some form of dimension reduction in order to alleviate some of the challenges presented by working with large datasets. Feature (or variable) selection is one way of achieving dimension reduction, and after a brief discussion of different feature selection techniques, we propose a new technique for feature selection in a multi-label context, based on the concept of independent probes. This technique is empirically evaluated by using simulated multi-label data and it is shown to achieve classification accuracy with a reduced set of features similar to that achieved with a full set of features. The proposed technique for feature selection is then also applied to the field of music information retrieval (MIR), specifically the problem of musical instrument recognition. An overview of the field of MIR is given, with particular emphasis on the instrument recognition problem. The particular goal of (polyphonic) musical instrument recognition is to automatically identify the instruments playing simultaneously in an audio clip, which is not a simple task. We specifically consider the case of duets – in other words, where two instruments are playing simultaneously – and approach the problem as a multi-label classification one. In our empirical study, we illustrate the complexity of musical instrument data and again show that our proposed feature selection technique is effective in identifying relevant features and thereby reducing the complexity of the dataset without negatively impacting on performance.
- ItemRegularised Gaussian belief propagation(Stellenbosch : Stellenbosch University, 2018-12) Kamper, Francois; Steel, S. J.; Du Preez, J. A.; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : Belief propagation (BP) has been applied as an approximation tool in a variety of inference problems. BP does not necessarily converge in loopy graphs and, even if it does, is not guaranteed to provide exact inference. Even so, BP is useful in many applications due to its computational tractability. On a high level this dissertation is concerned with addressing the issues of BP when applied to loopy graphs. To address these issues we formulate the principle of node regularisation on Markov graphs (MGs) within the context of BP. The main contribution of this dissertation is to provide mathematical and empirical evidence that the principle of node regularisation can achieve convergence and good inference quality when applied to a MG constructed from a Gaussian distribution in canonical parameterisation. There is a rich literature surrounding BP on Gaussian MGs (labelled Gaussian belief propagation or GaBP), and this is known to suffer from the same problems as general BP on graphs. GaBP is known to provide the correct marginal means if it converges (this is not guaranteed), but it does not provide the exact marginal precisions. We show that our regularised BP will, with sufficient tuning, always converge while maintaining the exact marginal means. This is true for a graph where nodes are allowed to have any number of variables. The selection of the degree of regularisation is addressed through the use of heuristics. Our variant of GaBP is tested empirically in a variety of settings. We show that our method outperforms other variants of GaBP available in the literature, both in terms of convergence speed and quality of inference. These improvements suggest that the principle of node regularisation in BP should be investigated in other inference problems. A by-product of GaBP is that it can be used to solve linear systems of equations; the same is true for our variant and in this context we make an empirical comparison with the conjugate gradient (CG) method.
- ItemVariable selection for kernel methods with application to binary classification(Stellenbosch : University of Stellenbosch, 2008-03) Oosthuizen, Surette; Steel, S. J.; University of Stellenbosch. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.The problem of variable selection in binary kernel classification is addressed in this thesis. Kernel methods are fairly recent additions to the statistical toolbox, having originated approximately two decades ago in machine learning and artificial intelligence. These methods are growing in popularity and are already frequently applied in regression and classification problems. Variable selection is an important step in many statistical applications. Thereby a better understanding of the problem being investigated is achieved, and subsequent analyses of the data frequently yield more accurate results if irrelevant variables have been eliminated. It is therefore obviously important to investigate aspects of variable selection for kernel methods. Chapter 2 of the thesis is an introduction to the main part presented in Chapters 3 to 6. In Chapter 2 some general background material on kernel methods is firstly provided, along with an introduction to variable selection. Empirical evidence is presented substantiating the claim that variable selection is a worthwhile enterprise in kernel classification problems. Several aspects which complicate variable selection in kernel methods are discussed. An important property of kernel methods is that the original data are effectively transformed before a classification algorithm is applied to it. The space in which the original data reside is called input space, while the transformed data occupy part of a feature space. In Chapter 3 we investigate whether variable selection should be performed in input space or rather in feature space. A new approach to selection, so-called feature-toinput space selection, is also proposed. This approach has the attractive property of combining information generated in feature space with easy interpretation in input space. An empirical study reveals that effective variable selection requires utilisation of at least some information from feature space. Having confirmed in Chapter 3 that variable selection should preferably be done in feature space, the focus in Chapter 4 is on two classes of selecion criteria operating in feature space: criteria which are independent of the specific kernel classification algorithm and criteria which depend on this algorithm. In this regard we concentrate on two kernel classifiers, viz. support vector machines and kernel Fisher discriminant analysis, both of which are described in some detail in Chapter 4. The chapter closes with a simulation study showing that two of the algorithm-independent criteria are very competitive with the more sophisticated algorithm-dependent ones. In Chapter 5 we incorporate a specific strategy for searching through the space of variable subsets into our investigation. Evidence in the literature strongly suggests that backward elimination is preferable to forward selection in this regard, and we therefore focus on recursive feature elimination. Zero- and first-order forms of the new selection criteria proposed earlier in the thesis are presented for use in recursive feature elimination and their properties are investigated in a numerical study. It is found that some of the simpler zeroorder criteria perform better than the more complicated first-order ones. Up to the end of Chapter 5 it is assumed that the number of variables to select is known. We do away with this restriction in Chapter 6 and propose a simple criterion which uses the data to identify this number when a support vector machine is used. The proposed criterion is investigated in a simulation study and compared to cross-validation, which can also be used for this purpose. We find that the proposed criterion performs well. The thesis concludes in Chapter 7 with a summary and several discussions for further research.