Masters Degrees (Statistics and Actuarial Science)
Permanent URI for this collection
Browse
Browsing Masters Degrees (Statistics and Actuarial Science) by browse.metadata.advisor "Hofmeyr, David"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- ItemImproving hyperplane based density clustering solutions with applications in image processing(Stellenbosch : Stellenbosch University, 2019-04) Kenyon, Jacob Bradley; Hofmeyr, David; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : Minimum Density Hyperplane (MDH) clustering is a recently proposed method that seeks the location of an optimal low-density separator by directly minimising the integral of the empirical density function on the separating surface. This approach learns underlying clusters within the data in an efficient and scalable way using projection pursuit. The main limitation of MDH is that it defines clusters using a linear hyperplane. In recent research, MDH was applied to data which was non-linearly embedded in a high-dimensional feature space using Kernel Principal Component Analysis. While this method has shown to be an effective approach that extends the linear plane to a non-linear form, it does not scale well. A procedure is needed that can improve the hyperplane solution in an efficient way. We pose a novel approach to improve upon MDH by reassigning observations in a neighbourhood around a hyperplane solution using a gradient ascent procedure, Mean Shift. While Mean Shift is shown to provide promising results, the computation required to reassign objects becomes prohibitive as the size of the dataset increases. To reduce computation, a single step gradient heuristic is proposed whereby observations are reassigned based on the initial gradient evaluated at each point in relation to the hyperplane. This study critically reviews the validity of these approaches through applications with simulated and real-world datasets, with a focus on applications in image segmentation. We show that these approaches have the potential to improve hyperplane solutions.
- ItemProjected naive bayes(Stellenbosch : Stellenbosch University, 2020-03) Melonas, Michail C.; Hofmeyr, David; Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.ENGLISH SUMMARY : Naïve Bayes is a well-known statistical model that is recognised by the Institute of Electrical and Electronics Engineers (IEEE) as being among the top ten data mining algorithms. It performs classification by making the strong assumption of class conditional mutual statistical independence. Although this assumption is unlikely to be an accurate representation of the true statistical dependencies, naïve Bayes nevertheless delivers accurate classification in many domains. This success can be related to that of linear regression providing reliable estimation in problems where exact linearity is not realistic. There is a rich body of literature on the topic of improving naïve Bayes. This dissertation is concerned with doing so via a projection matrix that provides an alternative representation for the data of interest. We introduce Projected Gaussian naïve Bayes and Projected Kernel naïve Bayes as naïve-Bayes-type classifiers that respectively relies on Gaussianity and kernel density estimation. The proposed method extends the flexibility of the standard naïve Bayes. The approach maintains the simplicity and efficiency of naïve Bayes while improving its accuracy. Our method is shown to be competitive with several popular classifiers on real-world data. In particular, our method’s classification accuracy is compared to that of linear- and quadratic discriminant analysis, the support vector machine and the random forest. There is a close connection between our proposal and the application of naïve Bayes to a class conditionally conducted independent component analysis. In addition to a classification accuracy improvement, the proposed method also provides a tool for visually representing data in low-dimensional space. This visualisation aspect of our method is discussed with respect to the connection to independent component analysis. Our method is shown to give a better visual representation than does linear discriminant analysis on a number of real-world data-sets.
- ItemValidation of independent components using a hypothesis testing approach(Stellenbosch : Stellenbosch University, 2020-12) De Koker, Corine; Hofmeyr, David; Bakker, Hans-PeterENGLISH ABSTRACT: The main focus of this thesis is the validation of Independent Component Analysis (ICA), a popular technique used in signal processing. In a typical application, the purpose of ICA is to extract non-Gaussian signals representing the source signals from observed signals that are mixtures of the source signals in the case where the source signals are unavailable or unknown. This thesis only considers the FastICA implementation of ICA in the case where the number of source signals are equal to the number of mixture signals, and where any additive noise can be neglected. The FastICA algorithm extracts non-Gaussian signals through the maxmisation of negentropy. The more non-Gaussian the source signals, the more closely the signals extracted using FastICA represent the source signals. Amongst other things, this thesis demonstrates a novel approach using hypothesis testing with negentropy as a test statistic to determine the degree of non-Gaussianity of the source signals. The results from the hypothesis test mentioned previously were compared to the results from a second hypothesis test which uses a measure suggested by Himberg et al. (2004) that measures the compactness of the clusters of estimates of ICA components. The clustering visualisation methods proposed by Himberg et al. (2004) were also executed in this thesis and provided visual support for the results from the hypothesis tests. Both hypothesis tests were performed on three different datasets. The first dataset contained mixtures of only non-Gaussian signals. The second dataset contained mixtures of three non-Gaussian and three Gaussian signals, while the third dataset contained mixtures of only Gaussian signals. Both hypothesis tests rejected the null hypothesis that each of the source signals contained in the dataset are Gaussian when applied to the first dataset, which is in line with our expectations. The results from both hypothesis tests indicated the presence of three Gaussian and three non-Gaussian source signals in the second dataset. Regarding the third dataset, both hypothesis tests rejected about 5% of the signals extracted by the FastICA algorithm, which was as expected since a significance level of 5% was used. Therefore, our results provide evidence that hypothesis testing could potentially be used as an alternative method to indicate the degree of non-Gaussianity of mixtures of source signals. Key words: ICA; Hypothesis testing; non-Gaussianity