Exploratory and inferential multivariate statistical techniques for multidimensional count and binary data with applications in R

dc.contributor.advisorLamont, M. M. C.en_ZA
dc.contributor.authorNtushelo, Nombasa Sherolineen_ZA
dc.contributor.otherStellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science.en_ZA
dc.date.accessioned2011-12-01T12:34:36Zen_ZA
dc.date.accessioned2011-12-05T13:14:10Z
dc.date.available2011-12-01T12:34:36Zen_ZA
dc.date.available2011-12-05T13:14:10Z
dc.date.issued2011-12en_ZA
dc.descriptionThesis (MComm)--Stellenbosch University, 2011.en_ZA
dc.description.abstractENGLISH ABSTRACT: The analysis of multidimensional (multivariate) data sets is a very important area of research in applied statistics. Over the decades many techniques have been developed to deal with such datasets. The multivariate techniques that have been developed include inferential analysis, regression analysis, discriminant analysis, cluster analysis and many more exploratory methods. Most of these methods deal with cases where the data contain numerical variables. However, there are powerful methods in the literature that also deal with multidimensional binary and count data. The primary purpose of this thesis is to discuss the exploratory and inferential techniques that can be used for binary and count data. In Chapter 2 of this thesis we give the detail of correspondence analysis and canonical correspondence analysis. These methods are used to analyze the data in contingency tables. Chapter 3 is devoted to cluster analysis. In this chapter we explain four well-known clustering methods and we also discuss the distance (dissimilarity) measures available in the literature for binary and count data. Chapter 4 contains an explanation of metric and non-metric multidimensional scaling. These methods can be used to represent binary or count data in a lower dimensional Euclidean space. In Chapter 5 we give a method for inferential analysis called the analysis of distance. This method use a similar reasoning as the analysis of variance, but the inference is based on a pseudo F-statistic with the p-value obtained using permutations of the data. Chapter 6 contains real-world applications of these above methods on two special data sets called the Biolog data and Barents Fish data. The secondary purpose of the thesis is to demonstrate how the above techniques can be performed in the software package R. Several R packages and functions are discussed throughout this thesis. The usage of these functions is also demonstrated with appropriate examples. Attention is also given to the interpretation of the output and graphics. The thesis ends with some general conclusions and ideas for further research.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING: Die analise van meerdimensionele (meerveranderlike) datastelle is ’n belangrike area van navorsing in toegepaste statistiek. Oor die afgelope dekades is daar verskeie tegnieke ontwikkel om sulke data te ontleed. Die meerveranderlike tegnieke wat ontwikkel is sluit in inferensie analise, regressie analise, diskriminant analise, tros analise en vele meer verkennende data analise tegnieke. Die meerderheid van hierdie metodes hanteer gevalle waar die data numeriese veranderlikes bevat. Daar bestaan ook kragtige metodes in die literatuur vir die analise van meerdimensionele binêre en telling data. Die primêre doel van hierdie tesis is om tegnieke vir verkennende en inferensiële analise van binêre en telling data te bespreek. In Hoofstuk 2 van hierdie tesis bespreek ons ooreenkoms analise en kanoniese ooreenkoms analise. Hierdie metodes word gebruik om data in gebeurlikheidstabelle te analiseer. Hoofstuk 3 bevat tegnieke vir tros analise. In hierdie hoofstuk verduidelik ons vier gewilde tros analise metodes. Ons bespreek ook die afstand maatstawwe wat beskikbaar is in die literatuur vir binêre en telling data. Hoofstuk 4 bevat ’n verduideliking van metriese en nie-metriese meerdimensionele skalering. Hierdie metodes kan gebruik word om binêre of telling data in ‘n lae dimensionele Euclidiese ruimte voor te stel. In Hoofstuk 5 beskryf ons ’n inferensie metode wat bekend staan as die analise van afstande. Hierdie metode gebruik ’n soortgelyke redenasie as die analise van variansie. Die inferensie hier is gebaseer op ’n pseudo F-toetsstatistiek en die p-waardes word verkry deur gebruik te maak van permutasies van die data. Hoofstuk 6 bevat toepassings van bogenoemde tegnieke op werklike datastelle wat bekend staan as die Biolog data en die Barents Fish data. Die sekondêre doel van die tesis is om te demonstreer hoe hierdie tegnieke uitgevoer word in the R sagteware. Verskeie R pakette en funksies word deurgaans bespreek in die tesis. Die gebruik van die funksies word gedemonstreer met toepaslike voorbeelde. Aandag word ook gegee aan die interpretasie van die afvoer en die grafieke. Die tesis sluit af met algemene gevolgtrekkings en voorstelle vir verdere navorsing.af
dc.format.extentix, 122 p. : ill. (some col.)en_ZA
dc.identifier.urihttp://hdl.handle.net/10019.1/17949en_ZA
dc.language.isoen_ZAen_ZA
dc.publisherStellenbosch : Stellenbosch Universityen_ZA
dc.rights.holderStellenbosch Universityen_ZA
dc.subjectStatistical techniquesen_ZA
dc.subjectBinary and count dataen_ZA
dc.subjectExploratory analysisen_ZA
dc.subjectInferential analysisen_ZA
dc.subjectTheses -- Statistics and actuarial scienceen_ZA
dc.subjectDissertations -- Statistics and actuarial scienceen_ZA
dc.titleExploratory and inferential multivariate statistical techniques for multidimensional count and binary data with applications in Ren_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ntushelo_exploratory_2011.pdf
Size:
1.61 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.98 KB
Format:
Plain Text
Description: