Classification of social media event-discussions using interaction patterns : a social network analysis approach

Date
2020-12
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH SUMMARY : This thesis uses social network analysis to explore the classification of social media discussions, utilising network structure derived from interactions on Twitter, while requiring minimal domain knowledge. In academia and industry, researchers strive to understand the patterns of interaction between actors on social media platforms, and how their actions may relate to particular events, topics, network characteristics, personalities and characters, among other factors. From literature, it is found that researchers in a wide range of disciplines lack the tools to classify in a variety of event-discussions. Further exemplified with the scenario where topics of interest to researchers on social media can overlap and that users are often engaged in a multitude of topics simultaneously, an approach to classification that necessitates minimal prior domain knowledge on the contents of the datasets is required. This study is a proof of concept for the use of network metrics to characterise and classify a diverse set of events that were discussed on social media. To classify social media data, one can utilise unsupervised machine learning methods. From the literature it is found that a multitude of clustering methods with regards to social media has been explored, in multi-media, networks, textual and other contexts. However, only limited approaches to classifying social media data—specifically Twitter—in terms of their network structure have been explored. This study does not aim to replace those methods but add to an array of tools that can be used by researchers, both in academia and in industry, to maximise the value obtained from social media data. In order to obtain metrics whereby to perform classification, a novel approach to modelling interactions with the data source, Twitter, was developed and a set of network measures and data descriptors that characterise the data were explored. The network measures and data descriptors were subjected to dimensionality reduction to account for co-variance in the measurements and to evaluate the contribution of each network measure, in order to expand the literature on what they define in the context of this study. The resulting principal components were used to classify the discussions of diverse events and the quality and quantity of clusters were evaluated. Finally, a set of tests and criteria were defined with which the research question was addressed. The study found that the approach produced an optimal number of clusters with reasonable structure quality without requiring any domain knowledge to produce them. Although the method proposed in this study is effective in finding underlying patterns and similarities, it mainly serves to point researchers in the right direction, more detailed analysis is necessary for definite conclusions and labelled categorisation. The study recognises the prior work performed in classifying social media data and recommends that future work include a wide variety of user features, sentiment, topic, and network measures. Furthermore, the study can be expanded upon by testing alternative dimensionality reduction and clustering methods at each stage of the proposed approach. The study furthered the understanding of classifying social media data in terms of social network analysis and the various network measures and data descriptors that was discussed.
AFRIKAANSE OPSOMMING : In hierdie tesis word sosiale netwerkanalise gebruik om die klassifikasie van sosiale media-besprekings,met behulp van die netwerkstruktuur afgelei van interaksies op Twitter, sonder die nodigheid van domeinkennis te ondersoek. In die akademie en die industrie poog navorsers om die interaksiepatrone tussen akteurs op sosialemediaplatforms te verstaan, en hoe hul optrede onder andere verband hou met spesifieke gebeure, onderwerpe, netwerkkenmerke, persoonlikhede en karakters. Uit die literatuur blyk dit dat navorsers in ’n wye verskeidenheid van dissiplines nie die gereedskap het om in ’n verskeidenheid van gespreksonderwerpe op sosialemedia te klassifiseer nie. Dit is ’n algemene scenario dat onderwerpe wat navorsers op sosialemedia van belang is, kan oorvleuel en dat gebruikers dikwels terselfdertyd besig is met ’n menigte onderwerpe. Hierdie gevalle motiveer die benadering tot klassifikasie wat ’nminimale kennis van die domein oor die inhoud van die datastelle noodsaak. Hierdie studie is ’n konsepbewys vir die gebruik van netwerkmetings (‘network metrics’) om ’n uiteenlopende reeks gebeure wat op sosialemedia bespreek is, te karakteriseer en te klassifiseer. Om gegewens op sosiale media te klassifiseer, kan u gebruik maak van masjienleermetodes wat nie onder toesig is nie. Die literatuur toon dat ’n menigte groeperingsmetodes (‘clustering methods’) van sosialemedia ondersoek is, inmultimedia, netwerke, tekstuele en ander kontekste. Slegs ’n beperkte aantal benaderings tot die klassifisering van sosialemediadata, spesifiek Twitter, is ondersoek. Hierdie studie is nie daarop gemik om hierdie metodes te vervang nie, maar dra by tot ’n verskeidenheid instrumente wat navorsers, sowel in die akademie as in die industrie, kan gebruik om die waarde wat uit sosialemediadata verkry word, te maksimeer. Ommaatstawwe (‘metrics’) te bekom waardeur klassifikasie uitgevoer kan word, is ’n nuwe benadering tot die modellering van interaksies met die databron (Twitter) ontwikkel, en ’n stel netwerkmetings en databeskrywers (’data descriptors’) wat die data kenmerk, ondersoek. Die netwerkmaatstawwe en databeskrywers is aan dimensie-vermindering onderwerp om rekening te hou met die kovariansie van die metings en om die bydrae van elke netwerkmaatstaf te evalueer. Die gevolglike hoofkomponente is gebruik om die besprekings van uiteenlopende gebeure te klassifiseer en die kwaliteit en hoeveelheid trosse (‘clusters’) is beoordeel. Laastens is ’n stel toetse en kriteria gedefinieer waarmee die navorsingsvraag aangespreek is. Die studie het bevind dat die benadering ’n optimale aantal groepe met ’n redelike struktuurkwaliteit opgelewer het sonder dat enige domeinkennis nodig was om dit te produseer. Alhoewel die metode wat in hierdie studie voorgestel word, effektief is om onderliggende patrone en ooreenkomste te vind, dien dit veral omnavorsers in die regte rigting te wys, maar meer gedetailleerde ontleding is nodig vir definitiewe gevolgtrekkings en benoemde kategorisering. Die studie erken die vorige werk wat gedoen is met die klassifikasie van sosialemediadata en beveel aan dat toekomstige werk ’n wye verskeidenheid gebruikersfunksies, sentimente, onderwerp en netwerkmaatreëls insluit. Verder kan die studie uitgebrei word deur alternatiewe dimensieverminderings en groeperingsmetodes in elke stadium van die voorgestelde benadering te toets. Die studie het die begrip van die klassifikasie van sosialemediadata ten opsigte van sosiale netwerkanalise en die verskillende netwerkmetings en databeskrywings bevorder.
Description
Thesis (MA)--Stellenbosch University, 2020.
Keywords
Social sciences -- Network analysis, Social structure -- Research, Social media -- Research, Actors -- Attitudes, Twitter (Firm), UCTD
Citation