Department of Computer Science
Permanent URI for this community
Browse
Browsing Department of Computer Science by browse.metadata.advisor "Cleophas, Loek"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemAutomatic Prediction of Comment Quality(Stellenbosch : Stellenbosch University, 2016-03) Brand, Dirk Johannes; Van der Merwe, Brink; Kroon, R. S. (Steve); Cleophas, Loek; Stellenbosch University. Faculty of Science. Department of Mathematical Sciences (Computer Science)ENGLISH ABSTRACT : The problem of identifying and assessing the quality of short texts (e.g. comments, reviews or web searches) has been intensively studied. There are great bene ts to being able to analyse short texts. As an example, advertisers might be interested in the sentiment of product reviews on e-commerce sites to more e ciently pair marketing material to content. Analysing short texts is a di cult problem, because traditional machine learning models generally perform better on data sets with larger samples, which often translates to more features. More data allow for better estimation of parameters for these models. Short texts generally do not have much content, but still carry high variability in that they may still consist of a large corpus of words. This thesis investigates various methods for feature extraction for short texts in the context of online user comments. These methods include the leading manual feature extraction techniques for short texts, N-gram models and techniques based on word embeddings. The e ect of using di erent kernels for a support vector classi er is also investigated. The investigation is centred around two data sets, one provided by News24 and the other extracted from Slashdot.org. It was found that N-gram models performed relatively well, mostly outperforming manual feature extraction techniques.