Browsing by Author "Van der Merwe, Brink"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- ItemComment classification for an online news domain(2014-12) Brand, Dirk; Van der Merwe, BrinkENGLISH ABSTRACT: In online discussion forums, comment moderation systems are often faced with the problem of establishing the value of an unseen online comment. By knowing the value of comments, the system is empowered to establish rank and to enhance the user experience. It is also useful for identifying malicious users that consistently show behaviour that is detrimental to the community. In this paper, we investigate and evaluate various machine learning techniques for automatic comment scoring. We derive a set of features that aim to capture various comment quality metrics (like relevance, informativeness and spelling) and compare it to content-based features. We investigate the correlation of these features against the community popularity of the comments. Through investigation of supervised learning techniques, we show that content-based features better serves as a predictor of popularity, while quality-based features are better suited for predicting user engagement. We also evaluate how well our classifier based rankings correlate to community preference.
- ItemComparing leaf and root insertion(South African Institute of Computer Scientists and Information Technologists, 2009) Geldenhuys, Jaco; Van der Merwe, BrinkWe consider two ways of inserting a key into a binary search tree: leaf insertion which is the standard method, and root insertion which involves additional rotations. Although the respective cost of constructing leaf and root insertion binary search trees trees, in terms of comparisons, are the same in the average case, we show that in the worst case the construction of a root insertion binary search tree needs approximately 50% of the number of comparisons required by leaf insertion.
- ItemN-gram representations for comment filtering(ACM, Inc., 2015-09) Brand, Dirk; Kroon, Steve; Van der Merwe, Brink; Cleophas, LoekAccurate classifiers for short texts are valuable assets in many applications. Especially in online communities, where users contribute to content in the form of posts and comments, an effective way of automatically categorising posts proves highly valuable. This paper investigates the use of N- grams as features for short text classification, and compares it to manual feature design techniques that have been popu- lar in this domain. We find that the N-gram representations greatly outperform manual feature extraction techniques.