Using test data to evaluate rankings of entities in large scholarly citation networks

Dunaiski, Marcel

Using test data to evaluate rankings of entities in large scholarly citation networks

Files

dunaiski_test_2019.pdf(12.34 MB)

Date

2019-04

Authors

Dunaiski, Marcel

Publisher

Stellenbosch : Stellenbosch University

Abstract

ENGLISH ABSTRACT : A core aspect in the field of bibliometrics is the formulation, refinement, and verification of metrics that rate entities in the science domain based on the information contained within the scientific literature corpus. Since these metrics play an increasingly important role in research evaluation, continued scrutiny of current methods is crucial. For example, metrics that are intended to rate the quality of papers should be assessed by correlating them with peer assessments. I approach the problem of assessing metrics with test data based on other objective ratings provided by domain experts which we use as proxies for peer-based quality assessments. This dissertation is an attempt to fill some of the gaps in the literature concerning the evaluation of metrics through test data. Specifically, I investigate two main research questions: (1) what are the best practices when evaluating rankings of academic entities based on test data, and (2), what can we learn about ranking algorithms and impact metrics when they are evaluated using test data? Besides the use of test data to evaluate metrics, the second continual theme of this dissertation is the application and evaluation of indirect ranking algorithms as an alternative to metrics based on direct citations. Through five published journal articles, I present the results of this investigation.
AFRIKAANSE OPSOMMING : Kern werksaamhede in die veld van bibliometrika is die formulasie, verfyning en verifikasie van maatstawwe wat rangordes vir wetenskaplike entiteite bepaal op grond van die inligting bevat in die literatuur korpus van die wetenskap. Aangesien hierdie maatstawwe ’n al belangriker rol speel in die evaluasie van navorsing, is dit krities dat hulle voortdurend en noukeurig ondersoek word. Byvoorbeeld, maatstawwe wat veronderstel is om die gehalte van artikels te beraam, moet gekorreleer word met eweknie-assesserings. Ek takel die evaluasie van maatstawwe met behulp van toetsdata gebaseer op ’n ander tipe objektiewe rangorde (verskaf deur kenners in ’n veld), en gebruik dít om in te staan vir eweknie-assesserings van gehalte. Hierdie proefskrif poog om van die gapings te vul as dit kom by die evaluasie van maatstawwe met behulp van toetsdata. Meer spesifiek ondersoek ek twee vrae: (1) wat is die beste praktyke vir die evaluasie van rangordes vir akademiese entiteite gebaseer op toetsdata, en (2) wat kan ons leer oor die rangorde algoritmes en oor impak-maatstawwe wanneer ons hulle met die toetsdata evalueer? Buiten die gebruik van toetsdata, is daar ’n tweede deurlopende tema in hierdie proefskrif: die toepassing en evaluering van indirekte rangorde algoritmes as ’n alternatief tot maatstawwe wat direkte sitasies gebruik. Die resultate van my ondersoek word beskryf in vyf reeds-gepubliseerde joernaal artikels.

Description

Thesis (PhD)--Stellenbosch University, 2019.

Keywords

Informetrics, Information retrieval, Citation analysis, Information science -- Statistical methods, Ranking and selection (Statistics), UCTD, Bibliographical citations

URI

http://hdl.handle.net/10019.1/105866

Collections

Doctoral Degrees (Computer Science)

Full item page