Concept-based exploration of rich semi-structured data collections

Greene, Gillian J.

Concept-based exploration of rich semi-structured data collections

dc.contributor.advisor	Fischer, Bernd	en_ZA
dc.contributor.author	Greene, Gillian J.	en_ZA
dc.contributor.other	Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Computer Science)	en_ZA
dc.date.accessioned	2017-01-26T07:35:57Z
dc.date.accessioned	2017-03-29T11:39:22Z
dc.date.available	2017-01-26T07:35:57Z
dc.date.available	2017-03-29T11:39:22Z
dc.date.issued	2017-03
dc.description	Thesis (PhD)--Stellenbosch University, 2017	en_ZA
dc.description.abstract	ENGLISH ABSTRACT : Search has become one of the fundamental operations in computer science, allowing users to extract data and ultimately information from datasets. However, when users have no previous knowledge of a dataset, or have not clearly defined their search task and are therefore unable to formulate a direct query, their task becomes one of exploratory search or browsing rather than focused search or retrieval. While search and retrieval tools are widely provided, support for browsing of large and especially rich semi-structured datasets, is lacking. Semi-structured data is particularly difficult to explore and query because treating it as complete free-text causes a loss of important additional information which is encoded in the structured portions of the data while considering only the structured fields results in the loss of important free-text information. We therefore develop a framework to support exploration of semi-structured data, which is otherwise difficult to gather insights from, without requiring the user to have prior knowledge of the dataset or have formulated a specific query. Our approach uses a novel combination of tag clouds and concept lattices to facilitate data exploration, analysis and visualization by allowing the user to continuously update (i.e., add and remove) a set of keywords that the searched documents must contain. The document set is not directly provided as the result of a specific query, but aggregated information and properties of relevant documents are provided as a result. We apply our framework to data contained in software repositories, in two different ways for different goals to highlight the flexibility of the approach and the different tasks that can be supported using the same underlying dataset. We also instantiate our framework to support the exploration of a large collection of academic publication data. We evaluate the instantiations of our framework by conducting user and case studies, which indicate that our approach is usable and allows users to gather valuable information from semi-structured data archives.	en_ZA
dc.description.abstract	AFRIKAANSE OPSOMMING : Soektogte is een van die fundamentele operasies in rekenaarwetenskap. Dit laat gebruikers toe om data, en uiteindelik inligting, vanuit datastelle te onttrek. Wanneer gebruikers egter geen vorige kennis van ’n datastel het nie, of hul soektog nie duidelik gedefinieer het nie, en dus nie in staat is om ’n direkte navraag te formuleer nie, word hul taak een van verkennende soek, of blaai, eerder as gefokusde soek of herwinning. Terwyl soeken herwinnings-instrumente algemeen beskikbaar is, ontbreek ondersteuning vir die verkenning van groot en veral ryk semi-gestruktureerde datastelle. Semi-gestruktureerde data is veral moeilik om te verken en na te vra omdat die hantering daarvan as slegs vrye teks ’n verlies van belangrike aanvullende inligting veroorsaak wat ingebou is in die gestruktureerde gedeeltes van die data, terwyl die inagneming van slegs die gestruktureerde velde weer lei tot ’n verlies van belangrike vrye teks inligting. Ons ontwikkel dus ’n raamwerk om die verkenning van semi-gestruktureerde data te ondersteun, wat andersins moeilik is om insigte uit te verkry, sonder om van die gebruiker te vereis dat hulle voorafgaande kennis van die datastel het, of ’n spesifieke navraag reeds geformuleer het. Ons benadering maak gebruik van ’n nuwe kombinasie van etiket-wolke en konsep-roosters om data-verkenning, data-analise, en data-visualisering te fasiliteer deur die gebruiker toe te laat om voortdurend ’n stel sleutelwoorde op te dateer (m.a.w. by te voeg of te verwyder) wat bevat moet word in die dokumente waarvoor gesoek word. Die dokument-stel word nie direk verskaf as die resultaat van ’n spesifieke navraag nie, maar saamgestelde inligting en eienskappe van relevante dokumente word eerder verskaf. Ons pas ons raamwerk toe op data wat in programmatuurargiewe gestoor is op twee verskillende maniere vir verskillende doelwitte om die buigsaamheid van die benadering en die verskillende take wat ondersteun kan word met behulp van dieselfde onderliggende datastel uit te lig. Ons instansieer ook ons raamwerk om die verkenning van ’n groot versameling van akademiese publikasiedata te ondersteun. Ons evalueer die instansies van ons raamwerk deur die gebruik van gebruiker en gevallestudies, wat daarop dui dat ons benadering bruikbaar is, en gebruikers in staat stel om waardevolle inligting vanuit semi-gestruktureerde data-argiewe in te samel.	af_ZA
dc.format.extent	xvii, 169 pages : illustrations (chiefly colour)	en_ZA
dc.identifier.uri	http://hdl.handle.net/10019.1/100859
dc.language.iso	en_ZA	en_ZA
dc.publisher	Stellenbosch : Stellenbosch University	en_ZA
dc.rights.holder	Stellenbosch University	en_ZA
dc.subject	Data analysis	en_ZA
dc.subject	Data visualization	en_ZA
dc.subject	Browsing (Databases)	en_ZA
dc.subject	Tag clouds	en_ZA
dc.subject	Concept lattices	en_ZA
dc.subject	UCTD	en_ZA
dc.title	Concept-based exploration of rich semi-structured data collections	en_ZA
dc.type	Thesis	en_ZA

Files

Original bundle

Now showing 1 - 1 of 1

Name:: greene_concept_2017.pdf
Size:: 10.2 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Plain Text
Description:

Download

Collections

Doctoral Degrees (Computer Science)