Browsing by Author "Greene, Gillian J."
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- ItemConcept-based exploration of rich semi-structured data collections(Stellenbosch : Stellenbosch University, 2017-03) Greene, Gillian J.; Fischer, Bernd; Stellenbosch University. Faculty of Science. Dept. of Mathematical Sciences (Computer Science)ENGLISH ABSTRACT : Search has become one of the fundamental operations in computer science, allowing users to extract data and ultimately information from datasets. However, when users have no previous knowledge of a dataset, or have not clearly defined their search task and are therefore unable to formulate a direct query, their task becomes one of exploratory search or browsing rather than focused search or retrieval. While search and retrieval tools are widely provided, support for browsing of large and especially rich semi-structured datasets, is lacking. Semi-structured data is particularly difficult to explore and query because treating it as complete free-text causes a loss of important additional information which is encoded in the structured portions of the data while considering only the structured fields results in the loss of important free-text information. We therefore develop a framework to support exploration of semi-structured data, which is otherwise difficult to gather insights from, without requiring the user to have prior knowledge of the dataset or have formulated a specific query. Our approach uses a novel combination of tag clouds and concept lattices to facilitate data exploration, analysis and visualization by allowing the user to continuously update (i.e., add and remove) a set of keywords that the searched documents must contain. The document set is not directly provided as the result of a specific query, but aggregated information and properties of relevant documents are provided as a result. We apply our framework to data contained in software repositories, in two different ways for different goals to highlight the flexibility of the approach and the different tasks that can be supported using the same underlying dataset. We also instantiate our framework to support the exploration of a large collection of academic publication data. We evaluate the instantiations of our framework by conducting user and case studies, which indicate that our approach is usable and allows users to gather valuable information from semi-structured data archives.