Scaling the ConceptCloud browser to very large semi-structured data sets: architecture and data completion

dc.contributor.advisorFischer, Bernden_ZA
dc.contributor.advisorBritz, K.en_ZA
dc.contributor.authorBerndt, Joshuaen_ZA
dc.contributor.otherStellenbosch University. Faculty of Science. Dept. of Mathematical Sciences. Division Computer Science.en_ZA
dc.date.accessioned2020-11-27T09:08:58Z
dc.date.accessioned2021-01-31T19:44:22Z
dc.date.available2020-11-27T09:08:58Z
dc.date.available2021-01-31T19:44:22Z
dc.date.issued2020-12
dc.descriptionThesis (MSc)--Stellenbosch University, 2020.en_ZA
dc.description.abstractENGLISH ABSTRACT: Semi-structured data sets such as product reviews or event log data are simultaneously becoming more widely used and ever larger. This thesis describes ConceptCloud, a exible, interactive browser for semi-structured datasets, with a focus on the improvements made to accommodate larger datasets, more intuitive data representation and the enrichment of the underlying data by way of data-imputation. ConceptCloud makes use of an intuitive tag cloud visualisation viewer in combination with an underlying concept lattice to provide a formal structure for navigation through datasets without prior knowledge of the structure of the data or compromising scalability. This scalability is achieved by the implementation of architectural changes to increase the system's resource efficiency. These changes are demonstrated by way of a case study on a dataset of wine reviews. Semi-structured data sets such as product reviews or event log data often contain a geolocation aspect: for example, the location of the winery for wine reviews, or the accident location for traffic data. In this thesis, I describe ConceptCloud extensions which allow for the rendering of specialised geolocation data while providing alternate navigation paths through the dataset. I show that using biclusters can make the navigation bidirectional, and demonstrate this approach on a crime data set making use of a geolocation specialised map viewer. Semi-structured data often contains implicit information which will be useful in driving data exploration if made explicit. I take advantage of domain ontologies to both allow implicit data in each input data set to be made explicit and verify and correct inconsistencies allowing for better data exploration. I demonstrate this approach with a continuation of the wine case study.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING: Semi-gestruktureerde datastelle soos produkbeoordelings of gebeurtenislogdata word terselfdertyd al hoe meer gebruik en word al hoe groter. Hierdie tesis beskryf ConceptCloud, 'n buigsame, interaktiewe blaaier vir semigestruktureerde datastelle, met die fokus op die verbeterings wat aangebring is om groter datastelle te akkommodeer, meer intuitiewe datavoorstelling te bereik en die verryking van die onderliggende data deur gebruik van databerekening. ConceptCloud maak gebruik van 'n intuitiewe tag-wolkvisualiseringkyker in kombinasie met 'n onderliggende konseprooster om 'n formele struktuurte bou vir navigasie deur datastelle sonder voorafkennis van die struktuur van die data of om die skaalbaarheid in die gedrang te bring. Hierdie skaalbaarheid is bereik deur die implementering van argitektoniese veranderings om die stelsel se hulpbrondoeltreffendheid te verhoog. Hierdie verbeterings word by wyse van 'n gevallestudie op 'n datastel van wynoorsigte gedemonstreer. Semi-gestruktureerde datastelle soos produkbeoordelings of gebeurtenislogdata bevat 'n ligginggewing-aspek: byvoorbeeld die ligging van die wynmakery vir wyn resensies, of die ongelukligging vir verkeersdata. In hierdie tesis beskryf ons 'n ConceptCloud-uitbreiding wat voorsiening maak vir gespesialiseerde ligginggewing-data, aangesien ons almal navigasiepaaie deur die datastel wissel. Ons wys dat die gebruik van biclusters die navigasie in twee rigtings kan laat plaasvind en demonstreer hierdie benadering op 'n misdaaddatastel wat gebruik maak van 'n gespesialiseerde geolokasie-kaart kyker. Semi-gestruktureerde data bevat dikwels implisiete inligting wat nuttig sal wees om data-eksplorasie te dryf as dit kan eksplisiet gemaak word. Ons benut die domeinontologiee om beide implisiete data in elke insetdatastel eksplisiet te laat maak as ook teenstrydighede te verifieer en te korrigeer, wat beter data-eksplorasie moontlik maak. Ons demonstreer hierdie benadering deur 'n gevallestudie met wyn data.af_ZA
dc.description.versionMastersen_ZA
dc.format.extentxi, 108 pages : illustrationsen_ZA
dc.identifier.urihttp://hdl.handle.net/10019.1/109315
dc.language.isoen_ZAen_ZA
dc.publisherStellenbosch : Stellenbosch Universityen_ZA
dc.rights.holderStellenbosch Universityen_ZA
dc.subjectConceptCloud Browseren_ZA
dc.subjectBig data -- Scalabilityen_ZA
dc.subjectArchitecture and Data Completionen_ZA
dc.subjectUCTD
dc.titleScaling the ConceptCloud browser to very large semi-structured data sets: architecture and data completionen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
berndt_scaling_2020.pdf
Size:
9.42 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: