Open-set learning with augmented category by exploiting unlabelled data (Open-LACU)

dc.contributor.advisorDu Preez, Johan en_ZA
dc.contributor.authorEngelbrecht, Emileen_ZA
dc.contributor.otherStellenbosch University. Faculty of Engineering. Dept. of Electrical and Electronic Engineering.en_ZA
dc.date.accessioned2024-03-04T08:36:01Zen_ZA
dc.date.accessioned2024-04-26T21:04:29Zen_ZA
dc.date.available2024-03-04T08:36:01Zen_ZA
dc.date.available2024-04-26T21:04:29Zen_ZA
dc.date.issued2024-03en_ZA
dc.descriptionThesis (PhD)--Stellenbosch Univesrity, 2024.en_ZA
dc.description.abstractENGLISH ABSTRACT: Neural network classifiers provide scalable means to analyse categorical patterns within datasets. However, current machine learning policies fail to consider certain nuances developed in real-world applications. The vast number of patterns represented in certain datasets and the continual collection of new data means classifiers must be aware of the observed-novel category and the unobserved novel category. To address these challenges, this dissertation combines semi-supervised learning and novelty detection into a single learning framework called open-set learning with augmented category by exploiting unlabelled data or Open-LACU. Although Open-LACU requires further development, we show and argue that Open-LACU classifiers will have reduced annotation cost, improved practicality and enhanced safety. Semi-supervised learning trains models using partially labelled datasets to reduce annotation costs. Novelty detection ensures classifiers are able to separate all data samples outside the domain of interest for enhanced safety. When working with partially labelled datasets in a domain where novel patterns exist, several inconsistencies appear in existing literature. More specifically, there is no distinction between those novel patterns which are unrepresented during training but appear during testing, and those novel patterns that are represented in unlabelled training data. Considering the unique properties of these different novel category types, we argue that classifiers must generalise these separately. In Open-LACU, classifiers must generalise 1) those K > 1 number of source categories for which labels are provided, 2) an additional K + 1’th observed-novel category for those novel patterns in the unlabelled training data, and 3) an additional K + 2’nd unobserved-novel category that encapsulates all those novel patterns unobserved during training but seen during testing. To introduce Open-LACU, we pursue several objectives that integrate different learning frameworks. For each of these integrating steps, we experiment on small-scale vision datasets to simulate different categorical scenarios. Our results both confirm the feasibility of Open-LACU and reveal several insights into the challenges that future research must address.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING: Neurale netwerk klassifiseerders bied ’n skaalbare metode om patrone binne groot datastelle te analiseer. Huidige masjienleerstrategieë oorweeg egter nie spesifieke nuanses van groot datastelle nie. Die groot hoeveelheid patrone in sommige datastelle en die deurlopende insameling van nuwe datamonsters beteken klassifiseerders moet bewus wees van onbekende of nuwe kategorieë. Hierdie verhandeling kombineer semi-toesighoudende leer en onbekende kategorie opsporing in een raamwerk in wat ons oopstelselleer met ‘n aangevulde kategorie met ongeëtiketteerde data (Open-LACU) noem. OpenLACU klassifiseerders verminder etiketteringskoste, verbeter bruikbaarheid en verhoog veiligheid. Semi-toesighoudende leer gebruik gedeeltelik geëtiketteerde datastelle om modelle op te lei met die doel om etiketteringskoste te verminder. Die opsporing van nuwe kategorieë verseker dat klassifiseerders alle datamonsters buite die belangstellingsgebied skei vir verhoogde veiligheid. Deur onbekende kategorie opsporing in semi-toesighoudende klassifiseerders te aktiveer, word die koste en praktiese voordele gekombineer. Wanneer gedeeltelik geëtiketteerde datastelle gebruik word in ’n omgewing waar onbekende kategorieë bestaan, is daar tans nie konsekwente definisies in die bestaande literatuur nie. Meer spesifiek, die bestaande literatuur onderskei nie tussen die onbekende patrone wat onverteenwoordig is tydens opleiding maar wat wel tydens toetsing verskyn, en die onbekende patrone wat wel in ongeëtiketteerde opleidingsdata teenwoordig is nie. Gegewe die uniek eienskappe van hierdie verskillende onbekende kategorieë, argumenteer ons dat klassifiseerders die twee tipe onbekende kategorieë afsonderlik moet veralgemeen. In Open-LACU moet klassifiseerders die volgende kategorieë veralgemeen: 1) die K aantal bron kategorieë waarvoor etikette voorsien word, 2) ’n addisionele aangevulde K + 1-de teenwoordigeonbekende kategorie wat al daardie onbekende patrone omsluit wat in die ongeëtiketteerde opleidings data waargeneem word, en 3) ’n addisionele aangevulde K + 2-de onverteenwoordigde-onbekende kategorie wat al die onbekende patrone omvat wat eers tydens toetsing verskyn. Om Open-LACU bekend te stel ondersoek ons verskeie doelwitte wat verskillende masjienleer strategieë integreer. Vir elke doelwit eksperimenteer ons op klein-skaal visie datastelle om verskillende kategorie skemas te simuleer. Ons resultate bevestig die nuttigheid van Open-LACU en stel verskeie insigte waarop verdere navorsing moet fokus, bloot.af_ZA
dc.description.versionDoctorateen_ZA
dc.format.extentxx, 94 pages : illustrations.en_ZA
dc.identifier.urihttps://scholar.sun.ac.za/handle/10019.1/130533en_ZA
dc.language.isoen_ZAen_ZA
dc.language.isoen_ZAen_ZA
dc.publisherStellenbosch : Stellenbosch Universityen_ZA
dc.subjectOpen-set learning; augmented category; exploiting unlabelled data (Open-LACU)en_ZA
dc.subject.lcshMachine learningen_ZA
dc.subject.lcshNeural networks (Computer science)en_ZA
dc.subject.lcshData setsen_ZA
dc.subject.lcshAugmented reality in educationen_ZA
dc.subject.lcshUCTDen_ZA
dc.titleOpen-set learning with augmented category by exploiting unlabelled data (Open-LACU)en_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
engelbrecht_open_2024.pdf
Size:
3.7 MB
Format:
Adobe Portable Document Format
Description: