Large language models and software testing

dc.contributor.advisorInggs, Cornelia P. en_ZA
dc.contributor.advisorVisser, Willemen_ZA
dc.contributor.authorDewey, Marcoen_ZA
dc.contributor.otherStellenbosch University. Faculty of Science. Dept. of Computer Science.en_ZA
dc.date.accessioned2024-03-04T17:21:04Z
dc.date.accessioned2024-04-26T07:44:57Z
dc.date.available2024-03-04T17:21:04Z
dc.date.available2024-04-26T07:44:57Z
dc.date.issued2024-03
dc.descriptionThesis (MSc)--Stellenbosch University, 2024.en_ZA
dc.description.abstractENGLISH ABSTRACT: This thesis examines the viability of leveraging transformer-based large language models, exemplified by Codex, f or the a utomated g eneration of test suites in production software. By leveraging the abilities large language models exhibit for understanding and generating natural and coding languages, these models can analyze code and comments to generate contextually relevant test cases. Using these models in the domain of automatic software testing presents a potential solution to the oracle problem. The research involves a comparative analysis between Codex and a promi- nent automatic testing tool, EvoSuite, using the Commons-Lang library from the Defects4J benchmark. This comparison draws insights regarding Codex’s efficacy in ge nerating co verage te sts an d id entifying fa ulty be havior within production code. The findings o f t his thesis a rgue t hat C odex w hile demon- strating promise, exhibits limitations as an automatic testing tool in achieving high test coverage and uncovering software bugs. Moreover, the study high- lights potential challenges associated with utilizing open-source repositories for training and testing code generation by large language models, including the risk of incorporating inconsistent coding conventions and suboptimal software testing practices into these models.en_ZA
dc.description.abstractAFRIKAANSE OPSOMMING: Hierdie tesis ondersoek hoe prakties dit is om transformeerder-gebaseerde groot taalmodelle, soos byvoorbeeld Codex, vir die outomatiese generering van toetsgevalle vir produksiesagteware te gebruik. Deur gebruik te maak van die vermoëns van groot taalmodelle om natuurlike tale en programmeringstale te verstaan en te genereer, kan hierdie modelle kode en kommentaar analiseer om kontekstueel-relevante toetsgevalle te genereer. Die gebruik van hierdie modelle op die gebied van outomatiese sagtewaretoetsing bied ’n potensiële oplossing vir die orakelprobleem. Die navorsing behels ’n vergelykende analise tussen Codex en ’n promi- nente outomatiese toetsingshulpmiddel, EvoSuite, deur gebruik te maak van die Commons-Lang biblioteek wat deel is van die Defects4J maatstaf. Hierdie vergelyking bied insigte oor die doeltreffendheid van Codex om dekkingstoetse te genereer en foutiewe gedrag binne produksiekode te identifiseer. D ie be- vindinge van hierdie tesis beweer dat Codex, alhoewel dit belowend lyk, be- perkings toon as ’n outomatiese toetsingshulpmiddel om hoë toetsdekking te bereik en sagtewarefoute bloot te lê. Verder beklemtoon die studie potensiële uitdagings wat gepaard gaan met die gebruik van oopbronbewaarplekke vir die opleiding en toetsing van groot taalmodelle om kode te genereer, insluitend die risiko om onkonsekwente koderingskonvensies en suboptimale sagtewaretoets- praktyke in hierdie modelle in te sluit.af_ZA
dc.description.versionMastersen_ZA
dc.format.extentviii, 90 pagesen_ZA
dc.identifier.urihttps://scholar.sun.ac.za/handle/10019.1/130167
dc.language.isoen_ZAen_ZA
dc.language.isoen_ZAen_ZA
dc.publisherStellenbosch : Stellenbosch Universityen_ZA
dc.rights.holderStellenbosch Universityen_ZA
dc.subject.lcshLarge language model -- Testingen_ZA
dc.subject.lcshComputational linguistics -- Evaluationen_ZA
dc.subject.lcshLanguage and languages -- Data processingen_ZA
dc.subject.lcshNatural language processing (Computer science) -- Testingen_ZA
dc.subject.lcshComputer software -- Testingen_ZA
dc.subject.nameUCTDen_ZA
dc.titleLarge language models and software testingen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
dewey_large_2024.pdf
Size:
1.74 MB
Format:
Adobe Portable Document Format
Description: