Application of data mining and machine learning on occupational health and safety struck-by incidents on south African construction sites: a CRISP-DM approach.
dc.contributor.advisor | Wium, Jan Andries | en_ZA |
dc.contributor.author | Adams, Logan Charl | en_ZA |
dc.contributor.other | Stellenbosch University. Faculty of Engineering. Dept. of Civil Engineering. | en_ZA |
dc.date.accessioned | 2023-02-27T12:45:09Z | en_ZA |
dc.date.accessioned | 2023-05-18T07:10:25Z | en_ZA |
dc.date.available | 2023-02-27T12:45:09Z | en_ZA |
dc.date.available | 2023-05-18T07:10:25Z | en_ZA |
dc.date.issued | 2023-02 | en_ZA |
dc.description | Thesis (MEng)--Stellenbosch University, 2023. | en_ZA |
dc.description.abstract | ENGLISH ABSTRACT: Occupational Health and Safety in the South African construction industry face many performance challenges that result in potentially avoidable incident occurrences. The study aims to propose the utilisation of data mining and classification machine learning models to improve data understanding, promote knowledge and information extraction, and encourage prediction capabilities through classification methods. A mixed research approach was applied in the study to enable a holistic usage of data and its applications. Interviews (qualitative research component) allowed the identification of the current state of OHS data and data management in the South African construction industry while identifying data considerations for the quantitative research component (Exploratory Data Analysis and classification machine learning models). Data sourced from Federated Employers Mutual Assurance Company (an insurance database), and additional databases (sourced from the Federal Reserve Bank of St. Louis and Organisation for Economic Cooperation and Development), enabled a quantitative Exploratory Data Analysis and the development of multiple classification machine learning models. The Exploratory Data Analysis provided insights into data understanding and the potential of using it to enable datadriven safety decision-making. The classification models provided insights into the possibility of an industry-wide classification prediction model based on existing data while also providing valuable insights into the fundamental concerns and limitations. The qualitative and quantitative components of the study highlighted several concerns regarding data, data management, and data innovations across OHS in the South African construction industry. At the core was the lack of understanding regarding the possibilities of data and the misaligned value proposition witnessed. Furthermore, the notable limitations in the quality of data and the mechanisms that influence its quality were highlighted, including the effects of ineffective incident investigations for fact-finding and prominent underreporting experienced in the construction industry. Data mining and machine learning offered the ability to extract deeper insights from incidents and enable improvements in OHS performance through data-driven safety decision-making. Three output variables were evaluated across several machine learning algorithms in terms of the model's ability to successfully predict and classify the state of an incident namely (1) Injury Location (the physical injury location on the affected individual's body), (2) Nature of Injury (the type of injury the affected individual experienced), and (3) Days off (number of days required off from work for recovery). The results obtained from the machine learning models demonstrate the capability to predict the Days off variable to high accuracy levels (average of 81.8%), moderate accuracy levels for the Nature of Injury (average of 37.4%), and low accuracy levels for Injury Location (average of 17.8%). The performance Stellenbosch University https://scholar.sun.ac.za iii | P a g e of the various machine learning models are directly influenced by the underlying correlation between the output and input variables and the number of classifications required within the output variable itself – with the largest correlation coefficient and the number of classifications respectively noted as Injury Location (0.07, 20), Nature of Injury (0.14, 9), and Days off (0.07, 3). It is recommended that the successful implementation of data mining and machine learning requires collaborative efforts between the industry, Government, and academia. | en_ZA |
dc.description.abstract | AFRIKAANS OPSOMMING: Beroepsgesondheid en -veiligheid (BGV) in die Suid-Afrikaanse konstruksiebedryf staar baie uitdagings in die gesig wat lei tot potensieel voorkombare voorvalle. Die studie het ten doel om die gebruik van data-ontginning en klassifikasiemasjienleermodelle voor te stel om databegrip te verbeter, kennis en inligting-onttrekking te bevorder, en voorspellingsvermoëns deur klassifikasiemetodes aan te moedig. 'n Gemengde navorsingsbenadering is in die studie toegepas om 'n holistiese gebruik van data en die toepassings daarvan moontlik te maak. Onderhoude (kwalitatiewe navorsingskomponent) is gevoer vir die identifisering van die huidige stand van BGV-data en databestuur in die Suid-Afrikaanse konstruksiebedryf, terwyl data-oorwegings vir die kwantitatiewe navorsingskomponent (Verkennende Data-analise en klassifikasiemasjienleermodelle) geïdentifiseer is. Data afkomstig van Federated Employers Mutual Assurance Company ('n versekeringsdatabasis), en bykomende databasisse (verkry vanaf die Federale Reserwebank van St. Louis en Organisasie vir Ekonomiese Samewerking en Ontwikkeling in die VSA), het 'n kwantitatiewe verkennende data-analise en die ontwikkeling van meervoudige klassifikasiemasjienleermodelle moontlik gemaak. Die verkennende data-analise het insigte verskaf in databegrip en die potensiaal geïdentifiseer om dit te gebruik vir datagedrewe veiligheidsbesluitneming. Die klassifikasiemodelle het insigte verskaf oor die moontlikheid van 'n industriewye klassifikasievoorspellingsmodel gebaseer op bestaande data, terwyl dit ook waardevolle insigte verskaf het oor die fundamentele tekortkominge en beperkings. Die kwalitatiewe en kwantitatiewe komponente van die studie het verskeie tekortkominge uitgelig rakende data, databestuur en data-innovasies van BHV in die Suid-Afrikaanse konstruksiebedryf. Die kern was die gebrek aan begrip rakende die moontlikhede van data en die waarde daarvan. Verder is die noemenswaardige beperkings in die kwaliteit van data en die meganismes wat die kwaliteit daarvan beïnvloed uitgelig, insluitend die uitwerking van ondoeltreffende voorvalondersoeke vir feitebevinding en onderverslaggewing wat in die konstruksiebedryf ervaar word. Data-ontginning en masjienleer bied die vermoë om dieper insigte uit voorvalle te onttrek en verbeterings in BGV-prestasie moontlik te maak deur data-gedrewe veiligheidsbesluitneming. Drie uitsetveranderlikes is geëvalueer deur verskeie masjienleeralgoritmes in terme van die modelle se vermoë om die toestand van 'n insident suksesvol te voorspel en te klassifiseer naamlik: (1) Beseringligging (die fisiese beseringsplek op die geaffekteerde individu se liggaam), (2) Aard van Besering (die tipe besering) wat die geaffekteerde individu ervaar het), en (3) Dae af (aantal dae af van die werk benodig vir herstel). Die resultate verkry uit die masjienleermodelle demonstreer die vermoë om die Dae af veranderlike tot hoë akkuraatheidsvlakke (gemiddeld van 81.8%), matige akkuraatheidsvlakke vir Aard van Besering (gemiddeld van 37.4%) en lae akkuraatheidsvlakke vir Beseringplek te voorspel (gemiddeld van 17,8%). Die sukses van die verskillende masjienleermodelle word direk beïnvloed deur die onderliggende korrelasie tussen die uitset- en insetveranderlikes en die aantal klassifikasies wat binne die uitsetveranderlike self benodig word – met die grootste korrelasiekoëffisiënt en aantal klassifikasies onderskeidelik as: Beseringsligging (0.07) , 20), Aard van Besering (0.14, 9) en Dae af (0.07, 3). Dit word aanbeveel dat die suksesvolle implementering van data-ontginning en masjienleer moontlik gemaak moet word deur samewerking tussen die industrie, die regering en die akademie | en_ZA |
dc.description.version | Masters | en_ZA |
dc.format.extent | xviii, 189 pages : illustrations. | en_ZA |
dc.identifier.uri | http://hdl.handle.net/10019.1/127219 | en_ZA |
dc.language.iso | en_ZA | en_ZA |
dc.language.iso | en_ZA | en_ZA |
dc.publisher | Stellenbosch : Stellenbosch University | en_ZA |
dc.rights.holder | Stellenbosch University | en_ZA |
dc.subject.lcsh | Data mining | en_ZA |
dc.subject.lcsh | Industrial safety | en_ZA |
dc.subject.lcsh | Machine learning | en_ZA |
dc.title | Application of data mining and machine learning on occupational health and safety struck-by incidents on south African construction sites: a CRISP-DM approach. | en_ZA |
dc.type | Thesis | en_ZA |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- adams_data_2023.pdf
- Size:
- 5.27 MB
- Format:
- Adobe Portable Document Format
- Description: