Integrating Bayesian network structure into normalizing flows and variational autoencoders

Date
2023-03
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: Deep generative models have become more popular in recent years due to their good scalability and representation capacity. However, these models do not typically incorporate domain knowledge. In contrast, probabilistic graphical models speci_cally constrain the dependencies between the variables of interest as informed by the domain. In this work, we therefore consider integrating probabilistic graphical models and deep generative models in order to construct models that are able to learn complex distributions, while remaining interpretable by leveraging prior knowledge about variable interactions. We specifically consider the type of domain knowledge that can be represented by Bayesian networks, and restrict our study to the deep generative frameworks of normalizing flows and variational autoencoders. Normalizing flows (NFs) are an important family of deep neural networks for modelling complex distributions as transformations of simple base distributions. Graphical _ows add further structure to NFs, allowing one to encode non-trivial variable dependencies in these distributions. Previous graphical flows have focused primarily on a single _ow direction: either the normalizing direction for density estimation, or the generative direction for inference and sampling. However, to use a single _ow to perform tasks in both directions, the model must exhibit stable and efficient flow inversion. This thesis introduces graphical residual flows (GRFs)_graphical flows based on invertible residual networks_which ensure stable invertibility by spectral normalization of its weight matrices. Experiments confirm that GRFs provide performance competitive with other graphical flows for both density estimation and inference tasks. Furthermore, our model provides stable and accurate inversion that is also more time-efficient than alternative flows with similar task performance. We therefore recommend the use of GRFs over other graphical flows when the model may be required to perform reliably in both directions. Since flows employ a bijective transformation, the dimension of the base or latent distribution must have the same dimensionality as the observed data. Variational autoencoders (VAEs) address this shortcoming by allowing practitioners to specify any number of latent variables. Initial work on VAEs assumed independent latent variables with simple prior and variational distributions. Subsequent work has explored incorporating more complex distributions and dependency structures: including NFs in the encoder network allows latent variables to entangle non-linearly, creating a richer class of distributions for the approximate posterior, and stacking layers of latent variables allows more complex priors to be specified. In this vein, this thesis also explores incorporating arbitrary dependency structures_as specified by Bayesian networks_into VAEs. This is achieved by extending both the prior and inference network with the above GRF, resulting in the structured invertible residual network (SIReN) VAE. We specifically consider GRFs, since the application of the _ow in the VAE prior necessitates stable inversion. We compare our model's performance on several datasets to models that encode no special dependency structures, and show its potential to provide a more interpretable model as well as better generalization performance in data-sparse settings. We also identify posterior collapse_where some latent dimensions become inactive and are effectively ignored by the model_as an issue with SIReN-VAE, as it is linked with the encoded structure. As such, we employ various combinations of existing approaches to alleviate this phenomenon.
AFRIKAANS OPSOMMING: Diep generatiewe modelle het die afgelope paar jaar gewild geword as gevolg van hul goeie skaalbaarheid en verteenwoordigingskapasiteit. Hierdie modelle inkorporeer egter nie tipies domeinkennis nie. In teenstelling hiermee beperk gra_ese waarskynlikheidsmodelle spesifiek die voorwaardelike onafhanklikhede tussen die veranderlikes van belang soos deur die domein ingelig. In hierdie werk oorweeg ons dus die integrasie van gra_ese waarskynlikheidsmodelle en diep generatiewe modelle om sodoende modelle te konstrueer wat komplekse verdelings kan leer, terwyl hul interpreteerbaar bly deur kennis oor veranderlike interaksies te benut. Ons oorweeg spesi_ek die tipe domeinkennis wat deur Bayesiaanse netwerke verteenwoordig kan word, en beperk ons studie tot die diep generatiewe raamwerke van normaliserende strome en variasionele outoenkodeerders. Normaliserende strome (NF'e) is 'n belangrike familie van diep neurale netwerke vir die modellering van komplekse verdelings as transformasies van eenvoudige basisverdelings. Gra_ese strome voeg verdere struktuur aan NF'e, wat 'n mens in staat stel om nie-triviale veranderlike afhanklikhede in hierdie verdelings te enkodeer. Vorige gra_ese strome het hoofsaaklik op 'n enkele vloeirigting gefokus: die normaliserende rigting vir digtheidskatting, of die generatiewe rigting vir statistiese a_eiding en steekproefneming. Om egter 'n enkele stroom te gebruik om take in beide rigtings uit te voer, moet die model stabiele en doeltre _ende inversie toon. Hierdie tesis stel gra_ese residuele strome (GRF'e) bekend wat gebaseer is op inverteerbare residuele netwerke. GRF'e verseker stabiele inverteerbaarheid deur spektrale normalisering van hul gewigsmatrikse. Eksperimente bevestig dat GRF'e kompeterende modeleringsvermoë bied in vergelyking met ander gra_ese strome vir beide digtheidsskatting en a_eidingstake. Verder bied ons model stabiele en akkurate inversie wat ook meer tydsdoeltre_end is as alternatiewe strome met soortgelyke taakverrigting. Ons beveel dus die gebruik van GRF'e aan wanneer die model vereis is om betroubaar in beide vloeirigtings te werk. Aangesien strome 'n byektiewe transformasie gebruik, moet die dimensie van die basis of latente verspreiding dieselfde dimensionaliteit hê as die waargenome data. Variasionele outo-enkodeerders (VAE's) spreek hierdie tekortkoming aan deur praktisyns toe te laat om enige aantal latente veranderlikes te spesi_seer. Aanvanklike werk op VAE's het onafhanklike latente veranderlikes met eenvoudige verdelings aanvaar. Daaropvolgende werk het ondersoek ingestel na die insluiting van meer komplekse verdelings en afhanklikheidstrukture: die insluiting van NF'e in die enkoderingsnetwerk laat latente veranderlikes toe om nie-lineêr te verstrengel wat 'n ryker klas verdelings vir die benaderde posteriori-verdeling skep, en die stapeling van lae latente veranderlikes laat toe dat meer komplekse priori-verdelings gespesi_seer word. In 'n soortgelyke trant ondersoek hierdie tesis die inkorporering van arbitrêre afhanklikheidstrukture, soos gespesi_seer deur Bayesiaanse netwerke, in VAE's. Dit word bewerkstellig deur beide die priori-verdeling en die a_eidingsnetwerk uit te brei met die bogenoemde GRF, wat lei tot die gestruktureerde inverteerbare residuele netwerk (SIReN) VAE. Ons oorweeg spesi_ek GRF'e, aangesien die toepassing van die stroom in die VAE priori-verdeling stabiele inversie benodig. Ons vergelyk ons model se modeleringsvermoë op verskeie datastelle teenoor modelle wat geen spesi_eke afhanklikheidstrukture inkorporeer nie, en toon die potensiaal daarvan om 'n meer interpreteerbare model te verskaf sowel as beter veralgemeningsvermoë wanneer beperkte data beskikbaar is. Ons identi_seer ook posteriori-ineenstorting_waar sommige latente dimensies deur die model geïgnoreer word_as 'n probleem met SIReN-VAE, aangesien dit gekoppel is aan die geënkodeerde struktuur. As sodanig evalueer ons verskeie kombinasies van bestaande tegnieke om hierdie verskynsel te verhoed.
Description
Thesis (MSc)--Stellenbosch University, 2023.
Keywords
Citation