diff --git a/README.md b/README.md index 189ccfb..d362799 100644 --- a/README.md +++ b/README.md @@ -1,142 +1,58 @@ -# Open Bayesian Trees Project -This repository includes new developments with Bayesian Additive Regression Trees and extends the original OpenBT repository created by Matt Pratola (https://bitbucket.org/mpratola/openbt/src/master/). -Such extensions include Bayesian Model Mixing and Bayesian Calibration. -All of the Bayesian Tree code is written in C++. User interfaces constructed in R and Python allow one to easily run the software. -The BART Model Mixing software has been implemented in the [Taweret](https://github.com/bandframework/Taweret/tree/main) Python package in conjunction with the [BAND](https://bandframework.github.io/) collaboration. - - -# Installation -The heart of OpenBT is a set of C++ tools that can be used directly via -the command line or indirectly through the Python and R packages, which wrap -them. Typically these tools are built with an implementation of the Message -Passing Interface (MPI) such as [Open MPI](https://www.open-mpi.org) or -[MPICH](https://www.mpich.org) to enable distributed parallelization of -computations. In particular, the Python wrapper package is always built with -MPI support. - -The software and its distribution scheme have been developed to allow users to -use OpenBT with the MPI installation of their choice. For instance, it -can be built with MPI installations on leadership class platforms and clusters -that were installed by experts and optimized for their specific platform. As a -result, however, the software is not distributed as prebuilt binaries or wheels, -but rather must be built for each case with the compiler suite and matching MPI -implementation provided by the user. - -## Requirements -Before building and installing the Python package, users -must provide -* a compiler suite that includes a C++ compiler that supports the C++14 - standard, -* an MPI installation that is compatible with the compiler suite, and -* optionally the [Eigen software package](https://gitlab.com/libeigen/eigen). - -Note that if installing MPI using a package manager, related developer library -packages such as ``libopenmpi-dev`` or ``libmpich-dev`` might need to be -installed in addition to the base MPI packages such as ``openmpi-bin`` or -``mpich``. - -To build and install just the bare C++ tools, users must provide in addition to -the above -* the [Meson build system](https://mesonbuild.com) and its prerequisites such as - Python 3 and [ninja](https://ninja-build.org). - -While both Meson and ninja are used internally to build the Python package, they -are installed automatically just for building the package. - -The Meson build system is setup to automatically detect the compiler suite and -MPI installation to use. If Eigen already exists in the system and Meson can -find it, then Meson will use it for the build. Otherwise, Meson will -automatically obtain a copy of Eigen for internal use. - -We presently test OpenBT with both Open MPI and MPICH. In addition, we -have successfully tested with the Intel MPI implementation and have used the -Python package with MPI implementations installed -* via package managers such as Ubuntu's Advanced Packaging Tool (`apt`) and - [homebrew](https://brew.sh) on macOS; -* by experts on clusters and that are available as modules; and -* with `conda` from the prebuilt conda forge - [openmpi](https://anaconda.org/conda-forge/openmpi) package. - -## Meson installation -The Meson build system documentation suggests installing Meson via package -manager when possible. Please refer to that documentation for detailed and -up-to-date installation information. - -If Meson cannot be installed by package manager or the manager's version is too -old, the following is contrary to Meson suggestions but has been used -successfully to install Meson with Python into a dedicated virtual environment -as well as to install `meson` in the `PATH` for use without needing to activate -that virtual environment. -``` -$ /path/to/target/python -m venv ~/local/venv/meson -$ . ~/local/venv/meson/bin/activate -$ which python -$ python -m pip install --upgrade pip -$ python -m pip install meson -$ python -m pip list -$ ln -s ~/local/venv/meson/bin/meson ~/local/bin - -$ deactivate -$ which meson -$ meson --version -``` -Note that this `meson` virtual environment is for installing **just** the Meson -build system. Attempts to install `openbt` into this virtual environment -will likely fail with an error that the `mesonbuild` module cannot be found. - -## Python package -The OpenBT Python package is **not** currently distributed on PyPI since the -[PyPI OpenBT package](https://pypi.org/project/openbt/) already exists. This -package will eventually be transferred to this project so that distribution of -modern versions of this package will be enabled by PyPI under the name `openbt`. - - +# OpenBT -Instead the package should be built and installed from a clone of this repository with -``` -$ cd /path/to/OpenBT/openbt_pypkg -$ python -m pip install . -``` -Developers can install in developer/editable mode with verbose logging of the -build process and installation with -``` -$ cd /path/to/OpenBT/openbt_pypkg -$ python -m pip install -v -e . -``` -In this latter case, the command line tools are built automatically and -installed at `/path/to/OpenBT/openbt_pypkg/src/openbt/bin`. The -Python package is hardcoded to use those tools so that the existence of another -set of tools in the system and in `PATH` should not cause issues. +TODO: Copy info from landing page of sphinx docs here. -OpenBT package installations can be minimally tested with -``` -$ python ->>> import openbt ->>> openbt.__version__ -'' ->>> openbt.test() -``` +### Repository +TODO: Add general badges + +### Python +TODO: Add Python-specific badges + +## License & Copyright -## C++ library & command line tool interface -Developers and C++ users can directly build and install the command line tools, -an OpenBT library, and library tests with `tools/build_openbt_clt.sh`. -Note that these do **not** need to be built in order to use the Python package. +TODO: Copy license & copyright info from landing page of sphinx docs here -## R package -**TODO**: Needs to be written based on current state of affairs. +## Support -# Examples +To -The examples from the article "Model Mixing Using Bayesian Additive Regression Trees" are reproduced in the jupyter notebook BART_BMM_Technometrics.ipynb. This notebook can be run locally or in a virtual environment such as google colab. +* report potential problems with OpenBT or any of the packages derived from it, +* propose a change, or +* request a new feature, + +please check if a related [Issue](https://github.com/bandframework/OpenBT/issues) +already exists before creating a new issue. For all other communication, please +send an email to the OpenBT development team + + * TODO: Matt's contact info + * TODO: John's contact info + +## Documentation + +[User and Developer Guides](https://openbt.readthedocs.io) are hosted on +ReadTheDocs. Please refer to those documents for information regarding +examples. + +## Installation & Testing + +Refer to the getting started sections in the User Guide related to the tool or +package that you intend to use. + +## Contributing to IBCDFO + +Contributions are welcome in a variety of forms; see +[Contributing](https://openbt.readthedocs.io/en/latest/contributing.html) in the +Developer Guide. + +## Cite OpenBT + +``` +@techreport{openbt2026, + author = {Matt Pratola and John Yannotty}, + title = {{OpenBT 1.2.0} Users Manual}, + institution = {TBD}, + number = {Version 1.2.0}, + year = {2026}, + url = {https://openbt.readthedocs.io/} +} +``` diff --git a/Ropenbt/DESCRIPTION b/Ropenbt/DESCRIPTION new file mode 100644 index 0000000..af7dad7 --- /dev/null +++ b/Ropenbt/DESCRIPTION @@ -0,0 +1,15 @@ +Package: Ropenbt +Type: Package +Title: The Open Bayesian Trees Project +Version: 0.0.0 +Date: 2026-06-23 +Authors@R: c(person('Matthew', 'Pratola', role=c('aut','cre','cph'),email='mpratola@iu.edu')) +URL: http://www.github.com/bandframework/OpenBT +Description: + This R package provides and R interface to the OpenBT library of Bayesian Tree models (http://www.github.com/bandframework/OpenBT). For more information and examples, refer to the README page at http://www.github.com/bandframework/OpenBT/README.md. +License: GPL (>= 2) +Depends: R (>= 2.10) +Imports: data.table, zip, emoa, Hmisc +Suggests: +SystemRequirements: +NeedsCompilation: no diff --git a/Ropenbt/NAMESPACE b/Ropenbt/NAMESPACE new file mode 100644 index 0000000..3a870f8 --- /dev/null +++ b/Ropenbt/NAMESPACE @@ -0,0 +1,29 @@ +export(openbt) +S3method(predict, openbt) +export(vartivity.openbt) +export(sobol.openbt) +export(mopareto.openbt) +export(openbt.save) +export(openbt.load) +export(makecuts.openbt) +export(setvarcuts.openbt) +export(calcMBD) +export(calcRS) +importFrom(stats,cor) +importFrom("graphics", "boxplot", "par") +importFrom("graphics", "abline", "lines", "plot", "points") +importFrom("stats", "quantile", "sd","qqplot") +importFrom("stats", "rnorm", "runif") +importFrom("stats","dist") +importFrom("utils","read.table") +importFrom("stats","qnorm") +importFrom("zip","unzip") +importFrom("zip","zipr") +importFrom("graphics","title") +importFrom("graphics","rect") +importFrom("grDevices","gray") +importFrom("utils","combn") +importFrom("Hmisc","wtd.mean") +importFrom("Hmisc","wtd.var") +importFrom("Hmisc","wtd.quantile") + diff --git a/Ropenbt/R/openbt.R b/Ropenbt/R/openbt.R new file mode 100644 index 0000000..cfe2cb1 --- /dev/null +++ b/Ropenbt/R/openbt.R @@ -0,0 +1,3179 @@ +## openbt.R: R script wrapper functions for OpenBT. + + +# Load/Install required packages +required <- c("zip","data.table","Hmisc") +tbi <- required[!(required %in% installed.packages()[,"Package"])] +if(length(tbi)) { + cat("***Installing OpenBT package dependencies***\n") + install.packages(tbi,repos="https://cloud.r-project.org",quiet=TRUE) +} +library(zip,quietly=TRUE,warn.conflicts=FALSE) +library(data.table,quietly=TRUE,warn.conflicts=FALSE) +library(Hmisc,quietly=TRUE,warn.conflicts=FALSE) #for weighted mean/var/quantile used in repredict.*() +# library(foreach,quietly=TRUE,warn.conflicts=FALSE) #only used in repredict.openbt() method to speedup R-based reweighting calculations +# library(doMC,quietly=TRUE,warn.conflicts=FALSE) #only used in repredict.openbt() method to speedup R-based reweighting calculations + + +openbt = function( +x.train, +y.train, +f.train = matrix(1,nrow = length(y.train), ncol = 2), +ntree=NULL, +ntreeh=NULL, +ndpost=1000, nskip=100, +k=NULL, +power=2.0, base=.95, +tc=2, +sigmav=rep(1,length(y.train)), +f.sd.train = NULL, # rename +wts.prior.info = NULL, # keep/rename +fmean=mean(y.train), +overallsd = NULL, +overallnu= NULL, +chv = cor(x.train,method="spearman"), +pbd=.7, +pb=.5, +stepwpert=.1, +probchv=.1, +minnumbot=5, +printevery=100, +numcut=100, +xicuts=NULL, +nadapt=1000, +adaptevery=100, +summarystats=FALSE, +truncateds=NULL, +model=NULL, +modelname="model" +) +{ +#-------------------------------------------------- +# model type definitions +modeltype=0 # undefined +MODEL_BT=1 +MODEL_BINOMIAL=2 +MODEL_POISSON=3 +MODEL_BART=4 +MODEL_HBART=5 +MODEL_PROBIT=6 +MODEL_MODIFIEDPROBIT=7 +MODEL_MERCK_TRUNCATED=8 +MODEL_MIXBART=9 +if(is.null(model)) +{ + cat("Model type not specified.\n") + cat("Available options are:\n") + cat("model='bt'\n") + cat("model='binomial'\n") + cat("model='poisson'\n") + cat("model='bart'\n") + cat("model='hbart'\n") + cat("model='probit'\n") + cat("model='modifiedprobit'\n") + cat("model='merck_truncated'\n") + cat("model='mixbart'\n") + + stop("missing model type.\n") +} +if(model=="bart") +{ + modeltype=MODEL_BART + if(is.null(ntree)) ntree=200 + if(is.null(ntreeh)) ntreeh=1 + if(is.null(k)) k=2 + if(is.null(overallsd)) overallsd=sd(y.train) + if(is.null(overallnu)) overallnu=10 + pbd=c(pbd,0.0) +} +if(model=="hbart") +{ + modeltype=MODEL_HBART + if(is.null(ntree)) ntree=200 + if(is.null(ntreeh)) ntreeh=40 + if(is.null(k)) k=5 + if(is.null(overallsd)) overallsd=sd(y.train) + if(is.null(overallnu)) overallnu=10 +} + +if(model=="probit") +{ + modeltype=MODEL_PROBIT + if(is.null(ntree)) ntree=200 + if(is.null(ntreeh)) ntreeh=1 + if(is.null(k)) k=1 + if(is.null(overallsd)) overallsd=1 + if(is.null(overallnu)) overallnu=-1 + if(length(pbd)==1) pbd=c(pbd,0.0) +} +if(model=="modified-probit") +{ + modeltype=MODEL_MODIFIEDPROBIT + if(is.null(ntree)) ntree=200 + if(is.null(ntreeh)) ntreeh=40 + if(is.null(k)) k=1 + if(is.null(overallsd)) overallsd=1 + if(is.null(overallnu)) overallnu=-1 +} +if(model=="merck_truncated") +{ + modeltype=MODEL_MERCK_TRUNCATED + if(is.null(ntree)) ntree=200 + if(is.null(ntreeh)) ntreeh=1 + if(is.null(k)) k=2 + if(is.null(overallsd)) overallsd=sd(y.train) + if(is.null(overallnu)) overallnu=10 + if(is.null(truncateds)) { + miny=min(y.train)[1] + truncateds=(y.train==miny) + } +} +if(model=="mixbart") +{ + modeltype=MODEL_MIXBART + if(is.null(ntree)) ntree=20 + if(is.null(ntreeh)) ntreeh=1 + if(is.null(k)) k=1 + if(is.null(overallsd)) overallsd=sd(y.train) + if(is.null(overallnu)) overallnu=10 + pbd=c(pbd,0.0) +} +#-------------------------------------------------- +nd = ndpost +burn = nskip +m = ntree +mh = ntreeh +#-------------------------------------------------- +#data +if(!is.matrix(x.train)){x.train = matrix(x.train, ncol = 1)} +n = length(y.train) +p = ncol(x.train) +#np = nrow(x.test) +x = t(x.train) +#xp = t(x.test) +if(modeltype==MODEL_BART || modeltype==MODEL_HBART || modeltype==MODEL_MERCK_TRUNCATED) +{ + y.train=y.train-fmean + fmean.out=paste(0.0) +} +if(modeltype==MODEL_PROBIT || modeltype==MODEL_MODIFIEDPROBIT) +{ + fmean.out=paste(qnorm(fmean)) + uniqy=sort(unique(y.train)) + if(length(uniqy)>2) stop("Invalid y.train: Probit requires dichotomous response coded 0/1") + if(uniqy[1]!=0 || uniqy[2]!=1) stop("Invalid y.train: Probit requires dichotomous response coded 0/1") +} + +#Set mix discrepancy to FALSE if we use a different model +nsprior = FALSE +wtsprior = FALSE +if(modeltype==MODEL_MIXBART) +{ + #Center the response + fmean.out=paste(0.0) + + #Check to see if any discrepancy data has been passed into the function -- if so, we will use the discrepancy model mixing + if(!is.null(f.sd.train)){ + nsprior = TRUE + } + if(!is.null(wts.prior.info)){ + wtsprior = TRUE + if(ncol(wts.prior.info) != 2 | nrow(wts.prior.info)!=ncol(f.train)) stop("Invalid wts.prior.info: Required dimensions of num.models x 2") + } +} +#-------------------------------------------------- +#cutpoints +if(!is.null(xicuts)) # use xicuts +{ + xi=xicuts +} +else # default to equal numcut per dimension +{ + xi=vector("list",p) + minx=floor(apply(x,1,min)) + maxx=ceiling(apply(x,1,max)) + for(i in 1:p) + { + xinc=(maxx[i]-minx[i])/(numcut+1) + xi[[i]]=(1:numcut)*xinc+minx[i] + } +} + +if(is.null(xicuts) & modeltype==MODEL_MIXBART){ + xi=vector("list",p) + minx_temp=apply(x,1,min) + maxx_temp=apply(x,1,max) + + maxx = round(maxx_temp,1) + ifelse((round(maxx_temp,1)-maxx_temp)>0,0,0.1) + minx = round(minx_temp,1) - ifelse((minx_temp - round(minx_temp,1))>0,0,0.1) + for(i in 1:p) + { + xinc=(maxx[i]-minx[i])/(numcut+1) + xi[[i]]=(1:numcut)*xinc+minx[i] + } +} +#-------------------------------------------------- +if(modeltype==MODEL_BART || modeltype==MODEL_HBART || modeltype==MODEL_MERCK_TRUNCATED || modeltype==MODEL_MIXBART) +{ + rgy = range(y.train) +} +if(modeltype==MODEL_PROBIT || modeltype==MODEL_MODIFIEDPROBIT) +{ + rgy = c(-2,2) +} + +if(modeltype==MODEL_MIXBART){ + tau = (1)/(2*sqrt(m)*k) + beta0 = 1/(2*m) + overallsd = sqrt((overallnu+2)*overallsd^2/overallnu) +}else{ + tau = (rgy[2]-rgy[1])/(2*sqrt(m)*k) + beta0=0 +} + +if(modeltype==MODEL_MIXBART & nsprior){ + #tau = 1/(sqrt(m)*k) + #tau = 1/(2*sqrt(m)*k) + tau = 1/(2*(m)*k) + beta0 = 1/m +} + +#-------------------------------------------------- +overalllambda = overallsd^2 +#-------------------------------------------------- +powerh=power +baseh=base +if(length(power)>1) { + powerh=power[2] + power=power[1] +} +if(length(base)>1) { + baseh=base[2] + base=base[1] +} +#-------------------------------------------------- +pbdh=pbd +pbh=pb +if(length(pbd)>1) { + pbdh=pbdh[2] + pbd=pbd[1] +} +if(length(pb)>1) { + pbh=pb[2] + pb=pb[1] +} +#-------------------------------------------------- +if(modeltype==MODEL_BART) +{ + cat("Model: Bayesian Additive Regression Trees model (BART)\n") +} +#-------------------------------------------------- +if(modeltype==MODEL_HBART) +{ + cat("Model: Heteroscedastic Bayesian Additive Regression Trees model (HBART)\n") +} +#-------------------------------------------------- +if(modeltype==MODEL_PROBIT) +{ + cat("Model: Dichotomous outcome model: Albert & Chib Probit (fixed)\n") +# overallnu=-1 + if(ntreeh>1) + stop("method probit requires ntreeh=1") + if(pbdh>0.0) + stop("method probit requires pbd[2]=0.0") +} +#-------------------------------------------------- +if(modeltype==MODEL_MODIFIEDPROBIT) +{ + cat("Model: Dichotomous outcome model: Modified Albert & Chib Probit\n") +} +#-------------------------------------------------- +if(modeltype==MODEL_MERCK_TRUNCATED) +{ + cat("Model: Truncated BART model\n") +} +#-------------------------------------------------- +if(modeltype==MODEL_MIXBART) +{ + cat("Model: Model Mixing with Bayesian Additive Regression Trees\n") +} +#-------------------------------------------------- +stepwperth=stepwpert +if(length(stepwpert)>1) { + stepwperth=stepwpert[2] + stepwpert=stepwpert[1] +} +#-------------------------------------------------- +probchvh=probchv +if(length(probchv)>1) { + probchvh=probchv[2] + probchv=probchv[1] +} +#-------------------------------------------------- +minnumboth=minnumbot +if(length(minnumbot)>1) { + minnumboth=minnumbot[2] + minnumbot=minnumbot[1] +} + +#-------------------------------------------------- +#write out config file +xroot="x" +yroot="y" +sroot="s" +chgvroot="chgv" +froot="f" +fsdroot="fsd" +wproot="wpr" +xiroot="xi" +folder=tempdir(check=TRUE) +if(!dir.exists(folder)) dir.create(folder) +tmpsubfolder=tempfile(tmpdir="") +tmpsubfolder=substr(tmpsubfolder,5,nchar(tmpsubfolder)) +tmpsubfolder=paste("openbt",tmpsubfolder,sep="") +folder=paste(folder,"/",tmpsubfolder,sep="") +if(!dir.exists(folder)) dir.create(folder) +fout=file(paste(folder,"/config",sep=""),"w") +writeLines(c(paste(modeltype),xroot,yroot,fmean.out,paste(m),paste(mh),paste(nd),paste(burn), + paste(nadapt),paste(adaptevery),paste(tau),paste(beta0),paste(overalllambda), + paste(overallnu),paste(base),paste(power),paste(baseh),paste(powerh), + paste(tc),paste(sroot),paste(chgvroot),paste(froot),paste(fsdroot),paste(nsprior),paste(wproot),paste(wtsprior), + paste(pbd),paste(pb),paste(pbdh),paste(pbh),paste(stepwpert),paste(stepwperth), + paste(probchv),paste(probchvh),paste(minnumbot),paste(minnumboth), + paste(printevery),paste(xiroot),paste(modelname),paste(summarystats)),fout) +close(fout) +# folder=paste(".",modelname,"/",sep="") +# system(paste("rm -rf ",folder,sep="")) +# system(paste("mkdir ",folder,sep="")) +# system(paste("cp config ",folder,"config",sep="")) + + +#-------------------------------------------------- +#write out data subsets +nslv=tc-1 +ylist=split(y.train,(seq(n)-1) %/% (n/nslv)) +for(i in 1:nslv) write(ylist[[i]],file=paste(folder,"/",yroot,i,sep="")) +xlist=split(as.data.frame(x.train),(seq(n)-1) %/% (n/nslv)) +for(i in 1:nslv) write(t(xlist[[i]]),file=paste(folder,"/",xroot,i,sep="")) +slist=split(sigmav,(seq(n)-1) %/% (n/nslv)) +for(i in 1:nslv) write(slist[[i]],file=paste(folder,"/",sroot,i,sep="")) +chv[is.na(chv)]=0 # if a var as 0 levels it will have a cor of NA so we'll just set those to 0. +write(chv,file=paste(folder,"/",chgvroot,sep="")) +for(i in 1:p) write(xi[[i]],file=paste(folder,"/",xiroot,i,sep="")) +rm(chv) + +#Write the function output if using model mixing +if(modeltype == MODEL_MIXBART){ + flist=split(as.data.frame(f.train),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(flist[[i]]),file=paste(folder,"/",froot,i,sep="")) + + if(nsprior){ + # -- delete these two lines + #fdmlist=split(as.data.frame(f.discrep.mean),(seq(n)-1) %/% (n/nslv)) + #for(i in 1:nslv) write(t(fdmlist[[i]]),file=paste(folder,"/",fdmroot,i,sep="")) + + fdslist=split(as.data.frame(f.sd.train),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(fdslist[[i]]),file=paste(folder,"/",fsdroot,i,sep="")) + } + + if(wtsprior){ + write(wts.prior.info,file=paste(folder,"/",wproot,sep="")) + } +} + +if(modeltype==MODEL_MERCK_TRUNCATED) +{ + tlist=split(truncateds,(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) { + truncs=which(tlist[[i]]==TRUE)-1 #-1 for correct indexing in c/c++ + ftrun=file(paste(folder,"/","truncs",i,sep=""),"w") + write(truncs,ftrun) + close(ftrun) + } +} +#-------------------------------------------------- +#run program +cmdopt=100 #default to serial/OpenMP +runlocal=FALSE +cmd="openbtcli --conf" +if(Sys.which("openbtcli")[[1]]=="") # not installed in a global location, so assume current directory + runlocal=TRUE + +if(runlocal && Sys.which("./openbtcli")[[1]]=="") # not installed globally, not installed locally. + stop("OpenBT library installation is missing. \n") + +if(runlocal) cmd="./openbtcli --conf" + +cmdopt=system(cmd) + +if(cmdopt==101) # MPI +{ + cmd=paste("mpirun -np ",tc," openbtcli ",folder,sep="") +} + +if(cmdopt==100) # serial/OpenMP +{ + if(runlocal) + cmd=paste("./openbtcli ",folder,sep="") + else + cmd=paste("openbtcli ",folder,sep="") +} +# +#cat(cmd) +system(cmd) +#system(paste("rm -f ",folder,"/config",sep="")) +#system(paste("mv ",folder,"fit ",folder,modelname,".fit",sep="")) + +res=list() + +# Read in the influence metrics. +res$influence=NULL +for(i in 1:nslv) + res$influence=rbind(res$influence,as.matrix(read.table(paste(folder,"/",modelname,".influence",i,sep="")))) + +colnames(res$influence)=c("MeanCookD", "MaxCookD", "KLDiv", "CPOinv") +res$modeltype=modeltype +res$model=model +res$xroot=xroot; res$yroot=yroot;res$m=m; res$mh=mh; res$nd=nd; res$burn=burn +res$nadapt=nadapt; res$adaptevery=adaptevery; res$tau=tau;res$beta0=beta0;res$overalllambda=overalllambda +res$overallnu=overallnu; res$k=k; res$base=base; res$power=power; res$baseh=baseh; res$powerh=powerh +res$tc=tc; res$sroot=sroot; res$chgvroot=chgvroot;res$froot=froot;res$fsdroot=fsdroot; +res$nsprior = nsprior; res$pbd=pbd; res$pb=pb +res$pbdh=pbdh; res$pbh=pbh; res$stepwpert=stepwpert; res$stepwperth=stepwperth +res$probchv=probchv; res$probchvh=probchvh; res$minnumbot=minnumbot; res$minnumboth=minnumboth +res$printevery=printevery; res$xiroot=xiroot; res$minx=minx; res$maxx=maxx; +res$summarystats=summarystats; res$modelname=modelname +class(xi)="OpenBT_cutinfo" +res$xicuts=xi +res$fmean=fmean +res$folder=folder +class(res)="OpenBT_posterior" + +return(res) +} + + + + +#-------------------------------------------------- +#Get Predictions +#-------------------------------------------------- + + +predict.openbt = function( +object=NULL, +x.test=NULL, +f.test=matrix(1,nrow = 1, ncol = 2), +# f.discrep.mean = NULL, # delete +# f.sd.test = NULL, # delete +tc=2, +fmean=fit$fmean, +q.lower=0.025, +q.upper=0.975, ... +) +{ + +fit=object + +# model type definitions +MODEL_BT=1 +MODEL_BINOMIAL=2 +MODEL_POISSON=3 +MODEL_BART=4 +MODEL_HBART=5 +MODEL_PROBIT=6 +MODEL_MODIFIEDPROBIT=7 +MODEL_MERCK_TRUNCATED=8 +MODEL_MIXBART=9 + +#-------------------------------------------------- +# params +if(is.null(fit)) stop("No fitted model specified!\n") +if(is.null(x.test)) stop("No prediction points specified!\n") + +nslv=tc +x.test=as.matrix(x.test) +p=ncol(x.test) +n=nrow(x.test) +k=2 #Default number of models for model mixing +xproot="xp" +fproot="fp" +#fpdmroot="fpdm" +#fpdsroot="fpds" + +if(fit$modeltype==MODEL_MIXBART){ + if(is.null(f.test)){stop("No function output specified for model mixing!\n")} + k=ncol(f.test) #Number of models + #if(fit$nsprior){stop("No function discrepancy mean and/or standard deviation was provided, but model was trained with functional discrepancy.") } +} + +if(fit$modeltype==MODEL_BART || fit$modeltype==MODEL_HBART || fit$modeltype==MODEL_MERCK_TRUNCATED) +{ + fmean.out=paste(fmean) +} +if(fit$modeltype==MODEL_PROBIT || fit$modeltype==MODEL_MODIFIEDPROBIT) +{ + fmean.out=paste(qnorm(fmean)) +} +if(fit$modeltype==MODEL_MIXBART){ + fmean.out=paste(0.0) +} + +#-------------------------------------------------- +#write out config file +fout=file(paste(fit$folder,"/config.pred",sep=""),"w") +writeLines(c(fit$modelname,fit$modeltype,fit$xiroot,xproot,fproot, + paste(fit$nd),paste(fit$m), + paste(fit$mh),paste(p),paste(k),paste(tc), + fmean.out), fout) +close(fout) + +#-------------------------------------------------- +#write out data subsets +#folder=paste(".",fit$modelname,"/",sep="") +xlist=split(as.data.frame(x.test),(seq(n)-1) %/% (n/nslv)) +for(i in 1:nslv) write(t(xlist[[i]]),file=paste(fit$folder,"/",xproot,i-1,sep="")) +for(i in 1:p) write(fit$xicuts[[i]],file=paste(fit$folder,"/",fit$xiroot,i,sep="")) + +if(fit$modeltype==MODEL_MIXBART){ + #for(i in 1:k) write(f.test[,i],file=paste(fit$folder,"/",fproot,i,sep="")) + flist=split(as.data.frame(f.test),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(flist[[i]]),file=paste(fit$folder,"/",fproot,i-1,sep="")) + + if(fit$nsprior){ + #fdmlist=split(as.data.frame(f.discrep.mean),(seq(n)-1) %/% (n/nslv)) + #for(i in 1:nslv) write(t(fdmlist[[i]]),file=paste(fit$folder,"/",fpdmroot,i-1,sep="")) + + #fdslist=split(as.data.frame(f.sd.train),(seq(n)-1) %/% (n/nslv)) + #for(i in 1:nslv) write(t(fdslist[[i]]),file=paste(fit$folder,"/",fpdsroot,i-1,sep="")) + } +} + +#-------------------------------------------------- +#run prediction program +cmdopt=100 #default to serial/OpenMP +runlocal=FALSE +cmd="openbtcli --conf" +if(Sys.which("openbtcli")[[1]]=="") # not installed in a global location, so assume current directory + runlocal=TRUE + +if(runlocal) cmd="./openbtcli --conf" + +cmdopt=system(cmd) + +if(cmdopt==101) # MPI +{ + cmd=paste("mpirun -np ",tc," openbtpred ",fit$folder,sep="") +} + +if(cmdopt==100) # serial/OpenMP +{ + if(runlocal) + cmd=paste("./openbtpred ",fit$folder,sep="") + else + cmd=paste("openbtpred ",fit$folder,sep="") +} + +#cmd=paste("mpirun -np ",tc," openbtpred",sep="") +#cat(cmd) +system(cmd) +system(paste("rm -f ",fit$folder,"/config.pred",sep="")) + + +#-------------------------------------------------- +#format and return +res=list() + +# Old, original code for reading in the posterior predictive draws. +# res$mdraws=read.table(paste(fit$folder,"/",fit$modelname,".mdraws",0,sep="")) +# res$sdraws=read.table(paste(fit$folder,"/",fit$modelname,".sdraws",0,sep="")) +# for(i in 2:nslv) +# { +# res$mdraws=cbind(res$mdraws,read.table(paste(fit$folder,"/",fit$modelname,".mdraws",i-1,sep=""))) +# res$sdraws=cbind(res$sdraws,read.table(paste(fit$folder,"/",fit$modelname,".sdraws",i-1,sep=""))) +# } +# res$mdraws=as.matrix(res$mdraws) +# res$sdraws=as.matrix(res$sdraws) + +# Faster using data.table's fread than the built-in read.table. +# However, it does strangely introduce some small rounding error on the order of 8.9e-16. +fnames=list.files(fit$folder,pattern=paste(fit$modelname,".mdraws*",sep=""),full.names=TRUE) +res$mdraws=as.matrix(do.call(cbind,sapply(fnames,data.table::fread))) +fnames=list.files(fit$folder,pattern=paste(fit$modelname,".sdraws*",sep=""),full.names=TRUE) +res$sdraws=as.matrix(do.call(cbind,sapply(fnames,data.table::fread))) + +system(paste("rm -f ",fit$folder,"/",fit$modelname,".mdraws*",sep="")) +system(paste("rm -f ",fit$folder,"/",fit$modelname,".sdraws*",sep="")) + +#Get Results for models that are not model mixing +if(fit$modeltype!=MODEL_MIXBART){ + res$mmean=apply(res$mdraws,2,mean) + res$smean=apply(res$sdraws,2,mean) + res$msd=apply(res$mdraws,2,sd) + res$ssd=apply(res$sdraws,2,sd) + res$m.5=apply(res$mdraws,2,quantile,0.5) + res$m.lower=apply(res$mdraws,2,quantile,q.lower) + res$m.upper=apply(res$mdraws,2,quantile,q.upper) + res$s.5=apply(res$sdraws,2,quantile,0.5) + res$s.lower=apply(res$sdraws,2,quantile,q.lower) + res$s.upper=apply(res$sdraws,2,quantile,q.upper) + res$pdraws=NULL + res$pmean=NULL + res$psd=NULL + res$p.5=NULL + res$p.lower=NULL + res$p.upper=NULL +} + +#Get probabilities for the probit models +if(fit$modeltype==MODEL_PROBIT || fit$modeltype==MODEL_MODIFIEDPROBIT) +{ + res$pdraws=read.table(paste(fit$folder,"/",fit$modelname,".pdraws",0,sep="")) + for(i in 2:nslv) + { + res$pdraws=cbind(res$pdraws,read.table(paste(fit$folder,"/",fit$modelname,".pdraws",i-1,sep=""))) + } + + system(paste("rm -f ",fit$folder,"/",fit$modelname,".pdraws*",sep="")) + + res$pdraws=as.matrix(res$pdraws) + res$pmean=apply(res$pdraws,2,mean) + res$psd=apply(res$pdraws,2,sd) + res$p.5=apply(res$pdraws,2,quantile,0.5) + res$p.lower=apply(res$pdraws,2,quantile,q.lower) + res$p.upper=apply(res$pdraws,2,quantile,q.upper) +} + +#Get model mixing results -- using only a constant variance -- change later to match with HBART +if(fit$modeltype==MODEL_MIXBART){ + res$mmean=apply(res$mdraws,2,mean) + res$msd=apply(res$mdraws,2,sd) + res$m.5=apply(res$mdraws,2,quantile,0.5) + res$m.lower=apply(res$mdraws,2,quantile,q.lower) + res$m.upper=apply(res$mdraws,2,quantile,q.upper) + res$smean=apply(res$sdraws,2,mean) + res$ssd=apply(res$sdraws,2,sd) + res$s.5=apply(res$sdraws,2,median) + res$s.lower=apply(res$sdraws,2,quantile,q.lower) + res$s.upper=apply(res$sdraws,2,quantile,q.lower) + res$pdraws=NULL + res$pmean=NULL + res$psd=NULL + res$p.5=NULL + res$p.lower=NULL + res$p.upper=NULL +} + +res$q.lower=q.lower +res$q.upper=q.upper +res$x.test=x.test +res$modeltype=fit$modeltype + +class(res)="OpenBT_predict" + +return(res) +} + + +#-------------------------------------------------- +#Get Variable activity +#-------------------------------------------------- + +vartivity.openbt = function( +fit=NULL, +q.lower=0.025, +q.upper=0.975 +) +{ + +#-------------------------------------------------- +# params +if(is.null(fit)) stop("No fitted model specified!\n") +p=length(fit$xicuts) +m=fit$m +mh=fit$mh +nd=fit$nd +modelname=fit$modelname + + +#-------------------------------------------------- +#write out config file +fout=file(paste(fit$folder,"/config.vartivity",sep=""),"w") +writeLines(c(fit$modelname, + paste(nd),paste(m), + paste(mh),paste(p)) ,fout) +close(fout) + + +#-------------------------------------------------- +#run vartivity program -- it's not actually parallel so no call to mpirun. +runlocal=FALSE +if(Sys.which("openbtcli")[[1]]=="") # not installed in a global locaiton, so assume current directory + runlocal=TRUE + +if(runlocal) + cmd=paste("./openbtvartivity ",fit$folder,sep="") +else + cmd=paste("openbtvartivity ",fit$folder,sep="") + +#cmd=paste("./openbtvartivity",sep="") +system(cmd) +system(paste("rm -f ",fit$folder,"/config.vartivity",sep="")) + + +#-------------------------------------------------- +#read in result +res=list() +res$vdraws=read.table(paste(fit$folder,"/",fit$modelname,".vdraws",sep="")) +res$vdrawsh=read.table(paste(fit$folder,"/",fit$modelname,".vdrawsh",sep="")) +res$vdraws=as.matrix(res$vdraws) +res$vdrawsh=as.matrix(res$vdrawsh) + +# normalize counts +colnorm=apply(res$vdraws,1,sum) +ix=which(colnorm>0) +res$vdraws[ix,]=res$vdraws[ix,]/colnorm[ix] +colnorm=apply(res$vdrawsh,1,sum) +ix=which(colnorm>0) +res$vdrawsh[ix,]=res$vdrawsh[ix,]/colnorm[ix] + +res$mvdraws=apply(res$vdraws,2,mean) +res$mvdrawsh=apply(res$vdrawsh,2,mean) +res$vdraws.sd=apply(res$vdraws,2,sd) +res$vdrawsh.sd=apply(res$vdrawsh,2,sd) +res$vdraws.5=apply(res$vdraws,2,quantile,0.5) +res$vdrawsh.5=apply(res$vdrawsh,2,quantile,0.5) +res$vdraws.lower=apply(res$vdraws,2,quantile,q.lower) +res$vdraws.upper=apply(res$vdraws,2,quantile,q.upper) +res$vdrawsh.lower=apply(res$vdrawsh,2,quantile,q.lower) +res$vdrawsh.upper=apply(res$vdrawsh,2,quantile,q.upper) +res$q.lower=q.lower +res$q.upper=q.upper +res$modeltype=fit$modeltype + +class(res)="OpenBT_vartivity" + +return(res) +} + +#-------------------------------------------------- +#Sobol function +#-------------------------------------------------- + +sobol.openbt = function( +fit=NULL, +q.lower=0.025, +q.upper=0.975, +tc=2 +) +{ + +#-------------------------------------------------- +# params +if(is.null(fit)) stop("No fitted model specified!\n") +p=length(fit$xicuts) +m=fit$m +mh=fit$mh +nd=fit$nd +modelname=fit$modelname + + +#-------------------------------------------------- +#write out config file +fout=file(paste(fit$folder,"/config.sobol",sep=""),"w") +writeLines(c(fit$modelname,fit$xiroot, + paste(nd),paste(m), + paste(mh),paste(p),paste(fit$minx), + paste(fit$maxx),paste(tc)) ,fout) +close(fout) + + +#-------------------------------------------------- +#run Sobol program +cmdopt=100 #default to serial/OpenMP +runlocal=FALSE +cmd="openbtcli --conf" +if(Sys.which("openbtcli")[[1]]=="") # not installed in a global locaiton, so assume current directory + runlocal=TRUE + +if(runlocal) cmd="./openbtcli --conf" + +cmdopt=system(cmd) + +if(cmdopt==101) # MPI +{ + cmd=paste("mpirun -np ",tc," openbtsobol ",fit$folder,sep="") +} + +if(cmdopt==100) # serial/OpenMP +{ + if(runlocal) + cmd=paste("./openbtsobol ",fit$folder,sep="") + else + cmd=paste("openbtsobol ",fit$folder,sep="") +} + +system(cmd) +system(paste("rm -f ",fit$folder,"/config.sobol",sep="")) + + +#-------------------------------------------------- +#read in result +res=list() +draws=read.table(paste(fit$folder,"/",fit$modelname,".sobol",0,sep="")) +for(i in 2:tc) + draws=rbind(draws,read.table(paste(fit$folder,"/",fit$modelname,".sobol",i-1,sep=""))) +draws=as.matrix(draws) + +drawsh=read.table(paste(fit$folder,"/",fit$modelname,".shapley",0,sep="")) +for(i in 2:tc) + drawsh=rbind(drawsh,read.table(paste(fit$folder,"/",fit$modelname,".shapley",i-1,sep=""))) +drawsh=as.matrix(drawsh) + +labs=gsub("\\s+",",",apply(combn(1:p,2),2,function(zz) Reduce(paste,zz))) +res$vidraws=draws[,1:p] +res$vijdraws=draws[,(p+1):(p+p*(p-1)/2)] +res$tvidraws=draws[,(ncol(draws)-p):(ncol(draws)-1)] +res$vdraws=draws[,ncol(draws)] +res$sidraws=res$vidraws/res$vdraws +res$sijdraws=res$vijdraws/res$vdraws +res$tsidraws=res$tvidraws/res$vdraws +res$vidraws=as.matrix(res$vidraws) +colnames(res$vidraws)=paste("V",1:p,sep="") +res$vijdraws=as.matrix(res$vijdraws) +colnames(res$vijdraws)=paste("V",labs,sep="") +res$tvidraws=as.matrix(res$tvidraws) +colnames(res$tvidraws)=paste("TV",1:p,sep="") +res$vdraws=as.matrix(res$vdraws) +colnames(res$vdraws)="V" +res$sidraws=as.matrix(res$sidraws) +colnames(res$sidraws)=paste("S",1:p,sep="") +res$sijdraws=as.matrix(res$sijdraws) +colnames(res$sijdraws)=paste("S",labs,sep="") +res$tsidraws=as.matrix(res$tsidraws) +colnames(res$tsidraws)=paste("TS",1:p,sep="") +rm(draws) +res$shdraws=as.matrix(drawsh) +res$shdraws=res$shdraws/as.vector(res$vdraws) +colnames(res$shdraws)=paste("Sh",1:p,sep="") +rm(drawsh) + +# summaries +res$msi=apply(res$sidraws,2,mean) +res$msi.sd=apply(res$sidraws,2,sd) +res$si.5=apply(res$sidraws,2,quantile,0.5) +res$si.lower=apply(res$sidraws,2,quantile,q.lower) +res$si.upper=apply(res$sidraws,2,quantile,q.upper) +names(res$msi)=paste("S",1:p,sep="") +names(res$msi.sd)=paste("S",1:p,sep="") +names(res$si.5)=paste("S",1:p,sep="") +names(res$si.lower)=paste("S",1:p,sep="") +names(res$si.upper)=paste("S",1:p,sep="") + +res$msij=apply(res$sijdraws,2,mean) +res$sij.sd=apply(res$sijdraws,2,sd) +res$sij.5=apply(res$sijdraws,2,quantile,0.5) +res$sij.lower=apply(res$sijdraws,2,quantile,q.lower) +res$sij.upper=apply(res$sijdraws,2,quantile,q.upper) +names(res$msij)=paste("S",labs,sep="") +names(res$sij.sd)=paste("S",labs,sep="") +names(res$sij.5)=paste("S",labs,sep="") +names(res$sij.lower)=paste("S",labs,sep="") +names(res$sij.upper)=paste("S",labs,sep="") + +res$mtsi=apply(res$tsidraws,2,mean) +res$tsi.sd=apply(res$tsidraws,2,sd) +res$tsi.5=apply(res$tsidraws,2,quantile,0.5) +res$tsi.lower=apply(res$tsidraws,2,quantile,q.lower) +res$tsi.upper=apply(res$tsidraws,2,quantile,q.upper) +names(res$mtsi)=paste("TS",1:p,sep="") +names(res$tsi.sd)=paste("TS",1:p,sep="") +names(res$tsi.5)=paste("TS",1:p,sep="") +names(res$tsi.lower)=paste("TS",1:p,sep="") +names(res$tsi.upper)=paste("TS",1:p,sep="") + +res$mshi=apply(res$shdraws,2,mean) +res$mshi.sd=apply(res$shdraws,2,sd) +# res$shi.5=apply(res$shdraws,2,quantile,0.5) +# res$shi.lower=apply(res$shdraws,2,quantile,q.lower) +# res$shi.upper=apply(res$shdraws,2,quantile,q.upper) +names(res$mshi)=paste("Sh",1:p,sep="") +names(res$mshi.sd)=paste("Sh",1:p,sep="") +# names(res$shi.5)=paste("Sh",1:p,sep="") +# names(res$shi.lower)=paste("Sh",1:p,sep="") +# names(res$shi.upper)=paste("Sh",1:p,sep="") + + +res$q.lower=q.lower +res$q.upper=q.upper +res$modeltype=fit$modeltype + +class(res)="OpenBT_sobol" + +return(res) +} + +#-------------------------------------------------- +#Pareto function +#-------------------------------------------------- +# Pareto Front Multiobjective Optimization using 2 fitted BART models +mopareto.openbt = function( +fit1=NULL, +fit2=NULL, +fit3=NULL, +q.lower=0.025, +q.upper=0.975, +tc=2 +) +{ + +#-------------------------------------------------- +# params +if(is.null(fit1) || is.null(fit2)) stop("No fitted models specified!\n") +#if(fit1$xicuts != fit2$xicuts) stop("Models not compatible\n") +if(fit1$nd != fit2$nd) stop("Models have different number of posterior samples\n") +p=length(fit1$xicuts) +d=2+(!is.null(fit3)) +print(paste0("d=",d)) +m1=fit1$m +m2=fit2$m +mh1=fit1$mh +mh2=fit2$mh +m3=0 +mh3=0 +if(!is.null(fit3)) { + if(fit3$nd != fit2$nd) stop("Models have different number of posterior samples\n") + m3=fit3$m + mh3=fit3$mh +} +# if(is.null(fit3)) { +# fit3=list() +# fit3$modelname="null" +# fit3$folder="null" +# fit3$fmean=0.0 +# } + +nd=fit1$nd +modelname=fit1$modelname + + +#-------------------------------------------------- +#write out config file +fout=file(paste(fit1$folder,"/config.mopareto",sep=""),"w") +if(!is.null(fit3)) { +writeLines(c(fit1$modelname,fit2$modelname,fit3$modelname,fit1$xiroot, + fit2$folder,fit3$folder, + paste(nd),paste(m1), + paste(mh1),paste(m2),paste(mh2), + paste(m3),paste(mh3), + paste(p),paste(fit1$minx), + paste(fit1$maxx),paste(fit1$fmean),paste(fit2$fmean), + paste(fit3$fmean),paste(tc)) ,fout) +} +if(is.null(fit3)) { +writeLines(c(fit1$modelname,fit2$modelname,paste("null"),fit1$xiroot, + fit2$folder,paste("null"), + paste(nd),paste(m1), + paste(mh1),paste(m2),paste(mh2), + paste(m3),paste(mh3), + paste(p),paste(fit1$minx), + paste(fit1$maxx),paste(fit1$fmean),paste(fit2$fmean), + paste(0.0),paste(tc)) ,fout) +} +close(fout) + + +#-------------------------------------------------- +#run Pareto Front program +cmdopt=100 #default to serial/OpenMP +runlocal=FALSE +cmd="openbtcli --conf" +if(Sys.which("openbtcli")[[1]]=="") # not installed in a global locaiton, so assume current directory + runlocal=TRUE + +if(runlocal) cmd="./openbtcli --conf" + +cmdopt=system(cmd) + +if(cmdopt==101) # MPI +{ + cmd=paste("mpirun -np ",tc," openbtmopareto ",fit1$folder,sep="") +} + +if(cmdopt==100) # serial/OpenMP +{ + if(runlocal) + cmd=paste("./openbtmopareto ",fit1$folder,sep="") + else + cmd=paste("openbtmopareto ",fit1$folder,sep="") +} + +system(cmd) +system(paste("rm -f ",fit1$folder,"/config.mopareto",sep="")) + + +#-------------------------------------------------- +#read in result +res=list() +ii=1 +u=0 # to modify temp indexing if fit3 exists +for(i in 1:tc) +{ + con=file(paste(fit1$folder,"/",fit1$modelname,".mopareto",i-1,sep="")) + open(con) + s=readLines(con,1) + while(length(s)>0) { + temp=as.numeric(unlist(strsplit(s," "))) + k=as.integer(temp[1]) + theta=matrix(0,ncol=k,nrow=d) + theta[1,]=temp[2:(2+k-1)] + theta[2,]=temp[(2+k):(2+2*k-1)] + if(!is.null(fit3)) { + theta[3,]=temp[(2+2*k):(2+3*k-1)] + u=k + } + a=matrix(0,nrow=p,ncol=k) + for(i in 1:p) a[i,]=temp[(2+(2+i-1)*k):(2+(2+i)*k-1)+u] + b=matrix(0,nrow=p,ncol=k) + for(i in 1:p) b[i,]=temp[(2+(2+p+i-1)*k):(2+(2+p+i)*k-1)+u] + entry=list() + entry[["theta"]]=theta + entry[["a"]]=a + entry[["b"]]=b + res[[ii]]=entry + ii=ii+1 + s=readLines(con,1) + } + close(con) +} + + +class(res)="OpenBT_mopareto" + +return(res) +} + + + +# Reweight predictions using output of influence.openbt(). +# For method IS_GLOBAL: infl (influence obect) and pred (prediction object to be reweighted) +# Also, idx may be specified to apply weights only to a subset of observations. +# For method IS_LOCAL_*: pass infl (influence object), xpred (training x's from fit) and fit (fitted OpenBT object) +repredict.openbt<-function(pred=NULL,infl=NULL,fit=NULL,tc=2) +{ + if(is.null(pred)) stop("Model prediction object required.\n") + if(is.null(infl)) stop("Model influence object required.\n") + if(infl$method=="is_global" && class(pred)!="OpenBT_predict") stop("Model prediction object not recognized.\n") + if(class(infl)!="OpenBT_influence") stop("Model influence object not recognized.\n") + if(class(pred)!="OpenBT_predict") stop("Model pred object not recognized.\n") + if(infl$method!="is_global" && infl$method!="is_local_int" && infl$method!="is_local_uint" && infl$method!="is_local_union" && infl$method!="is_local_l1" && infl$method!="is_local_l2") stop("Cannot reweight predictions with influence method ",infl$method,".\n") + + dropid=infl$infl.obs + ninfl=length(dropid) + res=list() + cat("Applying weights\n") + + if(infl$method=="is_global") + { + res$mdraws=pred$mdraws + res$wmat=apply(infl$w,1,prod) + #re-weight and return the predictions + # for(k in 1:ninfl) + # for(i in 1:fit$nd) { + # # res$mdraws[i,]=res$mdraws[i,]*infl$w[i,k] + # res$mdraws[i,]=res$mdraws[i,]*res$wmat[i] + # } + + # wbar=apply(infl$w,2,mean)#mean(infl$w) + wbar=mean(res$wmat)#mean(infl$w) + # res$mdraws=res$mdraws/wbar + # res$wmat=infl$w + res$wbar=wbar + res$mmean=rep(0,nrow(pred$x.test)) + res$msd=rep(0,nrow(pred$x.test)) + res$m.5=rep(0,nrow(pred$x.test)) + res$m.lower=rep(0,nrow(pred$x.test)) + res$m.upper=rep(0,nrow(pred$x.test)) + for(i in 1:nrow(pred$x.test)) { + res$mmean[i]=wtd.mean(res$mdraws[,i],res$wmat,normwt=TRUE) + res$msd[i]=sqrt(wtd.var(res$mdraws[,i],res$wmat,normwt=TRUE)) + res$m.5[i]=wtd.quantile(res$mdraws[,i],res$wmat,probs=c(0.5),normwt=TRUE) + res$m.lower[i]=wtd.quantile(res$mdraws[,i],res$wmat,probs=c(pred$q.lower),normwt=TRUE) + res$m.upper[i]=wtd.quantile(res$mdraws[,i],res$wmat,probs=c(pred$q.upper),normwt=TRUE) + + } + } + + if(infl$method=="is_local_int" || infl$method=="is_local_uint" || infl$method=="is_local_union" || infl$method=="is_local_l1") + { + + if(is.null(fit)) stop("Method is_local_* requires fitted OpenBT_posterior in object fit\n") + if(length(fit$xicuts)!=ncol(pred$x.test)) stop("xpred's are not same dimension as trained model\n") + + in_rect<-function(xin,a,b,p) + { + for(i in 1:length(xin)) { + if(xin[i]b[i]) { + return(FALSE) + } + } + return(TRUE) + } + + # get predictions + p=length(fit$xicuts) + res$mdraws=pred$mdraws + + # now for each posterior draw, identify all predictions in the hyperrectangle of influence + # parallel::mcaffinity(1:tc) + # cl=makeCluster(tc) + # registerDoParallel(cl) + # mcoptions = list(preschedule=TRUE,cores=tc) + # wmat=foreach(j=1:nrow(pred$x.test),.combine=cbind,.options.multicore=mcoptions) %dopar% { + # tempwvec=matrix(1,nrow=fit$nd,ncol=1) + # for(i in 1:fit$nd) { + # for(k in 1:ninfl) { + # if(in_rect(pred$x.test[j,],infl$hyperrects[[k]]$a[i,],infl$hyperrects[[k]]$b[i,],p)) { + # tempwvec[i,1]=tempwvec[i,1]*infl$w[i,k,drop=FALSE] + # } + # } + # } + # tempwvec + # } + # wmat=as.matrix(wmat) # our final weight matrix, should be fit$nd rows x nrow(pred$x.test) columns + # stopCluster(cl) + + # Original serial version of the above parallel R code: + wmat=matrix(1,nrow=fit$nd,ncol=nrow(pred$x.test)) # our final weight matrix + + for(i in 1:fit$nd) { + for(j in 1:nrow(pred$x.test)) { + for(k in 1:ninfl) { + if(in_rect(pred$x.test[j,],infl$hyperrects[[k]]$a[i,],infl$hyperrects[[k]]$b[i,],p)) { + # if(wmat[i,j]==1) + # wmat[i,j]=infl$w[i,k,drop=FALSE] + # else + wmat[i,j]=wmat[i,j]*infl$w[i,k,drop=FALSE] + } + } + } + } + + + wbar=apply(wmat,2,mean) + # res$mdraws=res$mdraws*wmat + # res$mdraws=t(t(res$mdraws)/wbar) + res$wmat=wmat + res$wbar=wbar + res$mmean=rep(0,nrow(pred$x.test)) + res$msd=rep(0,nrow(pred$x.test)) + res$m.5=rep(0,nrow(pred$x.test)) + res$m.lower=rep(0,nrow(pred$x.test)) + res$m.upper=rep(0,nrow(pred$x.test)) + for(i in 1:nrow(pred$x.test)) { + res$mmean[i]=wtd.mean(res$mdraws[,i],res$wmat[,i],normwt=TRUE) + res$msd[i]=sqrt(wtd.var(res$mdraws[,i],res$wmat[,i],normwt=TRUE)) + res$m.5[i]=wtd.quantile(res$mdraws[,i],res$wmat[,i],probs=c(0.5),normwt=TRUE) + res$m.lower[i]=wtd.quantile(res$mdraws[,i],res$wmat[,i],probs=c(pred$q.lower),normwt=TRUE) + res$m.upper[i]=wtd.quantile(res$mdraws[,i],res$wmat[,i],probs=c(pred$q.upper),normwt=TRUE) + + } + + } + + if(infl$method=="is_local_l2") + { + if(is.null(fit)) stop("Method is_local_* requires fitted OpenBT_posterior in object fit\n") + + res$mdraws=pred$mdraws + wmat=matrix(1/fit$nd,nrow=fit$nd,ncol=nrow(pred$x.test)) + + for(j in 1:ninfl) { + dvec=rep(0,nrow(pred$x.test)) + for(k in 1:nrow(pred$x.test)) + dvec[k]=dist(rbind(pred$x.test[k,],infl$x.infl[dropid[[j]],])) + idx=which(dvec<0.2) + for(i in 1:fit$nd) + for(k in idx) + if(wmat[i,k]==1/fit$nd) wmat[i,k]=infl$w[i,j,drop=FALSE] + else wmat[i,k]=wmat[i,k]*infl$w[i,j,drop=FALSE] + } + + wbar=apply(wmat,2,mean) + # res$mdraws=res$mdraws*wmat + # res$mdraws=t(t(res$mdraws)/wbar) + res$wmat=wmat + res$wbar=wbar + res$mmean=rep(0,nrow(pred$x.test)) + res$msd=rep(0,nrow(pred$x.test)) + res$m.5=rep(0,nrow(pred$x.test)) + res$m.lower=rep(0,nrow(pred$x.test)) + res$m.upper=rep(0,nrow(pred$x.test)) + for(i in 1:nrow(pred$x.test)) { + res$mmean[i]=wtd.mean(res$mdraws[,i],res$wmat[,i],normwt=TRUE) + res$msd[i]=sqrt(wtd.var(res$mdraws[,i],res$wmat[,i],normwt=TRUE)) + res$m.5[i]=wtd.quantile(res$mdraws[,i],res$wmat[,i],probs=c(0.5),normwt=TRUE) + res$m.lower[i]=wtd.quantile(res$mdraws[,i],res$wmat[,i],probs=c(pred$q.lower),normwt=TRUE) + res$m.upper[i]=wtd.quantile(res$mdraws[,i],res$wmat[,i],probs=c(pred$q.upper),normwt=TRUE) + + } + } + + + + res$sdraws=pred$sdraws + # res$mmean=apply(res$mdraws,2,mean) + res$smean=pred$smean + # res$msd=apply(res$mdraws,2,sd) + res$ssd=pred$ssd + # res$m.5=apply(res$mdraws,2,quantile,0.5) + # res$m.lower=apply(res$mdraws,2,quantile,pred$q.lower) + # res$m.upper=apply(res$mdraws,2,quantile,pred$q.upper) + res$s.5=pred$s.5 + res$s.lower=pred$s.lower + res$s.upper=pred$s.upper + res$pdraws=pred$pdraws + res$pmean=pred$pmean + res$psd=pred$psd + res$p.5=pred$p.5 + res$p.lower=pred$p.lower + res$p.upper=pred$p.upper + res$q.lower=pred$q.lower + res$q.upper=pred$q.upper + res$modeltype=pred$modeltype + + class(res)="OpenBT_predict" + return(res) +} + + +# Calculate various metrics of influence for a regression tree model. +influence.openbt<-function(x.infl,y.infl,fit=NULL,infl.obs=NULL,dvec=NULL,tc=2,method="is_local_int") +{ + if(is.null(fit)) stop("Model fit object required.\n") + if(class(fit)!="OpenBT_posterior") stop("Model fit object not recognized.\n") + if(is.null(infl.obs)) stop("No hold-out drop id specified for reweighting (infl.obs=NULL).\n") + if(min(infl.obs)<1) stop("Invalid drop id infl.obs=",infl.obs,".\n") + if(max(infl.obs)>nrow(x.infl)) stop("Invalid drop id infl.obs=",infl.obs,".\n") + if(method=="is_local_l1" && is.null(dvec)) stop("Invalid dvec for method is_local_l1.\n") + + nd=fit$nd + # np=nrow(x.infl) + ninfl=length(infl.obs) + w=matrix(0,nrow=nd,ncol=ninfl) + wsum=rep(1,ninfl) + + if(ninfl==0){ + stop("Argument infl.obs has no influential observations\n") + } + + + pp=predict.openbt(object=fit,x.test=x.infl,tc=tc) + + if(method=="is_global") { + # w is an nd by np matrix where the j_th column contains the weights for each + # nd posterior samples if (x_j,y_j) were removed from the dataset. + for(kk in 1:ninfl) + for(i in 1:nd) w[i,kk]=(2*pi)^(1/2)*pp$sdraws[i,infl.obs[kk]]*exp(1/(2*pp$sdraws[i,infl.obs[kk]]^2)*(y.infl[infl.obs[kk]]-pp$mdraws[i,infl.obs[kk]])^2) +# w=apply(w,1,prod) # iid BART likelihood => we just combine the different terms productwise. +# wsum=sum(w) + } + # end IS_GLOBAL + + if(method=="is_local_int" || method=="is_local_union" || method=="is_local_uint" || method=="is_local_l1") + { + p=length(fit$xicuts) + m=fit$m + mh=fit$mh + nd=fit$nd + modelname=fit$modelname + hrects.int=list() + hrects.uint=list() + hrects.union=list() + hrects.l1=list() +# w=matrix(0,nrow=nd,ncol=ninfl) +# wsum=rep(1,ninfl) + + for(kk in 1:ninfl) { + xd=x.infl[infl.obs[kk],] + if(!is.vector(xd)) stop("xd is not a px1 vector in x.infl.\n") + + #-------------------------------------------------- + #write out config file + fout=file(paste(fit$folder,"/config.influence",sep=""),"w") + writeLines(c(fit$modelname,fit$xiroot, + paste(nd),paste(m), + paste(mh),paste(p),paste(xd),paste(fit$minx), + paste(fit$maxx),paste(tc)) ,fout) + close(fout) + + + #-------------------------------------------------- + #run influential hyperrectangle program + cmdopt=100 #default to serial/OpenMP + runlocal=FALSE + cmd="openbtcli --conf" + if(Sys.which("openbtcli")[[1]]=="") # not installed in a global locaiton, so assume current directory + runlocal=TRUE + + if(runlocal) cmd="./openbtcli --conf" + + cmdopt=system(cmd) + + if(cmdopt==101) # MPI + { + cmd=paste("mpirun -np ",tc," openbtinfl ",fit$folder,sep="") + } + + if(cmdopt==100) # serial/OpenMP + { + if(runlocal) + cmd=paste("./openbtinfl ",fit$folder,sep="") + else + cmd=paste("openbtinfl ",fit$folder,sep="") + } + + system(cmd) + system(paste("rm -f ",fit$folder,"/config.influence",sep="")) + + #-------------------------------------------------- + #read in result + ii=1 + a=matrix(0,nrow=nd,ncol=p) + b=matrix(0,nrow=nd,ncol=p) + au=matrix(0,nrow=nd,ncol=p) + bu=matrix(0,nrow=nd,ncol=p) + for(i in 1:tc) + { + con=file(paste(fit$folder,"/",fit$modelname,".influence",i-1,sep="")) + open(con) + s=readLines(con,1) + while(length(s)>0) { + temp=as.numeric(unlist(strsplit(s," "))) + a[ii,]=temp[1:p] + b[ii,]=temp[(p+1):(2*p)] + au[ii,]=temp[(2*p+1):(3*p)] + bu[ii,]=temp[(3*p+1):(4*p)] + ii=ii+1 + s=readLines(con,1) + } + close(con) + } + + system(paste("rm -f ",fit$folder,"/",fit$modelname,".influence*",sep="")) + + hrects.int[[kk]]=list() + hrects.uint[[kk]]=list() + hrects.union[[kk]]=list() + hrects.l1[[kk]]=list() + hrects.int[[kk]][["a"]]=a + hrects.int[[kk]][["b"]]=b + hrects.uint[[kk]][["a"]]=matrix(apply(a,2,min),nrow=nd,ncol=p,byrow=TRUE) + hrects.uint[[kk]][["b"]]=matrix(apply(b,2,max),nrow=nd,ncol=p,byrow=TRUE) + hrects.union[[kk]][["a"]]=au + hrects.union[[kk]][["b"]]=bu + hrects.l1[[kk]][["a"]]=matrix(xd-dvec,nrow=nd,ncol=p,byrow=TRUE) + hrects.l1[[kk]][["b"]]=matrix(xd+dvec,nrow=nd,ncol=p,byrow=TRUE) + + # w is an nd by 1 matrix where containing the weights for each + # nd posterior samples if (x[infl.obs,],y[infl.obs]) were removed from the dataset, + # where the weight would be applied to any predictions in the corresponding hyperrect + # for this draw. Otherwise, the weight is just 1, giving usual posterior average -- this + # is handled in the reweight() function. + for(i in 1:nd) w[i,kk]=(2*pi)^(1/2)*pp$sdraws[i,infl.obs[kk]]*exp(1/(2*pp$sdraws[i,infl.obs[kk]]^2)*(y.infl[infl.obs[kk]]-pp$mdraws[i,infl.obs[kk]])^2) +# wsum[kk]=sum(w[,kk]) +# w[,kk]=w[,kk]/sum(w[,kk]) + + # Perhaps a simple indicator of which observation has a lot of influence: + # infl.max=apply(w,2,max) + # infl.mean=apply(w,2,mean) + # infl.ids=infl.obs + } + # end for kk in 1:length(infl.obs) loop + } + #end is_local_* + for(i in 1:ninfl) wsum[i]=sum(w[,i]) + + res=list() + res$nd=nd + # res$np=np + res$x.infl=x.infl + res$y.infl=y.infl + res$w=w + res$wsum=wsum + # res$infl.max=infl.max + # res$infl.mean=infl.mean + res$infl.obs=infl.obs + res$hyperrects=NULL + if(method=="is_local_int") { + res$hyperrects=hrects.int + } + if(method=="is_local_uint") { + res$hyperrects=hrects.uint + } + if(method=="is_local_union") { + res$hyperrects=hrects.union + } + if(method=="is_local_l1") { + res$hyperrects=hrects.l1 + } + res$method=method + + class(res)="OpenBT_influence" + + return(res) +} + + +summary.OpenBT_influence<-function(infl) +{ + cat("No summary method for object.\n") +} + +print.OpenBT_influence<-function(infl) +{ + cat("OpenBT Influence\n") + cat("metric: ",infl$method,"\n") + cat(infl$nd, " posterior samples.\n") + cat(infl$np, " observations.\n") + + if(infl$method=="is_global") { + idx=sort(infl$infl,decreasing=TRUE,index.return=TRUE)$ix + cat("\nTop 5 maximum weights: ", infl$infl[idx][1:5],"\n") + cat("\nTop 5 influential observations by weights: \n") + for(i in 1:5) { + cat("Input ",idx[i]," max.influence=",infl$infl[idx[i]], + " y=",infl$y.infl[idx[i]], " x=",infl$x.infl[idx[i],][1],"\n") + } + + tab=sort(table(infl$infl.ids),decreasing=TRUE) + cat("\nTop 5 influential observations by frequency: \n") + print(tab[1:5]) + } +} + +plot.OpenBT_influence<-function(infl) +{ + tab.ids=table(infl$infl.ids) + + if(infl$method=="is_global") { + par(mfrow=c(1,2)) + plot(1:infl$np,infl$infl,xlab="Observation Index",ylab="max.influence",type="h",lwd=2) + title(paste("metric: ",infl$method,sep="")) + plot(tab.ids,xlab="Observation Index",ylab="Frequency") + title(paste("metric: ",infl$method,sep="")) + } + + if(infl$method=="is_local_int" || infl$method=="is_local_union") + { + # plot hyperrectangles + plot(0,0,xlim=c(0,1),ylim=c(0,1),type="n",xlab="x1",ylab="x2") + for(i in 1:nrow(infl$hyperrects$a)) rect(infl$hyperrects$a[i,1],infl$hyperrects$a[i,2],infl$hyperrects$b[i,1],infl$hyperrects$b[i,2],col=gray(0.5,alpha=0.1),border=NA) + } +} + + + +# Computer Model Calibration a BART emulator and BART discrepancy. +calibrate.openbt<-function(xc,yc,xf,yf,xdims=NULL,caldims=NULL, + calprior="normal",calpriora=NULL,calpriorb=NULL, + ntreec=NULL,ntreef=NULL,ndpost=1000, + nskip=100,kc=NULL,kf=NULL,power=2,base=0.95,tc=2, + sigmac=rep(1,length(yc)),sigmaf=rep(1,length(yf)), + fmeanc=0.0,fmeanf=0.0,overallsdc=NULL, + overallnuc=NULL,overallsdf=NULL,overallnuf=NULL, + chv=cor(xc,method="spearman"),pbd=0.7,pb=0.5, + stepwpert=0.1,probchv=0.1,minnumbot=5,printevery=100, + numcut=100,xicuts=NULL,nadapt=1000,adaptevery=100, + model="bart",modelname="model",summarystats=FALSE) +{ + if(is.null(xdims)) stop("xdims must be specified.\n") + if(is.null(caldims)) stop("caldims must be specified.\n") + numx=length(xdims) + numcal=length(caldims) + if(ncol(xc)!=numx+numcal) stop("xc not compatible with xdims+caldims.\n") + if(ncol(xf)!=numx) stop("xf not compatible with xdims.\n") + if(is.null(calprior)) stop("Invalid calibration parameter prior type specified.\n") + if(is.null(calpriora)) stop("Invalid calibration parameter prior: calpriora.\n") + if(is.null(calpriorb)) stop("Invalid calibration parameter prior: calpriorb.\n") + + if(length(calprior)==1) calprior=rep(calprior,numcal) + if(length(calpriora)==1) calpriora=rep(calpriora,numcal) + if(length(calpriorb)==1) calpriorb=rep(calpriorb,numcal) + + if(length(calprior)!=numcal || length(calpriora)!=numcal || length(calpriorb)!=numcal) + stop("Invalid calibration parameter prior: does not match number of calibration parameters.\n") + + calpriortype=rep(NULL,numcal) + for(i in 1:numcal) { + if(calprior[i]=="normal") calpriortype[i]=1 + if(calprior[i]=="uniform") calpriortype[i]=0 + if(is.null(calpriortype[i])) stop("Invalid calibration parameter prior type specified.\n") + if(calpriortype[i]==1 && calpriorb[i]<=0) stop("Invalid variance specified in calibration parameter prior.\n") + } + + ntreehc=NULL + ntreehf=NULL + #-------------------------------------------------- + # model type definitions + modeltype=rep(0,2) # undefined + MODEL_BT=1 + MODEL_BINOMIAL=2 + MODEL_POISSON=3 + MODEL_BART=4 + MODEL_HBART=5 + MODEL_PROBIT=6 + MODEL_MODIFIEDPROBIT=7 + MODEL_MERCK_TRUNCATED=8 + + + if(length(model)==1) model=rep(model,2) + if( (model[1]!="bart" && model[1]!="hbart") || (model[2]!="bart" && model[2]!="hbart") ) + { + cat("Model type not specified.\n") + cat("Available options are:\n") + cat("model='bart'\n") + cat("model='hbart'\n") + + stop("missing model type.\n") + } + if(model[1]=="bart") + { + modeltype[1]=MODEL_BART + if(is.null(ntreec)) ntreec=200 + if(is.null(ntreehc)) ntreehc=1 + if(is.null(kc)) kc=2 + if(is.null(overallsdc)) overallsdc=sd(yc) + if(is.null(overallnuc)) overallnuc=10 + pbdc=c(pbd[1],0.0) + } + if(model[1]=="hbart") + { + modeltype=MODEL_HBART + if(is.null(ntreec)) ntreec=200 + if(is.null(ntreehc)) ntreehc=40 + if(is.null(kc)) kc=5 + if(is.null(overallsdc)) overallsdc=sd(yc) + if(is.null(overallnuc)) overallnuc=10 + pbdc=pbd + } + if(model[2]=="bart") + { + modeltype[2]=MODEL_BART + if(is.null(ntreef)) ntreef=200 + if(is.null(ntreehf)) ntreehf=1 + if(is.null(kf)) kf=2 + if(is.null(overallsdf)) overallsdf=sd(yf) + if(is.null(overallnuf)) overallnuf=10 + pbdf=c(pbd[1],0.0) + } + if(model[2]=="hbart") + { + modeltype[2]=MODEL_HBART + if(is.null(ntreef)) ntreef=200 + if(is.null(ntreehf)) ntreehf=40 + if(is.null(kf)) kf=5 + if(is.null(overallsdf)) overallsdf=sd(yf) + if(is.null(overallnuf)) overallnuf=10 + pbdf=pbd + } + + + #-------------------------------------------------- + nd = ndpost + burn = nskip + mc = ntreec + mhc = ntreehc + mf = ntreef + mhf = ntreehf + #-------------------------------------------------- + # simulator data + nc = length(yc) + p = ncol(xc) + pdelta = ncol(xf) + xpred = matrix(NA,nrow=nf,ncol=p) + for(i in 1:numcal) { + if(calpriortype[i]) + xpred[,caldims[i]]=calpriora[i] + else + xpred[,caldims[i]]=calpriora[i]+(calpriorb[i]-calpriora[i])/2.0 + } + for(i in 1:pdelta) { + xpred[,xdims[i]]=xf[,i] + } +# xpred = t(xpred) +# xc = t(xc) + nf = length(yf) +# xf = t(xf) + # if(modeltype==MODEL_BART || modeltype==MODEL_HBART) + # { + yc=yc-fmeanc + yf=yf-fmeanf + fmeanc.out=paste(fmeanc) + fmeanf.out=paste(fmeanf) + # } + # if(modeltype==MODEL_PROBIT || modeltype==MODEL_MODIFIEDPROBIT) + # { + # fmean.out=paste(qnorm(fmean)) + # uniqy=sort(unique(y.train)) + # if(length(uniqy)>2) stop("Invalid y.train: Probit requires dichotomous response coded 0/1") + # if(uniqy[1]!=0 || uniqy[2]!=1) stop("Invalid y.train: Probit requires dichotomous response coded 0/1") + # } + #-------------------------------------------------- + #cutpoints + if(!is.null(xicuts)) # use xicuts + { + xic=xicuts + } + else # default to equal numcut per dimension + { + xic=vector("list",p) + minxc=floor(apply(xc,2,min)) + maxxc=ceiling(apply(xc,2,max)) + for(i in 1:p) + { + xinc=(maxxc[i]-minxc[i])/(numcut+1) + xic[[i]]=(1:numcut)*xinc+minxc[i] + } + } + xif=vector("list",numx) + for(i in 1:numx) + xif[[i]]=xic[[xdims[i]]] + + #-------------------------------------------------- + if(modeltype[1]==MODEL_BART || modeltype[1]==MODEL_HBART) + { + rgc = range(yc) + } + if(modeltype[2]==MODEL_BART || modeltype[2]==MODEL_HBART) + { + rgf = range(yf) + } + # if(modeltype==MODEL_PROBIT || modeltype==MODEL_MODIFIEDPROBIT) + # { + # rgy = c(-2,2) + # } + tauc = (rgc[2]-rgc[1])/(2*sqrt(mc)*kc) + tauf = (rgf[2]-rgf[1])/(2*sqrt(mf)*kf) + + #-------------------------------------------------- + overalllambdac = overallsdc^2 + overalllambdaf = overallsdf^2 + #-------------------------------------------------- + powerh=power + baseh=base + if(length(power)>1) { + powerh=power[2] + power=power[1] + } + if(length(base)>1) { + baseh=base[2] + base=base[1] + } + #-------------------------------------------------- + pbdch=pbdc + pbdfh=pbdf + pbh=pb + if(length(pbdc)>1) { + pbdch=pbdc[2] + pbdc=pbdc[1] + } + if(length(pbdf)>1) { + pbdfh=pbdf[2] + pbdf=pbdf[1] + } + if(length(pb)>1) { + pbh=pb[2] + pb=pb[1] + } + cat("Model: Bayesian Regression Tree based Computer Model Calibration\n") + #-------------------------------------------------- + if(modeltype[1]==MODEL_BART) + { + cat("\tEmulator: BART\n") + } + #-------------------------------------------------- + if(modeltype[1]==MODEL_HBART) + { + cat("\tEmulator: Heteroscedastic BART\n") + } + #-------------------------------------------------- + if(modeltype[2]==MODEL_BART) + { + cat("\tDiscrepancy: BART\n") + } + #-------------------------------------------------- + if(modeltype[2]==MODEL_HBART) + { + cat("\tDiscrepancy: Heteroscedastic BART\n") + } + + #-------------------------------------------------- + stepwperth=stepwpert + if(length(stepwpert)>1) { + stepwperth=stepwpert[2] + stepwpert=stepwpert[1] + } + #-------------------------------------------------- + probchvh=probchv + if(length(probchv)>1) { + probchvh=probchv[2] + probchv=probchv[1] + } + #-------------------------------------------------- + minnumboth=minnumbot + if(length(minnumbot)>1) { + minnumboth=minnumbot[2] + minnumbot=minnumbot[1] + } + + #-------------------------------------------------- + #write out config file + xcroot="xc" + ycroot="yc" + scroot="sc" + chgvcroot="chgvc" + xicroot="xic" + xfroot="xf" + yfroot="yf" + sfroot="sf" + xproot="xpred" + chgvfroot="chgvf" + xifroot="xif" + folder=tempdir(check=TRUE) + if(!dir.exists(folder)) dir.create(folder) + tmpsubfolder=tempfile(tmpdir="") + tmpsubfolder=substr(tmpsubfolder,5,nchar(tmpsubfolder)) + tmpsubfolder=paste("openbt",tmpsubfolder,sep="") + folder=paste(folder,"/",tmpsubfolder,sep="") + if(!dir.exists(folder)) dir.create(folder) + fout=file(paste(folder,"/config.calibrate",sep=""),"w") + writeLines(c(paste(modeltype),xcroot,ycroot,xfroot,yfroot,xproot,fmeanc.out,fmeanf.out, + paste(mc),paste(mhc),paste(mf),paste(mhf),paste(nd),paste(burn), + paste(nadapt),paste(adaptevery),paste(tauc),paste(overalllambdac), + paste(overallnuc),paste(tauf),paste(overalllambdaf),paste(overallnuf), + paste(base),paste(power),paste(baseh),paste(powerh), + paste(tc),paste(scroot),paste(sfroot),paste(chgvcroot),paste(chgvfroot), + paste(pb),paste(pbdc),paste(pbdch), + paste(pbdf),paste(pbdfh),paste(stepwpert),paste(stepwperth), + paste(probchv),paste(probchvh),paste(minnumbot),paste(minnumboth), + paste(printevery),paste(xicroot),paste(xifroot),paste(modelname), + paste(as.integer(summarystats)),paste(numcal),paste(caldims-1),paste(calpriortype), + paste(calpriora),paste(calpriorb)),fout) + close(fout) + + + #-------------------------------------------------- + #write out data subsets + nslv=tc-1 + # yc data + yclist=split(yc,(seq(nc)-1) %/% (nc/nslv)) + for(i in 1:nslv) write(yclist[[i]],file=paste(folder,"/",ycroot,i,sep="")) + xclist=split(as.data.frame(xc),(seq(nc)-1) %/% (nc/nslv)) + for(i in 1:nslv) write(t(xclist[[i]]),file=paste(folder,"/",xcroot,i,sep="")) +#stop("done") + + sclist=split(sigmac,(seq(nc)-1) %/% (nc/nslv)) + for(i in 1:nslv) write(sclist[[i]],file=paste(folder,"/",scroot,i,sep="")) + chv[is.na(chv)]=0 # if a var has 0 levels it will have a cor of NA so we'll just set those to 0. + write(chv,file=paste(folder,"/",chgvcroot,sep="")) + for(i in 1:p) write(xic[[i]],file=paste(folder,"/",xicroot,i,sep="")) + # yf data + yflist=split(yf,(seq(nf)-1) %/% (nf/nslv)) + for(i in 1:nslv) write(yflist[[i]],file=paste(folder,"/",yfroot,i,sep="")) + xflist=split(as.data.frame(xf),(seq(nf)-1) %/% (nf/nslv)) + for(i in 1:nslv) write(t(xflist[[i]]),file=paste(folder,"/",xfroot,i,sep="")) + sflist=split(sigmaf,(seq(nf)-1) %/% (nf/nslv)) + for(i in 1:nslv) write(sflist[[i]],file=paste(folder,"/",sfroot,i,sep="")) + write(chv[xdims,xdims],file=paste(folder,"/",chgvfroot,sep="")) + for(i in 1:pdelta) write(xif[[i]],file=paste(folder,"/",xifroot,i,sep="")) + rm(chv) + # xpred + xplist=split(as.data.frame(xpred),(seq(nf)-1) %/% (nf/nslv)) + for(i in 1:nslv) write(t(xplist[[i]]),file=paste(folder,"/",xproot,i,sep="")) + + #-------------------------------------------------- + #run program + cmdopt=100 #default to serial/OpenMP + runlocal=FALSE + cmd="openbtcalibrate --conf" + if(Sys.which("openbtcalibrate")[[1]]=="") # not installed in a global locaiton, so assume current directory + runlocal=TRUE + + if(runlocal) cmd="./openbtcalibrate --conf" + + cmdopt=system(cmd) + + if(cmdopt==101) # MPI + { + cmd=paste("mpirun -np ",tc," openbtcalibrate ",folder,sep="") + } + + if(cmdopt==100) # serial/OpenMP + { + if(runlocal) + cmd=paste("./openbtcalibrate ",folder,sep="") + else + cmd=paste("openbtcalibrate ",folder,sep="") + } + # + system(cmd) + system(paste("rm -f ",folder,"/config.calibrate",sep="")) + + + #-------------------------------------------------- + # load in the theta posterior draws + thetas=scan(paste(folder,"/model.eta.theta.fit",sep=""),quiet=TRUE) + thetas=matrix(thetas,ncol=numcal,byrow=T) + + + res=list() + res$modeltype=modeltype + res$model=model + res$xcroot=xcroot; res$ycroot=ycroot; + #res$xfroot=xfroot; res$yfroot=yfroot; + res$xproot=xproot; + res$mc=mc; res$mhc=mhc; + #res$mf=mf; res$mhf=mhf; + res$p=p; + #res$pdelta=pdelta; + res$nd=nd; res$burn=burn; res$nadapt=nadapt; res$adaptevery=adaptevery; res$tauc=tauc; res$overalllambdac=overalllambdac + res$overallnuc=overallnuc; + #res$tauf=tauf; res$overalllambdaf=overalllambdaf; res$overallnuf=overallnuf; + res$base=base; res$power=power; res$baseh=baseh; res$powerh=powerh + res$k=numx+numcal; + res$tc=tc; res$scroot=scroot; + #res$sfroot=sfroot; + res$chgvcroot=chgvcroot; + #res$chgvfroot=chgvfroot; + res$pb=pb; res$pbdc=pbdc; res$pbdch=pbdch; + #res$pbdf=pbdf; res$pbdfh=pbdfh; + res$stepwpert=stepwpert; res$stepwperth=stepwperth; + res$probchv=probchv; res$probchvh=probchvh; res$minnumbot=minnumbot; res$minnumboth=minnumboth + res$printevery=printevery; res$xicroot=xicroot; + #res$xifroot=xifroot; + res$minxc=minxc; res$maxxc=maxxc; + res$xdims=xdims; res$numcal=numcal; res$caldims=caldims; res$calpriortype=calpriortype; res$calpriora=calpriora; + res$calpriorb=calpriorb; res$summarystats=summarystats; res$modelname=paste(modelname,".eta",sep="") + class(xic)="OpenBT_cutinfo" + class(xif)="OpenBT_cutinfo" + res$xiccuts=xic +# res$xifcuts=xif + res$fmeanc=fmeanc.out; +# res$fmeanf=fmeanf.out; + res$theta=thetas + res$folder=folder + + res$delta$modeltype=modeltype[2] + res$delta$model=model + res$delta$xroot=xfroot + res$delta$yroot=yfroot + res$delta$xproot=xproot + res$delta$k=numx + res$delta$m=mf + res$delta$mh=mhf + res$delta$p=pdelta + res$delta$nd=nd + res$delta$burn=burn + res$delta$nadapt=nadapt + res$delta$adaptevery=adaptevery + res$delta$tau=tauf + res$delta$overalllambda=overalllambdaf + res$delta$overallnu=overallnuf + res$delta$base=base + res$delta$power=power + res$delta$baseh=baseh + res$delta$powerh=powerh + res$delta$chgvroot=chgvfroot + res$delta$pb=pb + res$delta$pbd=pbdf + res$delta$pbdh=pbdfh + res$delta$stepwpert=stepwpert + res$delta$stepwperth=stepwperth + res$delta$probchv=probchv + res$delta$probchvh=probchvh + res$delta$minnumbot=minnumbot + res$delta$minnumboth=minnumboth + res$delta$printevery=printevery + res$delta$xiroot=xifroot + res$delta$summarystats=summarystats + res$delta$modelname=paste(modelname,".delta",sep="") + res$delta$xicuts=xif + res$delta$fmean=fmeanf.out + res$delta$folder=folder + class(res$delta)="OpenBT_posterior" + + + class(res)="OpenBT_calibrate_posterior" + + return(res) +} + + +emulate.openbt = function( +fit=NULL, +x.test=NULL, +tc=2, +fmeanc=fit$fmeanc, +fmeanf=fit$delta$fmean, +q.lower=0.025, +q.upper=0.975 +) +{ + + # model type definitions + MODEL_BT=1 + MODEL_BINOMIAL=2 + MODEL_POISSON=3 + MODEL_BART=4 + MODEL_HBART=5 + MODEL_PROBIT=6 + MODEL_MODIFIEDPROBIT=7 + MODEL_MERCK_TRUNCATED=8 + + #-------------------------------------------------- + # params + if(is.null(fit)) stop("No fitted model specified!\n") + if(is.null(x.test)) stop("No emulation points specified!\n") + if(class(fit)!="OpenBT_calibrate_posterior") stop("Invalid object passed to emulate.openbt()\n") + + xdims=fit$xdims + caldims=fit$caldims + nslv=tc + x.test=as.matrix(x.test) + pdelta=ncol(x.test) + if(pdelta!=fit$delta$p) stop("x.test has incorrect dimensions.\n") + n=nrow(x.test) + p=fit$p + x.eta.test=matrix(0,nrow=n,ncol=p) + x.eta.test[,xdims]=x.test + + # Get discrepancy posterior first + cat("Drawing from discrepancy posterior...\n") + fit.delta=predict.openbt(fit$delta,x.test=x.test,tc=tc,fmean=fmeanf) + + # Get eta posterior next + cat("Drawing from emulator posterior...\n") + xeroot="xe" + xdroot="xd" + + if( (fit$modeltype[1]==MODEL_BART || fit$modeltype[1]==MODEL_HBART) && (fit$modeltype[2]==MODEL_BART || fit$modeltype[2]==MODEL_HBART) ) + { + fmeanc.out=paste(fmeanc) + fmeanf.out=paste(fmeanf) + } + else stop("Invalid model passed to emulate.openbt()\n") + # if(fit$modeltype==MODEL_PROBIT || fit$modeltype==MODEL_MODIFIEDPROBIT) + # { + # fmean.out=paste(qnorm(fmean)) + # } + + + #-------------------------------------------------- + #write out config file + fout=file(paste(fit$folder,"/config.emulate",sep=""),"w") + writeLines(c(paste(fit$modelname),paste(fit$modeltype),fit$xicroot, + xeroot,paste(fit$nd),paste(fit$mc), + paste(fit$mhc),paste(p),paste(pdelta), + paste(fit$numcal),paste(caldims-1),paste(tc), + fmeanc.out), fout) + close(fout) + + + + #-------------------------------------------------- + #write out data subsets + #folder=paste(".",fit$modelname,"/",sep="") + xelist=split(as.data.frame(x.eta.test),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(xelist[[i]]),file=paste(fit$folder,"/",xeroot,i-1,sep="")) + xdlist=split(as.data.frame(x.test),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(xdlist[[i]]),file=paste(fit$folder,"/",xdroot,i-1,sep="")) + for(i in 1:p) write(fit$xiccuts[[i]],file=paste(fit$folder,"/",fit$xicroot,i,sep="")) + for(i in 1:pdelta) write(fit$xifcuts[[i]],file=paste(fit$folder,"/",fit$xifroot,i,sep="")) + + + #-------------------------------------------------- + #run prediction program + cmdopt=100 #default to serial/OpenMP + runlocal=FALSE + cmd="openbtcli --conf" + if(Sys.which("openbtcli")[[1]]=="") # not installed in a global location, so assume current directory + runlocal=TRUE + + if(runlocal) cmd="./openbtcli --conf" + + cmdopt=system(cmd) + + if(cmdopt==101) # MPI + { + cmd=paste("mpirun -np ",tc," openbtemulate ",fit$folder,sep="") + } + + if(cmdopt==100) # serial/OpenMP + { + if(runlocal) + cmd=paste("./openbtemulate ",fit$folder,sep="") + else + cmd=paste("openbtemulate ",fit$folder,sep="") + } + + #cmd=paste("mpirun -np ",tc," openbtpred",sep="") + #cat(cmd) + system(cmd) + system(paste("rm -f ",fit$folder,"/config.emulate",sep="")) + + +#-------------------------------------------------- +#format and return +res=list() + + +# Read in eta posterior +fnames=list.files(fit$folder,pattern=paste(fit$modelname,".mdraws*",sep=""),full.names=TRUE) +res$mdraws=as.matrix(do.call(cbind,sapply(fnames,data.table::fread))) +fnames=list.files(fit$folder,pattern=paste(fit$modelname,".sdraws*",sep=""),full.names=TRUE) +res$sdraws=as.matrix(do.call(cbind,sapply(fnames,data.table::fread))) + +system(paste("rm -f ",fit$folder,"/",fit$modelname,".mdraws*",sep="")) +system(paste("rm -f ",fit$folder,"/",fit$modelname,".sdraws*",sep="")) + +res$mmean=apply(res$mdraws,2,mean) +res$smean=apply(res$sdraws,2,mean) +res$msd=apply(res$mdraws,2,sd) +res$ssd=apply(res$sdraws,2,sd) +res$m.5=apply(res$mdraws,2,quantile,0.5) +res$m.lower=apply(res$mdraws,2,quantile,q.lower) +res$m.upper=apply(res$mdraws,2,quantile,q.upper) +res$s.5=apply(res$sdraws,2,quantile,0.5) +res$s.lower=apply(res$sdraws,2,quantile,q.lower) +res$s.upper=apply(res$sdraws,2,quantile,q.upper) +res$pdraws=NULL +res$pmean=NULL +res$psd=NULL +res$p.5=NULL +res$p.lower=NULL +res$p.upper=NULL +res$zdraws=res$mdraws+fit.delta$mdraws +res$zmean=apply(res$zdraws,2,mean) +res$zsd=apply(res$zdraws,2,sd) +res$z.5=apply(res$zdraws,2,quantile,0.5) +res$z.lower=apply(res$zdraws,2,quantile,q.lower) +res$z.upper=apply(res$zdraws,2,quantile,q.upper) + +if(fit$modeltype==MODEL_PROBIT || fit$modeltype==MODEL_MODIFIEDPROBIT) +{ + res$pdraws=read.table(paste(fit$folder,"/",fit$modelname,".pdraws",0,sep="")) + for(i in 2:nslv) + { + res$pdraws=cbind(res$pdraws,read.table(paste(fit$folder,"/",fit$modelname,".pdraws",i-1,sep=""))) + } + + system(paste("rm -f ",fit$folder,"/",fit$modelname,".pdraws*",sep="")) + + res$pdraws=as.matrix(res$pdraws) + res$pmean=apply(res$pdraws,2,mean) + res$psd=apply(res$pdraws,2,sd) + res$p.5=apply(res$pdraws,2,quantile,0.5) + res$p.lower=apply(res$pdraws,2,quantile,q.lower) + res$p.upper=apply(res$pdraws,2,quantile,q.upper) +} + +res$q.lower=q.lower +res$q.upper=q.upper +res$modeltype=fit$modeltype + +res$delta=fit.delta + +class(res)="OpenBT_emulate" + +return(res) +} + + + +# Scan the trees in the posterior to extract tree properties +# Returns the mean trees as a list of lists in object mt +# and the variance trees as a list of lists in object st. +# The format is mt[[i]][[j]] is the jth posterior tree from the ith posterior +# sum-of-trees (ensemble) sample. +# The tree is encoded in 4 vectors - the node ids, the node variables, +# the node cutpoints and the node thetas. +openbt.scanpost<-function(post) +{ + fp=file(paste(post$folder,"/",post$modelname,".fit",sep=""),open="r") + if(scan(fp,what=integer(),nmax=1,quiet=TRUE) != post$nd) stop("Error scanning posterior\n") + if(scan(fp,what=integer(),nmax=1,quiet=TRUE) != post$m) stop("Error scanning posterior\n") + if(scan(fp,what=integer(),nmax=1,quiet=TRUE) != post$mh) stop("Error scanning posterior\n") + if(scan(fp,what=integer(),nmax=1,quiet=TRUE) != post$nd*post$m) stop("Error scanning posterior\n") + + # scan mean trees + numnodes=scan(fp,what=integer(),nmax=post$nd*post$m,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + ids=scan(fp,what=integer(),nmax=lenvec,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + vars=scan(fp,what=integer(),nmax=lenvec,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + cuts=scan(fp,what=integer(),nmax=lenvec,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + thetas=scan(fp,what=double(),nmax=lenvec,quiet=TRUE) + + # scan var trees + if(scan(fp,what=integer(),nmax=1,quiet=TRUE) != post$nd*post$mh) stop("Error scanning posterior\n") + snumnodes=scan(fp,what=integer(),nmax=post$nd*post$mh,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + sids=scan(fp,what=integer(),nmax=lenvec,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + svars=scan(fp,what=integer(),nmax=lenvec,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + scuts=scan(fp,what=integer(),nmax=lenvec,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + sthetas=scan(fp,what=double(),nmax=lenvec,quiet=TRUE) + + close(fp) + + # Now rearrange things into lists of lists so its easier to manipulate + mt=list() + ndx=2 + cs.numnodes=c(0,cumsum(numnodes)) + for(i in 1:post$nd) { + ens=list() + for(j in 1:post$m) + { + tree=list(id=ids[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]], + var=vars[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]], + cut=cuts[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]], + theta=thetas[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]]) + ens[[j]]=tree + ndx=ndx+1 + } + mt[[i]]=ens + } + + + st=list() + ndx=2 + cs.numnodes=c(0,cumsum(snumnodes)) + for(i in 1:post$nd) { + ens=list() + for(j in 1:post$mh) + { + tree=list(id=sids[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]], + var=svars[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]], + cut=scuts[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]], + theta=sthetas[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]]) + ens[[j]]=tree + ndx=ndx+1 + } + st[[i]]=ens + } + + return(list(mt=mt,st=st)) +} + + + +# Save a posterior tree fit from the tmp working directory +# into a local zip file given by [file].zip +# If not file option specified, uses [model name].zip as the file. +openbt.save<-function(post,fname=NULL) +{ + if(class(post)!="OpenBT_posterior") stop("Invalid object.\n") + + if(is.null(fname)) fname=post$modelname + if(substr(fname,nchar(fname)-3,nchar(fname))!=".obt") fname=paste(fname,".obt",sep="") + + save(post,file=paste(post$folder,"/post.RData",sep="")) + + zipr(fname,paste(post$folder,"/",list.files(post$folder),sep="")) + cat("Saved posterior to ",fname,"\n") +} + + +# Load a posterior tree fit from a zip file into a tmp working directory. +openbt.load<-function(fname) +{ + if(substr(fname,nchar(fname)-3,nchar(fname))!=".obt") fname=paste(fname,".obt",sep="") + + folder=tempdir(check=TRUE) + if(!dir.exists(folder)) dir.create(folder) + tmpsubfolder=tempfile(tmpdir="") + tmpsubfolder=substr(tmpsubfolder,5,nchar(tmpsubfolder)) + tmpsubfolder=paste("openbt",tmpsubfolder,sep="") + folder=paste(folder,"/",tmpsubfolder,sep="") + if(!dir.exists(folder)) dir.create(folder) + + unzip(fname,exdir=folder) + post=loadRData(paste(folder,"/post.RData",sep="")) + post$folder=folder + + return(post) +} + + +# This is so lame. Seriously. +loadRData <- function(fname) +{ + load(fname) + get(ls()[ls() != "fname"]) +} + + +print.OpenBT_posterior<-function(post) +{ + MODEL_BT=1 + MODEL_BINOMIAL=2 + MODEL_POISSON=3 + MODEL_BART=4 + MODEL_HBART=5 + MODEL_PROBIT=6 + MODEL_MODIFIEDPROBIT=7 + MODEL_MERCK_TRUNCATED=8 + + cat("OpenBT Posterior\n") + cat("Model type: ") + if(post$modeltype==MODEL_BART) + { + cat("Bayesian Additive Regression Trees model (BART)\n") + cat("k=",post$k,"\n") + cat("tau=",post$tau,"\n") + cat("lambda=",post$overalllambda,"\n") + cat("nu=",post$overallnu,"\n") + } + if(post$modeltype==MODEL_HBART) + { + cat("Heteroscedastic Bayesian Additive Regression Trees model (HBART)\n") + } + if(post$modeltype==MODEL_PROBIT) + { + cat("Dichotomous outcome model: Albert & Chib Probit (fixed)\n") + } + if(post$modeltype==MODEL_MODIFIEDPROBIT) + { + cat("Dichotomous outcome model: Modified Albert & Chib Probit\n") + } + if(post$modeltype==MODEL_MERCK_TRUNCATED) + { + cat("Truncated BART model\n") + } + + cat("ntree=", post$m, " \n",sep="") + cat("ntreeh=",post$mh," \n",sep="") + cat(post$nd," posterior draws.\n") + summary(post$xicuts) +} + +summary.OpenBT_posterior<-function(post) +{ + cat("No summary method for object.\n") +} + +print.OpenBT_predict<-function(pred) +{ + MODEL_BT=1 + MODEL_BINOMIAL=2 + MODEL_POISSON=3 + MODEL_BART=4 + MODEL_HBART=5 + MODEL_PROBIT=6 + MODEL_MODIFIEDPROBIT=7 + MODEL_MERCK_TRUNCATED=8 + + cat("OpenBT Prediction\n") + cat(ncol(pred$mdraws), " prediction locations.\n") + cat(nrow(pred$mdraws), " realizations.\n") + if(pred$modeltype==MODEL_PROBIT || pred$modeltype==MODEL_MODIFIEDPROBIT) + { + cat("Probability quantiles: ",pred$q.lower,",",pred$q.upper,"\n") + } + cat("Mean quantiles: ",pred$q.lower,",",pred$q.upper,"\n") + cat("Variance quantiles: ",pred$q.lower,",",pred$q.upper,"\n\n") +} + +summary.OpenBT_predict<-function(pred) +{ + cat("No summary method for object.\n") +} + +print.OpenBT_sobol<-function(sobol) +{ + cat("OpenBT Sobol Indices\n") + cat(ncol(sobol$vidraws), " variables.\n") + cat(nrow(sobol$vidraws), " realizations.\n") +} + +summary.OpenBT_sobol<-function(sobol) +{ + cat("Summary of Posterior Sobol Sensitivity Indices\n") + + cat("Expected Sobol Indices (Mean)\n") + print(sobol$msi) + print(sobol$mtsi) + print(sobol$msij) + + cat("\nStd. Dev. of Sobol Indices (Mean)\n") + print(sobol$msi.sd) + print(sobol$tsi.sd) + print(sobol$sij.sd) +} + +plot.OpenBT_sobol<-function(sobol) +{ + par(mfrow=c(3,1)) + boxplot(sobol$sidraws,ylab="Sobol Sensitivity",main="First Order Sobol Indices",xlab="Variables") + boxplot(sobol$sijdraws,ylab="Sobol Sensitivity",main="Two-way Sobol Indices",xlab="Variables") + boxplot(sobol$tsidraws,ylab="Sobol Sensitivity",main="Total Sobol Indices",xlab="Variables") +} + +print.OpenBT_vartivity<-function(vartivity) +{ + cat("OpenBT Variable Activity\n") + cat(ncol(vartivity$vdraws), " variables.\n") + cat(nrow(vartivity$vdraws), " realizations.\n") +} + +summary.OpenBT_vartivity<-function(vartivity) +{ + cat("Summary of Posterior Variable Activity\n") + + p=length(vartivity$mvdraws) + if(p<11) + { + cat("Expected Variable Activity (Mean)\n") + mean.vartivity=vartivity$mvdraws + ix=sort(mean.vartivity,index.return=TRUE,decreasing=TRUE)$ix + mean.vartivity=round(mean.vartivity[ix],2) + names(mean.vartivity)=ix + print(mean.vartivity) + cat("Expected Variable Activity (Variance)\n") + sd.vartivity=vartivity$mvdrawsh + ix=sort(sd.vartivity,index.return=TRUE,decreasing=TRUE)$ix + sd.vartivity=round(sd.vartivity[ix],2) + names(sd.vartivity)=ix + print(sd.vartivity) + } + else + { + cat("Expected Variable Activity (Mean)\n") + mean.vartivity=vartivity$mvdraws + ix=sort(mean.vartivity,index.return=TRUE,decreasing=TRUE)$ix + mean.vartivity=round(mean.vartivity[ix],2) + rest=sum(mean.vartivity[11:p]) + mean.vartivity=mean.vartivity[1:11] + mean.vartivity[11]=rest + names(mean.vartivity)=c(ix[1:10],"...") + print(mean.vartivity) + cat("Expected Variable Activity (Variance)\n") + sd.vartivity=vartivity$mvdrawsh + ix=sort(sd.vartivity,index.return=TRUE,decreasing=TRUE)$ix + sd.vartivity=round(sd.vartivity[ix],2) + rest=sum(sd.vartivity[11:p]) + sd.vartivity=sd.vartivity[1:11] + sd.vartivity[11]=rest + names(sd.vartivity)=c(ix[1:10],"...") + print(sd.vartivity) + } +} + +plot.OpenBT_vartivity<-function(vartivity) +{ + par(mfrow=c(1,2)) + yrange=c(0,max(max(vartivity$vdraws),max(vartivity$vdrawsh))) + boxplot(vartivity$vdraws,ylab="% node splits",main="Mean Trees",xlab="Variables",ylim=yrange) + boxplot(vartivity$vdrawsh,ylab="% node splits",main="Variance Trees",xlab="Variables",ylim=yrange) +} + +summary.OpenBT_cutinfo<-function(xi) +{ + p=length(xi) + cat("Number of variables: ",p,"\n") + cat("Number of cutpoints per variable\n") + for(i in 1:p) + { + cat("Variable ",i,": ",length(xi[[i]])," cutpoints\n") + } +} + +print.OpenBT_cutinfo<-function(xi) +{ + summary.OpenBT_cutinfo(xi) +} + +# Takes the n x p design matrix and a scalar or vector of number of cutpoints +# per variable, returns a BARTcutinfo object with variables/cutpoints initalized. +makecuts.openbt<-function(x,numcuts) +{ + p=ncol(x) + if(length(numcuts)==1) + { + numcuts=rep(numcuts,p) + } + else if(ncol(x) != length(numcuts)) + { + cat("Number of variables does not equal length of numcuts vector!\n") + return(0) + } + + xi=vector("list",p) + minx=apply(x,2,min) + maxx=apply(x,2,max) + for(i in 1:p) + { + xinc=(maxx[i]-minx[i])/(numcuts[i]+1) + xi[[i]]=(1:numcuts[i])*xinc+minx[i] + } + + class(xi)="OpenBT_cutinfo" + return(xi) +} + +# Takes an existing OpenBT_cutinfo object xi and the particular variable to update, id, +# and the vector of cutpoints to manually assign to that variable, updates and returns +# the modified OpenBT_cutinfo object. +setvarcuts.openbt<-function(xi,id,cutvec) +{ + p=length(xi) + if(id>p || id<1) + { + cat("Invalid variable specified\n") + return(0) + } + xi[[id]]=cutvec + return(xi) +} + + +#-------------------------------------------------- +#Get model mixing weights +#-------------------------------------------------- +mixingwts.openbt = function( + fit=NULL, + x.test=NULL, + tc=2, + numwts=NULL, + q.lower=0.025, + q.upper=0.975 +) +{ + # model type definitions + MODEL_BT=1 + MODEL_BINOMIAL=2 + MODEL_POISSON=3 + MODEL_BART=4 + MODEL_HBART=5 + MODEL_PROBIT=6 + MODEL_MODIFIEDPROBIT=7 + MODEL_MERCK_TRUNCATED=8 + MODEL_MIXBART=9 + + model_types = c("mixbart", "mix_emulate") + #-------------------------------------------------- + # params + if(is.null(fit)) stop("No fitted model specified!\n") + if(is.null(x.test)) stop("No prediction points specified!\n") + if(is.null(numwts)){stop("Missing number of model weights parameter numwts.\n")} + if(!(fit$model %in% model_types)){stop("Wrong model type! This function is for mixbart.\n")} + #if(fit$modeltype!=MODEL_MIXBART){stop("Wrong model type! This function is for mixbart.\n")} + + nslv=tc + x.test=as.matrix(x.test) + p=ncol(x.test) + n=nrow(x.test) + xwroot="xw" + fitroot=".fit" + if(fit$model == "mixbart"){m = fit$m; mh = fit$mh}else{m = fit$mix_model_args$ntree; mh = fit$mix_model_args$ntreeh} + if(fit$model != "mixbart"){fitroot = ".fitmix"} + #-------------------------------------------------- + #write out config file + fout=file(paste(fit$folder,"/config.mxwts",sep=""),"w") + writeLines(c(fit$modelname,fit$modeltype,fit$xiroot,xwroot, + fitroot,paste(fit$nd),paste(m), + paste(mh),paste(p),paste(numwts),paste(tc)), fout) + close(fout) + + #-------------------------------------------------- + #write out data subsets + #folder=paste(".",fit$modelname,"/",sep="") + xlist=split(as.data.frame(x.test),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(xlist[[i]]),file=paste(fit$folder,"/",xwroot,i-1,sep="")) + for(i in 1:p) write(fit$xicuts[[i]],file=paste(fit$folder,"/",fit$xiroot,i,sep="")) + + #-------------------------------------------------- + #run prediction program + cmdopt=100 #default to serial/OpenMP + runlocal=FALSE + cmd="openbtcli --conf" + if(Sys.which("openbtcli")[[1]]=="") # not installed in a global location, so assume current directory + runlocal=TRUE + + if(runlocal) cmd="./openbtcli --conf" + + cmdopt=system(cmd) + + if(cmdopt==101) # MPI + { + cmd=paste("mpirun -np ",tc," openbtmixingwts ",fit$folder,sep="") + } + + if(cmdopt==100) # serial/OpenMP + { + if(runlocal) + cmd=paste("./openbtmixingwts ",fit$folder,sep="") + else + cmd=paste("openbtmixingwts ",fit$folder,sep="") + } + + system(cmd) + system(paste("rm -f ",fit$folder,"/config.mxwts",sep="")) + + + #-------------------------------------------------- + #format and return + res=list() + wt_list = list() + mean_matrix = sd_matrix = ub_matrix = lb_matrix = med_matrix = matrix(0, nrow = n, ncol = numwts) + + #Get the file names for the model weights + #--file name for model weight j using processor p is ".wjdrawsp" + for(j in 1:numwts){ + #Get the files for weight j + tagname = paste0(".w", j,"draws*") + fnames=list.files(fit$folder,pattern=paste(fit$modelname,tagname,sep=""),full.names=TRUE) + + #Bind the posteriors for weight j across all x points -- npost X n data + wt_list[[j]] = do.call(cbind,sapply(fnames,data.table::fread)) + + #Now populate the summary stat matrices -- n X k matrices + mean_matrix[,j] = apply(wt_list[[j]], 2, mean) + sd_matrix[,j] = apply(wt_list[[j]], 2, sd) + med_matrix[,j] = apply(wt_list[[j]], 2, median) + lb_matrix[,j] = apply(wt_list[[j]], 2, quantile,q.lower) + ub_matrix[,j] = apply(wt_list[[j]], 2, quantile,q.upper) + } + + #Save the list of posterior draws -- each list element is an npost X n dataframe + res$wdraws = wt_list + + #Get model mixing results + res$wmean=mean_matrix + res$wsd=sd_matrix + res$w.5=med_matrix + res$w.lower=lb_matrix + res$w.upper=ub_matrix + + res$q.lower=q.lower + res$q.upper=q.upper + res$modeltype=fit$modeltype + + class(res)="OpenBT_mixingwts" + return(res) +} + + +#---------------------------------------------- +# Model mixing and emulation interface +#---------------------------------------------- +openbt.mix_emulate = function( +mix_model_data = list(y_train = NULL, x_train = NULL), +emu_model_data = list('model1'=list(z_train = NULL, x_train = NULL),'model2'=list(z_train = NULL, x_train = NULL)), +mix_model_args = list('ntree'=10,'ntreeh'=1,'k'=2,'overallnu'=10,'overallsd'=NA,'power'=2.0,'base'=0.95,'powerh'=NA,'baseh'=NA), +emu_model_args = matrix(c(100,1,2,10,NA,2.0,0.95,NA,NA), nrow = 2, ncol = 9,byrow = TRUE, + dimnames = list(paste0('model',1:2),c('ntree','ntreeh','k','overallnu','overallsd','power','base','powerh','baseh'))), +discrep_model_args = list('k' = 10, 'beta0' = NA), +ndpost=1000, nskip=100,nadapt=1000,adaptevery=100, +pbd=.7, +pb=.5, +stepwpert=.1, +probchv=.1, +minnumbot=5, +tc=2, +printevery=100, +numcut=100, +xicuts=NULL, +sigmav_list=NULL, +chv_list = NULL, +summarystats = FALSE, +model=NULL, +modelname="model" +#sigmav=rep(1,length(y.train)), +#chv = cor(x.train,method="spearman"), +) +{ +#-------------------------------------------------- +# model type definitions and check default arguments +modeltype=0 +MODEL_MIX_EMULATE=1 #Full problem +MODEL_MIX_ORTHOGONAL=2 # Full problem with orthogonal discrepancy + +modeltype_list = c('mix_emulate','mix_orthogonal') + +if(is.null(model)){ + stop(cat("Enter a model type. Valid types include:\n", paste(modeltype_list,collapse = ', '))) +} + +if(!(model %in% modeltype_list)){ + stop(cat("Invalid model type. Valid types include:\n", paste(modeltype_list,collapse = ', '))) +} + +if(model == 'mix_emulate'){ + modeltype = MODEL_MIX_EMULATE + if(is.null(mix_model_args$overallsd)) ntreeh=1 +} + +#-------------------------------------------------- +# Define terms +nd = ndpost +burn = nskip +nummodels = length(emu_model_data) + +#-------------------------------------------------- +# Process the data +# Model mixing data +if(is.null(mix_model_data$x_train) || is.null(mix_model_data$y_train)){ + stop('Invalid model mixing inputs. Either x_train or y_train is NULL.') +} + +# Cast x_train data to matrix +if(!is.matrix(mix_model_data$x_train)){ + mix_model_data$x_train = matrix(is.null(mix_model_data$x_train), ncol = 1) +} + +# Define terms +n = nrow(mix_model_data$x_train) +p = ncol(mix_model_data$x_train) +ymean = mean(mix_model_data$y_train) + +# Emulation data +nc_vec = 0 +pc_vec = 0 +for(l in 1:nummodels){ + + if(is.null(emu_model_data[[l]]$x_train) || is.null(emu_model_data[[l]]$z_train)){ + stop(paste0('Invalid emulation inputs. Either x_train or y_train is NULL in model ', l,'.')) + } + + # Cast the x_train data to matrix + if(!is.matrix(mix_model_data$x_train)){ + emu_model_data[[l]]$x_train = matrix(is.null(emu_model_data[[l]]$x_train), ncol = 1) + } + + # Define terms + nc_vec[l] = nrow(emu_model_data[[l]]$x_train) + pc_vec[l] = ncol(emu_model_data[[l]]$x_train) + +} + +# Get means of the model runs and mean center the data +zmean_vec = 0 +for(l in 1:nummodels){ + zmean_vec[l] = mean(emu_model_data[[l]]$z_train) + emu_model_data[[l]]$z_train = emu_model_data[[l]]$z_train - zmean_vec[l] +} + +#-------------------------------------------------- +# cutpoints +# get joint input matrix +col_list1 = colnames(mix_model_data$x_train) +col_list2 = unique(unlist(lapply(emu_model_data, function(x) colnames(x$x_train)))) + +if(sum(sort(col_list1) != sort(col_list2))>0){ + stop(cat("The input variables in model mixing differ from the input variables in emulation.", + "\nModel Mixing Inputs: \n", col_list1, + "\nAll Emulation Inputs: \n", col_list2)) +} + +# Create a master list of inputs -- includes all from field obs and computer models +x_matrix = mix_model_data$x_train +xc_col_list = list() +for(l in 1:nummodels){ + # get matrix and Fill columns that are not present with NA + x_new = matrix(0, nrow = nc_vec[l], ncol = p, dimnames = list(NULL, col_list1)) + xc_cols = 0 + ind = 1 + for(j in 1:length(col_list1)){ + cname = col_list1[j] + if(cname %in% colnames(emu_model_data[[l]]$x_train)){ + # Add the corresponding data to the x_matrix + x_new[,cname] = emu_model_data[[l]]$x_train[,cname] + # Keep track of which columns are in the inputs for the lth computer model + xc_cols[ind] = j + ind = ind + 1 + }else{ + # If the predictor is not in the input space of lth computer model, then just use NA in the x_matrix + x_new[,cname] = rep(NA, nc_vec[l]) + } + } + xc_col_list = append(xc_col_list, list(xc_cols)) + x_matrix = rbind(x_matrix, x_new) +} +names(xc_col_list) = paste0('model',1:nummodels) + +# Use x_matrix to create cutpoints +if(!is.null(xicuts)){ + xi=xicuts +}else{ + xi=vector("list",p) + minx_temp=apply(t(x_matrix),1,function(x) min(x, na.rm = TRUE)) + maxx_temp=apply(t(x_matrix),1,function(x) max(x, na.rm = TRUE)) + + maxx = round(maxx_temp,1) + ifelse((round(maxx_temp,1)-maxx_temp)>0,0,0.1) + minx = round(minx_temp,1) - ifelse((minx_temp - round(minx_temp,1))>0,0,0.1) + for(i in 1:p){ + xinc=(maxx[i]-minx[i])/(numcut+1) + xi[[i]]=(1:numcut)*xinc+minx[i] + } +} + +#------------------------------------------------ +# Priors +#------------------------------------------------ +# Model Mixing +# -- Terminal node parameters +rgy = range(mix_model_data$y_train) +m = mix_model_args$ntree +k = mix_model_args$k + +tau_disc = (rgy[2]-rgy[1])/(2*sqrt(m)*discrep_model_args$k) +tau_wts = (1)/(2*sqrt(m)*k) +beta_wts = 1/(2*m) +if(is.na(discrep_model_args$beta0)){ + beta_disc = mean(mix_model_data$y_train)/m +}else{ + beta_disc = discrep_model_args$beta0/m +} + +# -- Error variance +if(is.na(mix_model_args$overallsd)){ + mix_model_args$overallsd = sd(mix_model_data$y_train) +} +overalllambda_mix = mix_model_args$overallsd^2 +overallnu_mix = mix_model_args$overallnu + +# -- Tree prior (mean and variance trees) +power_mix = mix_model_args$power +base_mix = mix_model_args$base +powerh_mix = mix_model_args$powerh +baseh_mix = mix_model_args$baseh + +if(is.na(powerh_mix)){ + powerh_mix=power_mix +} +if(is.na(baseh_mix)){ + baseh_mix=base_mix +} + +# Emulation with BART +tau_emu = 0 +base_emu = baseh_emu = 0 +power_emu = powerh_emu = 0 +overalllambda_emu = overallnu_emu = 0 +for(l in 1:nummodels){ + # -- Terminal node parameters + m = emu_model_args[l,'ntree'] + k = emu_model_args[l,'k'] + rgz = range(emu_model_data[[l]]$z_train) + tau_emu[l] = (rgz[2]-rgz[1])/(2*sqrt(m)*k) + + # -- Error variance + if(is.na(emu_model_args[l,'overallsd'])){ + emu_model_args[l,'overallsd'] = sd(emu_model_data[[l]]$z_train) + } + overalllambda_emu[l] = emu_model_args[l,'overallsd']^2 + overallnu_emu[l] = emu_model_args[l,'overallnu'] + + # -- Tree prior (mean and variance trees) + power_emu[l] = emu_model_args[l,'power'] + base_emu[l] = emu_model_args[l,'base'] + powerh_emu[l] = emu_model_args[l,'powerh'] + baseh_emu[l] = emu_model_args[l,'baseh'] + + if(is.na(powerh_emu[l])) { + powerh_emu[l] = power_emu[l] + } + if(is.na(baseh_emu[l])) { + baseh_emu[l]=base_emu[l] + } +} + +#-------------------------------------------------- +# Birth and death probability +pbdh=pbd +pbh=pb +if(length(pbd)>1) { + pbdh=pbd[2] + pbd=pbd[1] +} +if(length(pb)>1) { + pbh=pb[2] + pb=pb[1] +} + + +#-------------------------------------------------- +# Process other arguments +#-------------------------------------------------- +stepwperth=stepwpert +if(length(stepwpert)>1) { + stepwperth=stepwpert[2] + stepwpert=stepwpert[1] +} + +probchvh=probchv +if(length(probchv)>1) { + probchvh=probchv[2] + probchv=probchv[1] +} + +minnumboth=minnumbot +if(length(minnumbot)>1) { + minnumboth=minnumbot[2] + minnumbot=minnumbot[1] +} + +# Set default argument for the sigmavs +if(is.null(sigmav_list)){ + for(l in 0:nummodels){ + sigmav_n = ifelse(l == 0, n, nc_vec[l]+n) + sigmav_list[[l+1]] = rep(1,sigmav_n) + } +}else{ + if(!is.list(sigmav_list)){ + stop("Invalid data type: sigmav_list must be a list object with length K+1.") + } + for(l in 0:nummodels){ + sigmav_n = ifelse(l == 0, n, nc_vec[l]+n) + if(length(sigmav_list[[l+1]]) != sigmav_n){ + stop(cat("Invalid sigmav_list entry at item",l+1,"\n Length of sigmav_list[[l]] is incorrect. Required lengths are listed below: \n", + "\t Mixing (l=1): ", n, " (number of field obs) \n", + "\t Emulation (l>1): ", n + nc_vec[l+1], " (number of field obs + number of model runs))")) + } + + } +} + +# Set the default argument for the chvs +if(is.null(chv_list)){ + for(l in 0:nummodels){ + if(l == 0){ + chv_data = mix_model_data$x_train + }else{ + xmixtemp = mix_model_data$x_train[,unlist(xc_col_list[l])] + if(length(unlist(xc_col_list[l])) == 1){xmixtemp = matrix(xmixtemp,ncol = 1)} + chv_data = rbind(emu_model_data[[l]]$x_train, xmixtemp) + } + chv_list[[l+1]] = cor(chv_data,method="spearman") + } +}else{ + if(!is.list(chv_list)){ + stop("Invalid data type: chv_list must be a list object with length K+1.") + } + for(l in 0:nummodels){ + chv_p = ifelse(l == 0, p, pc_vec[l]) + if(chv_list[[l+1]] != chv_p){ + stop(paste0("Invalid chv_list entry at item k = ",l+1,". Number of rows & columns in chv_list[[k]] must match", + " the number of columns in the design matrix for emulator k-1. If k = 1, then the number of rows & columns in", + " chv_list[[1]] must match the number of columns in the design matrix for the field observations.")) + } + } +} + +#-------------------------------------------------- +# Print statements +#-------------------------------------------------- +if(modeltype==MODEL_MIX_EMULATE){ + cat("Model: KOH Bayesian Additive Model Mixing Trees\n") +} +if(modeltype==MODEL_MIX_ORTHOGONAL){ + cat("Model: Orthogonal Bayesian Additive Model Mixing Trees\n") +} + +#-------------------------------------------------- +#write out config file +#-------------------------------------------------- +# Field data roots +yroots="yf" +xroots=c("xf",paste0("xc",1:nummodels)) +zroots=paste0("zc",1:nummodels) +sroots=c("sf",paste0("sc",1:nummodels)) +chgvroots=c("chgvf",paste0("chgvc",1:nummodels)) +idroots="id" + +# cut points root +xiroot="xi" + +# Design column numbers for computer models +xc_design_cols = 0 +h = 1 +# Format -- number of cols for model 1, col numbers for model 1,...,number of cols for model K, col numbers for model K +for(l in 1:nummodels){ + xc_design_cols[h] = paste(pc_vec[l]) + xc_design_cols[(h+1):(h+pc_vec[l])] = paste(xc_col_list[[l]]) + h = h+pc_vec[l] + 1 +} + +# Flatten the model arguments +m_vec = paste(c(mix_model_args$ntree, emu_model_args[,'ntree'])) +mh_vec = paste(c(mix_model_args$ntreeh, emu_model_args[,'ntreeh'])) +tau_vec = paste(c(tau_disc, tau_wts, tau_emu)) #Two mixing parameters, K emu parameters +beta_vec = paste(c(beta_disc, beta_wts)) #Only mixing parameters +base_vec = paste(c(base_mix, base_emu)) +baseh_vec = paste(c(baseh_mix, baseh_emu)) +power_vec = paste(c(power_mix, power_emu)) +powerh_vec = paste(c(powerh_mix, powerh_emu)) +lambda_vec = paste(c(overalllambda_mix, overalllambda_emu)) +nu_vec = paste(c(overallnu_mix, overallnu_emu)) +means_vec = c(ymean,zmean_vec) + +# Bind all of the K+1 dimensional vectors together into one matrix +# Then flatten by row -- this makes things easier when reading in the data in c++ +# This way, we read in all data for mixing, then read in the data for each emulator one at a time +info_matrix = data.frame(xroots, yzroots = c(yroots,zroots),sroots,chgvroots,means = paste(means_vec), + m_vec, mh_vec, base_vec, baseh_vec, power_vec, powerh_vec, lambda_vec, nu_vec) + +info_vec = unlist(as.vector(t(info_matrix))) + +# Creating temp directory and config file +folder=tempdir(check=TRUE) +if(!dir.exists(folder)) dir.create(folder) +tmpsubfolder=tempfile(tmpdir="") +tmpsubfolder=substr(tmpsubfolder,5,nchar(tmpsubfolder)) +tmpsubfolder=paste("openbt",tmpsubfolder,sep="") +folder=paste(folder,"/",tmpsubfolder,sep="") +if(!dir.exists(folder)) dir.create(folder) +fout=file(paste(folder,"/config",sep=""),"w") + +# Write lines to the config file fout +# writeLines(c(paste(modeltype),paste(nummodels), +# xroots,yroot,zroots,sroots,chgvroots,paste(zmean_vec),xc_design_cols, +# paste(nd),paste(burn),paste(nadapt),paste(adaptevery), +# m_vec,mh_vec,tau_vec,beta_vec,lambda_vec,nu_vec,base_vec,power_vec,baseh_vec,powerh_vec, +# paste(pbd),paste(pb),paste(pbdh),paste(pbh),paste(stepwpert),paste(stepwperth), +# paste(probchv),paste(probchvh),paste(minnumbot),paste(minnumboth), +# paste(printevery),paste(xiroot),paste(tc),paste(modelname),paste(summarystats)),fout) + +writeLines(c(paste(modeltype),paste(nummodels),info_vec,xc_design_cols, + paste(nd),paste(burn),paste(nadapt),paste(adaptevery), + tau_vec,beta_vec, + paste(pbd),paste(pb),paste(pbdh),paste(pbh),paste(stepwpert),paste(stepwperth), + paste(probchv),paste(probchvh),paste(minnumbot),paste(minnumboth), + paste(printevery),paste(xiroot),paste(tc),paste(modelname),paste(summarystats)),fout) +close(fout) + +#-------------------------------------------------- +#write out data subsets +#-------------------------------------------------- +nslv=tc-1 +# Model Mixing data +ylist=split(mix_model_data$y_train,(seq(n)-1) %/% (n/nslv)) +for(i in 1:nslv) write(ylist[[i]],file=paste(folder,"/",yroots,i,sep="")) + +xlist=split(as.data.frame(mix_model_data$x_train),(seq(n)-1) %/% (n/nslv)) +for(i in 1:nslv) write(t(xlist[[i]]),file=paste(folder,"/",xroots[1],i,sep="")) + +# Emulation Data +#idlist = vector(mode='list', length = nslv) +for(l in 1:nummodels){ + zlist=split(emu_model_data[[l]]$z_train,(seq(nc_vec[l])-1) %/% (nc_vec[l]/nslv)) + for(i in 1:nslv) write(zlist[[i]],file=paste(folder,"/",zroots[l],i,sep="")) + + xlist=split(as.data.frame(emu_model_data[[l]]$x_train),(seq(nc_vec[l])-1) %/% (nc_vec[l]/nslv)) + for(i in 1:nslv) write(t(xlist[[i]]),file=paste(folder,"/",xroots[l+1],i,sep="")) +} + +# Write the ids per mpi +#for(i in 1:nslv) write(t(idlist[[i]]),file=paste(folder,"/",idroots,i,sep="")) + +# Variance and rotation data +for(l in 0:nummodels){ + ntemp = ifelse(l==0, n, nc_vec[l]+n) + slist=split(sigmav_list[[l+1]],(seq(ntemp)-1) %/% (ntemp/nslv)) + for(i in 1:nslv) write(slist[[i]],file=paste(folder,"/",sroots[l+1],i,sep="")) + + chv_list[[l+1]][is.na(chv_list[[l+1]])]=0 # if a var as 0 levels it will have a cor of NA so we'll just set those to 0. + write(chv_list[[l+1]],file=paste(folder,"/",chgvroots[l+1],sep="")) +} + +# Cutpoints +for(i in 1:p) write(xi[[i]],file=paste(folder,"/",xiroot,i,sep="")) +rm(chv_list) + +#-------------------------------------------------- +#run program +#-------------------------------------------------- +cmdopt=100 #default to serial/OpenMP +runlocal=FALSE +cmd="openbtmixing --conf" +if(Sys.which("openbtmixing")[[1]]=="") # not installed in a global location, so assume current directory + runlocal=TRUE + +if(runlocal) cmd="./openbtmixing --conf" + +cmdopt=system(cmd) + +if(cmdopt==101) # MPI +{ + cmd=paste("mpirun -np ",tc," openbtmixing ",folder,sep="") +} + +if(cmdopt==100) # serial/OpenMP +{ + if(runlocal) + cmd=paste("./openbtmixing ",folder,sep="") + else + cmd=paste("openbtmixing ",folder,sep="") +} + +system(cmd) +#system(paste("rm -f ",folder,"/config",sep="")) + +# Collect results +res=list() +res$modeltype=modeltype; res$model=model +res$xroot=xroots; res$yroot=yroots; res$zroots=zroots;res$sroot=sroots; res$chgvroot=chgvroots; res$xc_col_list=xc_col_list +res$means = means_vec; res$nummodels = nummodels; +res$nd=nd; res$burn=burn; res$nadapt=nadapt; res$adaptevery=adaptevery; +res$mix_model_args = mix_model_args; res$emu_model_args = emu_model_args; +res$pbd=pbd; res$pb=pb; res$pbdh=pbdh; res$pbh=pbh; res$stepwpert=stepwpert; res$stepwperth=stepwperth +res$probchv=probchv; res$probchvh=probchvh; res$minnumbot=minnumbot; res$minnumboth=minnumboth +res$printevery=printevery; res$xiroot=xiroot; res$minx=minx; res$maxx=maxx; +res$summarystats=summarystats; res$tc=tc; res$modelname=modelname +class(xi)="OpenBT_cutinfo" +res$xicuts=xi +res$folder=folder +class(res)="OpenBT_posterior" + +#res$k=k_vec; res$m=m_vec; res$mh=mh_vec;res$tau=tau_vec; res$beta_vec=beta_vec; res$overalllambda=lambda_vec; res$overallnu=nu_vec; +#res$base=base_vec; res$power=power_vec; res$baseh=baseh_vec; res$powerh=powerh_vec + +return(res) + +} + +# Predict function for mixing and emulation +openbt.predict_mix_emulate = function(fit=NULL, x_test=NULL,tc=2,q.lower=0.025,q.upper=0.975){ + # model type definitions + MODEL_MIX_EMULATE=1 #Full problem + MODEL_MIX_ORTHOGONAL=2 # Full problem with orthogonal discrepancy + + #-------------------------------------------------- + # Define objects + if(is.null(fit)) stop("No fitted model specified!\n") + if(is.null(x_test)) stop("No prediction points specified for !\n") + nslv=tc + means_vec=fit$means + p=ncol(x_test) + n=nrow(x_test) + nummodels = fit$nummodels + xproot = 'xp' + + x_test=as.matrix(x_test) + x_col_list = fit$xc_col_list + + #-------------------------------------------------- + # Flatten any vector arguments that need to be passed to pred + # Design column numbers for computer models -- flattens the x_col_list + xc_design_cols = 0 + h = 1 + # Format -- number of cols for model 1, col numbers for model 1,...,number of cols for model K, col numbers for model K + for(l in 1:nummodels){ + pc = length(x_col_list[[l]]) + xc_design_cols[h] = paste(pc) + xc_design_cols[(h+1):(h+pc)] = paste(fit$xc_col_list[[l]]) + h = h + pc + 1 + } + + # Flatten the m and mh vectors + mvec = fit$mix_model_args$ntree + mvec[2:(nummodels+1)] = fit$emu_model_args[,'ntree'] + mhvec = fit$mix_model_args$ntreeh + mhvec[2:(nummodels+1)] = fit$emu_model_args[,'ntreeh'] + + # Create an info matrix and vector of arguments that are required for the individual models + info_matrix = data.frame(means = paste(means_vec),mvec, mhvec) + info_vec = unlist(as.vector(t(info_matrix))) + + #-------------------------------------------------- + #write out config file + fout=file(paste(fit$folder,"/config.pred",sep=""),"w") + writeLines(c(fit$modelname,fit$modeltype,fit$xiroot,xproot, + paste(fit$nd),paste(nummodels),paste(p),paste(tc), + info_vec,xc_design_cols), fout) + close(fout) + + #-------------------------------------------------- + #write out data subsets + #folder=paste(".",fit$modelname,"/",sep="") + xlist=split(as.data.frame(x_test),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(xlist[[i]]),file=paste(fit$folder,"/",xproot,i-1,sep="")) + for(i in 1:p) write(fit$xicuts[[i]],file=paste(fit$folder,"/",fit$xiroot,i,sep="")) + + #-------------------------------------------------- + #run prediction program + cmdopt=100 #default to serial/OpenMP + runlocal=FALSE + cmd="openbtcli --conf" + if(Sys.which("openbtcli")[[1]]=="") # not installed in a global location, so assume current directory + runlocal=TRUE + + if(runlocal) cmd="./openbtcli --conf" + + cmdopt=system(cmd) + + if(cmdopt==101) # MPI + { + cmd=paste("mpirun -np ",tc," openbtmixingpred ",fit$folder,sep="") + } + + if(cmdopt==100) # serial/OpenMP + { + if(runlocal) + cmd=paste("./openbtmixingpred ",fit$folder,sep="") + else + cmd=paste("openbtmixingpred ",fit$folder,sep="") + } + + #cmd=paste("mpirun -np ",tc," openbtpred",sep="") + #cat(cmd) + system(cmd) + system(paste("rm -f ",fit$folder,"/config.pred",sep="")) + + #-------------------------------------------------- + #format and return + res_model=vector(mode = 'list', length = nummodels+2) + names(res_model) = c("mixmodel", paste0("emulator",1:nummodels),"Information") + res = vector(mode = 'list', length = 2) + names(res) = c('mdraws', 'sdraws') + + fnames=list.files(fit$folder,pattern=paste(fit$modelname,".mdraws*",sep=""),full.names=TRUE) + res$mdraws=do.call(cbind,sapply(fnames,data.table::fread)) + fnames=list.files(fit$folder,pattern=paste(fit$modelname,".sdraws*",sep=""),full.names=TRUE) + res$sdraws=do.call(cbind,sapply(fnames,data.table::fread)) + + #Separate results into mixing vs emulation + for(i in 0:nummodels){ + # Initialize the list elements + res_model[[i+1]] = vector(mode = 'list', length = 12) + names(res_model[[i+1]]) = paste0(rep(c("m","s"),6), + rep(c('draws','mean','sd','.5','.lower','.upper'),each = 2)) + ind = seq(i*fit$nd+1, (i+1)*fit$nd, 1) + + # Summarize the data for the selected model + res_model[[i+1]]$mdraws = res$mdraws[ind,] + res_model[[i+1]]$sdraws = res$sdraws[ind,] + res_model[[i+1]]$mmean=apply(res_model[[i+1]]$mdraws,2,mean) + res_model[[i+1]]$smean=apply(res_model[[i+1]]$sdraws,2,mean) + res_model[[i+1]]$msd=apply(res_model[[i+1]]$mdraws,2,sd) + res_model[[i+1]]$ssd=apply(res_model[[i+1]]$sdraws,2,sd) + res_model[[i+1]]$m.5=apply(res_model[[i+1]]$mdraws,2,quantile,0.5) + res_model[[i+1]]$s.5=apply(res_model[[i+1]]$sdraws,2,quantile,0.5) + res_model[[i+1]]$m.lower=apply(res_model[[i+1]]$mdraws,2,quantile,q.lower) + res_model[[i+1]]$s.lower=apply(res_model[[i+1]]$sdraws,2,quantile,q.lower) + res_model[[i+1]]$m.upper=apply(res_model[[i+1]]$mdraws,2,quantile,q.upper) + res_model[[i+1]]$s.upper=apply(res_model[[i+1]]$sdraws,2,quantile,q.upper) + } + + res_model[[nummodels+2]] = list(q.lower=q.lower,q.upper=q.upper,modeltype=fit$modeltype) + rm(res) + class(res_model)="OpenBT_predict" + + return(res_model) +} diff --git a/Ropenbt/R/pareto_helpers.R b/Ropenbt/R/pareto_helpers.R new file mode 100644 index 0000000..165e4f8 --- /dev/null +++ b/Ropenbt/R/pareto_helpers.R @@ -0,0 +1,117 @@ +## pareto_helpers.R: R functions for manipulating output of OpenBT's mopareto() model. + + +# For cpf, find y_j at each y_i cutpoint for all i=1:d, where j = (not i). +# This really only works for d=2. +# +# Args: cpf: (d)x(??) matrix CPF realization. +# +# Returns: (dq)-tuple of y_j values, where j = (not i). +GetIntersectVals = function(cpf, my.cuts, d=2) +{ + + GetIntersectValsFori = function(i) { + # Returns: q-tuple of y_j values, where j = (not i). + yi.vals = my.cuts[, i] + not.i = ifelse(i == 1, 2, 1) + sapply(yi.vals, function(yi) min(cpf[not.i, cpf[i, ] <= yi], na.rm = TRUE)) + } + + unlist(lapply(1:d, GetIntersectValsFori)) +} + + + +# Compute Modified Band data depth based UQ region +calcMBD = function(fit, d=2) +{ + if(class(fit)!="OpenBT_mopareto") stop("fit object not recognized, should be type OpenBT_mopareto.\n") + + cp.theta = lapply(fit, `[[`,"theta") + nd = length(fit) + + ## 1. Create cuts + n.cuts = 2^6 # also called q + cp.theta.cbind = Reduce(cbind, cp.theta) + my.cuts = sapply(1:d, function(i) seq(min(cp.theta.cbind[i, ]), + max(cp.theta.cbind[i, ]), length.out = n.cuts)) # (q x d) matrix + + ## 2. Create (dq)x(nd) matrix M + mat.M = sapply(cp.theta, GetIntersectVals, my.cuts) + + ## 3. Create (2q)x(nd) matrix R + mat.R.max = t(apply(mat.M, 1, rank, ties.method = "max")) + mat.R.min = t(apply(mat.M, 1, rank, ties.method = "min")) + + ## 4. Create (unnormalized) T matrix + # If l (r) = # strictly to left (right) and m is multiplicity, + # then the correct unnormalized depth is (l+m)(r+m)-m^2. + # No ties implies mat.R.max equals mat.R.min. + mat.T = mat.R.max * (nd - mat.R.min + 1) - (mat.R.max - mat.R.min + 1)^2 + mat.T = mat.T / choose(nd, 2) + + ## 5. Compute modified depth of each Pareto front. + depth.mod = colMeans(mat.T, na.rm=TRUE) + + ## 6. Compute UQ region + uq.cols = (rank(-depth.mod) <= (0.5 * nd)) + pf.uq.mbd = Reduce(cbind, lapply(which(uq.cols), function(i) cp.theta[[i]])) + + # 7. Compute PS + cp.uq.a.cbind = Reduce(cbind, lapply(which(uq.cols), function(i) fit[[i]]$a)) # Use fit here + cp.uq.b.cbind = Reduce(cbind, lapply(which(uq.cols), function(i) fit[[i]]$b)) # Use fit here + + return(list(depth.mbd=depth.mod,pf.uq.mbd=pf.uq.mbd,cp.uq.a=cp.uq.a.cbind,cp.uq.b=cp.uq.b.cbind)) +} + + + +# Compute Vorobev Random Sets based PF UQ region +calcRS = function(fit, d=2) +{ + if(class(fit)!="OpenBT_mopareto") stop("fit object not recognized, should be type OpenBT_mopareto.\n") + + cp.theta = lapply(fit, `[[`,"theta") + nd = length(fit) + + # 2. if R is unknown then find the extremal values for the + # objectives over the RNP sets realizations + cp.theta.cbind = Reduce(cbind, cp.theta) + + ref.max = apply(cp.theta.cbind, 1, max) + ref.min = apply(cp.theta.cbind, 1, min) + ref.vol = Reduce(`*`, ref.max - ref.min) + + # 4. Determine the average volume of the attained sets + FindVol = function(pt.set) { + # Find hypervolume of pt.set wrt ref.pt. + emoa::hypervolume_indicator(points = as.matrix(pt.set, nrow=d), + o = as.matrix(ref.max, nrow=d)) * -1 + } + vols.cpfs = sapply(cp.theta, FindVol) + avg.vol = mean(vols.cpfs) + + # 5. Find the value of the Vorob'ev threshold β∗ by dichotomy: + WhichCPFsDomPt = function(z) { + DoesCPFDomPt = function(cpt, z) { + dom.by.dim = lapply(1:d, function(i) cpt[i, ] <= z[i]) + Reduce(`|`, Reduce(`&`, dom.by.dim)) + } + sapply(cp.theta, DoesCPFDomPt, z=z) + } + + pf.att = apply(cp.theta.cbind, 2, WhichCPFsDomPt) # nd by n.pts matrix + pf.att = colMeans(pf.att) # n.pts-tuple + + # This region is defined by its nondominated point set. + uq.cols = (0.25 <= pf.att) & (pf.att <= 0.75) + pf.uq.rs = cp.theta.cbind[, uq.cols] + + # Compute UQ region for PS and its accuracy ############################## + cp.a.cbind = Reduce(cbind, lapply(fit, `[[`, "a")) # Use fit here + cp.b.cbind = Reduce(cbind, lapply(fit, `[[`, "b")) # Use fit here + cp.uq.a.cbind = cp.a.cbind[, uq.cols] + cp.uq.b.cbind = cp.b.cbind[, uq.cols] + + return(list(pf.uq=pf.uq.rs,cp.uq.a=cp.uq.a.cbind,cp.uq.b=cp.uq.b.cbind)) +} diff --git a/Ropenbt/man/calcMBD.Rd b/Ropenbt/man/calcMBD.Rd new file mode 100644 index 0000000..72db04b --- /dev/null +++ b/Ropenbt/man/calcMBD.Rd @@ -0,0 +1,32 @@ +\name{calcMBD} +\alias{calcMBD} +%\docType{package} +\title{Construct UQ regions for Pareto Front/Set using Modified Band Depth approach.} +\description{ + The function \code{calcMBD()} takes \code{OpenBT_mopareto} object and calculates UQ regions for the estimated Pareto Front and Pareto Set using the modified band depth approach. +} +\usage{ +calcMBD(fit,d=2) +} +\arguments{ + \item{fit}{An \code{OpenBT_mopareto} object.} + \item{d}{The number of responses being considered in the Pareto front/set. Currently only d=2 responses is supported.} +} +\details{ + \code{calcMBD()} calculates the modified band depth UQ region for the Pareto front and set calculated using two OpenBT models. +} +\references{ + Horiguchi, Akira, Santner, Thomas J., Sun, Ying, and Pratola, Matthew T. (2022) + Using BART to perform Pareto optimization and quantify its uncertainties. + \emph{Technometrics}, vol.64, 564--574. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{mopareto.openbt}} +} +\examples{ +# For example usage, see readthedocs. +} \ No newline at end of file diff --git a/Ropenbt/man/calcRS.Rd b/Ropenbt/man/calcRS.Rd new file mode 100644 index 0000000..97b3df9 --- /dev/null +++ b/Ropenbt/man/calcRS.Rd @@ -0,0 +1,32 @@ +\name{calcRS} +\alias{calcRS} +%\docType{package} +\title{Construct UQ regions for Pareto Front/Set using random sets approach.} +\description{ + The function \code{calcRS()} takes \code{OpenBT_mopareto} object and calculates UQ regions for the estimated Pareto Front and Pareto Set using the random sets approach. +} +\usage{ +calcRS(fit,d=2) +} +\arguments{ + \item{fit}{An \code{OpenBT_mopareto} object.} + \item{d}{The number of responses being considered in the Pareto front/set. Currently only d=2 responses is supported.} +} +\details{ + \code{calcRS()} calculates the random sets based UQ region for the Pareto front and set calculated using two OpenBT models. +} +\references{ + Horiguchi, Akira, Santner, Thomas J., Sun, Ying, and Pratola, Matthew T. (2022) + Using BART to perform Pareto optimization and quantify its uncertainties. + \emph{Technometrics}, vol.64, 564--574. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{mopareto.openbt}} +} +\examples{ +# For example usage, see readthedocs. +} \ No newline at end of file diff --git a/Ropenbt/man/load.openbt.Rd b/Ropenbt/man/load.openbt.Rd new file mode 100644 index 0000000..9f95762 --- /dev/null +++ b/Ropenbt/man/load.openbt.Rd @@ -0,0 +1,37 @@ +\name{openbt.load} +\alias{openbt.load} +%\docType{package} +\title{Load a previously saved OpenBT model.} +\description{ + The function \code{openbt.load()} loads an OpenBT model previously saved using \code{openbt.save()}. +} +\usage{ +\method{openbt}{load}( +fname) +} +\arguments{ + \item{fname}{Filename of the OpenBT model to load from disk storage.} +} +\details{ + \code{openbt.load()} loads an OpenBT model that was previously saved using \code{openbt.save()}. + + Returns an object of type \code{OpenBT_posterior}. +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +\dontrun{ + # Load fitted model to a new object. + fit2=openbt.load("test") +} +} \ No newline at end of file diff --git a/Ropenbt/man/makecuts.Rd b/Ropenbt/man/makecuts.Rd new file mode 100644 index 0000000..697caa4 --- /dev/null +++ b/Ropenbt/man/makecuts.Rd @@ -0,0 +1,34 @@ +\name{makecuts.openbt} +\alias{makecuts.openbt} +%\docType{package} +\title{Make a variable cuts object to be used in OpenBT regression models.} +\description{ + The function \code{makecuts.openbt()} takes an n by p design matrix and a scalar or vector number of cutpoints per variable, and returns an object of type \code{OpenBT_cutinfo} with the cutpoint information. +} +\usage{ +\method{makecuts}{openbt}(x,numcuts) +} +\arguments{ + \item{x}{An n by p design matrix.} + \item{numcuts}{Can be a scalar (same number of cutpoints for each variable) or vector describing the number of cutpoints per variable.} +} +\details{ + \code{makecuts.openbt()} creates a cutpoint object for an OpenBT model. + + Returns an object of type \code{OpenBT_cutinfo} which is a length p list, each having a vector describing the cutpoint locations. +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +# For example usage, see readthedocs. +} \ No newline at end of file diff --git a/Ropenbt/man/mopareto.openbt.Rd b/Ropenbt/man/mopareto.openbt.Rd new file mode 100644 index 0000000..4a03274 --- /dev/null +++ b/Ropenbt/man/mopareto.openbt.Rd @@ -0,0 +1,68 @@ +\name{mopareto.openbt} +\alias{mopareto.openbt} +%\docType{package} +\title{Pareto Front Multiobjective Optimization for OpenBT regression models.} +\description{ + The function \code{mopareto.openbt()} calculates the Pareto front and set for 2 fitted Bayesian tree models. Supports continous response BART or single tree models. +} +\usage{ +\method{mopareto}{openbt}( +fit1, +fit2, +fit3, +q.lower=0.025, +q.upper=0.975, +tc=2) +} +\arguments{ + \item{fit1}{First object of type \code{OpenBT_posterior} from a previous call to \code{openbt()}} + \item{fit2}{Second object of type \code{OpenBT_posterior} from a previous call to \code{openbt()}} + \item{fit3}{Optionally, a third object of type \code{OpenBT_posterior} from a previous call to \code{openbt()}} + \item{q.lower}{Lower quantile to return.} + \item{q.upper}{Upper quantile to return.} + \item{tc}{Number of cores to use for parallel computing.} +} +\details{ + \code{mopareto.openbt()} calculates the Pareto front and set for a pair (optionally, a 3-tuple) of Bayesian regression tree models that have been fit by \code{openbt()}. + + Returns an object of type \code{OpenBT_mopareto} which is a length ndpost list with each list entry consisting of the following matrices: +} +\value{ + \item{theta}{Posterior realization of the predicted values along the Pareto front. Stored in a 2-row matrix whose number of columns corresponds to the number of hyperrectangles on the Pareto front.} + \item{a}{Posterior realization of the lower-left corner of a hyperrectangle on the Pareto front. Stored in a p-row matrix whose number of columns corresponds to the number of hyperrectangles on the Pareto front.} + \item{b}{Posterior realization of the upper-right corner of a hyperrectangle on the Pareto front. Stored in a p-row matrix whose number of columns corresponds to the number of hyperrectangles on the Pareto front.} +} +\references{ + Horiguchi, Akira, Santner, Thomas J., Sun, Ying, and Pratola, Matthew T. (2022) + Using BART to perform Pareto optimization and quantify its uncertainties. + \emph{Technometrics}, vol.64, 564--574. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +\code{\link{calcMBD}} +\code{\link{calcRS}} +} +\examples{ +\dontrun{ + # Example of MO optimization + fit.a=openbt(x,y,pbd=c(0.7,0.0),ntreeh=1,numcut=20,tc=6,model="bart",modelname="branin-a") + fit.b=openbt(x,y,pbd=c(0.7,0.0),ntreeh=1,numcut=20,tc=6,model="bart",modelname="branin-b") + + pf=mopareto.openbt(fit.a,fit.b,tc=6) # quite slow, use many cores to speed it up + + # plot front + par(mfrow=c(1,2)) + plot(0,0,xlim=c(-2,2),ylim=c(-2,2),type="n",xlab="y1",ylab="y2") + for(vhr in pf) points(t(vhr$theta),pch=20) + + # plot set + plot(0,0,xlim=c(0,1),ylim=c(0,1),type="n",xlab="x1",ylab="x2") + for(vhr in pf) rect(vhr$a[1, ], vhr$a[2, ], vhr$b[1, ], vhr$b[2, ], col=gray(0.5,alpha=0.1), border=NA) + + # More examples found in pareto_example.R +} +} \ No newline at end of file diff --git a/Ropenbt/man/openbt.Rd b/Ropenbt/man/openbt.Rd new file mode 100644 index 0000000..0fc366f --- /dev/null +++ b/Ropenbt/man/openbt.Rd @@ -0,0 +1,105 @@ +\name{openbt} +\alias{openbt} +%\docType{package} +\title{Interface to Bayesian Regression Tree models in OpenBT.} +\description{ + The function \code{openbt()} is the main function for fitting Bayesian Regression Tree models that are provided in the OpenBT project. +} +\usage{ +openbt(x.train, y.train, ntree=NULL, ntreeh=NULL, ndpost=1000, + nskip=100, k=NULL, power=2.0, base=.95, tc=2, sigmav=rep(1,length(y.train)), + fmean=mean(y.train), overallsd = NULL, overallnu=NULL, + chv = cor(x.train,method="spearman"), pbd=.7, pb=.5, stepwpert=.1, probchv=.1, + minnumbot=5, printevery=100, numcut=100, xicuts=NULL, nadapt=1000, adaptevery=100, + summarystats=FALSE,truncateds=NULL,model=NULL,modelname="model") +} +\arguments{ + \item{x.train}{nxp matrix of predictor variables for the training data.} + \item{y.train}{nx1 vector of the observed response for the training data.} + \item{ntree}{Number of trees used for the mean model.} + \item{ntreeh}{Number of trees used for the variance model.} + \item{ndpost}{Number of iterations to run the MCMC algorithm after burn-in.} + \item{nskip}{Number of MCMC iterations treated as burn-in and discarded.} + \item{k}{Prior hyperparameter for the mean model.} + \item{power}{Power parameter in the tree depth penalizing prior.} + \item{base}{Base parameter in the tree depth penalizing prior.} + \item{tc}{Number of OpenMP threads to use.} + \item{sigmav}{Initialization of square-root of variance parameter.} + \item{fmean}{Overall mean of the data for pre-centering the data before running the model.} + \item{overallsd}{A rudimentary estimate of the process standard deviation. Used in calibrating the variance prior.} + \item{overallnu}{Shape parameter for the variance prior.} + \item{chv}{Predictor correlation matrix used as a pre-conditioner for MCMC change-of-variable proposals.} + \item{pbd}{Probability of performing a birth/death proposal, otherwise perform a rotate proopsal.} + \item{pb}{Probability of performing a birth proposal given that we choose to perform a birth/death proposal.} + \item{stepwpert}{Initial width of proposal distribution for peturbing cutpoints.} + \item{probchv}{Probability of performing a change-of-variable proposal. Otherwise, only do a perturb proposal}. + \item{minnumbot}{Minimum number of observations required in bottom (terminal) nodes.} + \item{printevery}{Outputs MCMC algorithm status every printevery iterations.} + \item{numcut}{Number of cutpoints to use for each predictor variable.} + \item{xicuts}{More detailed construction of cutpoints can be specified using makecuts() and passed as an argument here.} + \item{nadapt}{Number of MCMC iterations allowed for adaptive MCMC. These are also discarded.} + \item{adaptevery}{Adapt MCMC proposal distributions every adaptevery iterations until the algorithm has run for nadapt iterations.} + \item{summarystats}{Return detailed summary statistics about the fitting procedure.} + \item{truncateds}{For using the truncated Gaussian model contributed by Dai Feng.} + \item{model}{Specifies the model type. Options include 'bt','binomial','poisson','bart','hbart','probit','modifiedprobit' and 'merck_truncated'.} + \item{modelname}{A user-specified string that names their model, such as myExampleModel.} +} +\details{ + \code{openbt()} is the main model fitting function for continuous and dichotomous response data. + The most common models to fit are a homoscedastic single-tree model, a homoscedastic BART model and a heteroscedastic BART model. + + For a BART model, set \code{pbd=c(0.7,0.0)} and \code{ntreeh=1}. This forces a scalar (homoscedastic) variance term. + + For a single-tree model, set \code{pbd=(0.7,0.0)}, \code{ntreeh=1} and \code{ntree=1}. This forces the mean component to be modeled using only one tree. + + Examples are available on the OpenBT wiki page. +} +\value{ + \item{res}{Fitted model object of S3 class OpenBT_posterior.} +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{predict.openbt}} +} +\examples{ +\dontrun{ + remotes::install_bitbucket("mpratola/openbt/Ropenbt") + library(Ropenbt) + + # Test Branin function, rescaled + braninsc <- function(xx) + { + x1 <- xx[1] + x2 <- xx[2] + + x1bar <- 15*x1 - 5 + x2bar <- 15 * x2 + + term1 <- x2bar - 5.1*x1bar^2/(4*pi^2) + 5*x1bar/pi - 6 + term2 <- (10 - 10/(8*pi)) * cos(x1bar) + + y <- (term1^2 + term2 - 44.81) / 51.95 + return(y) + } + + # Simulate branin data for testing + set.seed(99) + n=500 + p=2 + x = matrix(runif(n*p),ncol=p) + y=rep(0,n) + for(i in 1:n) y[i] = braninsc(x[i,]) + + # Fit BART model + fit=openbt(x,y,tc=4,model="bart",modelname="branin") + summary(fit) +} +} \ No newline at end of file diff --git a/Ropenbt/man/predict.openbt.Rd b/Ropenbt/man/predict.openbt.Rd new file mode 100644 index 0000000..3a521ae --- /dev/null +++ b/Ropenbt/man/predict.openbt.Rd @@ -0,0 +1,79 @@ +\name{predict.openbt} +\alias{predict.openbt} +%\docType{package} +\title{Drawing Posterior Predictive Realizations for OpenBT regression models.} +\description{ + The function \code{predict.openbt()} is the main function for drawing posterior predictive realizations at new inputs using a fitted model stored in an \code{OpenBT_posterior} object returned from \code{openbt()}. +} +\usage{ +\method{predict}{openbt}( +object, +x.test=NULL, +tc=2, +fmean=fit$fmean, +q.lower=0.025, +q.upper=0.975, ...) +} +\arguments{ + \item{object}{Object of type \code{OpenBT_posterior} from a previous call to \code{openbt()}} + \item{x.test}{New input settings in the form of an \code{npred x p} matrix at which to construct predictions. Defaults to the training inputs.} + \item{tc}{Number of CPU cores to use for parallel computing.} + \item{fmean}{Mean-centering vector for the training data. Defaults to the value used by \code{openbt()} when fitting the model. Usually should be left to the default.} + \item{q.lower}{Lower quantile to return.} + \item{q.upper}{Upper quantile to return.} + \item{...}{Additional arguments for S3 compatibility (currently unused).} +} +\details{ + \code{predict.openbt()} is the main function for calculating posterior predictions and uncertainties once a model has been fit by \code{openbt()}. + + Returns an object of type \code{OpenBT_predict} with the following entries. +} +\value{ + \item{mdraws}{Posterior realizations of the mean function, \eqn{f({\bf x})}{f(x)} stored in an \code{ndpost x npred} matrix, + where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{sdraws}{Posterior realizations of the standard deviation function, \eqn{s({\bf x})}{s(x)} stored in an \code{ndpost x npred} matrix, + where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{mmean}{Posterior predictive mean of \eqn{f({\bf x})}{f(x)}.} + \item{smean}{Posterior predictive mean of the standard deviation, \eqn{s({\bf x})}{s(x)}.} + \item{msd}{Posterior standard deviation of the mean, \eqn{f({\bf x})}{f(x)}.} + \item{ssd}{Posterior standard deviation of the standard devation, \eqn{s({\bf x})}{s(x)}.} + \item{m.5}{Posterior median of the mean function realizations, \eqn{f({\bf x})}{f(x)}.} + \item{m.lower}{Posterior \code{q.lower} quantile of the mean function realizations.} + \item{m.upper}{Posterior \code{q.upper} quantile of the mean function realizations.} + \item{s.5}{Posterior median of the standard deviation function realizations, \eqn{s({\bf x})}{s(x)}.} + \item{s.lower}{Posterior \code{q.lower} quantile of the standard deviation function realizations.} + \item{s.upper}{Posterior \code{q.upper} quantile of the standard deviation function realizations.} + \item{q.lower}{Lower quantile used in constructing the above.} + \item{q.upper}{Upper quantile used in constructing the above.} + \item{pdraws}{Posterior realizations of the probabilities from a probit model.} + \item{pmean}{Posterior predictive mean of the probabilities from a probit model.} + \item{psd}{Posterior standard deviation of the probit model fitted probabilities.} + \item{p.5}{Posterior median of the probit model fitted probabilities.} + \item{p.lower}{Posterior \code{q.lower} quantile of the probit model fitted probabilities.} + \item{p.upper}{Posterior \code{q.upper} quantile of the probit model fitted probabilities.} + \item{q.lower}{The lower quantile used.} + \item{q.upper}{The upper quantile used.} + \item{modeltype}{The type of model fitted.} +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +\dontrun{ + # Calculate in-sample predictions + fitp=predict.openbt(fit,x,tc=4) + + # Make a simple plot + plot(y,fitp$mmean,xlab="observed",ylab="fitted") + abline(0,1) +} +} \ No newline at end of file diff --git a/Ropenbt/man/save.openbt.Rd b/Ropenbt/man/save.openbt.Rd new file mode 100644 index 0000000..6ab1a22 --- /dev/null +++ b/Ropenbt/man/save.openbt.Rd @@ -0,0 +1,36 @@ +\name{openbt.save} +\alias{openbt.save} +%\docType{package} +\title{Save a fitted OpenBT model to disk.} +\description{ + The function \code{openbt.save()} will save an OpenBT model to disk. +} +\usage{ +\method{openbt}{save}( +post,fname=NULL) +} +\arguments{ + \item{post}{An object of type \code{OpenBT_posterior} to save to disk storage.} + \item{fname}{The filename to store the saved result.} +} +\details{ + \code{openbt.save()} will save an OpenBT model that was previously fitted using \code{openbt()}. The data is stored in compressed format and the extension \code{.obt} is automatically appended to the filename \code{fname}. +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +\dontrun{ + # Save fitted model as test.obt in the working directory + openbt.save(fit,"test") +} +} \ No newline at end of file diff --git a/Ropenbt/man/setvarcuts.Rd b/Ropenbt/man/setvarcuts.Rd new file mode 100644 index 0000000..4079388 --- /dev/null +++ b/Ropenbt/man/setvarcuts.Rd @@ -0,0 +1,38 @@ +\name{setvarcuts.openbt} +\alias{setvarcuts.openbt} +%\docType{package} +\title{Modify an existing \code{OpenBT_cutinfo} cutpoint object.} +\description{ + The function \code{servarcuts.openbt()} can modify an exisiting cutpoint object. +} +\usage{ +\method{setvarcuts}{openbt}( +xi, +id, +cutvec) +} +\arguments{ + \item{xi}{Object of type \code{OpenBT_cutinfo} created from a previous call to \code{makecuts()}} + \item{id}{The variable to modify.} + \item{cutvec}{The new cutpoint vector for variable \code{id}.} +} +\details{ + \code{setvarcuts.openbt()} allows one to modify an existing cutpoint object by replacing the cutpoints for variable \code{id} with a new cutpoint vector given by \code{cutvec}. + + Returns an object of type \code{OpenBT_cutinfo}. +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +# For example usage, see readthedocs. +} \ No newline at end of file diff --git a/Ropenbt/man/sobol.openbt.Rd b/Ropenbt/man/sobol.openbt.Rd new file mode 100644 index 0000000..be9a86c --- /dev/null +++ b/Ropenbt/man/sobol.openbt.Rd @@ -0,0 +1,75 @@ +\name{sobol.openbt} +\alias{sobol.openbt} +%\docType{package} +\title{Sobol Sensitivity Indices for OpenBT regression models.} +\description{ + The function \code{sobol.openbt()} calculates the Sobol' sensitivity metric for Bayesian tree models. Supports continous response BART or single tree models. +} +\usage{ +\method{sobol}{openbt}( +fit=NULL, +q.lower=0.025, +q.upper=0.975, +tc=2) +} +\arguments{ + \item{fit}{Object of type \code{OpenBT_posterior} from a previous call to \code{openbt()} with model type 'bart' or 'hbart'.} + \item{q.lower}{Lower quantile to return.} + \item{q.upper}{Upper quantile to return.} + \item{tc}{Number of cores to use in parallel computing.} +} +\details{ + \code{sobol.openbt()} calculates the Sobol' sensitivity indices for a continuous response Bayesian regression tree model that has been fit by \code{openbt()}. These are exact (closed-form) calculations, and are much faster and more accurate than Monte-Carlo approximation methods. + + Returns an object of type \code{OpenBT_sobol} with the following entries. +} +\value{ + \item{vidraws}{Posterior realizations of the 1-way Sobol' variability, stored in an \code{ndpost x p} matrix, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{vijdraws}{Posterior realizations of the 2-way Sobol' variability, stored in an \code{ndpost x p*(p-1)/2} matrix, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{tvidraws}{Posterior realizations of the total single effect Sobol' variability, stored in a matrix with ndpost rows, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{vdraws}{Posterior realizations of total variability, stored in a matrix with ndpost rows.} + \item{sidraws}{Posterior realizations of the 1-way Sobol' sensitivities, stored in an \code{ndpost x p} matrix, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{sijdraws}{Posterior realizations of the 2-way Sobol' sensitivities, stored in an \code{ndpost x p*(p-1)/2} matrix, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{tsidraws}{Posterior realizations of the total single effect Sobol' sensitivities, stored in a matrix with ndpost rows, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{msi}{Posterior mean of the 1-way Sobol' sensitivities.} + \item{msi.sd}{Posterior s.d. of the 1-way Sobol' sensitivities.} + \item{si.5}{Posterior median of the 1-way Sobol' sensitivities.} + \item{si.lower}{Posterior q.lower quantile of the 1-way Sobol' sensitivities.} + \item{si.upper}{Posterior q.upper quantile of the 1-way Sobol' sensitivities.} + \item{msij}{Posterior mean of the 2-way Sobol' sensitivities.} + \item{sij.sd}{Posterior s.d. of the 2-way Sobol' sensitivities.} + \item{sij.5}{Posterior median of the 2-way Sobol' sensitivities.} + \item{sij.lower}{Posterior q.lower quantile of the 2-way Sobol' sensitivities.} + \item{sij.upper}{Posterior q.upper quantile of the 2-way Sobol' sensitivities.} + \item{mtsi}{Posterior mean of the total effect Sobol' sensitivities.} + \item{tsi.sd}{Posterior s.d. of the total effect Sobol' sensitivities.} + \item{tsi.5}{Posterior median of the total effect Sobol' sensitivities.} + \item{tsi.lower}{Posterior q.lower quantile of the total effect Sobol' sensitivities.} + \item{tsi.upper}{Posterior q.upper quantile of the total effect Sobol' sensitivities.} + \item{q.lower}{The lower quantile used.} + \item{q.upper}{The upper quantile used.} + \item{modeltype}{The type of model fitted by \code{openbt()}.} +} +\references{ + Horiguchi, Akira, Pratola, Matthew T. and Santner, Thomas J. (2020) + Assessing Variable Activity for Bayesian Regression Trees. + \emph{Reliability Engineering and System Safety}, \bold{207}, 107391. + + Horiguchi, Akira and Pratola, Matthew T. (2025) Estimating Shapley Effects in Big-Data Emulation and Regression Settings using Bayesian Additive Regression Trees. \emph{arXiv preprint}, \bold{arXiv:2304.03809}, 1--32. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +\dontrun{ + # Calculate Sobol indices + fits=sobol.openbt(fit2) + fits$msi + fits$mtsi + fits$msij +} +} \ No newline at end of file diff --git a/Ropenbt/man/vartivity.openbt.Rd b/Ropenbt/man/vartivity.openbt.Rd new file mode 100644 index 0000000..34fb5ae --- /dev/null +++ b/Ropenbt/man/vartivity.openbt.Rd @@ -0,0 +1,65 @@ +\name{vartivity.openbt} +\alias{vartivity.openbt} +%\docType{package} +\title{Classical variable activity metric for OpenBT regression models.} +\description{ + The function \code{vartivity.openbt()} calculates the classical variable activity metric for Bayesian tree models that is based on counting the number of internal nodes that split on each variable in a dataset. +} +\usage{ +\method{vartivity}{openbt}( +fit=NULL, +q.lower=0.025, +q.upper=0.975) +} +\arguments{ + \item{fit}{Object of type \code{OpenBT_posterior} from a previous call to \code{openbt()}} + \item{q.lower}{Lower quantile to return.} + \item{q.upper}{Upper quantile to return.} +} +\details{ + \code{vartivity.openbt()} calculates the classical variable activity metric for a Bayesian regression tree model that has been fit by \code{openbt()}. + + Returns an object of type \code{OpenBT_vartivity} with the following entries. +} +\value{ + \item{vdraws}{Posterior realizations of the split proportions for each variable in the mean model, stored in an \code{ndpost x npred} matrix, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{vdrawsh}{Posterior realizations of the split proportions for each variable in the variance model, stored in an \code{ndpost x npred} matrix, where ndpost is the number of kept MCMC draws in the \code{openbt} run. When a constant variance model is used, this will be zero.} + \item{mvdraws}{Posterior mean of the split proportions for the mean model.} + \item{mvdrawsh}{Posterior mean of the split proportions for the variance model.} + \item{vdraws.sd}{Posterior standard deviation of the split proportions for the mean model.} + \item{vdrawsh.sd}{Posterior standard deviation of the split proportions for the variance model.} + \item{vdraws.5}{Posterior median of the split proportions for the mean model.} + \item{vdrawsh.5}{Posterior median of the split proportions for the mean model.} + \item{vdraws.lower}{Posterior \code{q.lower} quantile of the split proportions for the mean model.} + \item{vdraws.upper}{Posterior \code{q.upper} quantile of the split proportions for the mean model.} + \item{vdrawsh.lower}{Posterior \code{q.lower} quantile of the split proportions for the variance model.} + \item{vdrawsh.upper}{Posterior \code{q.upper} quantile of the split proportions for the variance model.} + \item{q.lower}{The lower quantile used.} + \item{q.upper}{The upper quantile used.} + \item{modeltype}{The type of model fitted by \code{openbt()}.} +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. + + Horiguchi, Akira, Pratola, Matthew T. and Santner, Thomas J. (2020) + Assessing Variable Activity for Bayesian Regression Trees. + \emph{Reliability Engineering and System Safety}, \bold{207}, 107391. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +\dontrun{ + # Calculate variable activity information + fitv=vartivity.openbt(fit2) + + # Plot variable activity + plot(fitv) +} +} \ No newline at end of file diff --git a/docs/contributing.rst b/docs/contributing.rst index 8292b49..4055b6f 100644 --- a/docs/contributing.rst +++ b/docs/contributing.rst @@ -3,4 +3,6 @@ Contributing General Information ------------------- .. todo:: - Sarthak to write this + The Open Bayesian Trees (OpenBT) project evolved from an initial C + MPI codebase written from scrach in 2010 through two subsequent ground-up rewrites culminating in the current C++ codebase with R and Python wrappers. Matt Pratola (mpratola@iu.edu) has served as the primary author and architect for OpenBT. John Yannotty (jcyannotty@gmail.com) made significant contributions to the project including BART-based model mixing, the Random Path BART (RPBART) model and RPBART-based model mixing, integrating the Eigen library, moving the build system from Autotools to Meson, and writing Python wrapper code. Jared O'Neal (joneal@anl.gov) led the move from Autotools to Meson and built out the various Github and Python integrations as part of folding OpenBT into BAND. Akira Horiguchi (ahoriguchi@ucdavis.edu) made contributions with Sobol sensitivity indices and Shapley indices as well as Pareto front-based multiobjective optimization. Clark van Lieshout (clarkvan33@gmail.com) contributed Python wrapper code. Robert McCulloch (robert.mcculloch@asu.edu) made contributions in the early days with the random number generators and log gamma approximation. + + The current repo, now housed within the wider BAND project, becomes the long-term home for OpenBT. diff --git a/docs/index.rst b/docs/index.rst index 1633abf..ce823fe 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -3,11 +3,13 @@ .. _Open MPI: https://www.open-mpi.org .. _MPICH: https://www.mpich.org .. _framework: https://bandframework.github.io +.. _Issue 35: https://github.com/bandframework/OpenBT/issues/35 +.. _OpenBT repository: https://bitbucket.org/mpratola/openbt/src/master +.. _OpenBTMixing repository: https://github.com/jcyannotty/OpenBT .. todo:: - Write high-level description, explain development history, and motivate - breakdown of documents presented here. License, copyright, responsibilities - of user, etc. + Matt to write high-level description and motivate breakdown of documents + presented here. License, copyright, responsibilities of user, etc. The heart of |openbt| is a set of C++ tools that can be used directly |via| the command line or indirectly through the ``openbt`` Python package, which wraps @@ -22,12 +24,18 @@ it can be built with MPI installed on a laptop using the system's package manager or with MPI installations on leadership class platforms and clusters that were installed by experts and optimized for their specific platform. +This repository is being established by merging the contents of the original +`OpenBT repository`_ with the `OpenBTMixing repository`_, which was based off of +the former. It, therefore, will supercede those two repositories, which will be +frozen. + +This repository and its contents are being established and developed as part of +|band| framework_. + .. note:: While an R wrapper does exist for the original |openbt| and |openbtmixing| repositories, that functionality has not yet been included in this new, - combined repository (Issue #XYZ). - -This package is being developed as part of |band| framework_. + combined repository (`Issue 35`_). .. toctree:: :numbered: diff --git a/docs/release_procedure.rst b/docs/release_procedure.rst index 8adaee2..db1a149 100644 --- a/docs/release_procedure.rst +++ b/docs/release_procedure.rst @@ -128,8 +128,8 @@ Otherwise, * One side effect of the use of ``setuptools-scm`` is that by default it includes in the distribution all files located within ``openbt_pypkg``. However, many files and folders in that space do **not** need to be - distributed (|eg| Flake8 configuration files and OpenBT command line tools - installed during development in ``src/openbt/bin``). This is what + distributed (|eg| Flake8 configuration files and |openbt| command line + tools installed during development in ``src/openbt/bin``). This is what "minimal" means above. Files and folders that should not be included in the distribution are specified in ``MANIFEST.in``. * Review the metadata to ensure correct and complete. This should include diff --git a/includes/ambrt.h b/includes/ambrt.h index 205bbbc..231007b 100644 --- a/includes/ambrt.h +++ b/includes/ambrt.h @@ -1,26 +1,4 @@ // ambrt.h: Additive mean BART model class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_ambrt_h #define GUARD_ambrt_h @@ -77,13 +55,24 @@ class ambrt : public mbrt void collapseensemble(); // function for calculating Sobol-based variable activity indices - void sobol(std::vector& Si, std::vector&Sij, std::vector& TSi, double& V, std::vector& minx, std::vector& maxx, size_t p); + void sobol(std::vector& Si, std::vector&Sij, std::vector& TSi, std::vector& Shi, double& V, std::vector& minx, std::vector& maxx, size_t p, rn& gen); // function for converting an ensemble to vector hyperrectangle format, needed for Pareto Front multiobjective optimization (see mopareto.cpp) void ens2rects(std::vector >& asol, std::vector >& bsol, std::vector& thetasol, std::vector& minx, std::vector& maxx, size_t p); + // function to extract hyperrectangle from bottom nodes associated with influential observation y_d(x_d) + void inflrect(double* xd, std::vector& asol, std::vector& bsol, + std::vector& minx, std::vector& maxx, size_t p); + // function to extract the 'union' hyperrectangle from bottom nodes associated with influential observation y_d(x_d) + void influnionrect(double* xd, std::vector& asol, std::vector&bsol, + std::vector& minx, std::vector& maxx, size_t p); + + void cookdinfl(std::vector& cdinflmean, std::vector& cdinflmax, double* sigma); //Cook's distance + void kldivinfl(std::vector& klinfl, double* sigma); //KL-divergence based influence metric + void cpoinfl(std::vector& cpoinfl, double* sigma); //CPO^-1 based influence metric + //-------------------- //data //-------------------------------------------------- diff --git a/includes/amxbrt.h b/includes/amxbrt.h index d877fe0..dffe9fc 100644 --- a/includes/amxbrt.h +++ b/includes/amxbrt.h @@ -1,3 +1,5 @@ +// amxbrt.h: BART-based model mixing model class definition. + #ifndef GUARD_amxbrt_h #define GUARD_amxbrt_h diff --git a/includes/brt.h b/includes/brt.h index 456362d..b52d127 100644 --- a/includes/brt.h +++ b/includes/brt.h @@ -1,26 +1,4 @@ // brt.h: Base BT model class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_brt_h #define GUARD_brt_h @@ -177,6 +155,9 @@ class brt { // void loadtree(size_t nn, int* id, int* v, int* c, double* theta); //load tree from vector input format void loadtree(size_t iter, size_t m, std::vector& nn, std::vector >& id, std::vector >& v, std::vector >& c, std::vector >& theta); + void cookdinfl(std::vector& cdinfl); //Cook's distance + void kldivinfl(std::vector& klinfl); //KL-divergence based influence metric + void cpoinfl(std::vector& cpoinfl); //CPO^-1 based influence metric //-------------------- //data tree t; diff --git a/includes/brtfuns.h b/includes/brtfuns.h index cbfe39b..1e75a84 100644 --- a/includes/brtfuns.h +++ b/includes/brtfuns.h @@ -1,26 +1,4 @@ // brtfuns.h: Base BT model class help functions header file. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_brtfuns_h #define GUARD_brtfuns_h @@ -30,6 +8,8 @@ #include "tree.h" #include "treefuns.h" #include "brt.h" +#include // std::set_union, std::set_intersection, std::sort +#include using std::cout; using std::endl; @@ -112,7 +92,7 @@ void splitall(tree::tree_p t, tree::npv& tlefts, tree::npv& trights); //-------------------------------------------------- // Functions to support calculation of Sobol indices for BART -// Based on Hiroguchi, Pratola and Santner (2020). +// Based on Horiguchi, Pratola and Santner (2020). //-------------------------------------------------- double probxnoti_termk(size_t i, size_t k, std::vector >& a, std::vector >& b, std::vector& minx, std::vector& maxx); double probxi_termk(size_t i, size_t k, std::vector >& a, std::vector >& b, std::vector& minx, std::vector& maxx); @@ -122,9 +102,23 @@ double probxi_termkl(size_t i, size_t k, size_t l, std::vector >& a, std::vector >& b, std::vector& minx, std::vector& maxx); double probxall_termkl(size_t k, size_t l, std::vector >& a, std::vector >& b, std::vector& minx, std::vector& maxx); +//-------------------------------------------------- +// Additional functions to support calculation of Shapley indices for BART +// Based on Horiguchi and Pratola (2022). +//-------------------------------------------------- +double probxnotP_termk_sub(size_t k, std::vector >& a, + std::vector >& b, std::vector& minx, std::vector& maxx, + std::vector &dims); +double probxP_termk_sub(size_t k, std::vector >& a, + std::vector >& b, std::vector& minx, std::vector& maxx, + std::vector &dims); +double probxP_termkl_sub(size_t k, size_t l, std::vector >& a, + std::vector >& b, std::vector& minx, std::vector& maxx, + std::vector &dims); + //-------------------------------------------------- // This function only used for determining Pareto front/set. -// Based on Hiroguchi, Santner, Sun and Pratola (2020). +// Based on Horiguchi, Santner, Sun and Pratola (2020). //-------------------------------------------------- double probxall_termkl_rect(size_t k, size_t l, std::vector >& a0, std::vector >& b0, std::vector >& a1, @@ -132,6 +126,25 @@ double probxall_termkl_rect(size_t k, size_t l, std::vector std::vector find_pareto_front(size_t start, size_t end, std::list > theta); bool not_dominated(size_t index, std::vector R, std::list > theta); +void combine_ensembles(size_t p, xinfo& xi, std::vector >& ximaps, std::vector& minx, std::vector& maxx, + std::vector >& a1, std::vector >& b1, std::list >& ltheta1, + std::vector >& a2, std::vector >& b2, std::vector& theta2, double fmean2, + std::vector >& asol, std::vector >& bsol, std::list >& thetasol, + bool to_clear_old_ens); + +//-------------------------------------------------- +// This function only used determining the hyperrectangle that should +// be reweighted due to influential observation. +// Based on Pratola, George and McCulloch (2020). +//-------------------------------------------------- +double probxall_term_rect(std::vector& a0, + std::vector& b0, std::vector& a1, + std::vector& b1, std::vector& minx, std::vector& maxx, std::vector& aout, std::vector& bout); + +void unionxall_term_rect(std::vector& a0, + std::vector& b0, std::vector& a1, + std::vector& b1, std::vector& minx, std::vector& maxx, std::vector& aout, std::vector& bout); + //-------------------------------------------------- // Functions to Model Mixing with BART and/or Vector Parameters diff --git a/includes/crn.h b/includes/crn.h index 5d0feac..79edc14 100644 --- a/includes/crn.h +++ b/includes/crn.h @@ -1,26 +1,4 @@ // crn.h: Random number generator class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef CRN_H #define CRN_H diff --git a/includes/dinfo.h b/includes/dinfo.h index a70d85f..d9302f5 100644 --- a/includes/dinfo.h +++ b/includes/dinfo.h @@ -1,26 +1,4 @@ // dinfo.h: Data abstraction helper class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_dinfo_h #define GUARD_dinfo_h diff --git a/includes/mbrt.h b/includes/mbrt.h index 918302c..15353e1 100644 --- a/includes/mbrt.h +++ b/includes/mbrt.h @@ -1,26 +1,4 @@ // mbrt.h: Mean tree BT model class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_mbrt_h #define GUARD_mbrt_h @@ -93,7 +71,11 @@ class mbrt : public brt virtual std::vector& newsinfovec(size_t dim) { std::vector* si = new std::vector; si->resize(dim); for(size_t i=0;ipush_back(new msinfo); return *si; } virtual void local_mpi_reduce_allsuff(std::vector& siv); virtual void local_mpi_sr_suffs(sinfo& sil, sinfo& sir); + virtual void local_setr(diterator& diter); void pr(); + void cookdinfl(std::vector& cdinfl, double* sigma); //Cook's distance + void kldivinfl(std::vector& klinfl, double* sigma); //KL-divergence based influence metric + void cpoinfl(std::vector& cpoinfl, double* sigma); //CPO^-1 based influence metric //-------------------- //data diff --git a/includes/mxbrt.h b/includes/mxbrt.h index 82464ac..989f220 100644 --- a/includes/mxbrt.h +++ b/includes/mxbrt.h @@ -1,3 +1,5 @@ +// mxbrt.h: Mean tree model mixing model class definition. + #ifndef GUARD_mxbrt_h #define GUARD_mxbrt_h diff --git a/includes/psbrt.h b/includes/psbrt.h index 9455460..97e137d 100644 --- a/includes/psbrt.h +++ b/includes/psbrt.h @@ -1,26 +1,4 @@ // psbrt.h: Product variance BT model class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_psbrt_h #define GUARD_psbrt_h diff --git a/includes/rn.h b/includes/rn.h index 39d62a8..45c1a1c 100644 --- a/includes/rn.h +++ b/includes/rn.h @@ -1,26 +1,4 @@ // rn.h: Random number generator virtual class definition. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_rn #define GUARD_rn diff --git a/includes/sbrt.h b/includes/sbrt.h index 39c578c..6ddede9 100644 --- a/includes/sbrt.h +++ b/includes/sbrt.h @@ -1,26 +1,4 @@ // sbrt.h: Variance tree BT model class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_sbrt_h #define GUARD_sbrt_h diff --git a/includes/singlebinomial.h b/includes/singlebinomial.h index 6477b2f..a83faaa 100644 --- a/includes/singlebinomial.h +++ b/includes/singlebinomial.h @@ -1,24 +1,4 @@ // singlebinomial.h: Binomial tree model class definition. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #ifndef GUARD_singlebinomial_h #define GUARD_singlebinomial_h diff --git a/includes/singlepoisson.h b/includes/singlepoisson.h index a843251..57dc0bc 100644 --- a/includes/singlepoisson.h +++ b/includes/singlepoisson.h @@ -1,24 +1,4 @@ // singlepoisson.h: Poisson tree model class definition. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #ifndef GUARD_singlepoisson_h #define GUARD_singlepoisson_h diff --git a/includes/tnorm.h b/includes/tnorm.h index 897f7d8..9402147 100644 --- a/includes/tnorm.h +++ b/includes/tnorm.h @@ -1,24 +1,4 @@ // tnorm.h: Truncated Normal helper functions for probit model. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - // Draw from the truncated Normal distribution with left,right truncation points diff --git a/includes/tree.h b/includes/tree.h index 8a74d0d..b310fdc 100644 --- a/includes/tree.h +++ b/includes/tree.h @@ -1,26 +1,4 @@ // tree.h: BT tree class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_tree_h #define GUARD_tree_h diff --git a/includes/treefuns.h b/includes/treefuns.h index cd31f09..00135dc 100644 --- a/includes/treefuns.h +++ b/includes/treefuns.h @@ -1,26 +1,4 @@ // treefuns.h: BT tree class helper functions header. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_treefuns_h #define GUARD_treefuns_h diff --git a/openbt_pypkg/MANIFEST.in b/openbt_pypkg/MANIFEST.in index 9f8c6e7..59da8e5 100644 --- a/openbt_pypkg/MANIFEST.in +++ b/openbt_pypkg/MANIFEST.in @@ -1,4 +1,4 @@ -include LICENSE VERSION +include LICENSE exclude .flake8 .coveragerc exclude tox.ini prune src/openbt/bin diff --git a/src/ambrt.cpp b/src/ambrt.cpp index 2e64898..6b519e6 100644 --- a/src/ambrt.cpp +++ b/src/ambrt.cpp @@ -1,26 +1,4 @@ // ambrt.cpp: Additive mean BART model class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "ambrt.h" #include "brtfuns.h" @@ -189,7 +167,7 @@ void ambrt::local_loadtree(size_t iter, int beg, int end, std::vector& nn, void ambrt::pr() { std::cout << "***** ambrt object:\n"; - cout << "Number of trees in product representation:" << endl; + cout << "Number of trees in sum representation:" << endl; cout << " m: m=" << m << endl; cout << "Conditioning info on each individual tree:" << endl; cout << " mean: tau=" << ci.tau << endl; @@ -226,8 +204,9 @@ void ambrt::collapseensemble() //Draw is a p-dim vector to store the S_i's for all i=1,..,p variables. //Based on Hiroguchi, Pratola and Santner (2020). void ambrt::sobol(std::vector& Si, std::vector& Sij, - std::vector& TSi, double& V, std::vector& minx, - std::vector& maxx, size_t p) + std::vector& TSi, std::vector& Shi, + double& V, std::vector& minx, + std::vector& maxx, size_t p, rn& gen) { double term1,term2,term3; tree::npv bots; @@ -240,8 +219,22 @@ void ambrt::sobol(std::vector& Si, std::vector& Sij, size_t B=bots.size(); std::vector pxnoti(B); + std::vector pxnotP_sub(B); + std::vector pxnotPi_sub(B); + std::vector subdims(p,false); + std::vector tempdims(p,false); std::vector > a(p,std::vector(B)); std::vector > b(p,std::vector(B)); + double Psize=0.0; + + for(size_t i=0;i& Si, std::vector& Sij, { Si[i]=0.0; TSi[i]=0.0; - for(size_t k=0;k& Si, std::vector& Sij, for(size_t k=0;kgettheta()*pxnoti[k]; term2=bots[l]->gettheta()*pxnoti[l]; term3=probxi_termkl(i,k,l,a,b,minx,maxx)-probxi_termk(i,k,a,b,minx,maxx)*probxi_termk(i,l,a,b,minx,maxx); Si[i]+=term1*term2*term3; + //Shapley: + term1=bots[k]->gettheta()*pxnotPi_sub[k]; + term2=bots[l]->gettheta()*pxnotPi_sub[l]; + tempdims=subdims; + tempdims[i]=true; + term3=probxP_termkl_sub(k,l,a,b,minx,maxx,tempdims)-probxP_termk_sub(k,a,b,minx,maxx,tempdims)*probxP_termk_sub(l,a,b,minx,maxx,tempdims); + Shi[i]+=term1*term2*term3; + + tempdims[i]=false; + term1=bots[k]->gettheta()*pxnotP_sub[k]; + term2=bots[l]->gettheta()*pxnotP_sub[l]; + term3=probxP_termkl_sub(k,l,a,b,minx,maxx,tempdims)-probxP_termk_sub(k,a,b,minx,maxx,tempdims)*probxP_termk_sub(l,a,b,minx,maxx,tempdims); + Shi[i]-=term1*term2*term3; + + //Sobol Total effect: term1=bots[k]->gettheta()*probxi_termk(i,k,a,b,minx,maxx); term2=bots[l]->gettheta()*probxi_termk(i,l,a,b,minx,maxx); term3=probxnoti_termkl(i,k,l,a,b,minx,maxx)-pxnoti[k]*pxnoti[l]; @@ -529,3 +544,245 @@ void ambrt::ens2rects(std::vector >& asol, std::vector& asol, std::vector& bsol, + std::vector& minx, std::vector& maxx, size_t p) +{ + tree::tree_p bot; + std::vector a0(p),b0(p),anext(p),bnext(p); + std::vector aout(p); + std::vector bout(p); + + // Start with the first tree in the ensemble. + bot=mb[0].t.bn(xd,*xi); + + // Convert bot from first tree into hyperrectangle in + // the a0,b0 vector. + for(size_t i=0;i::min(); U=std::numeric_limits::max(); + bot->rgi(i,&L,&U); + + // Now we have the interval endpoints, put corresponding values in a,b matrices. + if(L!=std::numeric_limits::min()) a0[i]=(*xi)[i][L]; + else a0[i]=minx[i]; + if(U!=std::numeric_limits::max()) b0[i]=(*xi)[i][U]; + else b0[i]=maxx[i]; + } + + // the solution so far is just a0,b0 + asol=a0; bsol=b0; + a0.clear(); b0.clear(); + + // Now the main loop -- we will loop over trees 1..m, each time extracting the + // hyperrectangle of terminal node in tree j corresponding to x_d to the same + // a,b vector format and then taking the pairwise + // intersection of the current solution in a0,b0 with the next tree to add. + for(size_t j=1;j::min(); U=std::numeric_limits::max(); + bot->rgi(i,&L,&U); + + // Now we have the interval endpoints, put corresponding values in a,b matrices. + if(L!=std::numeric_limits::min()) anext[i]=(*xi)[i][L]; + else anext[i]=minx[i]; + if(U!=std::numeric_limits::max()) bnext[i]=(*xi)[i][U]; + else bnext[i]=maxx[i]; + } + + // now intersect the solution we have so far with this j'th tree and + // update the resulting stored solution + a0=asol; b0=bsol; + asol.clear(); bsol.clear(); + + // if the rectangles defined by a0[k],b0[k] and anext[l],bnext[l] intersect +// if(probxall_term_rect(k,l,a0,b0,anext,bnext,minx,maxx,aout,bout)>0.0) { + if(probxall_term_rect(a0,b0,anext,bnext,minx,maxx,aout,bout)>0.0) + { + asol=aout; + bsol=bout; + } else + { + cout << "should never happen!" << endl; + } + + // reset anext,bnext + anext.clear(); bnext.clear(); + anext.resize(p); bnext.resize(p); + // reset a0,b0 + a0.clear(); b0.clear(); + } // end of the main loop + +} + + +//-------------------------------------------------- +// Get ensemble rectangle corresponding to terminal nodes associated +// with observation y_d(x_d), the observation the be held-out in the reweighting +// scheme. Also calculates the corresponding weight. +// This is used in the importance sampling reweighting scheme to handle +// influential observations +// Based on Pratola, George and McCulloch (2020). +// This function extracts the union rectangle. +void ambrt::influnionrect(double* xd, std::vector& asol, std::vector&bsol, + std::vector& minx, std::vector& maxx, size_t p) +{ + tree::tree_p bot; + std::vector a0(p),b0(p),anext(p),bnext(p); + std::vector aout(p); + std::vector bout(p); + + // Start with the first tree in the ensemble. + bot=mb[0].t.bn(xd,*xi); + + // Convert bot from first tree into hyperrectangle in + // the a0,b0 vector. + for(size_t i=0;i::min(); U=std::numeric_limits::max(); + bot->rgi(i,&L,&U); + + // Now we have the interval endpoints, put corresponding values in a,b matrices. + if(L!=std::numeric_limits::min()) a0[i]=(*xi)[i][L]; + else a0[i]=minx[i]; + if(U!=std::numeric_limits::max()) b0[i]=(*xi)[i][U]; + else b0[i]=maxx[i]; + } + + // the solution so far is just a0,b0 + asol=a0; bsol=b0; + a0.clear(); b0.clear(); + + // Now the main loop -- we will loop over trees 1..m, each time extracting the + // hyperrectangle of terminal node in tree j corresponding to x_d to the same + // a,b vector format and then taking the pairwise + // union of the current solution in a0,b0 with the next tree to add. + for(size_t j=1;j::min(); U=std::numeric_limits::max(); + bot->rgi(i,&L,&U); + + // Now we have the interval endpoints, put corresponding values in a,b matrices. + if(L!=std::numeric_limits::min()) anext[i]=(*xi)[i][L]; + else anext[i]=minx[i]; + if(U!=std::numeric_limits::max()) bnext[i]=(*xi)[i][U]; + else bnext[i]=maxx[i]; + } + + // now intersect the solution we have so far with this j'th tree and + // update the resulting stored solution + a0=asol; b0=bsol; + asol.clear(); bsol.clear(); + + // Take union of the rectangles defined by a0[k],b0[k] and anext[l],bnext[l]. + // Note that both of these rectangles must include xd by definition, and therefore + // are not disjoint so the union is straightforward to calculate. + unionxall_term_rect(a0,b0,anext,bnext,minx,maxx,aout,bout); + asol=aout; + bsol=bout; + + // reset anext,bnext + anext.clear(); bnext.clear(); + anext.resize(p); bnext.resize(p); + // reset a0,b0 + a0.clear(); b0.clear(); + } // end of the main loop + +} + + + +//-------------------------------------------------- +// Influence metrics. See Pratola, George and McCulloch (2021) + +// Cook's distance +void ambrt::cookdinfl(std::vector& cdinflmean, std::vector& cdinflmax, double* sigma) +{ + std::fill(cdinflmean.begin(),cdinflmean.end(),0.0); + std::fill(cdinflmax.begin(),cdinflmax.end(),0.0); + + std::vector temp(di->n); + for(size_t j=0;jn;i++) { + cdinflmean[i] += temp[i]/((double) m); + if(temp[i]>cdinflmax[i]) cdinflmax[i]=temp[i]; + } + } +} + +//KL-divergence based influence metric +void ambrt::kldivinfl(std::vector& klinfl, double* sigma) +{ + std::fill(klinfl.begin(),klinfl.end(),0.0); + + std::vector temp(di->n); + for(size_t j=0;jn;i++) { + if(temp[i]==std::numeric_limits::infinity()) { + klinfl[i]=temp[i]; + break; + } + else { + double stdres = resid[i]/sigma[i]; + double stdres2 = stdres*stdres; + klinfl[i] = -0.5*std::log(2*3.14159)-std::log(sigma[i])-0.5*stdres2; + } + } + } + +} + +//CPO^-1 based influence metric +void ambrt::cpoinfl(std::vector& cpoinfl, double* sigma) +{ + std::fill(cpoinfl.begin(),cpoinfl.end(),0.0); + + std::vector temp(di->n); + for(size_t j=0;jn;i++) { + if(temp[i]==std::numeric_limits::infinity()) { + cpoinfl[i]=temp[i]; + break; + } + else { + double stdres = resid[i]/sigma[i]; + double stdres2 = stdres*stdres; + cpoinfl[i]=std::sqrt(2*3.14159)*sigma[i]*std::exp(stdres2/2.0); + } + } + } + +} diff --git a/src/amxbrt.cpp b/src/amxbrt.cpp index d9669db..9292920 100644 --- a/src/amxbrt.cpp +++ b/src/amxbrt.cpp @@ -1,3 +1,5 @@ +// amxbrt.cpp: BART-based model mixing model class methods. + #include "amxbrt.h" #include "brtfuns.h" #include diff --git a/src/brt.cpp b/src/brt.cpp index acbc2f7..8b2cfc7 100644 --- a/src/brt.cpp +++ b/src/brt.cpp @@ -1,26 +1,4 @@ // brt.cpp: Base BT model class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "brt.h" #include "brtfuns.h" @@ -1044,6 +1022,97 @@ void brt::local_predict(diterator& diter) diter.sety(bn->gettheta()); } } + +//-------------------------------------------------- +// Influence metrics. See Pratola, George and McCulloch (2021) + +// Cook's distance +void brt::cookdinfl(std::vector& cdinfl) +{ + tree::tree_p tbn; //which bottom node is current x in? + std::map bn_to_idx; //bottom node to index map + tree::npv bnv; + bnv.clear(); + t.getbots(bnv); + size_t B=bnv.size(); + std::vector nodenums(B,0); + + // initialize map + for(size_t i=0;in;i++) { + xx = di->x + i*di->p; + tbn = t.bn(xx,*xi); + nodenums[bn_to_idx[tbn]]++; + } + + // Sum up the ni's across all MPI threads + for(size_t i=0;in;i++) { + xx = di->x + i*di->p; + tbn = t.bn(xx,*xi); + ni = nodenums[bn_to_idx[tbn]]; + cdinfl[i]=((double)ni)/((double)B*((double)ni-1.0)*((double)ni-1.0)); + } +} + +//KL-divergence based influence metric +void brt::kldivinfl(std::vector& klinfl) +{ + tree::tree_p tbn; //which bottom node is current x in? + std::map bn_to_idx; //bottom node to index map + tree::npv bnv; + bnv.clear(); + t.getbots(bnv); + size_t B=bnv.size(); + std::vector nodenums(B,0); + + // initialize map + for(size_t i=0;in;i++) { + xx = di->x + i*di->p; + tbn = t.bn(xx,*xi); + nodenums[bn_to_idx[tbn]]++; + } + + // Sum up the ni's across all MPI threads + for(size_t i=0;in;i++) { + xx = di->x + i*di->p; + tbn = t.bn(xx,*xi); + ni = nodenums[bn_to_idx[tbn]]; + if(ni==(unsigned int)mi.minperbot) //ni-1 falls below threshold + klinfl[i]=std::numeric_limits::infinity(); + else + klinfl[i]=0.0; + } +} + + +//CPO-based divergence based influence metric +void brt::cpoinfl(std::vector& cpoinfl) +{ + // for cpoinfl in brt class, all we need is to extract the infinities, which is done in kldivinfl already. + kldivinfl(cpoinfl); +} + //-------------------------------------------------- //save/load tree to/from vector format //Note: for single tree models the parallelization just directs @@ -2487,4 +2556,4 @@ bool brt::rot(tree::tree_p tnew, tree& x, rn& gen) return false; // we never actually get here. } -*/ \ No newline at end of file +*/ diff --git a/src/brtfuns.cpp b/src/brtfuns.cpp index 06f2847..1f76292 100644 --- a/src/brtfuns.cpp +++ b/src/brtfuns.cpp @@ -1,26 +1,4 @@ // brtfuns.cpp: Base BT model class helper functios. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "brtfuns.h" @@ -551,6 +529,57 @@ void collapsetree(tree& st, tree::tree_p t, tree::tree_p tprime) } +//-------------------------------------------------- +//collapse tree for vector parameter theta +void collapsetree_vec(tree& st, tree::tree_p t, tree::tree_p tprime) +{ + tree::npv tlefts, trights, tbots; + tree::tree_cp tempt; + + vxd thetavec=t->getthetavec(); + + //simple case, tprime is terminal, t is (always) terminal + if(!tprime->l) { + t->setthetavec(tprime->getthetavec()+thetavec); + } + else if(!t->p) //simple case 2: t is a terminal root node + { + st.tonull(); + st=(*tprime); //copy + st.getbots(tbots);// all terminal nodes below t. + for(size_t j=0;jsetthetavec(tbots[j]->getthetavec()+thetavec); + } + else { //general case, t is (always) terminal, tprime is not. + tempt=tprime; + tree::tree_p tpar=t->p; + if(t->isleft()) { + t->p=0; + delete t; + tpar->l=new tree(*tempt); + tpar->l->p=tpar; + tpar->l->getpathtorootlr(tlefts,trights); + //collapse redundancies in tprime + splitall(tpar->l,tlefts,trights); + tpar->l->getbots(tbots);// all terminal nodes below t. + } + else { //isright + t->p=0; + delete t; + tpar->r=new tree(*tempt); + tpar->r->p=tpar; + tpar->r->getpathtorootlr(tlefts,trights); + //collapse redundancies in tprime + splitall(tpar->r,tlefts,trights); + tpar->r->getbots(tbots);// all terminal nodes below t. + } + + for(size_t j=0;jsetthetavec(tbots[j]->getthetavec()+thetavec); + } +} + + //-------------------------------------------------- //split tree along a sequence of variable, cutpoint pairs //retains only the part of the tree that remains. @@ -968,7 +997,7 @@ bool mergecount(tree::tree_p tl, tree::tree_p tr, size_t v, size_t c, int* nways //-------------------------------------------------- // Functions to support calculation of Sobol indices for BART -// Based on Hiroguchi, Pratola and Santner (2020). +// Based on Horiguchi, Pratola and Santner (2020). //-------------------------------------------------- double probxnoti_termk(size_t i, size_t k, std::vector >& a, std::vector >& b, std::vector& minx, std::vector& maxx) @@ -1065,9 +1094,95 @@ double probxall_termkl(size_t k, size_t l, std::vector >& a, return prob; } +//-------------------------------------------------- +// Additional functions to support calculation of Shapley indices for BART +// Based on Horiguchi and Pratola (2022). +//-------------------------------------------------- +double probxnotP_termk_sub(size_t k, std::vector >& a, + std::vector >& b, std::vector& minx, std::vector& maxx, + std::vector &dims) +{ +/* double prob=1.0; + size_t p=minx.size(); + + for(size_t j=0;j >& a, + std::vector >& b, std::vector& minx, std::vector& maxx, + std::vector &dims) +{ +/* double prob=1.0; + size_t p=minx.size(); + + for(size_t j=0;j >& a, + std::vector >& b, std::vector& minx, std::vector& maxx, + std::vector &dims) +{ + double aa,bb; +/* double prob=1.0; + size_t p=minx.size(); + + for(size_t j=0;j return prob; } +//-------------------------------------------------- +// Similar to probxall_termkl_rect() function except +// single hyperrectangles are passed in rather than a vector of them +// so no need for k,l indices. +// Currently only used in the ambrt::inflrect() method for reweighting +// predictions for influential observations. +// See Pratola, George and McCulloch (2020). +//-------------------------------------------------- +double probxall_term_rect(std::vector& a0, + std::vector& b0, std::vector& a1, + std::vector& b1, std::vector& minx, std::vector& maxx, std::vector& aout, std::vector& bout) +{ + double prob=1.0; + size_t p=minx.size(); + + for(size_t j=0;j& a0, + std::vector& b0, std::vector& a1, + std::vector& b1, std::vector& minx, std::vector& maxx, std::vector& aout, std::vector& bout) +{ + size_t p=minx.size(); + + for(size_t j=0;j find_pareto_front(size_t start, size_t end, std::list > theta) @@ -1105,9 +1270,9 @@ std::vector find_pareto_front(size_t start, size_t end, std::list find_pareto_front(size_t start, size_t end, std::list R, std::list > theta) { + size_t d = theta.front().size() - 1; // this assumes theta still has the index record as the last element of each vector. // note the -1's because we keep track of 1..sizeof(V) but the vectors are indexed by 0..sizeof(V)-1. - for(size_t i=0;i >::iterator itR = std::next(theta.begin(),R[i]-1); - std::list >::iterator it = std::next(theta.begin(),index-1); + std::list >::iterator it = std::next(theta.begin(),index-1); + // for(size_t i=0;i >::iterator itR = std::next(theta.begin(),i-1); // if R_ij <= v_j for all j then R_i dominates v, return false // if(theta[R[i]-1][0]<=theta[index-1][0] && theta[R[i]-1][1]<=theta[index-1][1]) - if((*itR).at(0) <= (*it).at(0) && (*itR).at(1) <= (*it).at(1)) - return false; + bool is_dominated = true; + for (size_t j=0;jgetthetavec(); +// combines a dd-dim output ensemble with a 1-dim output ensemble to make a (dd+1)-dim output ensemble +// assume a2/b2/theta2 is a 1-dim output ens +// but a1/b1/ltheta1 is a dd-dim output ens (where dd>=1) +// function's output is asol/bsol/thetasol +void combine_ensembles(size_t p, xinfo& xi, std::vector >& ximaps, std::vector& minx, std::vector& maxx, + std::vector >& a1, std::vector >& b1, std::list >& ltheta1, + std::vector >& a2, std::vector >& b2, std::vector& theta2, double fmean2, + std::vector >& asol, std::vector >& bsol, std::list >& thetasol, + bool to_clear_old_ens) +{ - //simple case, tprime is terminal, t is (always) terminal - if(!tprime->l) { - t->setthetavec(tprime->getthetavec()+thetavec); + std::vector aout(p),bout(p); + + // 1. Assign rectangle indices in ens2 to the 3-d jagged tensor ie2 + // Time: O(theta2.size() * p * numcut) + std::vector > > ie2(p); // (p,ncutsi+1,varying) jagged tensor -- to contain indices of ens2 + for (size_t w=0;wp) //simple case 2: t is a terminal root node - { - st.tonull(); - st=(*tprime); //copy - st.getbots(tbots);// all terminal nodes below t. - for(size_t j=0;jsetthetavec(tbots[j]->getthetavec()+thetavec); - } - else { //general case, t is (always) terminal, tprime is not. - tempt=tprime; - tree::tree_p tpar=t->p; - if(t->isleft()) { - t->p=0; - delete t; - tpar->l=new tree(*tempt); - tpar->l->p=tpar; - tpar->l->getpathtorootlr(tlefts,trights); - //collapse redundancies in tprime - splitall(tpar->l,tlefts,trights); - tpar->l->getbots(tbots);// all terminal nodes below t. + } + + // 2. Get rectangle intersections between ens1 and ens2 + // Time: O(theta1.size() * p * numcut), which is faster than O(theta1.size() * theta2.size()) + std::list >::iterator it1 = ltheta1.begin(); + for (size_t j=0;j valids; // to contain all rectangles k in ambm2 that intersect with rectangle j from ambm1 + size_t v_size = 0; + for (size_t w=0;w validsw(nvalidw); + std::vector::iterator it; + nvalidw = 0; + for (size_t c=lc;c<=rc;c++) { + it=std::set_union(validsw.begin(), validsw.begin()+nvalidw, ie2[w][c].begin(), ie2[w][c].end(), validsw.begin()); + nvalidw = it-validsw.begin(); } - else { //isright - t->p=0; - delete t; - tpar->r=new tree(*tempt); - tpar->r->p=tpar; - tpar->r->getpathtorootlr(tlefts,trights); - //collapse redundancies in tprime - splitall(tpar->r,tlefts,trights); - tpar->r->getbots(tbots);// all terminal nodes below t. + if (w==0) { + valids.reserve(nvalidw); + valids.assign(validsw.begin(), validsw.begin()+nvalidw); + v_size = nvalidw; + } + else { // intersect valids and validsw; store in valids + it=std::set_intersection(valids.begin(), valids.begin()+v_size, validsw.begin(), validsw.begin()+nvalidw, valids.begin()); + v_size = it-valids.begin(); } + } - for(size_t j=0;jsetthetavec(tbots[j]->getthetavec()+thetavec); + for (size_t kind=0;kind0.0) { // which should almost always be true, I believe + asol.push_back(aout); + bsol.push_back(bout); + //The asol.size() is to record the index in the unsorted list so we can back + //out the correct VHR for the theta's on the PF later on(line 497-498). + //It would be cleaner if we had a list of vector, size_t tuple but this works as well. + std::vector th = *it1; // ltheta1[j]; + if (th.size() > 1) th.pop_back(); // this is to remove the index record keeping in ltheta1 + th.push_back(theta2[k]+fmean2); + th.push_back((double)(asol.size())); + thetasol.push_back(th); + } } + it1++; + } + + if (to_clear_old_ens) { // clear the hyperrectangles for this realization + a1.clear(); + b1.clear(); + ltheta1.clear(); + a2.clear(); + b2.clear(); + theta2.clear(); + } } //-------------------------------------------------- @@ -1245,4 +1450,4 @@ void array_to_vector(Eigen::VectorXd &V, double *b){ for(size_t i = 0; i. -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #include "brt.h" #include "brtfuns.h" diff --git a/src/cli.cpp b/src/cli.cpp index 932daa3..1b19f65 100644 --- a/src/cli.cpp +++ b/src/cli.cpp @@ -1,24 +1,4 @@ // cli.cpp: Implement command-line model interface to OpenBT. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #include #include @@ -723,6 +703,14 @@ if(modeltype!=MODEL_MIXBART){ std::vector > stheta(nd*mh, std::vector(1)); brtMethodWrapper fambm(&brt::f,ambm); brtMethodWrapper fpsbm(&brt::f,psbm); + std::vector Cookd_mean(di.n,0.0); + std::vector Cookd_mean_temp(di.n,0.0); + std::vector Cookd_max(di.n,0.0); + std::vector Cookd_max_temp(di.n,0.0); + std::vector KLinfl(di.n,0.0); + std::vector KLinfl_temp(di.n,0.0); + std::vector CPOinfl(di.n,0.0); + std::vector CPOinfl_temp(di.n,0.0); #ifdef _OPENMPI double tstart=0.0,tend=0.0; @@ -837,6 +825,30 @@ if(modeltype!=MODEL_MIXBART){ } } + //Calculate Cook's distance and KL divergence influence metric + ambm.cookdinfl(Cookd_mean_temp,Cookd_max_temp,disig.y); + ambm.kldivinfl(KLinfl_temp,disig.y); + ambm.cpoinfl(CPOinfl_temp,disig.y); + for(size_t j=0;j::infinity()) { + KLinfl[j]=std::numeric_limits::infinity(); + } + else if(KLinfl[j] != std::numeric_limits::infinity()) { + KLinfl[j] += KLinfl_temp[j]/((double) nd); + } + + if(CPOinfl_temp[j] == std::numeric_limits::infinity()) { + CPOinfl[j]=std::numeric_limits::infinity(); + } + else if(CPOinfl[j] != std::numeric_limits::infinity()) { + CPOinfl[j] += CPOinfl_temp[j]/((double) nd); + } + } + + //save tree to vec format if(mpirank==0) { ambm.savetree(i,m,onn,oid,ovar,oc,otheta); @@ -850,6 +862,17 @@ if(modeltype!=MODEL_MIXBART){ } #endif + // Finalize KLinfl and write out to files. + if(mpirank>0) { + for(size_t j=0;j. -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "crn.h" diff --git a/src/mbrt.cpp b/src/mbrt.cpp index 1cdca3c..7f1df2f 100644 --- a/src/mbrt.cpp +++ b/src/mbrt.cpp @@ -1,26 +1,4 @@ // mbrt.cpp: Mean tree BT model class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "mbrt.h" //#include "brtfuns.h" @@ -291,6 +269,54 @@ void mbrt::local_mpi_reduce_allsuff(std::vector& siv) #endif } +void mbrt::local_setr(diterator& diter) +{ + tree::tree_p bn; + + for(;ditery[*diter] - bn->gettheta(); + } +} + +//-------------------------------------------------- +// Influence metrics. See Pratola, George and McCulloch (2021) + +// Cook's distance +void mbrt::cookdinfl(std::vector& cdinfl, double* sigma) +{ + brt::cookdinfl(cdinfl); + for(size_t i=0;in;i++) { + double stdres = resid[i]/sigma[i]; + double stdres2 = stdres*stdres; + cdinfl[i] *= stdres2; + } +} + +//KL-divergence based influence metric +void mbrt::kldivinfl(std::vector& klinfl, double* sigma) +{ + brt::kldivinfl(klinfl); + for(size_t i=0;in;i++) + if(klinfl[i]!=std::numeric_limits::infinity()) { + double stdres = resid[i]/sigma[i]; + double stdres2 = stdres*stdres; + klinfl[i]=-0.5*std::log(2*3.14159)-std::log(sigma[i])-0.5*stdres2; + } +} + +//CPO^-1 based influence metric +void mbrt::cpoinfl(std::vector& cpoinfl, double* sigma) +{ + brt::cpoinfl(cpoinfl); + for(size_t i=0;in;i++) + if(cpoinfl[i]!=std::numeric_limits::infinity()) { + double stdres = resid[i]/sigma[i]; + double stdres2 = stdres*stdres; + cpoinfl[i]=std::sqrt(2*3.14159)*sigma[i]*std::exp(stdres2/2.0); + } +} + //-------------------------------------------------- //pr for brt void mbrt::pr() diff --git a/src/mixandemulate.cpp b/src/mixandemulate.cpp index a75da86..858d780 100644 --- a/src/mixandemulate.cpp +++ b/src/mixandemulate.cpp @@ -1,3 +1,5 @@ +// mixandemulate.cpp: Implement command-line model interface to model mixing. + #include #include #include diff --git a/src/mixandemulatepred.cpp b/src/mixandemulatepred.cpp index 6f3a7e7..38410f5 100644 --- a/src/mixandemulatepred.cpp +++ b/src/mixandemulatepred.cpp @@ -1,3 +1,5 @@ +// mixandemulatepred.cpp: Implement command-line model interface to model mixing prediction. + #include #include #include diff --git a/src/mixingwts.cpp b/src/mixingwts.cpp index f2a75cd..6f05df7 100644 --- a/src/mixingwts.cpp +++ b/src/mixingwts.cpp @@ -1,24 +1,4 @@ - // pred.cpp: Implement model prediction interface for OpenBT. - // Copyright (C) 2012-2018 Matthew T. Pratola - // - // This file is part of OpenBT. - // - // OpenBT is free software: you can redistribute it and/or modify - // it under the terms of the GNU Affero General Public License as published by - // the Free Software Foundation, either version 3 of the License, or - // (at your option) any later version. - // - // OpenBT is distributed in the hope that it will be useful, - // but WITHOUT ANY WARRANTY; without even the implied warranty of - // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - // GNU Affero General Public License for more details. - // - // You should have received a copy of the GNU Affero General Public License - // along with this program. If not, see . - // - // Author contact information - // Matthew T. Pratola: mpratola@gmail.com - +// mixingwts.cpp: Implement command-line model interface to model mixing weights. #include #include diff --git a/src/mopareto.cpp b/src/mopareto.cpp index 9e6ae6d..2e5434c 100644 --- a/src/mopareto.cpp +++ b/src/mopareto.cpp @@ -1,30 +1,11 @@ // mopareto.cpp: Implement Pareto-front multiobjective optimization using OpenBT. -// Copyright (C) 2020 Matthew T. Pratola, Akira Horiguchi -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Akira Horiguchi: horiguchi.6@osu.edu - #include #include #include #include +#include // std::set_union, std::set_intersection, std::sort +#include #include #include @@ -49,6 +30,7 @@ int main(int argc, char* argv[]) { std::string folder(""); std::string folder2(""); + std::string folder3(""); if(argc>1) { @@ -64,16 +46,20 @@ int main(int argc, char* argv[]) //model name, number of saved draws and number of trees std::string modelname; std::string modelname2; + std::string modelname3; std::string xicore; //model name and xi conf >> modelname; conf >> modelname2; + conf >> modelname3; conf >> xicore; //location of the second fitted model conf >> folder2; folder2=folder2+"/"; + conf >> folder3; + folder3=folder3+"/"; //number of saved draws and number of trees size_t nd; @@ -81,12 +67,16 @@ int main(int argc, char* argv[]) size_t mh1; size_t m2; size_t mh2; + size_t m3; + size_t mh3; conf >> nd; conf >> m1; conf >> mh1; conf >> m2; conf >> mh2; + conf >> m3; + conf >> mh3; //number of predictors size_t p; @@ -101,15 +91,21 @@ int main(int argc, char* argv[]) conf >> maxx[i]; //global means of each response - double fmean1, fmean2; + double fmean1, fmean2, fmean3; conf >> fmean1; conf >> fmean2; + conf >> fmean3; //thread count int tc; conf >> tc; conf.close(); + //simple flag to tell us if we are doing 2-output or 3-output Pareto front + bool threeresponse=false; + if(m3>0) threeresponse=true; + size_t d=2; // number of responses + if (threeresponse) d=3; //MPI initialization int mpirank=0; @@ -187,13 +183,16 @@ int main(int argc, char* argv[]) ambm1.setxi(&xi); //set the cutpoints for this model object ambrt ambm2(m2); ambm2.setxi(&xi); //set the cutpoints for this model object + ambrt ambm3(m3); + if(threeresponse) ambm3.setxi(&xi); //setup psbrt objects psbrt psbm1(mh1); psbm1.setxi(&xi); //set the cutpoints for this model object psbrt psbm2(mh1); psbm2.setxi(&xi); //set the cutpoints for this model object - + psbrt psbm3(mh3); + if(threeresponse) psbm3.setxi(&xi); //load first fitted model from file @@ -343,6 +342,81 @@ int main(int argc, char* argv[]) + //load third fitted model from file + if(threeresponse) { +#ifndef SILENT + if(mpirank==0) cout << "Loading saved posterior tree draws" << endl; +#endif + imf.open(folder3 + modelname3 + ".fit"); + imf >> ind; + imf >> im; + imf >> imh; +#ifdef _OPENMPI + if(nd!=ind) { cout << "Error loading posterior trees" << endl; MPI_Finalize(); return 0; } + if(m3!=im) { cout << "Error loading posterior trees" << endl; MPI_Finalize(); return 0; } + if(mh3!=imh) { cout << "Error loading posterior trees" << endl; MPI_Finalize(); return 0; } +#else + if(nd!=ind) { cout << "Error loading posterior trees" << endl; return 0; } + if(m3!=im) { cout << "Error loading posterior trees" << endl; return 0; } + if(mh3!=imh) { cout << "Error loading posterior trees" << endl; return 0; } +#endif + } + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_ots(temp); + if(threeresponse) for(size_t i=0;i> e3_ots.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_oid(temp); + if(threeresponse) for(size_t i=0;i> e3_oid.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_ovar(temp); + if(threeresponse) for(size_t i=0;i> e3_ovar.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_oc(temp); + if(threeresponse) for(size_t i=0;i> e3_oc.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_otheta(temp); + if(threeresponse) for(size_t i=0;i> std::scientific >> e3_otheta.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_sts(temp); + if(threeresponse) for(size_t i=0;i> e3_sts.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_sid(temp); + if(threeresponse) for(size_t i=0;i> e3_sid.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_svar(temp); + if(threeresponse) for(size_t i=0;i> e3_svar.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_sc(temp); + if(threeresponse) for(size_t i=0;i> e3_sc.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_stheta(temp); + if(threeresponse) for(size_t i=0;i> std::scientific >> e3_stheta.at(i); + + if(threeresponse) imf.close(); + + + + // Calculate range of posterior samples to do Pareto front/set on for MPI. int startnd=0,endnd=nd-1; size_t snd,end,rnd=nd; @@ -359,7 +433,8 @@ int main(int argc, char* argv[]) //objects where we'll store the realizations std::vector > asol; std::vector > bsol; - std::vector thetasol; + // std::vector thetasol; + std::list > thetasol; // Temporary vectors used for loading one model realization at a time. std::vector onn1(m1,1); @@ -384,13 +459,26 @@ int main(int argc, char* argv[]) std::vector > sc2(mh2, std::vector(1)); std::vector > stheta2(mh2, std::vector(1)); + std::vector onn3(m3,1); + std::vector > oid3(m3, std::vector(1)); + std::vector > ov3(m3, std::vector(1)); + std::vector > oc3(m3, std::vector(1)); + std::vector > otheta3(m3, std::vector(1)); + std::vector snn3(mh3,1); + std::vector > sid3(mh3, std::vector(1)); + std::vector > sv3(mh3, std::vector(1)); + std::vector > sc3(mh3, std::vector(1)); + std::vector > stheta3(mh3, std::vector(1)); + // Draw realizations of the posterior predictive. size_t curdx1=0; size_t cumdx1=0; size_t curdx2=0; size_t cumdx2=0; - std::vector > a1,a2,b1,b2; - std::vector theta1,theta2; + size_t curdx3=0; + size_t cumdx3=0; + std::vector > a1,a2,a3,b1,b2,b3; + std::vector theta1,theta2,theta3; #ifdef _OPENMPI double tstart=0.0,tend=0.0; if(mpirank==0) tstart=MPI_Wtime(); @@ -405,6 +493,19 @@ int main(int argc, char* argv[]) std::vector > > bset(nd, std::vector >(0, std::vector(0))); std::vector > > front(nd, std::vector >(0, std::vector(0))); + // 0. Create hashmap to get xi indices more easily. + // Should also work even if xi=xicuts. + std::vector > ximaps; + ximaps.reserve(p); + for (size_t w=0;w ximap; + for (size_t c=0;c > asol,bsol; - std::vector aout(p),bout(p); - std::list > thetasol; + // std::vector aout(p),bout(p); if(i>=snd && i0.0) { - asol.push_back(aout); - bsol.push_back(bout); - //The asol.size() is to record the index in the unsorted list so we can back - //out the correct VHR for the theta's on the PF later on(line 497-498). - //It would be cleaner if we had a list of vector, size_t tuple but this works as well. - std::vector th{theta1[j]+fmean1,theta2[k]+fmean2,(double)(asol.size())}; - thetasol.push_back(th); - } - } - - // clear the hyperrectangles for this realization - a1.clear(); - b1.clear(); - theta1.clear(); - a2.clear(); - b2.clear(); - theta2.clear(); + std::list > ltheta1; + for (double t1: theta1) ltheta1.push_back({t1+fmean1}); + combine_ensembles(p, xi, ximaps, minx, maxx, a1, b1, ltheta1, a2, b2, theta2, fmean2, asol, bsol, thetasol, true); + + if(threeresponse) { + // Possible to avoid copying these three containers? + a1 = asol; + b1 = bsol; + ltheta1 = thetasol; + asol.clear(); + bsol.clear(); + thetasol.clear(); + + ambm3.ens2rects(a3,b3,theta3,minx,maxx,p); + combine_ensembles(p, xi, ximaps, minx, maxx, a1, b1, ltheta1, a3, b3, theta3, fmean3, asol, bsol, thetasol, true); + } // then we can get the front,set, and then clear these vectors thetasol.sort(); @@ -496,12 +609,11 @@ int main(int argc, char* argv[]) for(size_t j=0;j //header files from OpenBT diff --git a/src/pred.cpp b/src/pred.cpp index 6fd87ad..0469acb 100644 --- a/src/pred.cpp +++ b/src/pred.cpp @@ -1,24 +1,4 @@ // pred.cpp: Implement model prediction interface for OpenBT. -// Copyright (C) 2012-2018 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #include #include diff --git a/src/psbrt.cpp b/src/psbrt.cpp index 5056aba..e7f7581 100644 --- a/src/psbrt.cpp +++ b/src/psbrt.cpp @@ -1,26 +1,4 @@ // psbrt.cpp: Product variance BT model class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "psbrt.h" //#include "brtfuns.h" diff --git a/src/sbrt.cpp b/src/sbrt.cpp index c75ddc6..f4018be 100644 --- a/src/sbrt.cpp +++ b/src/sbrt.cpp @@ -1,26 +1,4 @@ // sbrt.cpp: Variance tree BT model class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "sbrt.h" //#include "brtfuns.h" diff --git a/src/singlebinomial.cpp b/src/singlebinomial.cpp index 3dd3411..64c6322 100644 --- a/src/singlebinomial.cpp +++ b/src/singlebinomial.cpp @@ -1,24 +1,4 @@ // singlebinomial.cpp: Binomial tree model methods. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #include "singlebinomial.h" //#include "brtfuns.h" diff --git a/src/singlepoisson.cpp b/src/singlepoisson.cpp index 25d52b7..d85dcad 100644 --- a/src/singlepoisson.cpp +++ b/src/singlepoisson.cpp @@ -1,24 +1,4 @@ // singlepoisson.cpp: Poisson tree model methods. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #include "singlepoisson.h" //#include "brtfuns.h" diff --git a/src/sobol.cpp b/src/sobol.cpp index 7b56a15..d1382fb 100644 --- a/src/sobol.cpp +++ b/src/sobol.cpp @@ -1,25 +1,4 @@ // sobol.cpp: Implement Sobol-based variable activity metrics for OpenBT. -// Copyright (C) 2020 Matthew T. Pratola, Akira Horiguchi -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Akira Horiguchi: horiguchi.6@osu.edu - #include #include @@ -29,6 +8,7 @@ #include #include #include +#include #include "crn.h" #include "tree.h" @@ -56,6 +36,13 @@ int main(int argc, char* argv[]) folder=folder+"/"; } + //----------------------------------------------------------- + //random number generation + crn gen; + gen.set_seed(static_cast(std::chrono::high_resolution_clock::now() + .time_since_epoch() + .count())); + //-------------------------------------------------- //process args std::ifstream conf(folder+"config.sobol"); @@ -262,6 +249,7 @@ int main(int argc, char* argv[]) //objects where we'll store the realizations std::vector > Sidraws(rnd,std::vector(p)); + std::vector > Shidraws(rnd,std::vector(p)); std::vector > Sijdraws(rnd,std::vector(p*(p-1)/2)); std::vector > TSidraws(rnd,std::vector(p)); std::vector V(rnd); @@ -312,7 +300,7 @@ int main(int argc, char* argv[]) // Calculate Sobol Indices if(i>=snd && i. -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - //# include //# include diff --git a/src/tree.cpp b/src/tree.cpp index 996478e..deb088e 100644 --- a/src/tree.cpp +++ b/src/tree.cpp @@ -1,26 +1,4 @@ // tree.cpp: BT tree class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include #include diff --git a/src/treefuns.cpp b/src/treefuns.cpp index 65fba8f..c65c86d 100644 --- a/src/treefuns.cpp +++ b/src/treefuns.cpp @@ -1,26 +1,4 @@ // treefuns.cpp: BT tree class helper functions. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "treefuns.h" diff --git a/src/vartivity.cpp b/src/vartivity.cpp index 4c57071..c7eaf7e 100644 --- a/src/vartivity.cpp +++ b/src/vartivity.cpp @@ -1,24 +1,4 @@ // vartivity.cpp: Implement variable activity interface for OpenBT. -// Copyright (C) 2012-2018 Matthew T. Pratola. -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #include #include