From aea8f2b4a8a379ace6fdb7e1f570f7e97a5608c6 Mon Sep 17 00:00:00 2001 From: Jared O'Neal Date: Thu, 25 Jun 2026 16:44:50 -0500 Subject: [PATCH 1/9] (Issue #28) Hollow out README and leave TODOs for Matt. --- README.md | 188 +++++++++++++++--------------------------------------- 1 file changed, 52 insertions(+), 136 deletions(-) diff --git a/README.md b/README.md index 189ccfb..d362799 100644 --- a/README.md +++ b/README.md @@ -1,142 +1,58 @@ -# Open Bayesian Trees Project -This repository includes new developments with Bayesian Additive Regression Trees and extends the original OpenBT repository created by Matt Pratola (https://bitbucket.org/mpratola/openbt/src/master/). -Such extensions include Bayesian Model Mixing and Bayesian Calibration. -All of the Bayesian Tree code is written in C++. User interfaces constructed in R and Python allow one to easily run the software. -The BART Model Mixing software has been implemented in the [Taweret](https://github.com/bandframework/Taweret/tree/main) Python package in conjunction with the [BAND](https://bandframework.github.io/) collaboration. - - -# Installation -The heart of OpenBT is a set of C++ tools that can be used directly via -the command line or indirectly through the Python and R packages, which wrap -them. Typically these tools are built with an implementation of the Message -Passing Interface (MPI) such as [Open MPI](https://www.open-mpi.org) or -[MPICH](https://www.mpich.org) to enable distributed parallelization of -computations. In particular, the Python wrapper package is always built with -MPI support. - -The software and its distribution scheme have been developed to allow users to -use OpenBT with the MPI installation of their choice. For instance, it -can be built with MPI installations on leadership class platforms and clusters -that were installed by experts and optimized for their specific platform. As a -result, however, the software is not distributed as prebuilt binaries or wheels, -but rather must be built for each case with the compiler suite and matching MPI -implementation provided by the user. - -## Requirements -Before building and installing the Python package, users -must provide -* a compiler suite that includes a C++ compiler that supports the C++14 - standard, -* an MPI installation that is compatible with the compiler suite, and -* optionally the [Eigen software package](https://gitlab.com/libeigen/eigen). - -Note that if installing MPI using a package manager, related developer library -packages such as ``libopenmpi-dev`` or ``libmpich-dev`` might need to be -installed in addition to the base MPI packages such as ``openmpi-bin`` or -``mpich``. - -To build and install just the bare C++ tools, users must provide in addition to -the above -* the [Meson build system](https://mesonbuild.com) and its prerequisites such as - Python 3 and [ninja](https://ninja-build.org). - -While both Meson and ninja are used internally to build the Python package, they -are installed automatically just for building the package. - -The Meson build system is setup to automatically detect the compiler suite and -MPI installation to use. If Eigen already exists in the system and Meson can -find it, then Meson will use it for the build. Otherwise, Meson will -automatically obtain a copy of Eigen for internal use. - -We presently test OpenBT with both Open MPI and MPICH. In addition, we -have successfully tested with the Intel MPI implementation and have used the -Python package with MPI implementations installed -* via package managers such as Ubuntu's Advanced Packaging Tool (`apt`) and - [homebrew](https://brew.sh) on macOS; -* by experts on clusters and that are available as modules; and -* with `conda` from the prebuilt conda forge - [openmpi](https://anaconda.org/conda-forge/openmpi) package. - -## Meson installation -The Meson build system documentation suggests installing Meson via package -manager when possible. Please refer to that documentation for detailed and -up-to-date installation information. - -If Meson cannot be installed by package manager or the manager's version is too -old, the following is contrary to Meson suggestions but has been used -successfully to install Meson with Python into a dedicated virtual environment -as well as to install `meson` in the `PATH` for use without needing to activate -that virtual environment. -``` -$ /path/to/target/python -m venv ~/local/venv/meson -$ . ~/local/venv/meson/bin/activate -$ which python -$ python -m pip install --upgrade pip -$ python -m pip install meson -$ python -m pip list -$ ln -s ~/local/venv/meson/bin/meson ~/local/bin - -$ deactivate -$ which meson -$ meson --version -``` -Note that this `meson` virtual environment is for installing **just** the Meson -build system. Attempts to install `openbt` into this virtual environment -will likely fail with an error that the `mesonbuild` module cannot be found. - -## Python package -The OpenBT Python package is **not** currently distributed on PyPI since the -[PyPI OpenBT package](https://pypi.org/project/openbt/) already exists. This -package will eventually be transferred to this project so that distribution of -modern versions of this package will be enabled by PyPI under the name `openbt`. - - +# OpenBT -Instead the package should be built and installed from a clone of this repository with -``` -$ cd /path/to/OpenBT/openbt_pypkg -$ python -m pip install . -``` -Developers can install in developer/editable mode with verbose logging of the -build process and installation with -``` -$ cd /path/to/OpenBT/openbt_pypkg -$ python -m pip install -v -e . -``` -In this latter case, the command line tools are built automatically and -installed at `/path/to/OpenBT/openbt_pypkg/src/openbt/bin`. The -Python package is hardcoded to use those tools so that the existence of another -set of tools in the system and in `PATH` should not cause issues. +TODO: Copy info from landing page of sphinx docs here. -OpenBT package installations can be minimally tested with -``` -$ python ->>> import openbt ->>> openbt.__version__ -'' ->>> openbt.test() -``` +### Repository +TODO: Add general badges + +### Python +TODO: Add Python-specific badges + +## License & Copyright -## C++ library & command line tool interface -Developers and C++ users can directly build and install the command line tools, -an OpenBT library, and library tests with `tools/build_openbt_clt.sh`. -Note that these do **not** need to be built in order to use the Python package. +TODO: Copy license & copyright info from landing page of sphinx docs here -## R package -**TODO**: Needs to be written based on current state of affairs. +## Support -# Examples +To -The examples from the article "Model Mixing Using Bayesian Additive Regression Trees" are reproduced in the jupyter notebook BART_BMM_Technometrics.ipynb. This notebook can be run locally or in a virtual environment such as google colab. +* report potential problems with OpenBT or any of the packages derived from it, +* propose a change, or +* request a new feature, + +please check if a related [Issue](https://github.com/bandframework/OpenBT/issues) +already exists before creating a new issue. For all other communication, please +send an email to the OpenBT development team + + * TODO: Matt's contact info + * TODO: John's contact info + +## Documentation + +[User and Developer Guides](https://openbt.readthedocs.io) are hosted on +ReadTheDocs. Please refer to those documents for information regarding +examples. + +## Installation & Testing + +Refer to the getting started sections in the User Guide related to the tool or +package that you intend to use. + +## Contributing to IBCDFO + +Contributions are welcome in a variety of forms; see +[Contributing](https://openbt.readthedocs.io/en/latest/contributing.html) in the +Developer Guide. + +## Cite OpenBT + +``` +@techreport{openbt2026, + author = {Matt Pratola and John Yannotty}, + title = {{OpenBT 1.2.0} Users Manual}, + institution = {TBD}, + number = {Version 1.2.0}, + year = {2026}, + url = {https://openbt.readthedocs.io/} +} +``` From 668e5eef99b235cfaeabde57e7255e43eeb57cd7 Mon Sep 17 00:00:00 2001 From: Jared O'Neal Date: Thu, 25 Jun 2026 16:48:45 -0500 Subject: [PATCH 2/9] (Issue #28) Improve sphinx doc landing page. Leave TODOs. --- docs/index.rst | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index 1633abf..ce823fe 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -3,11 +3,13 @@ .. _Open MPI: https://www.open-mpi.org .. _MPICH: https://www.mpich.org .. _framework: https://bandframework.github.io +.. _Issue 35: https://github.com/bandframework/OpenBT/issues/35 +.. _OpenBT repository: https://bitbucket.org/mpratola/openbt/src/master +.. _OpenBTMixing repository: https://github.com/jcyannotty/OpenBT .. todo:: - Write high-level description, explain development history, and motivate - breakdown of documents presented here. License, copyright, responsibilities - of user, etc. + Matt to write high-level description and motivate breakdown of documents + presented here. License, copyright, responsibilities of user, etc. The heart of |openbt| is a set of C++ tools that can be used directly |via| the command line or indirectly through the ``openbt`` Python package, which wraps @@ -22,12 +24,18 @@ it can be built with MPI installed on a laptop using the system's package manager or with MPI installations on leadership class platforms and clusters that were installed by experts and optimized for their specific platform. +This repository is being established by merging the contents of the original +`OpenBT repository`_ with the `OpenBTMixing repository`_, which was based off of +the former. It, therefore, will supercede those two repositories, which will be +frozen. + +This repository and its contents are being established and developed as part of +|band| framework_. + .. note:: While an R wrapper does exist for the original |openbt| and |openbtmixing| repositories, that functionality has not yet been included in this new, - combined repository (Issue #XYZ). - -This package is being developed as part of |band| framework_. + combined repository (`Issue 35`_). .. toctree:: :numbered: From 8ccdb7c6b9cfcd948177577a306df06a8125b719 Mon Sep 17 00:00:00 2001 From: Jared O'Neal Date: Thu, 25 Jun 2026 16:50:10 -0500 Subject: [PATCH 3/9] (Issue #28) Sneak in tweaks not included in the last merge. These were discovered during the final stages of reviewing the branch's PR. They were simple enough that they were put off until the next branch. --- docs/release_procedure.rst | 4 ++-- openbt_pypkg/MANIFEST.in | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/release_procedure.rst b/docs/release_procedure.rst index 8adaee2..db1a149 100644 --- a/docs/release_procedure.rst +++ b/docs/release_procedure.rst @@ -128,8 +128,8 @@ Otherwise, * One side effect of the use of ``setuptools-scm`` is that by default it includes in the distribution all files located within ``openbt_pypkg``. However, many files and folders in that space do **not** need to be - distributed (|eg| Flake8 configuration files and OpenBT command line tools - installed during development in ``src/openbt/bin``). This is what + distributed (|eg| Flake8 configuration files and |openbt| command line + tools installed during development in ``src/openbt/bin``). This is what "minimal" means above. Files and folders that should not be included in the distribution are specified in ``MANIFEST.in``. * Review the metadata to ensure correct and complete. This should include diff --git a/openbt_pypkg/MANIFEST.in b/openbt_pypkg/MANIFEST.in index 9f8c6e7..59da8e5 100644 --- a/openbt_pypkg/MANIFEST.in +++ b/openbt_pypkg/MANIFEST.in @@ -1,4 +1,4 @@ -include LICENSE VERSION +include LICENSE exclude .flake8 .coveragerc exclude tox.ini prune src/openbt/bin From 05292895f69c8daf13133d22be78996279a23abb Mon Sep 17 00:00:00 2001 From: Matt Pratola Date: Mon, 29 Jun 2026 10:25:46 -0400 Subject: [PATCH 4/9] Added information to contributing.rst. Also removed old and incorrect license information from top of cpp files. --- docs/contributing.rst | 4 +++- src/ambrt.cpp | 22 ---------------------- src/amxbrt.cpp | 2 ++ src/brt.cpp | 22 ---------------------- src/brtfuns.cpp | 22 ---------------------- src/brtmoves.cpp | 20 -------------------- src/cli.cpp | 20 -------------------- src/crn.cpp | 22 ---------------------- src/mbrt.cpp | 22 ---------------------- src/mixandemulate.cpp | 2 ++ src/mixandemulatepred.cpp | 2 ++ src/mixingwts.cpp | 22 +--------------------- src/mopareto.cpp | 21 --------------------- src/mxbrt.cpp | 2 ++ src/pred.cpp | 20 -------------------- src/psbrt.cpp | 22 ---------------------- src/sbrt.cpp | 22 ---------------------- src/singlebinomial.cpp | 20 -------------------- src/singlepoisson.cpp | 20 -------------------- src/sobol.cpp | 21 --------------------- src/tnorm.cpp | 20 -------------------- src/tree.cpp | 22 ---------------------- src/treefuns.cpp | 22 ---------------------- src/vartivity.cpp | 20 -------------------- 24 files changed, 12 insertions(+), 402 deletions(-) diff --git a/docs/contributing.rst b/docs/contributing.rst index 8292b49..e62821c 100644 --- a/docs/contributing.rst +++ b/docs/contributing.rst @@ -3,4 +3,6 @@ Contributing General Information ------------------- .. todo:: - Sarthak to write this + The Open Bayesian Trees (OpenBT) project evolved from an initial C + MPI codebase written from scrach in 2010 through two subsequent ground-up rewrites culminating in the current C++ codebase with R and Python wrappers. Matthew T. Pratola (mpratola@iu.edu) has served as the primary author and architect for OpenBT. Akira Horiguchi (ahoriguchi@ucdavis.edu) made contributions with Sobol sensitivity indices and Shapley indices as well as Pareto front-based multiobjective optimization. John Yannotty (jcyannotty@gmail.com) contributed BART-based model mixing, the Random Path BART (RPBART) model and RPBART-based model mixing as well as Python wrapper code. Robert McCulloch (robert.mcculloch@asu.edu) made contributions in the early days with the random number generators and log gamma approximation. Clark van Lieshout (clarkvan33@gmail.com) contributed Python wrapper code. + + The current repo, now housed within the wider BAND project, becomes the long-term home for OpenBT. diff --git a/src/ambrt.cpp b/src/ambrt.cpp index 2e64898..a541a7e 100644 --- a/src/ambrt.cpp +++ b/src/ambrt.cpp @@ -1,26 +1,4 @@ // ambrt.cpp: Additive mean BART model class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "ambrt.h" #include "brtfuns.h" diff --git a/src/amxbrt.cpp b/src/amxbrt.cpp index d9669db..9292920 100644 --- a/src/amxbrt.cpp +++ b/src/amxbrt.cpp @@ -1,3 +1,5 @@ +// amxbrt.cpp: BART-based model mixing model class methods. + #include "amxbrt.h" #include "brtfuns.h" #include diff --git a/src/brt.cpp b/src/brt.cpp index acbc2f7..e25fa15 100644 --- a/src/brt.cpp +++ b/src/brt.cpp @@ -1,26 +1,4 @@ // brt.cpp: Base BT model class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "brt.h" #include "brtfuns.h" diff --git a/src/brtfuns.cpp b/src/brtfuns.cpp index 06f2847..aa7b732 100644 --- a/src/brtfuns.cpp +++ b/src/brtfuns.cpp @@ -1,26 +1,4 @@ // brtfuns.cpp: Base BT model class helper functios. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "brtfuns.h" diff --git a/src/brtmoves.cpp b/src/brtmoves.cpp index ec7d1f5..8e0e379 100644 --- a/src/brtmoves.cpp +++ b/src/brtmoves.cpp @@ -1,24 +1,4 @@ // brtmoves.cpp: Base BT model class advanced MH methods. -// Copyright (C) 2013-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #include "brt.h" #include "brtfuns.h" diff --git a/src/cli.cpp b/src/cli.cpp index 932daa3..9353940 100644 --- a/src/cli.cpp +++ b/src/cli.cpp @@ -1,24 +1,4 @@ // cli.cpp: Implement command-line model interface to OpenBT. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #include #include diff --git a/src/crn.cpp b/src/crn.cpp index ffe9b56..2222ace 100644 --- a/src/crn.cpp +++ b/src/crn.cpp @@ -1,26 +1,4 @@ // crn.h: Random number generator class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "crn.h" diff --git a/src/mbrt.cpp b/src/mbrt.cpp index 1cdca3c..e3e8c04 100644 --- a/src/mbrt.cpp +++ b/src/mbrt.cpp @@ -1,26 +1,4 @@ // mbrt.cpp: Mean tree BT model class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "mbrt.h" //#include "brtfuns.h" diff --git a/src/mixandemulate.cpp b/src/mixandemulate.cpp index a75da86..858d780 100644 --- a/src/mixandemulate.cpp +++ b/src/mixandemulate.cpp @@ -1,3 +1,5 @@ +// mixandemulate.cpp: Implement command-line model interface to model mixing. + #include #include #include diff --git a/src/mixandemulatepred.cpp b/src/mixandemulatepred.cpp index 6f3a7e7..38410f5 100644 --- a/src/mixandemulatepred.cpp +++ b/src/mixandemulatepred.cpp @@ -1,3 +1,5 @@ +// mixandemulatepred.cpp: Implement command-line model interface to model mixing prediction. + #include #include #include diff --git a/src/mixingwts.cpp b/src/mixingwts.cpp index f2a75cd..6f05df7 100644 --- a/src/mixingwts.cpp +++ b/src/mixingwts.cpp @@ -1,24 +1,4 @@ - // pred.cpp: Implement model prediction interface for OpenBT. - // Copyright (C) 2012-2018 Matthew T. Pratola - // - // This file is part of OpenBT. - // - // OpenBT is free software: you can redistribute it and/or modify - // it under the terms of the GNU Affero General Public License as published by - // the Free Software Foundation, either version 3 of the License, or - // (at your option) any later version. - // - // OpenBT is distributed in the hope that it will be useful, - // but WITHOUT ANY WARRANTY; without even the implied warranty of - // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - // GNU Affero General Public License for more details. - // - // You should have received a copy of the GNU Affero General Public License - // along with this program. If not, see . - // - // Author contact information - // Matthew T. Pratola: mpratola@gmail.com - +// mixingwts.cpp: Implement command-line model interface to model mixing weights. #include #include diff --git a/src/mopareto.cpp b/src/mopareto.cpp index 9e6ae6d..0959f76 100644 --- a/src/mopareto.cpp +++ b/src/mopareto.cpp @@ -1,25 +1,4 @@ // mopareto.cpp: Implement Pareto-front multiobjective optimization using OpenBT. -// Copyright (C) 2020 Matthew T. Pratola, Akira Horiguchi -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Akira Horiguchi: horiguchi.6@osu.edu - #include #include diff --git a/src/mxbrt.cpp b/src/mxbrt.cpp index 0b8b44f..afc3d76 100644 --- a/src/mxbrt.cpp +++ b/src/mxbrt.cpp @@ -1,3 +1,5 @@ +// mxbrt.cpp: Base single tree model mixing model class methods. + #include //header files from OpenBT diff --git a/src/pred.cpp b/src/pred.cpp index 6fd87ad..0469acb 100644 --- a/src/pred.cpp +++ b/src/pred.cpp @@ -1,24 +1,4 @@ // pred.cpp: Implement model prediction interface for OpenBT. -// Copyright (C) 2012-2018 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #include #include diff --git a/src/psbrt.cpp b/src/psbrt.cpp index 5056aba..e7f7581 100644 --- a/src/psbrt.cpp +++ b/src/psbrt.cpp @@ -1,26 +1,4 @@ // psbrt.cpp: Product variance BT model class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "psbrt.h" //#include "brtfuns.h" diff --git a/src/sbrt.cpp b/src/sbrt.cpp index c75ddc6..f4018be 100644 --- a/src/sbrt.cpp +++ b/src/sbrt.cpp @@ -1,26 +1,4 @@ // sbrt.cpp: Variance tree BT model class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "sbrt.h" //#include "brtfuns.h" diff --git a/src/singlebinomial.cpp b/src/singlebinomial.cpp index 3dd3411..64c6322 100644 --- a/src/singlebinomial.cpp +++ b/src/singlebinomial.cpp @@ -1,24 +1,4 @@ // singlebinomial.cpp: Binomial tree model methods. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #include "singlebinomial.h" //#include "brtfuns.h" diff --git a/src/singlepoisson.cpp b/src/singlepoisson.cpp index 25d52b7..d85dcad 100644 --- a/src/singlepoisson.cpp +++ b/src/singlepoisson.cpp @@ -1,24 +1,4 @@ // singlepoisson.cpp: Poisson tree model methods. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #include "singlepoisson.h" //#include "brtfuns.h" diff --git a/src/sobol.cpp b/src/sobol.cpp index 7b56a15..b674ed1 100644 --- a/src/sobol.cpp +++ b/src/sobol.cpp @@ -1,25 +1,4 @@ // sobol.cpp: Implement Sobol-based variable activity metrics for OpenBT. -// Copyright (C) 2020 Matthew T. Pratola, Akira Horiguchi -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Akira Horiguchi: horiguchi.6@osu.edu - #include #include diff --git a/src/tnorm.cpp b/src/tnorm.cpp index 675b316..d3d93c0 100644 --- a/src/tnorm.cpp +++ b/src/tnorm.cpp @@ -1,24 +1,4 @@ // tnorm.cpp: Truncated Normal helper functions for probit model. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - //# include //# include diff --git a/src/tree.cpp b/src/tree.cpp index 996478e..deb088e 100644 --- a/src/tree.cpp +++ b/src/tree.cpp @@ -1,26 +1,4 @@ // tree.cpp: BT tree class methods. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include #include diff --git a/src/treefuns.cpp b/src/treefuns.cpp index 65fba8f..c65c86d 100644 --- a/src/treefuns.cpp +++ b/src/treefuns.cpp @@ -1,26 +1,4 @@ // treefuns.cpp: BT tree class helper functions. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #include "treefuns.h" diff --git a/src/vartivity.cpp b/src/vartivity.cpp index 4c57071..c7eaf7e 100644 --- a/src/vartivity.cpp +++ b/src/vartivity.cpp @@ -1,24 +1,4 @@ // vartivity.cpp: Implement variable activity interface for OpenBT. -// Copyright (C) 2012-2018 Matthew T. Pratola. -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #include #include From 7cbbd74532c95255020cb6d66b4418e9637186df Mon Sep 17 00:00:00 2001 From: Matt Pratola Date: Mon, 29 Jun 2026 10:30:12 -0400 Subject: [PATCH 5/9] Also removed incorrect/outdated license information from C++ header files in the includes/ directory. --- includes/ambrt.h | 22 ---------------------- includes/amxbrt.h | 2 ++ includes/brt.h | 22 ---------------------- includes/brtfuns.h | 22 ---------------------- includes/crn.h | 22 ---------------------- includes/dinfo.h | 22 ---------------------- includes/mbrt.h | 22 ---------------------- includes/mxbrt.h | 2 ++ includes/psbrt.h | 22 ---------------------- includes/rn.h | 22 ---------------------- includes/sbrt.h | 22 ---------------------- includes/singlebinomial.h | 20 -------------------- includes/singlepoisson.h | 20 -------------------- includes/tnorm.h | 20 -------------------- includes/tree.h | 22 ---------------------- includes/treefuns.h | 22 ---------------------- 16 files changed, 4 insertions(+), 302 deletions(-) diff --git a/includes/ambrt.h b/includes/ambrt.h index 205bbbc..6931ef6 100644 --- a/includes/ambrt.h +++ b/includes/ambrt.h @@ -1,26 +1,4 @@ // ambrt.h: Additive mean BART model class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_ambrt_h #define GUARD_ambrt_h diff --git a/includes/amxbrt.h b/includes/amxbrt.h index d877fe0..dffe9fc 100644 --- a/includes/amxbrt.h +++ b/includes/amxbrt.h @@ -1,3 +1,5 @@ +// amxbrt.h: BART-based model mixing model class definition. + #ifndef GUARD_amxbrt_h #define GUARD_amxbrt_h diff --git a/includes/brt.h b/includes/brt.h index 456362d..1372069 100644 --- a/includes/brt.h +++ b/includes/brt.h @@ -1,26 +1,4 @@ // brt.h: Base BT model class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_brt_h #define GUARD_brt_h diff --git a/includes/brtfuns.h b/includes/brtfuns.h index cbfe39b..7ab7706 100644 --- a/includes/brtfuns.h +++ b/includes/brtfuns.h @@ -1,26 +1,4 @@ // brtfuns.h: Base BT model class help functions header file. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_brtfuns_h #define GUARD_brtfuns_h diff --git a/includes/crn.h b/includes/crn.h index 5d0feac..79edc14 100644 --- a/includes/crn.h +++ b/includes/crn.h @@ -1,26 +1,4 @@ // crn.h: Random number generator class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef CRN_H #define CRN_H diff --git a/includes/dinfo.h b/includes/dinfo.h index a70d85f..d9302f5 100644 --- a/includes/dinfo.h +++ b/includes/dinfo.h @@ -1,26 +1,4 @@ // dinfo.h: Data abstraction helper class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_dinfo_h #define GUARD_dinfo_h diff --git a/includes/mbrt.h b/includes/mbrt.h index 918302c..3c7649e 100644 --- a/includes/mbrt.h +++ b/includes/mbrt.h @@ -1,26 +1,4 @@ // mbrt.h: Mean tree BT model class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_mbrt_h #define GUARD_mbrt_h diff --git a/includes/mxbrt.h b/includes/mxbrt.h index 82464ac..989f220 100644 --- a/includes/mxbrt.h +++ b/includes/mxbrt.h @@ -1,3 +1,5 @@ +// mxbrt.h: Mean tree model mixing model class definition. + #ifndef GUARD_mxbrt_h #define GUARD_mxbrt_h diff --git a/includes/psbrt.h b/includes/psbrt.h index 9455460..97e137d 100644 --- a/includes/psbrt.h +++ b/includes/psbrt.h @@ -1,26 +1,4 @@ // psbrt.h: Product variance BT model class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_psbrt_h #define GUARD_psbrt_h diff --git a/includes/rn.h b/includes/rn.h index 39d62a8..45c1a1c 100644 --- a/includes/rn.h +++ b/includes/rn.h @@ -1,26 +1,4 @@ // rn.h: Random number generator virtual class definition. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_rn #define GUARD_rn diff --git a/includes/sbrt.h b/includes/sbrt.h index 39c578c..6ddede9 100644 --- a/includes/sbrt.h +++ b/includes/sbrt.h @@ -1,26 +1,4 @@ // sbrt.h: Variance tree BT model class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_sbrt_h #define GUARD_sbrt_h diff --git a/includes/singlebinomial.h b/includes/singlebinomial.h index 6477b2f..a83faaa 100644 --- a/includes/singlebinomial.h +++ b/includes/singlebinomial.h @@ -1,24 +1,4 @@ // singlebinomial.h: Binomial tree model class definition. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #ifndef GUARD_singlebinomial_h #define GUARD_singlebinomial_h diff --git a/includes/singlepoisson.h b/includes/singlepoisson.h index a843251..57dc0bc 100644 --- a/includes/singlepoisson.h +++ b/includes/singlepoisson.h @@ -1,24 +1,4 @@ // singlepoisson.h: Poisson tree model class definition. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - #ifndef GUARD_singlepoisson_h #define GUARD_singlepoisson_h diff --git a/includes/tnorm.h b/includes/tnorm.h index 897f7d8..9402147 100644 --- a/includes/tnorm.h +++ b/includes/tnorm.h @@ -1,24 +1,4 @@ // tnorm.h: Truncated Normal helper functions for probit model. -// Copyright (C) 2012-2019 Matthew T. Pratola -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com - // Draw from the truncated Normal distribution with left,right truncation points diff --git a/includes/tree.h b/includes/tree.h index 8a74d0d..b310fdc 100644 --- a/includes/tree.h +++ b/includes/tree.h @@ -1,26 +1,4 @@ // tree.h: BT tree class definition. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_tree_h #define GUARD_tree_h diff --git a/includes/treefuns.h b/includes/treefuns.h index cd31f09..00135dc 100644 --- a/includes/treefuns.h +++ b/includes/treefuns.h @@ -1,26 +1,4 @@ // treefuns.h: BT tree class helper functions header. -// Copyright (C) 2012-2016 Matthew T. Pratola, Robert E. McCulloch and Hugh A. Chipman -// -// This file is part of OpenBT. -// -// OpenBT is free software: you can redistribute it and/or modify -// it under the terms of the GNU Affero General Public License as published by -// the Free Software Foundation, either version 3 of the License, or -// (at your option) any later version. -// -// OpenBT is distributed in the hope that it will be useful, -// but WITHOUT ANY WARRANTY; without even the implied warranty of -// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the -// GNU Affero General Public License for more details. -// -// You should have received a copy of the GNU Affero General Public License -// along with this program. If not, see . -// -// Author contact information -// Matthew T. Pratola: mpratola@gmail.com -// Robert E. McCulloch: robert.e.mculloch@gmail.com -// Hugh A. Chipman: hughchipman@gmail.com - #ifndef GUARD_treefuns_h #define GUARD_treefuns_h From adcd13d78f5c9999d194a6b3b34145a3e0fa234e Mon Sep 17 00:00:00 2001 From: Matt Pratola Date: Mon, 29 Jun 2026 23:46:24 -0400 Subject: [PATCH 6/9] Slight reorg to contributing.rst to get historical order right. Also added Jared for his contributions. --- docs/contributing.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/contributing.rst b/docs/contributing.rst index e62821c..4055b6f 100644 --- a/docs/contributing.rst +++ b/docs/contributing.rst @@ -3,6 +3,6 @@ Contributing General Information ------------------- .. todo:: - The Open Bayesian Trees (OpenBT) project evolved from an initial C + MPI codebase written from scrach in 2010 through two subsequent ground-up rewrites culminating in the current C++ codebase with R and Python wrappers. Matthew T. Pratola (mpratola@iu.edu) has served as the primary author and architect for OpenBT. Akira Horiguchi (ahoriguchi@ucdavis.edu) made contributions with Sobol sensitivity indices and Shapley indices as well as Pareto front-based multiobjective optimization. John Yannotty (jcyannotty@gmail.com) contributed BART-based model mixing, the Random Path BART (RPBART) model and RPBART-based model mixing as well as Python wrapper code. Robert McCulloch (robert.mcculloch@asu.edu) made contributions in the early days with the random number generators and log gamma approximation. Clark van Lieshout (clarkvan33@gmail.com) contributed Python wrapper code. + The Open Bayesian Trees (OpenBT) project evolved from an initial C + MPI codebase written from scrach in 2010 through two subsequent ground-up rewrites culminating in the current C++ codebase with R and Python wrappers. Matt Pratola (mpratola@iu.edu) has served as the primary author and architect for OpenBT. John Yannotty (jcyannotty@gmail.com) made significant contributions to the project including BART-based model mixing, the Random Path BART (RPBART) model and RPBART-based model mixing, integrating the Eigen library, moving the build system from Autotools to Meson, and writing Python wrapper code. Jared O'Neal (joneal@anl.gov) led the move from Autotools to Meson and built out the various Github and Python integrations as part of folding OpenBT into BAND. Akira Horiguchi (ahoriguchi@ucdavis.edu) made contributions with Sobol sensitivity indices and Shapley indices as well as Pareto front-based multiobjective optimization. Clark van Lieshout (clarkvan33@gmail.com) contributed Python wrapper code. Robert McCulloch (robert.mcculloch@asu.edu) made contributions in the early days with the random number generators and log gamma approximation. The current repo, now housed within the wider BAND project, becomes the long-term home for OpenBT. From 4faacdb682848f97956a83b543ce85211c6481e3 Mon Sep 17 00:00:00 2001 From: Matt Pratola Date: Tue, 30 Jun 2026 17:20:31 -0400 Subject: [PATCH 7/9] Initial merge of .h and .cpp files from bitbucket repo into band repo. --- includes/ambrt.h | 13 +- includes/brt.h | 3 + includes/brtfuns.h | 39 +++++- includes/mbrt.h | 4 + src/ambrt.cpp | 289 ++++++++++++++++++++++++++++++++++++++++++++- src/brt.cpp | 93 ++++++++++++++- src/brtfuns.cpp | 282 ++++++++++++++++++++++++++++++++++--------- src/cli.cpp | 43 +++++++ src/mbrt.cpp | 48 ++++++++ src/mopareto.cpp | 214 +++++++++++++++++++++++++++------ src/sobol.cpp | 24 +++- 11 files changed, 950 insertions(+), 102 deletions(-) diff --git a/includes/ambrt.h b/includes/ambrt.h index 6931ef6..231007b 100644 --- a/includes/ambrt.h +++ b/includes/ambrt.h @@ -55,13 +55,24 @@ class ambrt : public mbrt void collapseensemble(); // function for calculating Sobol-based variable activity indices - void sobol(std::vector& Si, std::vector&Sij, std::vector& TSi, double& V, std::vector& minx, std::vector& maxx, size_t p); + void sobol(std::vector& Si, std::vector&Sij, std::vector& TSi, std::vector& Shi, double& V, std::vector& minx, std::vector& maxx, size_t p, rn& gen); // function for converting an ensemble to vector hyperrectangle format, needed for Pareto Front multiobjective optimization (see mopareto.cpp) void ens2rects(std::vector >& asol, std::vector >& bsol, std::vector& thetasol, std::vector& minx, std::vector& maxx, size_t p); + // function to extract hyperrectangle from bottom nodes associated with influential observation y_d(x_d) + void inflrect(double* xd, std::vector& asol, std::vector& bsol, + std::vector& minx, std::vector& maxx, size_t p); + // function to extract the 'union' hyperrectangle from bottom nodes associated with influential observation y_d(x_d) + void influnionrect(double* xd, std::vector& asol, std::vector&bsol, + std::vector& minx, std::vector& maxx, size_t p); + + void cookdinfl(std::vector& cdinflmean, std::vector& cdinflmax, double* sigma); //Cook's distance + void kldivinfl(std::vector& klinfl, double* sigma); //KL-divergence based influence metric + void cpoinfl(std::vector& cpoinfl, double* sigma); //CPO^-1 based influence metric + //-------------------- //data //-------------------------------------------------- diff --git a/includes/brt.h b/includes/brt.h index 1372069..b52d127 100644 --- a/includes/brt.h +++ b/includes/brt.h @@ -155,6 +155,9 @@ class brt { // void loadtree(size_t nn, int* id, int* v, int* c, double* theta); //load tree from vector input format void loadtree(size_t iter, size_t m, std::vector& nn, std::vector >& id, std::vector >& v, std::vector >& c, std::vector >& theta); + void cookdinfl(std::vector& cdinfl); //Cook's distance + void kldivinfl(std::vector& klinfl); //KL-divergence based influence metric + void cpoinfl(std::vector& cpoinfl); //CPO^-1 based influence metric //-------------------- //data tree t; diff --git a/includes/brtfuns.h b/includes/brtfuns.h index 7ab7706..1e75a84 100644 --- a/includes/brtfuns.h +++ b/includes/brtfuns.h @@ -8,6 +8,8 @@ #include "tree.h" #include "treefuns.h" #include "brt.h" +#include // std::set_union, std::set_intersection, std::sort +#include using std::cout; using std::endl; @@ -90,7 +92,7 @@ void splitall(tree::tree_p t, tree::npv& tlefts, tree::npv& trights); //-------------------------------------------------- // Functions to support calculation of Sobol indices for BART -// Based on Hiroguchi, Pratola and Santner (2020). +// Based on Horiguchi, Pratola and Santner (2020). //-------------------------------------------------- double probxnoti_termk(size_t i, size_t k, std::vector >& a, std::vector >& b, std::vector& minx, std::vector& maxx); double probxi_termk(size_t i, size_t k, std::vector >& a, std::vector >& b, std::vector& minx, std::vector& maxx); @@ -100,9 +102,23 @@ double probxi_termkl(size_t i, size_t k, size_t l, std::vector >& a, std::vector >& b, std::vector& minx, std::vector& maxx); double probxall_termkl(size_t k, size_t l, std::vector >& a, std::vector >& b, std::vector& minx, std::vector& maxx); +//-------------------------------------------------- +// Additional functions to support calculation of Shapley indices for BART +// Based on Horiguchi and Pratola (2022). +//-------------------------------------------------- +double probxnotP_termk_sub(size_t k, std::vector >& a, + std::vector >& b, std::vector& minx, std::vector& maxx, + std::vector &dims); +double probxP_termk_sub(size_t k, std::vector >& a, + std::vector >& b, std::vector& minx, std::vector& maxx, + std::vector &dims); +double probxP_termkl_sub(size_t k, size_t l, std::vector >& a, + std::vector >& b, std::vector& minx, std::vector& maxx, + std::vector &dims); + //-------------------------------------------------- // This function only used for determining Pareto front/set. -// Based on Hiroguchi, Santner, Sun and Pratola (2020). +// Based on Horiguchi, Santner, Sun and Pratola (2020). //-------------------------------------------------- double probxall_termkl_rect(size_t k, size_t l, std::vector >& a0, std::vector >& b0, std::vector >& a1, @@ -110,6 +126,25 @@ double probxall_termkl_rect(size_t k, size_t l, std::vector std::vector find_pareto_front(size_t start, size_t end, std::list > theta); bool not_dominated(size_t index, std::vector R, std::list > theta); +void combine_ensembles(size_t p, xinfo& xi, std::vector >& ximaps, std::vector& minx, std::vector& maxx, + std::vector >& a1, std::vector >& b1, std::list >& ltheta1, + std::vector >& a2, std::vector >& b2, std::vector& theta2, double fmean2, + std::vector >& asol, std::vector >& bsol, std::list >& thetasol, + bool to_clear_old_ens); + +//-------------------------------------------------- +// This function only used determining the hyperrectangle that should +// be reweighted due to influential observation. +// Based on Pratola, George and McCulloch (2020). +//-------------------------------------------------- +double probxall_term_rect(std::vector& a0, + std::vector& b0, std::vector& a1, + std::vector& b1, std::vector& minx, std::vector& maxx, std::vector& aout, std::vector& bout); + +void unionxall_term_rect(std::vector& a0, + std::vector& b0, std::vector& a1, + std::vector& b1, std::vector& minx, std::vector& maxx, std::vector& aout, std::vector& bout); + //-------------------------------------------------- // Functions to Model Mixing with BART and/or Vector Parameters diff --git a/includes/mbrt.h b/includes/mbrt.h index 3c7649e..15353e1 100644 --- a/includes/mbrt.h +++ b/includes/mbrt.h @@ -71,7 +71,11 @@ class mbrt : public brt virtual std::vector& newsinfovec(size_t dim) { std::vector* si = new std::vector; si->resize(dim); for(size_t i=0;ipush_back(new msinfo); return *si; } virtual void local_mpi_reduce_allsuff(std::vector& siv); virtual void local_mpi_sr_suffs(sinfo& sil, sinfo& sir); + virtual void local_setr(diterator& diter); void pr(); + void cookdinfl(std::vector& cdinfl, double* sigma); //Cook's distance + void kldivinfl(std::vector& klinfl, double* sigma); //KL-divergence based influence metric + void cpoinfl(std::vector& cpoinfl, double* sigma); //CPO^-1 based influence metric //-------------------- //data diff --git a/src/ambrt.cpp b/src/ambrt.cpp index a541a7e..6b519e6 100644 --- a/src/ambrt.cpp +++ b/src/ambrt.cpp @@ -167,7 +167,7 @@ void ambrt::local_loadtree(size_t iter, int beg, int end, std::vector& nn, void ambrt::pr() { std::cout << "***** ambrt object:\n"; - cout << "Number of trees in product representation:" << endl; + cout << "Number of trees in sum representation:" << endl; cout << " m: m=" << m << endl; cout << "Conditioning info on each individual tree:" << endl; cout << " mean: tau=" << ci.tau << endl; @@ -204,8 +204,9 @@ void ambrt::collapseensemble() //Draw is a p-dim vector to store the S_i's for all i=1,..,p variables. //Based on Hiroguchi, Pratola and Santner (2020). void ambrt::sobol(std::vector& Si, std::vector& Sij, - std::vector& TSi, double& V, std::vector& minx, - std::vector& maxx, size_t p) + std::vector& TSi, std::vector& Shi, + double& V, std::vector& minx, + std::vector& maxx, size_t p, rn& gen) { double term1,term2,term3; tree::npv bots; @@ -218,8 +219,22 @@ void ambrt::sobol(std::vector& Si, std::vector& Sij, size_t B=bots.size(); std::vector pxnoti(B); + std::vector pxnotP_sub(B); + std::vector pxnotPi_sub(B); + std::vector subdims(p,false); + std::vector tempdims(p,false); std::vector > a(p,std::vector(B)); std::vector > b(p,std::vector(B)); + double Psize=0.0; + + for(size_t i=0;i& Si, std::vector& Sij, { Si[i]=0.0; TSi[i]=0.0; - for(size_t k=0;k& Si, std::vector& Sij, for(size_t k=0;kgettheta()*pxnoti[k]; term2=bots[l]->gettheta()*pxnoti[l]; term3=probxi_termkl(i,k,l,a,b,minx,maxx)-probxi_termk(i,k,a,b,minx,maxx)*probxi_termk(i,l,a,b,minx,maxx); Si[i]+=term1*term2*term3; + //Shapley: + term1=bots[k]->gettheta()*pxnotPi_sub[k]; + term2=bots[l]->gettheta()*pxnotPi_sub[l]; + tempdims=subdims; + tempdims[i]=true; + term3=probxP_termkl_sub(k,l,a,b,minx,maxx,tempdims)-probxP_termk_sub(k,a,b,minx,maxx,tempdims)*probxP_termk_sub(l,a,b,minx,maxx,tempdims); + Shi[i]+=term1*term2*term3; + + tempdims[i]=false; + term1=bots[k]->gettheta()*pxnotP_sub[k]; + term2=bots[l]->gettheta()*pxnotP_sub[l]; + term3=probxP_termkl_sub(k,l,a,b,minx,maxx,tempdims)-probxP_termk_sub(k,a,b,minx,maxx,tempdims)*probxP_termk_sub(l,a,b,minx,maxx,tempdims); + Shi[i]-=term1*term2*term3; + + //Sobol Total effect: term1=bots[k]->gettheta()*probxi_termk(i,k,a,b,minx,maxx); term2=bots[l]->gettheta()*probxi_termk(i,l,a,b,minx,maxx); term3=probxnoti_termkl(i,k,l,a,b,minx,maxx)-pxnoti[k]*pxnoti[l]; @@ -507,3 +544,245 @@ void ambrt::ens2rects(std::vector >& asol, std::vector& asol, std::vector& bsol, + std::vector& minx, std::vector& maxx, size_t p) +{ + tree::tree_p bot; + std::vector a0(p),b0(p),anext(p),bnext(p); + std::vector aout(p); + std::vector bout(p); + + // Start with the first tree in the ensemble. + bot=mb[0].t.bn(xd,*xi); + + // Convert bot from first tree into hyperrectangle in + // the a0,b0 vector. + for(size_t i=0;i::min(); U=std::numeric_limits::max(); + bot->rgi(i,&L,&U); + + // Now we have the interval endpoints, put corresponding values in a,b matrices. + if(L!=std::numeric_limits::min()) a0[i]=(*xi)[i][L]; + else a0[i]=minx[i]; + if(U!=std::numeric_limits::max()) b0[i]=(*xi)[i][U]; + else b0[i]=maxx[i]; + } + + // the solution so far is just a0,b0 + asol=a0; bsol=b0; + a0.clear(); b0.clear(); + + // Now the main loop -- we will loop over trees 1..m, each time extracting the + // hyperrectangle of terminal node in tree j corresponding to x_d to the same + // a,b vector format and then taking the pairwise + // intersection of the current solution in a0,b0 with the next tree to add. + for(size_t j=1;j::min(); U=std::numeric_limits::max(); + bot->rgi(i,&L,&U); + + // Now we have the interval endpoints, put corresponding values in a,b matrices. + if(L!=std::numeric_limits::min()) anext[i]=(*xi)[i][L]; + else anext[i]=minx[i]; + if(U!=std::numeric_limits::max()) bnext[i]=(*xi)[i][U]; + else bnext[i]=maxx[i]; + } + + // now intersect the solution we have so far with this j'th tree and + // update the resulting stored solution + a0=asol; b0=bsol; + asol.clear(); bsol.clear(); + + // if the rectangles defined by a0[k],b0[k] and anext[l],bnext[l] intersect +// if(probxall_term_rect(k,l,a0,b0,anext,bnext,minx,maxx,aout,bout)>0.0) { + if(probxall_term_rect(a0,b0,anext,bnext,minx,maxx,aout,bout)>0.0) + { + asol=aout; + bsol=bout; + } else + { + cout << "should never happen!" << endl; + } + + // reset anext,bnext + anext.clear(); bnext.clear(); + anext.resize(p); bnext.resize(p); + // reset a0,b0 + a0.clear(); b0.clear(); + } // end of the main loop + +} + + +//-------------------------------------------------- +// Get ensemble rectangle corresponding to terminal nodes associated +// with observation y_d(x_d), the observation the be held-out in the reweighting +// scheme. Also calculates the corresponding weight. +// This is used in the importance sampling reweighting scheme to handle +// influential observations +// Based on Pratola, George and McCulloch (2020). +// This function extracts the union rectangle. +void ambrt::influnionrect(double* xd, std::vector& asol, std::vector&bsol, + std::vector& minx, std::vector& maxx, size_t p) +{ + tree::tree_p bot; + std::vector a0(p),b0(p),anext(p),bnext(p); + std::vector aout(p); + std::vector bout(p); + + // Start with the first tree in the ensemble. + bot=mb[0].t.bn(xd,*xi); + + // Convert bot from first tree into hyperrectangle in + // the a0,b0 vector. + for(size_t i=0;i::min(); U=std::numeric_limits::max(); + bot->rgi(i,&L,&U); + + // Now we have the interval endpoints, put corresponding values in a,b matrices. + if(L!=std::numeric_limits::min()) a0[i]=(*xi)[i][L]; + else a0[i]=minx[i]; + if(U!=std::numeric_limits::max()) b0[i]=(*xi)[i][U]; + else b0[i]=maxx[i]; + } + + // the solution so far is just a0,b0 + asol=a0; bsol=b0; + a0.clear(); b0.clear(); + + // Now the main loop -- we will loop over trees 1..m, each time extracting the + // hyperrectangle of terminal node in tree j corresponding to x_d to the same + // a,b vector format and then taking the pairwise + // union of the current solution in a0,b0 with the next tree to add. + for(size_t j=1;j::min(); U=std::numeric_limits::max(); + bot->rgi(i,&L,&U); + + // Now we have the interval endpoints, put corresponding values in a,b matrices. + if(L!=std::numeric_limits::min()) anext[i]=(*xi)[i][L]; + else anext[i]=minx[i]; + if(U!=std::numeric_limits::max()) bnext[i]=(*xi)[i][U]; + else bnext[i]=maxx[i]; + } + + // now intersect the solution we have so far with this j'th tree and + // update the resulting stored solution + a0=asol; b0=bsol; + asol.clear(); bsol.clear(); + + // Take union of the rectangles defined by a0[k],b0[k] and anext[l],bnext[l]. + // Note that both of these rectangles must include xd by definition, and therefore + // are not disjoint so the union is straightforward to calculate. + unionxall_term_rect(a0,b0,anext,bnext,minx,maxx,aout,bout); + asol=aout; + bsol=bout; + + // reset anext,bnext + anext.clear(); bnext.clear(); + anext.resize(p); bnext.resize(p); + // reset a0,b0 + a0.clear(); b0.clear(); + } // end of the main loop + +} + + + +//-------------------------------------------------- +// Influence metrics. See Pratola, George and McCulloch (2021) + +// Cook's distance +void ambrt::cookdinfl(std::vector& cdinflmean, std::vector& cdinflmax, double* sigma) +{ + std::fill(cdinflmean.begin(),cdinflmean.end(),0.0); + std::fill(cdinflmax.begin(),cdinflmax.end(),0.0); + + std::vector temp(di->n); + for(size_t j=0;jn;i++) { + cdinflmean[i] += temp[i]/((double) m); + if(temp[i]>cdinflmax[i]) cdinflmax[i]=temp[i]; + } + } +} + +//KL-divergence based influence metric +void ambrt::kldivinfl(std::vector& klinfl, double* sigma) +{ + std::fill(klinfl.begin(),klinfl.end(),0.0); + + std::vector temp(di->n); + for(size_t j=0;jn;i++) { + if(temp[i]==std::numeric_limits::infinity()) { + klinfl[i]=temp[i]; + break; + } + else { + double stdres = resid[i]/sigma[i]; + double stdres2 = stdres*stdres; + klinfl[i] = -0.5*std::log(2*3.14159)-std::log(sigma[i])-0.5*stdres2; + } + } + } + +} + +//CPO^-1 based influence metric +void ambrt::cpoinfl(std::vector& cpoinfl, double* sigma) +{ + std::fill(cpoinfl.begin(),cpoinfl.end(),0.0); + + std::vector temp(di->n); + for(size_t j=0;jn;i++) { + if(temp[i]==std::numeric_limits::infinity()) { + cpoinfl[i]=temp[i]; + break; + } + else { + double stdres = resid[i]/sigma[i]; + double stdres2 = stdres*stdres; + cpoinfl[i]=std::sqrt(2*3.14159)*sigma[i]*std::exp(stdres2/2.0); + } + } + } + +} diff --git a/src/brt.cpp b/src/brt.cpp index e25fa15..8b2cfc7 100644 --- a/src/brt.cpp +++ b/src/brt.cpp @@ -1022,6 +1022,97 @@ void brt::local_predict(diterator& diter) diter.sety(bn->gettheta()); } } + +//-------------------------------------------------- +// Influence metrics. See Pratola, George and McCulloch (2021) + +// Cook's distance +void brt::cookdinfl(std::vector& cdinfl) +{ + tree::tree_p tbn; //which bottom node is current x in? + std::map bn_to_idx; //bottom node to index map + tree::npv bnv; + bnv.clear(); + t.getbots(bnv); + size_t B=bnv.size(); + std::vector nodenums(B,0); + + // initialize map + for(size_t i=0;in;i++) { + xx = di->x + i*di->p; + tbn = t.bn(xx,*xi); + nodenums[bn_to_idx[tbn]]++; + } + + // Sum up the ni's across all MPI threads + for(size_t i=0;in;i++) { + xx = di->x + i*di->p; + tbn = t.bn(xx,*xi); + ni = nodenums[bn_to_idx[tbn]]; + cdinfl[i]=((double)ni)/((double)B*((double)ni-1.0)*((double)ni-1.0)); + } +} + +//KL-divergence based influence metric +void brt::kldivinfl(std::vector& klinfl) +{ + tree::tree_p tbn; //which bottom node is current x in? + std::map bn_to_idx; //bottom node to index map + tree::npv bnv; + bnv.clear(); + t.getbots(bnv); + size_t B=bnv.size(); + std::vector nodenums(B,0); + + // initialize map + for(size_t i=0;in;i++) { + xx = di->x + i*di->p; + tbn = t.bn(xx,*xi); + nodenums[bn_to_idx[tbn]]++; + } + + // Sum up the ni's across all MPI threads + for(size_t i=0;in;i++) { + xx = di->x + i*di->p; + tbn = t.bn(xx,*xi); + ni = nodenums[bn_to_idx[tbn]]; + if(ni==(unsigned int)mi.minperbot) //ni-1 falls below threshold + klinfl[i]=std::numeric_limits::infinity(); + else + klinfl[i]=0.0; + } +} + + +//CPO-based divergence based influence metric +void brt::cpoinfl(std::vector& cpoinfl) +{ + // for cpoinfl in brt class, all we need is to extract the infinities, which is done in kldivinfl already. + kldivinfl(cpoinfl); +} + //-------------------------------------------------- //save/load tree to/from vector format //Note: for single tree models the parallelization just directs @@ -2465,4 +2556,4 @@ bool brt::rot(tree::tree_p tnew, tree& x, rn& gen) return false; // we never actually get here. } -*/ \ No newline at end of file +*/ diff --git a/src/brtfuns.cpp b/src/brtfuns.cpp index aa7b732..07d7147 100644 --- a/src/brtfuns.cpp +++ b/src/brtfuns.cpp @@ -946,7 +946,7 @@ bool mergecount(tree::tree_p tl, tree::tree_p tr, size_t v, size_t c, int* nways //-------------------------------------------------- // Functions to support calculation of Sobol indices for BART -// Based on Hiroguchi, Pratola and Santner (2020). +// Based on Horiguchi, Pratola and Santner (2020). //-------------------------------------------------- double probxnoti_termk(size_t i, size_t k, std::vector >& a, std::vector >& b, std::vector& minx, std::vector& maxx) @@ -1043,9 +1043,95 @@ double probxall_termkl(size_t k, size_t l, std::vector >& a, return prob; } +//-------------------------------------------------- +// Additional functions to support calculation of Shapley indices for BART +// Based on Horiguchi and Pratola (2022). +//-------------------------------------------------- +double probxnotP_termk_sub(size_t k, std::vector >& a, + std::vector >& b, std::vector& minx, std::vector& maxx, + std::vector &dims) +{ +/* double prob=1.0; + size_t p=minx.size(); + + for(size_t j=0;j >& a, + std::vector >& b, std::vector& minx, std::vector& maxx, + std::vector &dims) +{ +/* double prob=1.0; + size_t p=minx.size(); + + for(size_t j=0;j >& a, + std::vector >& b, std::vector& minx, std::vector& maxx, + std::vector &dims) +{ + double aa,bb; +/* double prob=1.0; + size_t p=minx.size(); + + for(size_t j=0;j return prob; } +//-------------------------------------------------- +// Similar to probxall_termkl_rect() function except +// single hyperrectangles are passed in rather than a vector of them +// so no need for k,l indices. +// Currently only used in the ambrt::inflrect() method for reweighting +// predictions for influential observations. +// See Pratola, George and McCulloch (2020). +//-------------------------------------------------- +double probxall_term_rect(std::vector& a0, + std::vector& b0, std::vector& a1, + std::vector& b1, std::vector& minx, std::vector& maxx, std::vector& aout, std::vector& bout) +{ + double prob=1.0; + size_t p=minx.size(); + + for(size_t j=0;j& a0, + std::vector& b0, std::vector& a1, + std::vector& b1, std::vector& minx, std::vector& maxx, std::vector& aout, std::vector& bout) +{ + size_t p=minx.size(); + + for(size_t j=0;j find_pareto_front(size_t start, size_t end, std::list > theta) @@ -1083,9 +1219,9 @@ std::vector find_pareto_front(size_t start, size_t end, std::list find_pareto_front(size_t start, size_t end, std::list R, std::list > theta) { + size_t d = theta.front().size() - 1; // this assumes theta still has the index record as the last element of each vector. // note the -1's because we keep track of 1..sizeof(V) but the vectors are indexed by 0..sizeof(V)-1. - for(size_t i=0;i >::iterator itR = std::next(theta.begin(),R[i]-1); - std::list >::iterator it = std::next(theta.begin(),index-1); + std::list >::iterator it = std::next(theta.begin(),index-1); + // for(size_t i=0;i >::iterator itR = std::next(theta.begin(),i-1); // if R_ij <= v_j for all j then R_i dominates v, return false // if(theta[R[i]-1][0]<=theta[index-1][0] && theta[R[i]-1][1]<=theta[index-1][1]) - if((*itR).at(0) <= (*it).at(0) && (*itR).at(1) <= (*it).at(1)) - return false; + bool is_dominated = true; + for (size_t j=0;jgetthetavec(); +// combines a dd-dim output ensemble with a 1-dim output ensemble to make a (dd+1)-dim output ensemble +// assume a2/b2/theta2 is a 1-dim output ens +// but a1/b1/ltheta1 is a dd-dim output ens (where dd>=1) +// function's output is asol/bsol/thetasol +void combine_ensembles(size_t p, xinfo& xi, std::vector >& ximaps, std::vector& minx, std::vector& maxx, + std::vector >& a1, std::vector >& b1, std::list >& ltheta1, + std::vector >& a2, std::vector >& b2, std::vector& theta2, double fmean2, + std::vector >& asol, std::vector >& bsol, std::list >& thetasol, + bool to_clear_old_ens) +{ - //simple case, tprime is terminal, t is (always) terminal - if(!tprime->l) { - t->setthetavec(tprime->getthetavec()+thetavec); - } - else if(!t->p) //simple case 2: t is a terminal root node - { - st.tonull(); - st=(*tprime); //copy - st.getbots(tbots);// all terminal nodes below t. - for(size_t j=0;jsetthetavec(tbots[j]->getthetavec()+thetavec); + std::vector aout(p),bout(p); + + // 1. Assign rectangle indices in ens2 to the 3-d jagged tensor ie2 + // Time: O(theta2.size() * p * numcut) + std::vector > > ie2(p); // (p,ncutsi+1,varying) jagged tensor -- to contain indices of ens2 + for (size_t w=0;wp; - if(t->isleft()) { - t->p=0; - delete t; - tpar->l=new tree(*tempt); - tpar->l->p=tpar; - tpar->l->getpathtorootlr(tlefts,trights); - //collapse redundancies in tprime - splitall(tpar->l,tlefts,trights); - tpar->l->getbots(tbots);// all terminal nodes below t. + } + + // 2. Get rectangle intersections between ens1 and ens2 + // Time: O(theta1.size() * p * numcut), which is faster than O(theta1.size() * theta2.size()) + std::list >::iterator it1 = ltheta1.begin(); + for (size_t j=0;j valids; // to contain all rectangles k in ambm2 that intersect with rectangle j from ambm1 + size_t v_size = 0; + for (size_t w=0;w validsw(nvalidw); + std::vector::iterator it; + nvalidw = 0; + for (size_t c=lc;c<=rc;c++) { + it=std::set_union(validsw.begin(), validsw.begin()+nvalidw, ie2[w][c].begin(), ie2[w][c].end(), validsw.begin()); + nvalidw = it-validsw.begin(); } - else { //isright - t->p=0; - delete t; - tpar->r=new tree(*tempt); - tpar->r->p=tpar; - tpar->r->getpathtorootlr(tlefts,trights); - //collapse redundancies in tprime - splitall(tpar->r,tlefts,trights); - tpar->r->getbots(tbots);// all terminal nodes below t. + if (w==0) { + valids.reserve(nvalidw); + valids.assign(validsw.begin(), validsw.begin()+nvalidw); + v_size = nvalidw; + } + else { // intersect valids and validsw; store in valids + it=std::set_intersection(valids.begin(), valids.begin()+v_size, validsw.begin(), validsw.begin()+nvalidw, valids.begin()); + v_size = it-valids.begin(); } + } - for(size_t j=0;jsetthetavec(tbots[j]->getthetavec()+thetavec); + for (size_t kind=0;kind0.0) { // which should almost always be true, I believe + asol.push_back(aout); + bsol.push_back(bout); + //The asol.size() is to record the index in the unsorted list so we can back + //out the correct VHR for the theta's on the PF later on(line 497-498). + //It would be cleaner if we had a list of vector, size_t tuple but this works as well. + std::vector th = *it1; // ltheta1[j]; + if (th.size() > 1) th.pop_back(); // this is to remove the index record keeping in ltheta1 + th.push_back(theta2[k]+fmean2); + th.push_back((double)(asol.size())); + thetasol.push_back(th); + } } + it1++; + } + + if (to_clear_old_ens) { // clear the hyperrectangles for this realization + a1.clear(); + b1.clear(); + ltheta1.clear(); + a2.clear(); + b2.clear(); + theta2.clear(); + } } //-------------------------------------------------- @@ -1223,4 +1399,4 @@ void array_to_vector(Eigen::VectorXd &V, double *b){ for(size_t i = 0; i > stheta(nd*mh, std::vector(1)); brtMethodWrapper fambm(&brt::f,ambm); brtMethodWrapper fpsbm(&brt::f,psbm); + std::vector Cookd_mean(di.n,0.0); + std::vector Cookd_mean_temp(di.n,0.0); + std::vector Cookd_max(di.n,0.0); + std::vector Cookd_max_temp(di.n,0.0); + std::vector KLinfl(di.n,0.0); + std::vector KLinfl_temp(di.n,0.0); + std::vector CPOinfl(di.n,0.0); + std::vector CPOinfl_temp(di.n,0.0); #ifdef _OPENMPI double tstart=0.0,tend=0.0; @@ -817,6 +825,30 @@ if(modeltype!=MODEL_MIXBART){ } } + //Calculate Cook's distance and KL divergence influence metric + ambm.cookdinfl(Cookd_mean_temp,Cookd_max_temp,disig.y); + ambm.kldivinfl(KLinfl_temp,disig.y); + ambm.cpoinfl(CPOinfl_temp,disig.y); + for(size_t j=0;j::infinity()) { + KLinfl[j]=std::numeric_limits::infinity(); + } + else if(KLinfl[j] != std::numeric_limits::infinity()) { + KLinfl[j] += KLinfl_temp[j]/((double) nd); + } + + if(CPOinfl_temp[j] == std::numeric_limits::infinity()) { + CPOinfl[j]=std::numeric_limits::infinity(); + } + else if(CPOinfl[j] != std::numeric_limits::infinity()) { + CPOinfl[j] += CPOinfl_temp[j]/((double) nd); + } + } + + //save tree to vec format if(mpirank==0) { ambm.savetree(i,m,onn,oid,ovar,oc,otheta); @@ -830,6 +862,17 @@ if(modeltype!=MODEL_MIXBART){ } #endif + // Finalize KLinfl and write out to files. + if(mpirank>0) { + for(size_t j=0;j& siv) #endif } +void mbrt::local_setr(diterator& diter) +{ + tree::tree_p bn; + + for(;ditery[*diter] - bn->gettheta(); + } +} + +//-------------------------------------------------- +// Influence metrics. See Pratola, George and McCulloch (2021) + +// Cook's distance +void mbrt::cookdinfl(std::vector& cdinfl, double* sigma) +{ + brt::cookdinfl(cdinfl); + for(size_t i=0;in;i++) { + double stdres = resid[i]/sigma[i]; + double stdres2 = stdres*stdres; + cdinfl[i] *= stdres2; + } +} + +//KL-divergence based influence metric +void mbrt::kldivinfl(std::vector& klinfl, double* sigma) +{ + brt::kldivinfl(klinfl); + for(size_t i=0;in;i++) + if(klinfl[i]!=std::numeric_limits::infinity()) { + double stdres = resid[i]/sigma[i]; + double stdres2 = stdres*stdres; + klinfl[i]=-0.5*std::log(2*3.14159)-std::log(sigma[i])-0.5*stdres2; + } +} + +//CPO^-1 based influence metric +void mbrt::cpoinfl(std::vector& cpoinfl, double* sigma) +{ + brt::cpoinfl(cpoinfl); + for(size_t i=0;in;i++) + if(cpoinfl[i]!=std::numeric_limits::infinity()) { + double stdres = resid[i]/sigma[i]; + double stdres2 = stdres*stdres; + cpoinfl[i]=std::sqrt(2*3.14159)*sigma[i]*std::exp(stdres2/2.0); + } +} + //-------------------------------------------------- //pr for brt void mbrt::pr() diff --git a/src/mopareto.cpp b/src/mopareto.cpp index 0959f76..2e5434c 100644 --- a/src/mopareto.cpp +++ b/src/mopareto.cpp @@ -4,6 +4,8 @@ #include #include #include +#include // std::set_union, std::set_intersection, std::sort +#include #include #include @@ -28,6 +30,7 @@ int main(int argc, char* argv[]) { std::string folder(""); std::string folder2(""); + std::string folder3(""); if(argc>1) { @@ -43,16 +46,20 @@ int main(int argc, char* argv[]) //model name, number of saved draws and number of trees std::string modelname; std::string modelname2; + std::string modelname3; std::string xicore; //model name and xi conf >> modelname; conf >> modelname2; + conf >> modelname3; conf >> xicore; //location of the second fitted model conf >> folder2; folder2=folder2+"/"; + conf >> folder3; + folder3=folder3+"/"; //number of saved draws and number of trees size_t nd; @@ -60,12 +67,16 @@ int main(int argc, char* argv[]) size_t mh1; size_t m2; size_t mh2; + size_t m3; + size_t mh3; conf >> nd; conf >> m1; conf >> mh1; conf >> m2; conf >> mh2; + conf >> m3; + conf >> mh3; //number of predictors size_t p; @@ -80,15 +91,21 @@ int main(int argc, char* argv[]) conf >> maxx[i]; //global means of each response - double fmean1, fmean2; + double fmean1, fmean2, fmean3; conf >> fmean1; conf >> fmean2; + conf >> fmean3; //thread count int tc; conf >> tc; conf.close(); + //simple flag to tell us if we are doing 2-output or 3-output Pareto front + bool threeresponse=false; + if(m3>0) threeresponse=true; + size_t d=2; // number of responses + if (threeresponse) d=3; //MPI initialization int mpirank=0; @@ -166,13 +183,16 @@ int main(int argc, char* argv[]) ambm1.setxi(&xi); //set the cutpoints for this model object ambrt ambm2(m2); ambm2.setxi(&xi); //set the cutpoints for this model object + ambrt ambm3(m3); + if(threeresponse) ambm3.setxi(&xi); //setup psbrt objects psbrt psbm1(mh1); psbm1.setxi(&xi); //set the cutpoints for this model object psbrt psbm2(mh1); psbm2.setxi(&xi); //set the cutpoints for this model object - + psbrt psbm3(mh3); + if(threeresponse) psbm3.setxi(&xi); //load first fitted model from file @@ -322,6 +342,81 @@ int main(int argc, char* argv[]) + //load third fitted model from file + if(threeresponse) { +#ifndef SILENT + if(mpirank==0) cout << "Loading saved posterior tree draws" << endl; +#endif + imf.open(folder3 + modelname3 + ".fit"); + imf >> ind; + imf >> im; + imf >> imh; +#ifdef _OPENMPI + if(nd!=ind) { cout << "Error loading posterior trees" << endl; MPI_Finalize(); return 0; } + if(m3!=im) { cout << "Error loading posterior trees" << endl; MPI_Finalize(); return 0; } + if(mh3!=imh) { cout << "Error loading posterior trees" << endl; MPI_Finalize(); return 0; } +#else + if(nd!=ind) { cout << "Error loading posterior trees" << endl; return 0; } + if(m3!=im) { cout << "Error loading posterior trees" << endl; return 0; } + if(mh3!=imh) { cout << "Error loading posterior trees" << endl; return 0; } +#endif + } + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_ots(temp); + if(threeresponse) for(size_t i=0;i> e3_ots.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_oid(temp); + if(threeresponse) for(size_t i=0;i> e3_oid.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_ovar(temp); + if(threeresponse) for(size_t i=0;i> e3_ovar.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_oc(temp); + if(threeresponse) for(size_t i=0;i> e3_oc.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_otheta(temp); + if(threeresponse) for(size_t i=0;i> std::scientific >> e3_otheta.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_sts(temp); + if(threeresponse) for(size_t i=0;i> e3_sts.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_sid(temp); + if(threeresponse) for(size_t i=0;i> e3_sid.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_svar(temp); + if(threeresponse) for(size_t i=0;i> e3_svar.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_sc(temp); + if(threeresponse) for(size_t i=0;i> e3_sc.at(i); + + temp=0; + if(threeresponse) imf >> temp; + std::vector e3_stheta(temp); + if(threeresponse) for(size_t i=0;i> std::scientific >> e3_stheta.at(i); + + if(threeresponse) imf.close(); + + + + // Calculate range of posterior samples to do Pareto front/set on for MPI. int startnd=0,endnd=nd-1; size_t snd,end,rnd=nd; @@ -338,7 +433,8 @@ int main(int argc, char* argv[]) //objects where we'll store the realizations std::vector > asol; std::vector > bsol; - std::vector thetasol; + // std::vector thetasol; + std::list > thetasol; // Temporary vectors used for loading one model realization at a time. std::vector onn1(m1,1); @@ -363,13 +459,26 @@ int main(int argc, char* argv[]) std::vector > sc2(mh2, std::vector(1)); std::vector > stheta2(mh2, std::vector(1)); + std::vector onn3(m3,1); + std::vector > oid3(m3, std::vector(1)); + std::vector > ov3(m3, std::vector(1)); + std::vector > oc3(m3, std::vector(1)); + std::vector > otheta3(m3, std::vector(1)); + std::vector snn3(mh3,1); + std::vector > sid3(mh3, std::vector(1)); + std::vector > sv3(mh3, std::vector(1)); + std::vector > sc3(mh3, std::vector(1)); + std::vector > stheta3(mh3, std::vector(1)); + // Draw realizations of the posterior predictive. size_t curdx1=0; size_t cumdx1=0; size_t curdx2=0; size_t cumdx2=0; - std::vector > a1,a2,b1,b2; - std::vector theta1,theta2; + size_t curdx3=0; + size_t cumdx3=0; + std::vector > a1,a2,a3,b1,b2,b3; + std::vector theta1,theta2,theta3; #ifdef _OPENMPI double tstart=0.0,tend=0.0; if(mpirank==0) tstart=MPI_Wtime(); @@ -384,6 +493,19 @@ int main(int argc, char* argv[]) std::vector > > bset(nd, std::vector >(0, std::vector(0))); std::vector > > front(nd, std::vector >(0, std::vector(0))); + // 0. Create hashmap to get xi indices more easily. + // Should also work even if xi=xicuts. + std::vector > ximaps; + ximaps.reserve(p); + for (size_t w=0;w ximap; + for (size_t c=0;c > asol,bsol; - std::vector aout(p),bout(p); - std::list > thetasol; + // std::vector aout(p),bout(p); if(i>=snd && i0.0) { - asol.push_back(aout); - bsol.push_back(bout); - //The asol.size() is to record the index in the unsorted list so we can back - //out the correct VHR for the theta's on the PF later on(line 497-498). - //It would be cleaner if we had a list of vector, size_t tuple but this works as well. - std::vector th{theta1[j]+fmean1,theta2[k]+fmean2,(double)(asol.size())}; - thetasol.push_back(th); - } - } - - // clear the hyperrectangles for this realization - a1.clear(); - b1.clear(); - theta1.clear(); - a2.clear(); - b2.clear(); - theta2.clear(); + std::list > ltheta1; + for (double t1: theta1) ltheta1.push_back({t1+fmean1}); + combine_ensembles(p, xi, ximaps, minx, maxx, a1, b1, ltheta1, a2, b2, theta2, fmean2, asol, bsol, thetasol, true); + + if(threeresponse) { + // Possible to avoid copying these three containers? + a1 = asol; + b1 = bsol; + ltheta1 = thetasol; + asol.clear(); + bsol.clear(); + thetasol.clear(); + + ambm3.ens2rects(a3,b3,theta3,minx,maxx,p); + combine_ensembles(p, xi, ximaps, minx, maxx, a1, b1, ltheta1, a3, b3, theta3, fmean3, asol, bsol, thetasol, true); + } // then we can get the front,set, and then clear these vectors thetasol.sort(); @@ -475,12 +609,11 @@ int main(int argc, char* argv[]) for(size_t j=0;j #include #include +#include #include "crn.h" #include "tree.h" @@ -35,6 +36,13 @@ int main(int argc, char* argv[]) folder=folder+"/"; } + //----------------------------------------------------------- + //random number generation + crn gen; + gen.set_seed(static_cast(std::chrono::high_resolution_clock::now() + .time_since_epoch() + .count())); + //-------------------------------------------------- //process args std::ifstream conf(folder+"config.sobol"); @@ -241,6 +249,7 @@ int main(int argc, char* argv[]) //objects where we'll store the realizations std::vector > Sidraws(rnd,std::vector(p)); + std::vector > Shidraws(rnd,std::vector(p)); std::vector > Sijdraws(rnd,std::vector(p*(p-1)/2)); std::vector > TSidraws(rnd,std::vector(p)); std::vector V(rnd); @@ -291,7 +300,7 @@ int main(int argc, char* argv[]) // Calculate Sobol Indices if(i>=snd && i Date: Tue, 30 Jun 2026 17:29:20 -0400 Subject: [PATCH 8/9] Fixed dropped collapsetree_vec method from merge. --- src/brtfuns.cpp | 51 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/src/brtfuns.cpp b/src/brtfuns.cpp index 07d7147..1f76292 100644 --- a/src/brtfuns.cpp +++ b/src/brtfuns.cpp @@ -529,6 +529,57 @@ void collapsetree(tree& st, tree::tree_p t, tree::tree_p tprime) } +//-------------------------------------------------- +//collapse tree for vector parameter theta +void collapsetree_vec(tree& st, tree::tree_p t, tree::tree_p tprime) +{ + tree::npv tlefts, trights, tbots; + tree::tree_cp tempt; + + vxd thetavec=t->getthetavec(); + + //simple case, tprime is terminal, t is (always) terminal + if(!tprime->l) { + t->setthetavec(tprime->getthetavec()+thetavec); + } + else if(!t->p) //simple case 2: t is a terminal root node + { + st.tonull(); + st=(*tprime); //copy + st.getbots(tbots);// all terminal nodes below t. + for(size_t j=0;jsetthetavec(tbots[j]->getthetavec()+thetavec); + } + else { //general case, t is (always) terminal, tprime is not. + tempt=tprime; + tree::tree_p tpar=t->p; + if(t->isleft()) { + t->p=0; + delete t; + tpar->l=new tree(*tempt); + tpar->l->p=tpar; + tpar->l->getpathtorootlr(tlefts,trights); + //collapse redundancies in tprime + splitall(tpar->l,tlefts,trights); + tpar->l->getbots(tbots);// all terminal nodes below t. + } + else { //isright + t->p=0; + delete t; + tpar->r=new tree(*tempt); + tpar->r->p=tpar; + tpar->r->getpathtorootlr(tlefts,trights); + //collapse redundancies in tprime + splitall(tpar->r,tlefts,trights); + tpar->r->getbots(tbots);// all terminal nodes below t. + } + + for(size_t j=0;jsetthetavec(tbots[j]->getthetavec()+thetavec); + } +} + + //-------------------------------------------------- //split tree along a sequence of variable, cutpoint pairs //retains only the part of the tree that remains. From 390e40019abf2058c81a54d645047e8bc2efb316 Mon Sep 17 00:00:00 2001 From: Matt Pratola Date: Tue, 30 Jun 2026 18:59:24 -0400 Subject: [PATCH 9/9] Added Ropenbt/ directory to support R package that can be installed directly from the github repo. Includes the openbt.R wrapper which is a merged of version from Pratola's repo and Yannotty's version which has some model mixing interface code. --- Ropenbt/DESCRIPTION | 15 + Ropenbt/NAMESPACE | 29 + Ropenbt/R/openbt.R | 3179 +++++++++++++++++++++++++++++++ Ropenbt/R/pareto_helpers.R | 117 ++ Ropenbt/man/calcMBD.Rd | 32 + Ropenbt/man/calcRS.Rd | 32 + Ropenbt/man/load.openbt.Rd | 37 + Ropenbt/man/makecuts.Rd | 34 + Ropenbt/man/mopareto.openbt.Rd | 68 + Ropenbt/man/openbt.Rd | 105 + Ropenbt/man/predict.openbt.Rd | 79 + Ropenbt/man/save.openbt.Rd | 36 + Ropenbt/man/setvarcuts.Rd | 38 + Ropenbt/man/sobol.openbt.Rd | 75 + Ropenbt/man/vartivity.openbt.Rd | 65 + 15 files changed, 3941 insertions(+) create mode 100644 Ropenbt/DESCRIPTION create mode 100644 Ropenbt/NAMESPACE create mode 100644 Ropenbt/R/openbt.R create mode 100644 Ropenbt/R/pareto_helpers.R create mode 100644 Ropenbt/man/calcMBD.Rd create mode 100644 Ropenbt/man/calcRS.Rd create mode 100644 Ropenbt/man/load.openbt.Rd create mode 100644 Ropenbt/man/makecuts.Rd create mode 100644 Ropenbt/man/mopareto.openbt.Rd create mode 100644 Ropenbt/man/openbt.Rd create mode 100644 Ropenbt/man/predict.openbt.Rd create mode 100644 Ropenbt/man/save.openbt.Rd create mode 100644 Ropenbt/man/setvarcuts.Rd create mode 100644 Ropenbt/man/sobol.openbt.Rd create mode 100644 Ropenbt/man/vartivity.openbt.Rd diff --git a/Ropenbt/DESCRIPTION b/Ropenbt/DESCRIPTION new file mode 100644 index 0000000..af7dad7 --- /dev/null +++ b/Ropenbt/DESCRIPTION @@ -0,0 +1,15 @@ +Package: Ropenbt +Type: Package +Title: The Open Bayesian Trees Project +Version: 0.0.0 +Date: 2026-06-23 +Authors@R: c(person('Matthew', 'Pratola', role=c('aut','cre','cph'),email='mpratola@iu.edu')) +URL: http://www.github.com/bandframework/OpenBT +Description: + This R package provides and R interface to the OpenBT library of Bayesian Tree models (http://www.github.com/bandframework/OpenBT). For more information and examples, refer to the README page at http://www.github.com/bandframework/OpenBT/README.md. +License: GPL (>= 2) +Depends: R (>= 2.10) +Imports: data.table, zip, emoa, Hmisc +Suggests: +SystemRequirements: +NeedsCompilation: no diff --git a/Ropenbt/NAMESPACE b/Ropenbt/NAMESPACE new file mode 100644 index 0000000..3a870f8 --- /dev/null +++ b/Ropenbt/NAMESPACE @@ -0,0 +1,29 @@ +export(openbt) +S3method(predict, openbt) +export(vartivity.openbt) +export(sobol.openbt) +export(mopareto.openbt) +export(openbt.save) +export(openbt.load) +export(makecuts.openbt) +export(setvarcuts.openbt) +export(calcMBD) +export(calcRS) +importFrom(stats,cor) +importFrom("graphics", "boxplot", "par") +importFrom("graphics", "abline", "lines", "plot", "points") +importFrom("stats", "quantile", "sd","qqplot") +importFrom("stats", "rnorm", "runif") +importFrom("stats","dist") +importFrom("utils","read.table") +importFrom("stats","qnorm") +importFrom("zip","unzip") +importFrom("zip","zipr") +importFrom("graphics","title") +importFrom("graphics","rect") +importFrom("grDevices","gray") +importFrom("utils","combn") +importFrom("Hmisc","wtd.mean") +importFrom("Hmisc","wtd.var") +importFrom("Hmisc","wtd.quantile") + diff --git a/Ropenbt/R/openbt.R b/Ropenbt/R/openbt.R new file mode 100644 index 0000000..cfe2cb1 --- /dev/null +++ b/Ropenbt/R/openbt.R @@ -0,0 +1,3179 @@ +## openbt.R: R script wrapper functions for OpenBT. + + +# Load/Install required packages +required <- c("zip","data.table","Hmisc") +tbi <- required[!(required %in% installed.packages()[,"Package"])] +if(length(tbi)) { + cat("***Installing OpenBT package dependencies***\n") + install.packages(tbi,repos="https://cloud.r-project.org",quiet=TRUE) +} +library(zip,quietly=TRUE,warn.conflicts=FALSE) +library(data.table,quietly=TRUE,warn.conflicts=FALSE) +library(Hmisc,quietly=TRUE,warn.conflicts=FALSE) #for weighted mean/var/quantile used in repredict.*() +# library(foreach,quietly=TRUE,warn.conflicts=FALSE) #only used in repredict.openbt() method to speedup R-based reweighting calculations +# library(doMC,quietly=TRUE,warn.conflicts=FALSE) #only used in repredict.openbt() method to speedup R-based reweighting calculations + + +openbt = function( +x.train, +y.train, +f.train = matrix(1,nrow = length(y.train), ncol = 2), +ntree=NULL, +ntreeh=NULL, +ndpost=1000, nskip=100, +k=NULL, +power=2.0, base=.95, +tc=2, +sigmav=rep(1,length(y.train)), +f.sd.train = NULL, # rename +wts.prior.info = NULL, # keep/rename +fmean=mean(y.train), +overallsd = NULL, +overallnu= NULL, +chv = cor(x.train,method="spearman"), +pbd=.7, +pb=.5, +stepwpert=.1, +probchv=.1, +minnumbot=5, +printevery=100, +numcut=100, +xicuts=NULL, +nadapt=1000, +adaptevery=100, +summarystats=FALSE, +truncateds=NULL, +model=NULL, +modelname="model" +) +{ +#-------------------------------------------------- +# model type definitions +modeltype=0 # undefined +MODEL_BT=1 +MODEL_BINOMIAL=2 +MODEL_POISSON=3 +MODEL_BART=4 +MODEL_HBART=5 +MODEL_PROBIT=6 +MODEL_MODIFIEDPROBIT=7 +MODEL_MERCK_TRUNCATED=8 +MODEL_MIXBART=9 +if(is.null(model)) +{ + cat("Model type not specified.\n") + cat("Available options are:\n") + cat("model='bt'\n") + cat("model='binomial'\n") + cat("model='poisson'\n") + cat("model='bart'\n") + cat("model='hbart'\n") + cat("model='probit'\n") + cat("model='modifiedprobit'\n") + cat("model='merck_truncated'\n") + cat("model='mixbart'\n") + + stop("missing model type.\n") +} +if(model=="bart") +{ + modeltype=MODEL_BART + if(is.null(ntree)) ntree=200 + if(is.null(ntreeh)) ntreeh=1 + if(is.null(k)) k=2 + if(is.null(overallsd)) overallsd=sd(y.train) + if(is.null(overallnu)) overallnu=10 + pbd=c(pbd,0.0) +} +if(model=="hbart") +{ + modeltype=MODEL_HBART + if(is.null(ntree)) ntree=200 + if(is.null(ntreeh)) ntreeh=40 + if(is.null(k)) k=5 + if(is.null(overallsd)) overallsd=sd(y.train) + if(is.null(overallnu)) overallnu=10 +} + +if(model=="probit") +{ + modeltype=MODEL_PROBIT + if(is.null(ntree)) ntree=200 + if(is.null(ntreeh)) ntreeh=1 + if(is.null(k)) k=1 + if(is.null(overallsd)) overallsd=1 + if(is.null(overallnu)) overallnu=-1 + if(length(pbd)==1) pbd=c(pbd,0.0) +} +if(model=="modified-probit") +{ + modeltype=MODEL_MODIFIEDPROBIT + if(is.null(ntree)) ntree=200 + if(is.null(ntreeh)) ntreeh=40 + if(is.null(k)) k=1 + if(is.null(overallsd)) overallsd=1 + if(is.null(overallnu)) overallnu=-1 +} +if(model=="merck_truncated") +{ + modeltype=MODEL_MERCK_TRUNCATED + if(is.null(ntree)) ntree=200 + if(is.null(ntreeh)) ntreeh=1 + if(is.null(k)) k=2 + if(is.null(overallsd)) overallsd=sd(y.train) + if(is.null(overallnu)) overallnu=10 + if(is.null(truncateds)) { + miny=min(y.train)[1] + truncateds=(y.train==miny) + } +} +if(model=="mixbart") +{ + modeltype=MODEL_MIXBART + if(is.null(ntree)) ntree=20 + if(is.null(ntreeh)) ntreeh=1 + if(is.null(k)) k=1 + if(is.null(overallsd)) overallsd=sd(y.train) + if(is.null(overallnu)) overallnu=10 + pbd=c(pbd,0.0) +} +#-------------------------------------------------- +nd = ndpost +burn = nskip +m = ntree +mh = ntreeh +#-------------------------------------------------- +#data +if(!is.matrix(x.train)){x.train = matrix(x.train, ncol = 1)} +n = length(y.train) +p = ncol(x.train) +#np = nrow(x.test) +x = t(x.train) +#xp = t(x.test) +if(modeltype==MODEL_BART || modeltype==MODEL_HBART || modeltype==MODEL_MERCK_TRUNCATED) +{ + y.train=y.train-fmean + fmean.out=paste(0.0) +} +if(modeltype==MODEL_PROBIT || modeltype==MODEL_MODIFIEDPROBIT) +{ + fmean.out=paste(qnorm(fmean)) + uniqy=sort(unique(y.train)) + if(length(uniqy)>2) stop("Invalid y.train: Probit requires dichotomous response coded 0/1") + if(uniqy[1]!=0 || uniqy[2]!=1) stop("Invalid y.train: Probit requires dichotomous response coded 0/1") +} + +#Set mix discrepancy to FALSE if we use a different model +nsprior = FALSE +wtsprior = FALSE +if(modeltype==MODEL_MIXBART) +{ + #Center the response + fmean.out=paste(0.0) + + #Check to see if any discrepancy data has been passed into the function -- if so, we will use the discrepancy model mixing + if(!is.null(f.sd.train)){ + nsprior = TRUE + } + if(!is.null(wts.prior.info)){ + wtsprior = TRUE + if(ncol(wts.prior.info) != 2 | nrow(wts.prior.info)!=ncol(f.train)) stop("Invalid wts.prior.info: Required dimensions of num.models x 2") + } +} +#-------------------------------------------------- +#cutpoints +if(!is.null(xicuts)) # use xicuts +{ + xi=xicuts +} +else # default to equal numcut per dimension +{ + xi=vector("list",p) + minx=floor(apply(x,1,min)) + maxx=ceiling(apply(x,1,max)) + for(i in 1:p) + { + xinc=(maxx[i]-minx[i])/(numcut+1) + xi[[i]]=(1:numcut)*xinc+minx[i] + } +} + +if(is.null(xicuts) & modeltype==MODEL_MIXBART){ + xi=vector("list",p) + minx_temp=apply(x,1,min) + maxx_temp=apply(x,1,max) + + maxx = round(maxx_temp,1) + ifelse((round(maxx_temp,1)-maxx_temp)>0,0,0.1) + minx = round(minx_temp,1) - ifelse((minx_temp - round(minx_temp,1))>0,0,0.1) + for(i in 1:p) + { + xinc=(maxx[i]-minx[i])/(numcut+1) + xi[[i]]=(1:numcut)*xinc+minx[i] + } +} +#-------------------------------------------------- +if(modeltype==MODEL_BART || modeltype==MODEL_HBART || modeltype==MODEL_MERCK_TRUNCATED || modeltype==MODEL_MIXBART) +{ + rgy = range(y.train) +} +if(modeltype==MODEL_PROBIT || modeltype==MODEL_MODIFIEDPROBIT) +{ + rgy = c(-2,2) +} + +if(modeltype==MODEL_MIXBART){ + tau = (1)/(2*sqrt(m)*k) + beta0 = 1/(2*m) + overallsd = sqrt((overallnu+2)*overallsd^2/overallnu) +}else{ + tau = (rgy[2]-rgy[1])/(2*sqrt(m)*k) + beta0=0 +} + +if(modeltype==MODEL_MIXBART & nsprior){ + #tau = 1/(sqrt(m)*k) + #tau = 1/(2*sqrt(m)*k) + tau = 1/(2*(m)*k) + beta0 = 1/m +} + +#-------------------------------------------------- +overalllambda = overallsd^2 +#-------------------------------------------------- +powerh=power +baseh=base +if(length(power)>1) { + powerh=power[2] + power=power[1] +} +if(length(base)>1) { + baseh=base[2] + base=base[1] +} +#-------------------------------------------------- +pbdh=pbd +pbh=pb +if(length(pbd)>1) { + pbdh=pbdh[2] + pbd=pbd[1] +} +if(length(pb)>1) { + pbh=pb[2] + pb=pb[1] +} +#-------------------------------------------------- +if(modeltype==MODEL_BART) +{ + cat("Model: Bayesian Additive Regression Trees model (BART)\n") +} +#-------------------------------------------------- +if(modeltype==MODEL_HBART) +{ + cat("Model: Heteroscedastic Bayesian Additive Regression Trees model (HBART)\n") +} +#-------------------------------------------------- +if(modeltype==MODEL_PROBIT) +{ + cat("Model: Dichotomous outcome model: Albert & Chib Probit (fixed)\n") +# overallnu=-1 + if(ntreeh>1) + stop("method probit requires ntreeh=1") + if(pbdh>0.0) + stop("method probit requires pbd[2]=0.0") +} +#-------------------------------------------------- +if(modeltype==MODEL_MODIFIEDPROBIT) +{ + cat("Model: Dichotomous outcome model: Modified Albert & Chib Probit\n") +} +#-------------------------------------------------- +if(modeltype==MODEL_MERCK_TRUNCATED) +{ + cat("Model: Truncated BART model\n") +} +#-------------------------------------------------- +if(modeltype==MODEL_MIXBART) +{ + cat("Model: Model Mixing with Bayesian Additive Regression Trees\n") +} +#-------------------------------------------------- +stepwperth=stepwpert +if(length(stepwpert)>1) { + stepwperth=stepwpert[2] + stepwpert=stepwpert[1] +} +#-------------------------------------------------- +probchvh=probchv +if(length(probchv)>1) { + probchvh=probchv[2] + probchv=probchv[1] +} +#-------------------------------------------------- +minnumboth=minnumbot +if(length(minnumbot)>1) { + minnumboth=minnumbot[2] + minnumbot=minnumbot[1] +} + +#-------------------------------------------------- +#write out config file +xroot="x" +yroot="y" +sroot="s" +chgvroot="chgv" +froot="f" +fsdroot="fsd" +wproot="wpr" +xiroot="xi" +folder=tempdir(check=TRUE) +if(!dir.exists(folder)) dir.create(folder) +tmpsubfolder=tempfile(tmpdir="") +tmpsubfolder=substr(tmpsubfolder,5,nchar(tmpsubfolder)) +tmpsubfolder=paste("openbt",tmpsubfolder,sep="") +folder=paste(folder,"/",tmpsubfolder,sep="") +if(!dir.exists(folder)) dir.create(folder) +fout=file(paste(folder,"/config",sep=""),"w") +writeLines(c(paste(modeltype),xroot,yroot,fmean.out,paste(m),paste(mh),paste(nd),paste(burn), + paste(nadapt),paste(adaptevery),paste(tau),paste(beta0),paste(overalllambda), + paste(overallnu),paste(base),paste(power),paste(baseh),paste(powerh), + paste(tc),paste(sroot),paste(chgvroot),paste(froot),paste(fsdroot),paste(nsprior),paste(wproot),paste(wtsprior), + paste(pbd),paste(pb),paste(pbdh),paste(pbh),paste(stepwpert),paste(stepwperth), + paste(probchv),paste(probchvh),paste(minnumbot),paste(minnumboth), + paste(printevery),paste(xiroot),paste(modelname),paste(summarystats)),fout) +close(fout) +# folder=paste(".",modelname,"/",sep="") +# system(paste("rm -rf ",folder,sep="")) +# system(paste("mkdir ",folder,sep="")) +# system(paste("cp config ",folder,"config",sep="")) + + +#-------------------------------------------------- +#write out data subsets +nslv=tc-1 +ylist=split(y.train,(seq(n)-1) %/% (n/nslv)) +for(i in 1:nslv) write(ylist[[i]],file=paste(folder,"/",yroot,i,sep="")) +xlist=split(as.data.frame(x.train),(seq(n)-1) %/% (n/nslv)) +for(i in 1:nslv) write(t(xlist[[i]]),file=paste(folder,"/",xroot,i,sep="")) +slist=split(sigmav,(seq(n)-1) %/% (n/nslv)) +for(i in 1:nslv) write(slist[[i]],file=paste(folder,"/",sroot,i,sep="")) +chv[is.na(chv)]=0 # if a var as 0 levels it will have a cor of NA so we'll just set those to 0. +write(chv,file=paste(folder,"/",chgvroot,sep="")) +for(i in 1:p) write(xi[[i]],file=paste(folder,"/",xiroot,i,sep="")) +rm(chv) + +#Write the function output if using model mixing +if(modeltype == MODEL_MIXBART){ + flist=split(as.data.frame(f.train),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(flist[[i]]),file=paste(folder,"/",froot,i,sep="")) + + if(nsprior){ + # -- delete these two lines + #fdmlist=split(as.data.frame(f.discrep.mean),(seq(n)-1) %/% (n/nslv)) + #for(i in 1:nslv) write(t(fdmlist[[i]]),file=paste(folder,"/",fdmroot,i,sep="")) + + fdslist=split(as.data.frame(f.sd.train),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(fdslist[[i]]),file=paste(folder,"/",fsdroot,i,sep="")) + } + + if(wtsprior){ + write(wts.prior.info,file=paste(folder,"/",wproot,sep="")) + } +} + +if(modeltype==MODEL_MERCK_TRUNCATED) +{ + tlist=split(truncateds,(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) { + truncs=which(tlist[[i]]==TRUE)-1 #-1 for correct indexing in c/c++ + ftrun=file(paste(folder,"/","truncs",i,sep=""),"w") + write(truncs,ftrun) + close(ftrun) + } +} +#-------------------------------------------------- +#run program +cmdopt=100 #default to serial/OpenMP +runlocal=FALSE +cmd="openbtcli --conf" +if(Sys.which("openbtcli")[[1]]=="") # not installed in a global location, so assume current directory + runlocal=TRUE + +if(runlocal && Sys.which("./openbtcli")[[1]]=="") # not installed globally, not installed locally. + stop("OpenBT library installation is missing. \n") + +if(runlocal) cmd="./openbtcli --conf" + +cmdopt=system(cmd) + +if(cmdopt==101) # MPI +{ + cmd=paste("mpirun -np ",tc," openbtcli ",folder,sep="") +} + +if(cmdopt==100) # serial/OpenMP +{ + if(runlocal) + cmd=paste("./openbtcli ",folder,sep="") + else + cmd=paste("openbtcli ",folder,sep="") +} +# +#cat(cmd) +system(cmd) +#system(paste("rm -f ",folder,"/config",sep="")) +#system(paste("mv ",folder,"fit ",folder,modelname,".fit",sep="")) + +res=list() + +# Read in the influence metrics. +res$influence=NULL +for(i in 1:nslv) + res$influence=rbind(res$influence,as.matrix(read.table(paste(folder,"/",modelname,".influence",i,sep="")))) + +colnames(res$influence)=c("MeanCookD", "MaxCookD", "KLDiv", "CPOinv") +res$modeltype=modeltype +res$model=model +res$xroot=xroot; res$yroot=yroot;res$m=m; res$mh=mh; res$nd=nd; res$burn=burn +res$nadapt=nadapt; res$adaptevery=adaptevery; res$tau=tau;res$beta0=beta0;res$overalllambda=overalllambda +res$overallnu=overallnu; res$k=k; res$base=base; res$power=power; res$baseh=baseh; res$powerh=powerh +res$tc=tc; res$sroot=sroot; res$chgvroot=chgvroot;res$froot=froot;res$fsdroot=fsdroot; +res$nsprior = nsprior; res$pbd=pbd; res$pb=pb +res$pbdh=pbdh; res$pbh=pbh; res$stepwpert=stepwpert; res$stepwperth=stepwperth +res$probchv=probchv; res$probchvh=probchvh; res$minnumbot=minnumbot; res$minnumboth=minnumboth +res$printevery=printevery; res$xiroot=xiroot; res$minx=minx; res$maxx=maxx; +res$summarystats=summarystats; res$modelname=modelname +class(xi)="OpenBT_cutinfo" +res$xicuts=xi +res$fmean=fmean +res$folder=folder +class(res)="OpenBT_posterior" + +return(res) +} + + + + +#-------------------------------------------------- +#Get Predictions +#-------------------------------------------------- + + +predict.openbt = function( +object=NULL, +x.test=NULL, +f.test=matrix(1,nrow = 1, ncol = 2), +# f.discrep.mean = NULL, # delete +# f.sd.test = NULL, # delete +tc=2, +fmean=fit$fmean, +q.lower=0.025, +q.upper=0.975, ... +) +{ + +fit=object + +# model type definitions +MODEL_BT=1 +MODEL_BINOMIAL=2 +MODEL_POISSON=3 +MODEL_BART=4 +MODEL_HBART=5 +MODEL_PROBIT=6 +MODEL_MODIFIEDPROBIT=7 +MODEL_MERCK_TRUNCATED=8 +MODEL_MIXBART=9 + +#-------------------------------------------------- +# params +if(is.null(fit)) stop("No fitted model specified!\n") +if(is.null(x.test)) stop("No prediction points specified!\n") + +nslv=tc +x.test=as.matrix(x.test) +p=ncol(x.test) +n=nrow(x.test) +k=2 #Default number of models for model mixing +xproot="xp" +fproot="fp" +#fpdmroot="fpdm" +#fpdsroot="fpds" + +if(fit$modeltype==MODEL_MIXBART){ + if(is.null(f.test)){stop("No function output specified for model mixing!\n")} + k=ncol(f.test) #Number of models + #if(fit$nsprior){stop("No function discrepancy mean and/or standard deviation was provided, but model was trained with functional discrepancy.") } +} + +if(fit$modeltype==MODEL_BART || fit$modeltype==MODEL_HBART || fit$modeltype==MODEL_MERCK_TRUNCATED) +{ + fmean.out=paste(fmean) +} +if(fit$modeltype==MODEL_PROBIT || fit$modeltype==MODEL_MODIFIEDPROBIT) +{ + fmean.out=paste(qnorm(fmean)) +} +if(fit$modeltype==MODEL_MIXBART){ + fmean.out=paste(0.0) +} + +#-------------------------------------------------- +#write out config file +fout=file(paste(fit$folder,"/config.pred",sep=""),"w") +writeLines(c(fit$modelname,fit$modeltype,fit$xiroot,xproot,fproot, + paste(fit$nd),paste(fit$m), + paste(fit$mh),paste(p),paste(k),paste(tc), + fmean.out), fout) +close(fout) + +#-------------------------------------------------- +#write out data subsets +#folder=paste(".",fit$modelname,"/",sep="") +xlist=split(as.data.frame(x.test),(seq(n)-1) %/% (n/nslv)) +for(i in 1:nslv) write(t(xlist[[i]]),file=paste(fit$folder,"/",xproot,i-1,sep="")) +for(i in 1:p) write(fit$xicuts[[i]],file=paste(fit$folder,"/",fit$xiroot,i,sep="")) + +if(fit$modeltype==MODEL_MIXBART){ + #for(i in 1:k) write(f.test[,i],file=paste(fit$folder,"/",fproot,i,sep="")) + flist=split(as.data.frame(f.test),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(flist[[i]]),file=paste(fit$folder,"/",fproot,i-1,sep="")) + + if(fit$nsprior){ + #fdmlist=split(as.data.frame(f.discrep.mean),(seq(n)-1) %/% (n/nslv)) + #for(i in 1:nslv) write(t(fdmlist[[i]]),file=paste(fit$folder,"/",fpdmroot,i-1,sep="")) + + #fdslist=split(as.data.frame(f.sd.train),(seq(n)-1) %/% (n/nslv)) + #for(i in 1:nslv) write(t(fdslist[[i]]),file=paste(fit$folder,"/",fpdsroot,i-1,sep="")) + } +} + +#-------------------------------------------------- +#run prediction program +cmdopt=100 #default to serial/OpenMP +runlocal=FALSE +cmd="openbtcli --conf" +if(Sys.which("openbtcli")[[1]]=="") # not installed in a global location, so assume current directory + runlocal=TRUE + +if(runlocal) cmd="./openbtcli --conf" + +cmdopt=system(cmd) + +if(cmdopt==101) # MPI +{ + cmd=paste("mpirun -np ",tc," openbtpred ",fit$folder,sep="") +} + +if(cmdopt==100) # serial/OpenMP +{ + if(runlocal) + cmd=paste("./openbtpred ",fit$folder,sep="") + else + cmd=paste("openbtpred ",fit$folder,sep="") +} + +#cmd=paste("mpirun -np ",tc," openbtpred",sep="") +#cat(cmd) +system(cmd) +system(paste("rm -f ",fit$folder,"/config.pred",sep="")) + + +#-------------------------------------------------- +#format and return +res=list() + +# Old, original code for reading in the posterior predictive draws. +# res$mdraws=read.table(paste(fit$folder,"/",fit$modelname,".mdraws",0,sep="")) +# res$sdraws=read.table(paste(fit$folder,"/",fit$modelname,".sdraws",0,sep="")) +# for(i in 2:nslv) +# { +# res$mdraws=cbind(res$mdraws,read.table(paste(fit$folder,"/",fit$modelname,".mdraws",i-1,sep=""))) +# res$sdraws=cbind(res$sdraws,read.table(paste(fit$folder,"/",fit$modelname,".sdraws",i-1,sep=""))) +# } +# res$mdraws=as.matrix(res$mdraws) +# res$sdraws=as.matrix(res$sdraws) + +# Faster using data.table's fread than the built-in read.table. +# However, it does strangely introduce some small rounding error on the order of 8.9e-16. +fnames=list.files(fit$folder,pattern=paste(fit$modelname,".mdraws*",sep=""),full.names=TRUE) +res$mdraws=as.matrix(do.call(cbind,sapply(fnames,data.table::fread))) +fnames=list.files(fit$folder,pattern=paste(fit$modelname,".sdraws*",sep=""),full.names=TRUE) +res$sdraws=as.matrix(do.call(cbind,sapply(fnames,data.table::fread))) + +system(paste("rm -f ",fit$folder,"/",fit$modelname,".mdraws*",sep="")) +system(paste("rm -f ",fit$folder,"/",fit$modelname,".sdraws*",sep="")) + +#Get Results for models that are not model mixing +if(fit$modeltype!=MODEL_MIXBART){ + res$mmean=apply(res$mdraws,2,mean) + res$smean=apply(res$sdraws,2,mean) + res$msd=apply(res$mdraws,2,sd) + res$ssd=apply(res$sdraws,2,sd) + res$m.5=apply(res$mdraws,2,quantile,0.5) + res$m.lower=apply(res$mdraws,2,quantile,q.lower) + res$m.upper=apply(res$mdraws,2,quantile,q.upper) + res$s.5=apply(res$sdraws,2,quantile,0.5) + res$s.lower=apply(res$sdraws,2,quantile,q.lower) + res$s.upper=apply(res$sdraws,2,quantile,q.upper) + res$pdraws=NULL + res$pmean=NULL + res$psd=NULL + res$p.5=NULL + res$p.lower=NULL + res$p.upper=NULL +} + +#Get probabilities for the probit models +if(fit$modeltype==MODEL_PROBIT || fit$modeltype==MODEL_MODIFIEDPROBIT) +{ + res$pdraws=read.table(paste(fit$folder,"/",fit$modelname,".pdraws",0,sep="")) + for(i in 2:nslv) + { + res$pdraws=cbind(res$pdraws,read.table(paste(fit$folder,"/",fit$modelname,".pdraws",i-1,sep=""))) + } + + system(paste("rm -f ",fit$folder,"/",fit$modelname,".pdraws*",sep="")) + + res$pdraws=as.matrix(res$pdraws) + res$pmean=apply(res$pdraws,2,mean) + res$psd=apply(res$pdraws,2,sd) + res$p.5=apply(res$pdraws,2,quantile,0.5) + res$p.lower=apply(res$pdraws,2,quantile,q.lower) + res$p.upper=apply(res$pdraws,2,quantile,q.upper) +} + +#Get model mixing results -- using only a constant variance -- change later to match with HBART +if(fit$modeltype==MODEL_MIXBART){ + res$mmean=apply(res$mdraws,2,mean) + res$msd=apply(res$mdraws,2,sd) + res$m.5=apply(res$mdraws,2,quantile,0.5) + res$m.lower=apply(res$mdraws,2,quantile,q.lower) + res$m.upper=apply(res$mdraws,2,quantile,q.upper) + res$smean=apply(res$sdraws,2,mean) + res$ssd=apply(res$sdraws,2,sd) + res$s.5=apply(res$sdraws,2,median) + res$s.lower=apply(res$sdraws,2,quantile,q.lower) + res$s.upper=apply(res$sdraws,2,quantile,q.lower) + res$pdraws=NULL + res$pmean=NULL + res$psd=NULL + res$p.5=NULL + res$p.lower=NULL + res$p.upper=NULL +} + +res$q.lower=q.lower +res$q.upper=q.upper +res$x.test=x.test +res$modeltype=fit$modeltype + +class(res)="OpenBT_predict" + +return(res) +} + + +#-------------------------------------------------- +#Get Variable activity +#-------------------------------------------------- + +vartivity.openbt = function( +fit=NULL, +q.lower=0.025, +q.upper=0.975 +) +{ + +#-------------------------------------------------- +# params +if(is.null(fit)) stop("No fitted model specified!\n") +p=length(fit$xicuts) +m=fit$m +mh=fit$mh +nd=fit$nd +modelname=fit$modelname + + +#-------------------------------------------------- +#write out config file +fout=file(paste(fit$folder,"/config.vartivity",sep=""),"w") +writeLines(c(fit$modelname, + paste(nd),paste(m), + paste(mh),paste(p)) ,fout) +close(fout) + + +#-------------------------------------------------- +#run vartivity program -- it's not actually parallel so no call to mpirun. +runlocal=FALSE +if(Sys.which("openbtcli")[[1]]=="") # not installed in a global locaiton, so assume current directory + runlocal=TRUE + +if(runlocal) + cmd=paste("./openbtvartivity ",fit$folder,sep="") +else + cmd=paste("openbtvartivity ",fit$folder,sep="") + +#cmd=paste("./openbtvartivity",sep="") +system(cmd) +system(paste("rm -f ",fit$folder,"/config.vartivity",sep="")) + + +#-------------------------------------------------- +#read in result +res=list() +res$vdraws=read.table(paste(fit$folder,"/",fit$modelname,".vdraws",sep="")) +res$vdrawsh=read.table(paste(fit$folder,"/",fit$modelname,".vdrawsh",sep="")) +res$vdraws=as.matrix(res$vdraws) +res$vdrawsh=as.matrix(res$vdrawsh) + +# normalize counts +colnorm=apply(res$vdraws,1,sum) +ix=which(colnorm>0) +res$vdraws[ix,]=res$vdraws[ix,]/colnorm[ix] +colnorm=apply(res$vdrawsh,1,sum) +ix=which(colnorm>0) +res$vdrawsh[ix,]=res$vdrawsh[ix,]/colnorm[ix] + +res$mvdraws=apply(res$vdraws,2,mean) +res$mvdrawsh=apply(res$vdrawsh,2,mean) +res$vdraws.sd=apply(res$vdraws,2,sd) +res$vdrawsh.sd=apply(res$vdrawsh,2,sd) +res$vdraws.5=apply(res$vdraws,2,quantile,0.5) +res$vdrawsh.5=apply(res$vdrawsh,2,quantile,0.5) +res$vdraws.lower=apply(res$vdraws,2,quantile,q.lower) +res$vdraws.upper=apply(res$vdraws,2,quantile,q.upper) +res$vdrawsh.lower=apply(res$vdrawsh,2,quantile,q.lower) +res$vdrawsh.upper=apply(res$vdrawsh,2,quantile,q.upper) +res$q.lower=q.lower +res$q.upper=q.upper +res$modeltype=fit$modeltype + +class(res)="OpenBT_vartivity" + +return(res) +} + +#-------------------------------------------------- +#Sobol function +#-------------------------------------------------- + +sobol.openbt = function( +fit=NULL, +q.lower=0.025, +q.upper=0.975, +tc=2 +) +{ + +#-------------------------------------------------- +# params +if(is.null(fit)) stop("No fitted model specified!\n") +p=length(fit$xicuts) +m=fit$m +mh=fit$mh +nd=fit$nd +modelname=fit$modelname + + +#-------------------------------------------------- +#write out config file +fout=file(paste(fit$folder,"/config.sobol",sep=""),"w") +writeLines(c(fit$modelname,fit$xiroot, + paste(nd),paste(m), + paste(mh),paste(p),paste(fit$minx), + paste(fit$maxx),paste(tc)) ,fout) +close(fout) + + +#-------------------------------------------------- +#run Sobol program +cmdopt=100 #default to serial/OpenMP +runlocal=FALSE +cmd="openbtcli --conf" +if(Sys.which("openbtcli")[[1]]=="") # not installed in a global locaiton, so assume current directory + runlocal=TRUE + +if(runlocal) cmd="./openbtcli --conf" + +cmdopt=system(cmd) + +if(cmdopt==101) # MPI +{ + cmd=paste("mpirun -np ",tc," openbtsobol ",fit$folder,sep="") +} + +if(cmdopt==100) # serial/OpenMP +{ + if(runlocal) + cmd=paste("./openbtsobol ",fit$folder,sep="") + else + cmd=paste("openbtsobol ",fit$folder,sep="") +} + +system(cmd) +system(paste("rm -f ",fit$folder,"/config.sobol",sep="")) + + +#-------------------------------------------------- +#read in result +res=list() +draws=read.table(paste(fit$folder,"/",fit$modelname,".sobol",0,sep="")) +for(i in 2:tc) + draws=rbind(draws,read.table(paste(fit$folder,"/",fit$modelname,".sobol",i-1,sep=""))) +draws=as.matrix(draws) + +drawsh=read.table(paste(fit$folder,"/",fit$modelname,".shapley",0,sep="")) +for(i in 2:tc) + drawsh=rbind(drawsh,read.table(paste(fit$folder,"/",fit$modelname,".shapley",i-1,sep=""))) +drawsh=as.matrix(drawsh) + +labs=gsub("\\s+",",",apply(combn(1:p,2),2,function(zz) Reduce(paste,zz))) +res$vidraws=draws[,1:p] +res$vijdraws=draws[,(p+1):(p+p*(p-1)/2)] +res$tvidraws=draws[,(ncol(draws)-p):(ncol(draws)-1)] +res$vdraws=draws[,ncol(draws)] +res$sidraws=res$vidraws/res$vdraws +res$sijdraws=res$vijdraws/res$vdraws +res$tsidraws=res$tvidraws/res$vdraws +res$vidraws=as.matrix(res$vidraws) +colnames(res$vidraws)=paste("V",1:p,sep="") +res$vijdraws=as.matrix(res$vijdraws) +colnames(res$vijdraws)=paste("V",labs,sep="") +res$tvidraws=as.matrix(res$tvidraws) +colnames(res$tvidraws)=paste("TV",1:p,sep="") +res$vdraws=as.matrix(res$vdraws) +colnames(res$vdraws)="V" +res$sidraws=as.matrix(res$sidraws) +colnames(res$sidraws)=paste("S",1:p,sep="") +res$sijdraws=as.matrix(res$sijdraws) +colnames(res$sijdraws)=paste("S",labs,sep="") +res$tsidraws=as.matrix(res$tsidraws) +colnames(res$tsidraws)=paste("TS",1:p,sep="") +rm(draws) +res$shdraws=as.matrix(drawsh) +res$shdraws=res$shdraws/as.vector(res$vdraws) +colnames(res$shdraws)=paste("Sh",1:p,sep="") +rm(drawsh) + +# summaries +res$msi=apply(res$sidraws,2,mean) +res$msi.sd=apply(res$sidraws,2,sd) +res$si.5=apply(res$sidraws,2,quantile,0.5) +res$si.lower=apply(res$sidraws,2,quantile,q.lower) +res$si.upper=apply(res$sidraws,2,quantile,q.upper) +names(res$msi)=paste("S",1:p,sep="") +names(res$msi.sd)=paste("S",1:p,sep="") +names(res$si.5)=paste("S",1:p,sep="") +names(res$si.lower)=paste("S",1:p,sep="") +names(res$si.upper)=paste("S",1:p,sep="") + +res$msij=apply(res$sijdraws,2,mean) +res$sij.sd=apply(res$sijdraws,2,sd) +res$sij.5=apply(res$sijdraws,2,quantile,0.5) +res$sij.lower=apply(res$sijdraws,2,quantile,q.lower) +res$sij.upper=apply(res$sijdraws,2,quantile,q.upper) +names(res$msij)=paste("S",labs,sep="") +names(res$sij.sd)=paste("S",labs,sep="") +names(res$sij.5)=paste("S",labs,sep="") +names(res$sij.lower)=paste("S",labs,sep="") +names(res$sij.upper)=paste("S",labs,sep="") + +res$mtsi=apply(res$tsidraws,2,mean) +res$tsi.sd=apply(res$tsidraws,2,sd) +res$tsi.5=apply(res$tsidraws,2,quantile,0.5) +res$tsi.lower=apply(res$tsidraws,2,quantile,q.lower) +res$tsi.upper=apply(res$tsidraws,2,quantile,q.upper) +names(res$mtsi)=paste("TS",1:p,sep="") +names(res$tsi.sd)=paste("TS",1:p,sep="") +names(res$tsi.5)=paste("TS",1:p,sep="") +names(res$tsi.lower)=paste("TS",1:p,sep="") +names(res$tsi.upper)=paste("TS",1:p,sep="") + +res$mshi=apply(res$shdraws,2,mean) +res$mshi.sd=apply(res$shdraws,2,sd) +# res$shi.5=apply(res$shdraws,2,quantile,0.5) +# res$shi.lower=apply(res$shdraws,2,quantile,q.lower) +# res$shi.upper=apply(res$shdraws,2,quantile,q.upper) +names(res$mshi)=paste("Sh",1:p,sep="") +names(res$mshi.sd)=paste("Sh",1:p,sep="") +# names(res$shi.5)=paste("Sh",1:p,sep="") +# names(res$shi.lower)=paste("Sh",1:p,sep="") +# names(res$shi.upper)=paste("Sh",1:p,sep="") + + +res$q.lower=q.lower +res$q.upper=q.upper +res$modeltype=fit$modeltype + +class(res)="OpenBT_sobol" + +return(res) +} + +#-------------------------------------------------- +#Pareto function +#-------------------------------------------------- +# Pareto Front Multiobjective Optimization using 2 fitted BART models +mopareto.openbt = function( +fit1=NULL, +fit2=NULL, +fit3=NULL, +q.lower=0.025, +q.upper=0.975, +tc=2 +) +{ + +#-------------------------------------------------- +# params +if(is.null(fit1) || is.null(fit2)) stop("No fitted models specified!\n") +#if(fit1$xicuts != fit2$xicuts) stop("Models not compatible\n") +if(fit1$nd != fit2$nd) stop("Models have different number of posterior samples\n") +p=length(fit1$xicuts) +d=2+(!is.null(fit3)) +print(paste0("d=",d)) +m1=fit1$m +m2=fit2$m +mh1=fit1$mh +mh2=fit2$mh +m3=0 +mh3=0 +if(!is.null(fit3)) { + if(fit3$nd != fit2$nd) stop("Models have different number of posterior samples\n") + m3=fit3$m + mh3=fit3$mh +} +# if(is.null(fit3)) { +# fit3=list() +# fit3$modelname="null" +# fit3$folder="null" +# fit3$fmean=0.0 +# } + +nd=fit1$nd +modelname=fit1$modelname + + +#-------------------------------------------------- +#write out config file +fout=file(paste(fit1$folder,"/config.mopareto",sep=""),"w") +if(!is.null(fit3)) { +writeLines(c(fit1$modelname,fit2$modelname,fit3$modelname,fit1$xiroot, + fit2$folder,fit3$folder, + paste(nd),paste(m1), + paste(mh1),paste(m2),paste(mh2), + paste(m3),paste(mh3), + paste(p),paste(fit1$minx), + paste(fit1$maxx),paste(fit1$fmean),paste(fit2$fmean), + paste(fit3$fmean),paste(tc)) ,fout) +} +if(is.null(fit3)) { +writeLines(c(fit1$modelname,fit2$modelname,paste("null"),fit1$xiroot, + fit2$folder,paste("null"), + paste(nd),paste(m1), + paste(mh1),paste(m2),paste(mh2), + paste(m3),paste(mh3), + paste(p),paste(fit1$minx), + paste(fit1$maxx),paste(fit1$fmean),paste(fit2$fmean), + paste(0.0),paste(tc)) ,fout) +} +close(fout) + + +#-------------------------------------------------- +#run Pareto Front program +cmdopt=100 #default to serial/OpenMP +runlocal=FALSE +cmd="openbtcli --conf" +if(Sys.which("openbtcli")[[1]]=="") # not installed in a global locaiton, so assume current directory + runlocal=TRUE + +if(runlocal) cmd="./openbtcli --conf" + +cmdopt=system(cmd) + +if(cmdopt==101) # MPI +{ + cmd=paste("mpirun -np ",tc," openbtmopareto ",fit1$folder,sep="") +} + +if(cmdopt==100) # serial/OpenMP +{ + if(runlocal) + cmd=paste("./openbtmopareto ",fit1$folder,sep="") + else + cmd=paste("openbtmopareto ",fit1$folder,sep="") +} + +system(cmd) +system(paste("rm -f ",fit1$folder,"/config.mopareto",sep="")) + + +#-------------------------------------------------- +#read in result +res=list() +ii=1 +u=0 # to modify temp indexing if fit3 exists +for(i in 1:tc) +{ + con=file(paste(fit1$folder,"/",fit1$modelname,".mopareto",i-1,sep="")) + open(con) + s=readLines(con,1) + while(length(s)>0) { + temp=as.numeric(unlist(strsplit(s," "))) + k=as.integer(temp[1]) + theta=matrix(0,ncol=k,nrow=d) + theta[1,]=temp[2:(2+k-1)] + theta[2,]=temp[(2+k):(2+2*k-1)] + if(!is.null(fit3)) { + theta[3,]=temp[(2+2*k):(2+3*k-1)] + u=k + } + a=matrix(0,nrow=p,ncol=k) + for(i in 1:p) a[i,]=temp[(2+(2+i-1)*k):(2+(2+i)*k-1)+u] + b=matrix(0,nrow=p,ncol=k) + for(i in 1:p) b[i,]=temp[(2+(2+p+i-1)*k):(2+(2+p+i)*k-1)+u] + entry=list() + entry[["theta"]]=theta + entry[["a"]]=a + entry[["b"]]=b + res[[ii]]=entry + ii=ii+1 + s=readLines(con,1) + } + close(con) +} + + +class(res)="OpenBT_mopareto" + +return(res) +} + + + +# Reweight predictions using output of influence.openbt(). +# For method IS_GLOBAL: infl (influence obect) and pred (prediction object to be reweighted) +# Also, idx may be specified to apply weights only to a subset of observations. +# For method IS_LOCAL_*: pass infl (influence object), xpred (training x's from fit) and fit (fitted OpenBT object) +repredict.openbt<-function(pred=NULL,infl=NULL,fit=NULL,tc=2) +{ + if(is.null(pred)) stop("Model prediction object required.\n") + if(is.null(infl)) stop("Model influence object required.\n") + if(infl$method=="is_global" && class(pred)!="OpenBT_predict") stop("Model prediction object not recognized.\n") + if(class(infl)!="OpenBT_influence") stop("Model influence object not recognized.\n") + if(class(pred)!="OpenBT_predict") stop("Model pred object not recognized.\n") + if(infl$method!="is_global" && infl$method!="is_local_int" && infl$method!="is_local_uint" && infl$method!="is_local_union" && infl$method!="is_local_l1" && infl$method!="is_local_l2") stop("Cannot reweight predictions with influence method ",infl$method,".\n") + + dropid=infl$infl.obs + ninfl=length(dropid) + res=list() + cat("Applying weights\n") + + if(infl$method=="is_global") + { + res$mdraws=pred$mdraws + res$wmat=apply(infl$w,1,prod) + #re-weight and return the predictions + # for(k in 1:ninfl) + # for(i in 1:fit$nd) { + # # res$mdraws[i,]=res$mdraws[i,]*infl$w[i,k] + # res$mdraws[i,]=res$mdraws[i,]*res$wmat[i] + # } + + # wbar=apply(infl$w,2,mean)#mean(infl$w) + wbar=mean(res$wmat)#mean(infl$w) + # res$mdraws=res$mdraws/wbar + # res$wmat=infl$w + res$wbar=wbar + res$mmean=rep(0,nrow(pred$x.test)) + res$msd=rep(0,nrow(pred$x.test)) + res$m.5=rep(0,nrow(pred$x.test)) + res$m.lower=rep(0,nrow(pred$x.test)) + res$m.upper=rep(0,nrow(pred$x.test)) + for(i in 1:nrow(pred$x.test)) { + res$mmean[i]=wtd.mean(res$mdraws[,i],res$wmat,normwt=TRUE) + res$msd[i]=sqrt(wtd.var(res$mdraws[,i],res$wmat,normwt=TRUE)) + res$m.5[i]=wtd.quantile(res$mdraws[,i],res$wmat,probs=c(0.5),normwt=TRUE) + res$m.lower[i]=wtd.quantile(res$mdraws[,i],res$wmat,probs=c(pred$q.lower),normwt=TRUE) + res$m.upper[i]=wtd.quantile(res$mdraws[,i],res$wmat,probs=c(pred$q.upper),normwt=TRUE) + + } + } + + if(infl$method=="is_local_int" || infl$method=="is_local_uint" || infl$method=="is_local_union" || infl$method=="is_local_l1") + { + + if(is.null(fit)) stop("Method is_local_* requires fitted OpenBT_posterior in object fit\n") + if(length(fit$xicuts)!=ncol(pred$x.test)) stop("xpred's are not same dimension as trained model\n") + + in_rect<-function(xin,a,b,p) + { + for(i in 1:length(xin)) { + if(xin[i]b[i]) { + return(FALSE) + } + } + return(TRUE) + } + + # get predictions + p=length(fit$xicuts) + res$mdraws=pred$mdraws + + # now for each posterior draw, identify all predictions in the hyperrectangle of influence + # parallel::mcaffinity(1:tc) + # cl=makeCluster(tc) + # registerDoParallel(cl) + # mcoptions = list(preschedule=TRUE,cores=tc) + # wmat=foreach(j=1:nrow(pred$x.test),.combine=cbind,.options.multicore=mcoptions) %dopar% { + # tempwvec=matrix(1,nrow=fit$nd,ncol=1) + # for(i in 1:fit$nd) { + # for(k in 1:ninfl) { + # if(in_rect(pred$x.test[j,],infl$hyperrects[[k]]$a[i,],infl$hyperrects[[k]]$b[i,],p)) { + # tempwvec[i,1]=tempwvec[i,1]*infl$w[i,k,drop=FALSE] + # } + # } + # } + # tempwvec + # } + # wmat=as.matrix(wmat) # our final weight matrix, should be fit$nd rows x nrow(pred$x.test) columns + # stopCluster(cl) + + # Original serial version of the above parallel R code: + wmat=matrix(1,nrow=fit$nd,ncol=nrow(pred$x.test)) # our final weight matrix + + for(i in 1:fit$nd) { + for(j in 1:nrow(pred$x.test)) { + for(k in 1:ninfl) { + if(in_rect(pred$x.test[j,],infl$hyperrects[[k]]$a[i,],infl$hyperrects[[k]]$b[i,],p)) { + # if(wmat[i,j]==1) + # wmat[i,j]=infl$w[i,k,drop=FALSE] + # else + wmat[i,j]=wmat[i,j]*infl$w[i,k,drop=FALSE] + } + } + } + } + + + wbar=apply(wmat,2,mean) + # res$mdraws=res$mdraws*wmat + # res$mdraws=t(t(res$mdraws)/wbar) + res$wmat=wmat + res$wbar=wbar + res$mmean=rep(0,nrow(pred$x.test)) + res$msd=rep(0,nrow(pred$x.test)) + res$m.5=rep(0,nrow(pred$x.test)) + res$m.lower=rep(0,nrow(pred$x.test)) + res$m.upper=rep(0,nrow(pred$x.test)) + for(i in 1:nrow(pred$x.test)) { + res$mmean[i]=wtd.mean(res$mdraws[,i],res$wmat[,i],normwt=TRUE) + res$msd[i]=sqrt(wtd.var(res$mdraws[,i],res$wmat[,i],normwt=TRUE)) + res$m.5[i]=wtd.quantile(res$mdraws[,i],res$wmat[,i],probs=c(0.5),normwt=TRUE) + res$m.lower[i]=wtd.quantile(res$mdraws[,i],res$wmat[,i],probs=c(pred$q.lower),normwt=TRUE) + res$m.upper[i]=wtd.quantile(res$mdraws[,i],res$wmat[,i],probs=c(pred$q.upper),normwt=TRUE) + + } + + } + + if(infl$method=="is_local_l2") + { + if(is.null(fit)) stop("Method is_local_* requires fitted OpenBT_posterior in object fit\n") + + res$mdraws=pred$mdraws + wmat=matrix(1/fit$nd,nrow=fit$nd,ncol=nrow(pred$x.test)) + + for(j in 1:ninfl) { + dvec=rep(0,nrow(pred$x.test)) + for(k in 1:nrow(pred$x.test)) + dvec[k]=dist(rbind(pred$x.test[k,],infl$x.infl[dropid[[j]],])) + idx=which(dvec<0.2) + for(i in 1:fit$nd) + for(k in idx) + if(wmat[i,k]==1/fit$nd) wmat[i,k]=infl$w[i,j,drop=FALSE] + else wmat[i,k]=wmat[i,k]*infl$w[i,j,drop=FALSE] + } + + wbar=apply(wmat,2,mean) + # res$mdraws=res$mdraws*wmat + # res$mdraws=t(t(res$mdraws)/wbar) + res$wmat=wmat + res$wbar=wbar + res$mmean=rep(0,nrow(pred$x.test)) + res$msd=rep(0,nrow(pred$x.test)) + res$m.5=rep(0,nrow(pred$x.test)) + res$m.lower=rep(0,nrow(pred$x.test)) + res$m.upper=rep(0,nrow(pred$x.test)) + for(i in 1:nrow(pred$x.test)) { + res$mmean[i]=wtd.mean(res$mdraws[,i],res$wmat[,i],normwt=TRUE) + res$msd[i]=sqrt(wtd.var(res$mdraws[,i],res$wmat[,i],normwt=TRUE)) + res$m.5[i]=wtd.quantile(res$mdraws[,i],res$wmat[,i],probs=c(0.5),normwt=TRUE) + res$m.lower[i]=wtd.quantile(res$mdraws[,i],res$wmat[,i],probs=c(pred$q.lower),normwt=TRUE) + res$m.upper[i]=wtd.quantile(res$mdraws[,i],res$wmat[,i],probs=c(pred$q.upper),normwt=TRUE) + + } + } + + + + res$sdraws=pred$sdraws + # res$mmean=apply(res$mdraws,2,mean) + res$smean=pred$smean + # res$msd=apply(res$mdraws,2,sd) + res$ssd=pred$ssd + # res$m.5=apply(res$mdraws,2,quantile,0.5) + # res$m.lower=apply(res$mdraws,2,quantile,pred$q.lower) + # res$m.upper=apply(res$mdraws,2,quantile,pred$q.upper) + res$s.5=pred$s.5 + res$s.lower=pred$s.lower + res$s.upper=pred$s.upper + res$pdraws=pred$pdraws + res$pmean=pred$pmean + res$psd=pred$psd + res$p.5=pred$p.5 + res$p.lower=pred$p.lower + res$p.upper=pred$p.upper + res$q.lower=pred$q.lower + res$q.upper=pred$q.upper + res$modeltype=pred$modeltype + + class(res)="OpenBT_predict" + return(res) +} + + +# Calculate various metrics of influence for a regression tree model. +influence.openbt<-function(x.infl,y.infl,fit=NULL,infl.obs=NULL,dvec=NULL,tc=2,method="is_local_int") +{ + if(is.null(fit)) stop("Model fit object required.\n") + if(class(fit)!="OpenBT_posterior") stop("Model fit object not recognized.\n") + if(is.null(infl.obs)) stop("No hold-out drop id specified for reweighting (infl.obs=NULL).\n") + if(min(infl.obs)<1) stop("Invalid drop id infl.obs=",infl.obs,".\n") + if(max(infl.obs)>nrow(x.infl)) stop("Invalid drop id infl.obs=",infl.obs,".\n") + if(method=="is_local_l1" && is.null(dvec)) stop("Invalid dvec for method is_local_l1.\n") + + nd=fit$nd + # np=nrow(x.infl) + ninfl=length(infl.obs) + w=matrix(0,nrow=nd,ncol=ninfl) + wsum=rep(1,ninfl) + + if(ninfl==0){ + stop("Argument infl.obs has no influential observations\n") + } + + + pp=predict.openbt(object=fit,x.test=x.infl,tc=tc) + + if(method=="is_global") { + # w is an nd by np matrix where the j_th column contains the weights for each + # nd posterior samples if (x_j,y_j) were removed from the dataset. + for(kk in 1:ninfl) + for(i in 1:nd) w[i,kk]=(2*pi)^(1/2)*pp$sdraws[i,infl.obs[kk]]*exp(1/(2*pp$sdraws[i,infl.obs[kk]]^2)*(y.infl[infl.obs[kk]]-pp$mdraws[i,infl.obs[kk]])^2) +# w=apply(w,1,prod) # iid BART likelihood => we just combine the different terms productwise. +# wsum=sum(w) + } + # end IS_GLOBAL + + if(method=="is_local_int" || method=="is_local_union" || method=="is_local_uint" || method=="is_local_l1") + { + p=length(fit$xicuts) + m=fit$m + mh=fit$mh + nd=fit$nd + modelname=fit$modelname + hrects.int=list() + hrects.uint=list() + hrects.union=list() + hrects.l1=list() +# w=matrix(0,nrow=nd,ncol=ninfl) +# wsum=rep(1,ninfl) + + for(kk in 1:ninfl) { + xd=x.infl[infl.obs[kk],] + if(!is.vector(xd)) stop("xd is not a px1 vector in x.infl.\n") + + #-------------------------------------------------- + #write out config file + fout=file(paste(fit$folder,"/config.influence",sep=""),"w") + writeLines(c(fit$modelname,fit$xiroot, + paste(nd),paste(m), + paste(mh),paste(p),paste(xd),paste(fit$minx), + paste(fit$maxx),paste(tc)) ,fout) + close(fout) + + + #-------------------------------------------------- + #run influential hyperrectangle program + cmdopt=100 #default to serial/OpenMP + runlocal=FALSE + cmd="openbtcli --conf" + if(Sys.which("openbtcli")[[1]]=="") # not installed in a global locaiton, so assume current directory + runlocal=TRUE + + if(runlocal) cmd="./openbtcli --conf" + + cmdopt=system(cmd) + + if(cmdopt==101) # MPI + { + cmd=paste("mpirun -np ",tc," openbtinfl ",fit$folder,sep="") + } + + if(cmdopt==100) # serial/OpenMP + { + if(runlocal) + cmd=paste("./openbtinfl ",fit$folder,sep="") + else + cmd=paste("openbtinfl ",fit$folder,sep="") + } + + system(cmd) + system(paste("rm -f ",fit$folder,"/config.influence",sep="")) + + #-------------------------------------------------- + #read in result + ii=1 + a=matrix(0,nrow=nd,ncol=p) + b=matrix(0,nrow=nd,ncol=p) + au=matrix(0,nrow=nd,ncol=p) + bu=matrix(0,nrow=nd,ncol=p) + for(i in 1:tc) + { + con=file(paste(fit$folder,"/",fit$modelname,".influence",i-1,sep="")) + open(con) + s=readLines(con,1) + while(length(s)>0) { + temp=as.numeric(unlist(strsplit(s," "))) + a[ii,]=temp[1:p] + b[ii,]=temp[(p+1):(2*p)] + au[ii,]=temp[(2*p+1):(3*p)] + bu[ii,]=temp[(3*p+1):(4*p)] + ii=ii+1 + s=readLines(con,1) + } + close(con) + } + + system(paste("rm -f ",fit$folder,"/",fit$modelname,".influence*",sep="")) + + hrects.int[[kk]]=list() + hrects.uint[[kk]]=list() + hrects.union[[kk]]=list() + hrects.l1[[kk]]=list() + hrects.int[[kk]][["a"]]=a + hrects.int[[kk]][["b"]]=b + hrects.uint[[kk]][["a"]]=matrix(apply(a,2,min),nrow=nd,ncol=p,byrow=TRUE) + hrects.uint[[kk]][["b"]]=matrix(apply(b,2,max),nrow=nd,ncol=p,byrow=TRUE) + hrects.union[[kk]][["a"]]=au + hrects.union[[kk]][["b"]]=bu + hrects.l1[[kk]][["a"]]=matrix(xd-dvec,nrow=nd,ncol=p,byrow=TRUE) + hrects.l1[[kk]][["b"]]=matrix(xd+dvec,nrow=nd,ncol=p,byrow=TRUE) + + # w is an nd by 1 matrix where containing the weights for each + # nd posterior samples if (x[infl.obs,],y[infl.obs]) were removed from the dataset, + # where the weight would be applied to any predictions in the corresponding hyperrect + # for this draw. Otherwise, the weight is just 1, giving usual posterior average -- this + # is handled in the reweight() function. + for(i in 1:nd) w[i,kk]=(2*pi)^(1/2)*pp$sdraws[i,infl.obs[kk]]*exp(1/(2*pp$sdraws[i,infl.obs[kk]]^2)*(y.infl[infl.obs[kk]]-pp$mdraws[i,infl.obs[kk]])^2) +# wsum[kk]=sum(w[,kk]) +# w[,kk]=w[,kk]/sum(w[,kk]) + + # Perhaps a simple indicator of which observation has a lot of influence: + # infl.max=apply(w,2,max) + # infl.mean=apply(w,2,mean) + # infl.ids=infl.obs + } + # end for kk in 1:length(infl.obs) loop + } + #end is_local_* + for(i in 1:ninfl) wsum[i]=sum(w[,i]) + + res=list() + res$nd=nd + # res$np=np + res$x.infl=x.infl + res$y.infl=y.infl + res$w=w + res$wsum=wsum + # res$infl.max=infl.max + # res$infl.mean=infl.mean + res$infl.obs=infl.obs + res$hyperrects=NULL + if(method=="is_local_int") { + res$hyperrects=hrects.int + } + if(method=="is_local_uint") { + res$hyperrects=hrects.uint + } + if(method=="is_local_union") { + res$hyperrects=hrects.union + } + if(method=="is_local_l1") { + res$hyperrects=hrects.l1 + } + res$method=method + + class(res)="OpenBT_influence" + + return(res) +} + + +summary.OpenBT_influence<-function(infl) +{ + cat("No summary method for object.\n") +} + +print.OpenBT_influence<-function(infl) +{ + cat("OpenBT Influence\n") + cat("metric: ",infl$method,"\n") + cat(infl$nd, " posterior samples.\n") + cat(infl$np, " observations.\n") + + if(infl$method=="is_global") { + idx=sort(infl$infl,decreasing=TRUE,index.return=TRUE)$ix + cat("\nTop 5 maximum weights: ", infl$infl[idx][1:5],"\n") + cat("\nTop 5 influential observations by weights: \n") + for(i in 1:5) { + cat("Input ",idx[i]," max.influence=",infl$infl[idx[i]], + " y=",infl$y.infl[idx[i]], " x=",infl$x.infl[idx[i],][1],"\n") + } + + tab=sort(table(infl$infl.ids),decreasing=TRUE) + cat("\nTop 5 influential observations by frequency: \n") + print(tab[1:5]) + } +} + +plot.OpenBT_influence<-function(infl) +{ + tab.ids=table(infl$infl.ids) + + if(infl$method=="is_global") { + par(mfrow=c(1,2)) + plot(1:infl$np,infl$infl,xlab="Observation Index",ylab="max.influence",type="h",lwd=2) + title(paste("metric: ",infl$method,sep="")) + plot(tab.ids,xlab="Observation Index",ylab="Frequency") + title(paste("metric: ",infl$method,sep="")) + } + + if(infl$method=="is_local_int" || infl$method=="is_local_union") + { + # plot hyperrectangles + plot(0,0,xlim=c(0,1),ylim=c(0,1),type="n",xlab="x1",ylab="x2") + for(i in 1:nrow(infl$hyperrects$a)) rect(infl$hyperrects$a[i,1],infl$hyperrects$a[i,2],infl$hyperrects$b[i,1],infl$hyperrects$b[i,2],col=gray(0.5,alpha=0.1),border=NA) + } +} + + + +# Computer Model Calibration a BART emulator and BART discrepancy. +calibrate.openbt<-function(xc,yc,xf,yf,xdims=NULL,caldims=NULL, + calprior="normal",calpriora=NULL,calpriorb=NULL, + ntreec=NULL,ntreef=NULL,ndpost=1000, + nskip=100,kc=NULL,kf=NULL,power=2,base=0.95,tc=2, + sigmac=rep(1,length(yc)),sigmaf=rep(1,length(yf)), + fmeanc=0.0,fmeanf=0.0,overallsdc=NULL, + overallnuc=NULL,overallsdf=NULL,overallnuf=NULL, + chv=cor(xc,method="spearman"),pbd=0.7,pb=0.5, + stepwpert=0.1,probchv=0.1,minnumbot=5,printevery=100, + numcut=100,xicuts=NULL,nadapt=1000,adaptevery=100, + model="bart",modelname="model",summarystats=FALSE) +{ + if(is.null(xdims)) stop("xdims must be specified.\n") + if(is.null(caldims)) stop("caldims must be specified.\n") + numx=length(xdims) + numcal=length(caldims) + if(ncol(xc)!=numx+numcal) stop("xc not compatible with xdims+caldims.\n") + if(ncol(xf)!=numx) stop("xf not compatible with xdims.\n") + if(is.null(calprior)) stop("Invalid calibration parameter prior type specified.\n") + if(is.null(calpriora)) stop("Invalid calibration parameter prior: calpriora.\n") + if(is.null(calpriorb)) stop("Invalid calibration parameter prior: calpriorb.\n") + + if(length(calprior)==1) calprior=rep(calprior,numcal) + if(length(calpriora)==1) calpriora=rep(calpriora,numcal) + if(length(calpriorb)==1) calpriorb=rep(calpriorb,numcal) + + if(length(calprior)!=numcal || length(calpriora)!=numcal || length(calpriorb)!=numcal) + stop("Invalid calibration parameter prior: does not match number of calibration parameters.\n") + + calpriortype=rep(NULL,numcal) + for(i in 1:numcal) { + if(calprior[i]=="normal") calpriortype[i]=1 + if(calprior[i]=="uniform") calpriortype[i]=0 + if(is.null(calpriortype[i])) stop("Invalid calibration parameter prior type specified.\n") + if(calpriortype[i]==1 && calpriorb[i]<=0) stop("Invalid variance specified in calibration parameter prior.\n") + } + + ntreehc=NULL + ntreehf=NULL + #-------------------------------------------------- + # model type definitions + modeltype=rep(0,2) # undefined + MODEL_BT=1 + MODEL_BINOMIAL=2 + MODEL_POISSON=3 + MODEL_BART=4 + MODEL_HBART=5 + MODEL_PROBIT=6 + MODEL_MODIFIEDPROBIT=7 + MODEL_MERCK_TRUNCATED=8 + + + if(length(model)==1) model=rep(model,2) + if( (model[1]!="bart" && model[1]!="hbart") || (model[2]!="bart" && model[2]!="hbart") ) + { + cat("Model type not specified.\n") + cat("Available options are:\n") + cat("model='bart'\n") + cat("model='hbart'\n") + + stop("missing model type.\n") + } + if(model[1]=="bart") + { + modeltype[1]=MODEL_BART + if(is.null(ntreec)) ntreec=200 + if(is.null(ntreehc)) ntreehc=1 + if(is.null(kc)) kc=2 + if(is.null(overallsdc)) overallsdc=sd(yc) + if(is.null(overallnuc)) overallnuc=10 + pbdc=c(pbd[1],0.0) + } + if(model[1]=="hbart") + { + modeltype=MODEL_HBART + if(is.null(ntreec)) ntreec=200 + if(is.null(ntreehc)) ntreehc=40 + if(is.null(kc)) kc=5 + if(is.null(overallsdc)) overallsdc=sd(yc) + if(is.null(overallnuc)) overallnuc=10 + pbdc=pbd + } + if(model[2]=="bart") + { + modeltype[2]=MODEL_BART + if(is.null(ntreef)) ntreef=200 + if(is.null(ntreehf)) ntreehf=1 + if(is.null(kf)) kf=2 + if(is.null(overallsdf)) overallsdf=sd(yf) + if(is.null(overallnuf)) overallnuf=10 + pbdf=c(pbd[1],0.0) + } + if(model[2]=="hbart") + { + modeltype[2]=MODEL_HBART + if(is.null(ntreef)) ntreef=200 + if(is.null(ntreehf)) ntreehf=40 + if(is.null(kf)) kf=5 + if(is.null(overallsdf)) overallsdf=sd(yf) + if(is.null(overallnuf)) overallnuf=10 + pbdf=pbd + } + + + #-------------------------------------------------- + nd = ndpost + burn = nskip + mc = ntreec + mhc = ntreehc + mf = ntreef + mhf = ntreehf + #-------------------------------------------------- + # simulator data + nc = length(yc) + p = ncol(xc) + pdelta = ncol(xf) + xpred = matrix(NA,nrow=nf,ncol=p) + for(i in 1:numcal) { + if(calpriortype[i]) + xpred[,caldims[i]]=calpriora[i] + else + xpred[,caldims[i]]=calpriora[i]+(calpriorb[i]-calpriora[i])/2.0 + } + for(i in 1:pdelta) { + xpred[,xdims[i]]=xf[,i] + } +# xpred = t(xpred) +# xc = t(xc) + nf = length(yf) +# xf = t(xf) + # if(modeltype==MODEL_BART || modeltype==MODEL_HBART) + # { + yc=yc-fmeanc + yf=yf-fmeanf + fmeanc.out=paste(fmeanc) + fmeanf.out=paste(fmeanf) + # } + # if(modeltype==MODEL_PROBIT || modeltype==MODEL_MODIFIEDPROBIT) + # { + # fmean.out=paste(qnorm(fmean)) + # uniqy=sort(unique(y.train)) + # if(length(uniqy)>2) stop("Invalid y.train: Probit requires dichotomous response coded 0/1") + # if(uniqy[1]!=0 || uniqy[2]!=1) stop("Invalid y.train: Probit requires dichotomous response coded 0/1") + # } + #-------------------------------------------------- + #cutpoints + if(!is.null(xicuts)) # use xicuts + { + xic=xicuts + } + else # default to equal numcut per dimension + { + xic=vector("list",p) + minxc=floor(apply(xc,2,min)) + maxxc=ceiling(apply(xc,2,max)) + for(i in 1:p) + { + xinc=(maxxc[i]-minxc[i])/(numcut+1) + xic[[i]]=(1:numcut)*xinc+minxc[i] + } + } + xif=vector("list",numx) + for(i in 1:numx) + xif[[i]]=xic[[xdims[i]]] + + #-------------------------------------------------- + if(modeltype[1]==MODEL_BART || modeltype[1]==MODEL_HBART) + { + rgc = range(yc) + } + if(modeltype[2]==MODEL_BART || modeltype[2]==MODEL_HBART) + { + rgf = range(yf) + } + # if(modeltype==MODEL_PROBIT || modeltype==MODEL_MODIFIEDPROBIT) + # { + # rgy = c(-2,2) + # } + tauc = (rgc[2]-rgc[1])/(2*sqrt(mc)*kc) + tauf = (rgf[2]-rgf[1])/(2*sqrt(mf)*kf) + + #-------------------------------------------------- + overalllambdac = overallsdc^2 + overalllambdaf = overallsdf^2 + #-------------------------------------------------- + powerh=power + baseh=base + if(length(power)>1) { + powerh=power[2] + power=power[1] + } + if(length(base)>1) { + baseh=base[2] + base=base[1] + } + #-------------------------------------------------- + pbdch=pbdc + pbdfh=pbdf + pbh=pb + if(length(pbdc)>1) { + pbdch=pbdc[2] + pbdc=pbdc[1] + } + if(length(pbdf)>1) { + pbdfh=pbdf[2] + pbdf=pbdf[1] + } + if(length(pb)>1) { + pbh=pb[2] + pb=pb[1] + } + cat("Model: Bayesian Regression Tree based Computer Model Calibration\n") + #-------------------------------------------------- + if(modeltype[1]==MODEL_BART) + { + cat("\tEmulator: BART\n") + } + #-------------------------------------------------- + if(modeltype[1]==MODEL_HBART) + { + cat("\tEmulator: Heteroscedastic BART\n") + } + #-------------------------------------------------- + if(modeltype[2]==MODEL_BART) + { + cat("\tDiscrepancy: BART\n") + } + #-------------------------------------------------- + if(modeltype[2]==MODEL_HBART) + { + cat("\tDiscrepancy: Heteroscedastic BART\n") + } + + #-------------------------------------------------- + stepwperth=stepwpert + if(length(stepwpert)>1) { + stepwperth=stepwpert[2] + stepwpert=stepwpert[1] + } + #-------------------------------------------------- + probchvh=probchv + if(length(probchv)>1) { + probchvh=probchv[2] + probchv=probchv[1] + } + #-------------------------------------------------- + minnumboth=minnumbot + if(length(minnumbot)>1) { + minnumboth=minnumbot[2] + minnumbot=minnumbot[1] + } + + #-------------------------------------------------- + #write out config file + xcroot="xc" + ycroot="yc" + scroot="sc" + chgvcroot="chgvc" + xicroot="xic" + xfroot="xf" + yfroot="yf" + sfroot="sf" + xproot="xpred" + chgvfroot="chgvf" + xifroot="xif" + folder=tempdir(check=TRUE) + if(!dir.exists(folder)) dir.create(folder) + tmpsubfolder=tempfile(tmpdir="") + tmpsubfolder=substr(tmpsubfolder,5,nchar(tmpsubfolder)) + tmpsubfolder=paste("openbt",tmpsubfolder,sep="") + folder=paste(folder,"/",tmpsubfolder,sep="") + if(!dir.exists(folder)) dir.create(folder) + fout=file(paste(folder,"/config.calibrate",sep=""),"w") + writeLines(c(paste(modeltype),xcroot,ycroot,xfroot,yfroot,xproot,fmeanc.out,fmeanf.out, + paste(mc),paste(mhc),paste(mf),paste(mhf),paste(nd),paste(burn), + paste(nadapt),paste(adaptevery),paste(tauc),paste(overalllambdac), + paste(overallnuc),paste(tauf),paste(overalllambdaf),paste(overallnuf), + paste(base),paste(power),paste(baseh),paste(powerh), + paste(tc),paste(scroot),paste(sfroot),paste(chgvcroot),paste(chgvfroot), + paste(pb),paste(pbdc),paste(pbdch), + paste(pbdf),paste(pbdfh),paste(stepwpert),paste(stepwperth), + paste(probchv),paste(probchvh),paste(minnumbot),paste(minnumboth), + paste(printevery),paste(xicroot),paste(xifroot),paste(modelname), + paste(as.integer(summarystats)),paste(numcal),paste(caldims-1),paste(calpriortype), + paste(calpriora),paste(calpriorb)),fout) + close(fout) + + + #-------------------------------------------------- + #write out data subsets + nslv=tc-1 + # yc data + yclist=split(yc,(seq(nc)-1) %/% (nc/nslv)) + for(i in 1:nslv) write(yclist[[i]],file=paste(folder,"/",ycroot,i,sep="")) + xclist=split(as.data.frame(xc),(seq(nc)-1) %/% (nc/nslv)) + for(i in 1:nslv) write(t(xclist[[i]]),file=paste(folder,"/",xcroot,i,sep="")) +#stop("done") + + sclist=split(sigmac,(seq(nc)-1) %/% (nc/nslv)) + for(i in 1:nslv) write(sclist[[i]],file=paste(folder,"/",scroot,i,sep="")) + chv[is.na(chv)]=0 # if a var has 0 levels it will have a cor of NA so we'll just set those to 0. + write(chv,file=paste(folder,"/",chgvcroot,sep="")) + for(i in 1:p) write(xic[[i]],file=paste(folder,"/",xicroot,i,sep="")) + # yf data + yflist=split(yf,(seq(nf)-1) %/% (nf/nslv)) + for(i in 1:nslv) write(yflist[[i]],file=paste(folder,"/",yfroot,i,sep="")) + xflist=split(as.data.frame(xf),(seq(nf)-1) %/% (nf/nslv)) + for(i in 1:nslv) write(t(xflist[[i]]),file=paste(folder,"/",xfroot,i,sep="")) + sflist=split(sigmaf,(seq(nf)-1) %/% (nf/nslv)) + for(i in 1:nslv) write(sflist[[i]],file=paste(folder,"/",sfroot,i,sep="")) + write(chv[xdims,xdims],file=paste(folder,"/",chgvfroot,sep="")) + for(i in 1:pdelta) write(xif[[i]],file=paste(folder,"/",xifroot,i,sep="")) + rm(chv) + # xpred + xplist=split(as.data.frame(xpred),(seq(nf)-1) %/% (nf/nslv)) + for(i in 1:nslv) write(t(xplist[[i]]),file=paste(folder,"/",xproot,i,sep="")) + + #-------------------------------------------------- + #run program + cmdopt=100 #default to serial/OpenMP + runlocal=FALSE + cmd="openbtcalibrate --conf" + if(Sys.which("openbtcalibrate")[[1]]=="") # not installed in a global locaiton, so assume current directory + runlocal=TRUE + + if(runlocal) cmd="./openbtcalibrate --conf" + + cmdopt=system(cmd) + + if(cmdopt==101) # MPI + { + cmd=paste("mpirun -np ",tc," openbtcalibrate ",folder,sep="") + } + + if(cmdopt==100) # serial/OpenMP + { + if(runlocal) + cmd=paste("./openbtcalibrate ",folder,sep="") + else + cmd=paste("openbtcalibrate ",folder,sep="") + } + # + system(cmd) + system(paste("rm -f ",folder,"/config.calibrate",sep="")) + + + #-------------------------------------------------- + # load in the theta posterior draws + thetas=scan(paste(folder,"/model.eta.theta.fit",sep=""),quiet=TRUE) + thetas=matrix(thetas,ncol=numcal,byrow=T) + + + res=list() + res$modeltype=modeltype + res$model=model + res$xcroot=xcroot; res$ycroot=ycroot; + #res$xfroot=xfroot; res$yfroot=yfroot; + res$xproot=xproot; + res$mc=mc; res$mhc=mhc; + #res$mf=mf; res$mhf=mhf; + res$p=p; + #res$pdelta=pdelta; + res$nd=nd; res$burn=burn; res$nadapt=nadapt; res$adaptevery=adaptevery; res$tauc=tauc; res$overalllambdac=overalllambdac + res$overallnuc=overallnuc; + #res$tauf=tauf; res$overalllambdaf=overalllambdaf; res$overallnuf=overallnuf; + res$base=base; res$power=power; res$baseh=baseh; res$powerh=powerh + res$k=numx+numcal; + res$tc=tc; res$scroot=scroot; + #res$sfroot=sfroot; + res$chgvcroot=chgvcroot; + #res$chgvfroot=chgvfroot; + res$pb=pb; res$pbdc=pbdc; res$pbdch=pbdch; + #res$pbdf=pbdf; res$pbdfh=pbdfh; + res$stepwpert=stepwpert; res$stepwperth=stepwperth; + res$probchv=probchv; res$probchvh=probchvh; res$minnumbot=minnumbot; res$minnumboth=minnumboth + res$printevery=printevery; res$xicroot=xicroot; + #res$xifroot=xifroot; + res$minxc=minxc; res$maxxc=maxxc; + res$xdims=xdims; res$numcal=numcal; res$caldims=caldims; res$calpriortype=calpriortype; res$calpriora=calpriora; + res$calpriorb=calpriorb; res$summarystats=summarystats; res$modelname=paste(modelname,".eta",sep="") + class(xic)="OpenBT_cutinfo" + class(xif)="OpenBT_cutinfo" + res$xiccuts=xic +# res$xifcuts=xif + res$fmeanc=fmeanc.out; +# res$fmeanf=fmeanf.out; + res$theta=thetas + res$folder=folder + + res$delta$modeltype=modeltype[2] + res$delta$model=model + res$delta$xroot=xfroot + res$delta$yroot=yfroot + res$delta$xproot=xproot + res$delta$k=numx + res$delta$m=mf + res$delta$mh=mhf + res$delta$p=pdelta + res$delta$nd=nd + res$delta$burn=burn + res$delta$nadapt=nadapt + res$delta$adaptevery=adaptevery + res$delta$tau=tauf + res$delta$overalllambda=overalllambdaf + res$delta$overallnu=overallnuf + res$delta$base=base + res$delta$power=power + res$delta$baseh=baseh + res$delta$powerh=powerh + res$delta$chgvroot=chgvfroot + res$delta$pb=pb + res$delta$pbd=pbdf + res$delta$pbdh=pbdfh + res$delta$stepwpert=stepwpert + res$delta$stepwperth=stepwperth + res$delta$probchv=probchv + res$delta$probchvh=probchvh + res$delta$minnumbot=minnumbot + res$delta$minnumboth=minnumboth + res$delta$printevery=printevery + res$delta$xiroot=xifroot + res$delta$summarystats=summarystats + res$delta$modelname=paste(modelname,".delta",sep="") + res$delta$xicuts=xif + res$delta$fmean=fmeanf.out + res$delta$folder=folder + class(res$delta)="OpenBT_posterior" + + + class(res)="OpenBT_calibrate_posterior" + + return(res) +} + + +emulate.openbt = function( +fit=NULL, +x.test=NULL, +tc=2, +fmeanc=fit$fmeanc, +fmeanf=fit$delta$fmean, +q.lower=0.025, +q.upper=0.975 +) +{ + + # model type definitions + MODEL_BT=1 + MODEL_BINOMIAL=2 + MODEL_POISSON=3 + MODEL_BART=4 + MODEL_HBART=5 + MODEL_PROBIT=6 + MODEL_MODIFIEDPROBIT=7 + MODEL_MERCK_TRUNCATED=8 + + #-------------------------------------------------- + # params + if(is.null(fit)) stop("No fitted model specified!\n") + if(is.null(x.test)) stop("No emulation points specified!\n") + if(class(fit)!="OpenBT_calibrate_posterior") stop("Invalid object passed to emulate.openbt()\n") + + xdims=fit$xdims + caldims=fit$caldims + nslv=tc + x.test=as.matrix(x.test) + pdelta=ncol(x.test) + if(pdelta!=fit$delta$p) stop("x.test has incorrect dimensions.\n") + n=nrow(x.test) + p=fit$p + x.eta.test=matrix(0,nrow=n,ncol=p) + x.eta.test[,xdims]=x.test + + # Get discrepancy posterior first + cat("Drawing from discrepancy posterior...\n") + fit.delta=predict.openbt(fit$delta,x.test=x.test,tc=tc,fmean=fmeanf) + + # Get eta posterior next + cat("Drawing from emulator posterior...\n") + xeroot="xe" + xdroot="xd" + + if( (fit$modeltype[1]==MODEL_BART || fit$modeltype[1]==MODEL_HBART) && (fit$modeltype[2]==MODEL_BART || fit$modeltype[2]==MODEL_HBART) ) + { + fmeanc.out=paste(fmeanc) + fmeanf.out=paste(fmeanf) + } + else stop("Invalid model passed to emulate.openbt()\n") + # if(fit$modeltype==MODEL_PROBIT || fit$modeltype==MODEL_MODIFIEDPROBIT) + # { + # fmean.out=paste(qnorm(fmean)) + # } + + + #-------------------------------------------------- + #write out config file + fout=file(paste(fit$folder,"/config.emulate",sep=""),"w") + writeLines(c(paste(fit$modelname),paste(fit$modeltype),fit$xicroot, + xeroot,paste(fit$nd),paste(fit$mc), + paste(fit$mhc),paste(p),paste(pdelta), + paste(fit$numcal),paste(caldims-1),paste(tc), + fmeanc.out), fout) + close(fout) + + + + #-------------------------------------------------- + #write out data subsets + #folder=paste(".",fit$modelname,"/",sep="") + xelist=split(as.data.frame(x.eta.test),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(xelist[[i]]),file=paste(fit$folder,"/",xeroot,i-1,sep="")) + xdlist=split(as.data.frame(x.test),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(xdlist[[i]]),file=paste(fit$folder,"/",xdroot,i-1,sep="")) + for(i in 1:p) write(fit$xiccuts[[i]],file=paste(fit$folder,"/",fit$xicroot,i,sep="")) + for(i in 1:pdelta) write(fit$xifcuts[[i]],file=paste(fit$folder,"/",fit$xifroot,i,sep="")) + + + #-------------------------------------------------- + #run prediction program + cmdopt=100 #default to serial/OpenMP + runlocal=FALSE + cmd="openbtcli --conf" + if(Sys.which("openbtcli")[[1]]=="") # not installed in a global location, so assume current directory + runlocal=TRUE + + if(runlocal) cmd="./openbtcli --conf" + + cmdopt=system(cmd) + + if(cmdopt==101) # MPI + { + cmd=paste("mpirun -np ",tc," openbtemulate ",fit$folder,sep="") + } + + if(cmdopt==100) # serial/OpenMP + { + if(runlocal) + cmd=paste("./openbtemulate ",fit$folder,sep="") + else + cmd=paste("openbtemulate ",fit$folder,sep="") + } + + #cmd=paste("mpirun -np ",tc," openbtpred",sep="") + #cat(cmd) + system(cmd) + system(paste("rm -f ",fit$folder,"/config.emulate",sep="")) + + +#-------------------------------------------------- +#format and return +res=list() + + +# Read in eta posterior +fnames=list.files(fit$folder,pattern=paste(fit$modelname,".mdraws*",sep=""),full.names=TRUE) +res$mdraws=as.matrix(do.call(cbind,sapply(fnames,data.table::fread))) +fnames=list.files(fit$folder,pattern=paste(fit$modelname,".sdraws*",sep=""),full.names=TRUE) +res$sdraws=as.matrix(do.call(cbind,sapply(fnames,data.table::fread))) + +system(paste("rm -f ",fit$folder,"/",fit$modelname,".mdraws*",sep="")) +system(paste("rm -f ",fit$folder,"/",fit$modelname,".sdraws*",sep="")) + +res$mmean=apply(res$mdraws,2,mean) +res$smean=apply(res$sdraws,2,mean) +res$msd=apply(res$mdraws,2,sd) +res$ssd=apply(res$sdraws,2,sd) +res$m.5=apply(res$mdraws,2,quantile,0.5) +res$m.lower=apply(res$mdraws,2,quantile,q.lower) +res$m.upper=apply(res$mdraws,2,quantile,q.upper) +res$s.5=apply(res$sdraws,2,quantile,0.5) +res$s.lower=apply(res$sdraws,2,quantile,q.lower) +res$s.upper=apply(res$sdraws,2,quantile,q.upper) +res$pdraws=NULL +res$pmean=NULL +res$psd=NULL +res$p.5=NULL +res$p.lower=NULL +res$p.upper=NULL +res$zdraws=res$mdraws+fit.delta$mdraws +res$zmean=apply(res$zdraws,2,mean) +res$zsd=apply(res$zdraws,2,sd) +res$z.5=apply(res$zdraws,2,quantile,0.5) +res$z.lower=apply(res$zdraws,2,quantile,q.lower) +res$z.upper=apply(res$zdraws,2,quantile,q.upper) + +if(fit$modeltype==MODEL_PROBIT || fit$modeltype==MODEL_MODIFIEDPROBIT) +{ + res$pdraws=read.table(paste(fit$folder,"/",fit$modelname,".pdraws",0,sep="")) + for(i in 2:nslv) + { + res$pdraws=cbind(res$pdraws,read.table(paste(fit$folder,"/",fit$modelname,".pdraws",i-1,sep=""))) + } + + system(paste("rm -f ",fit$folder,"/",fit$modelname,".pdraws*",sep="")) + + res$pdraws=as.matrix(res$pdraws) + res$pmean=apply(res$pdraws,2,mean) + res$psd=apply(res$pdraws,2,sd) + res$p.5=apply(res$pdraws,2,quantile,0.5) + res$p.lower=apply(res$pdraws,2,quantile,q.lower) + res$p.upper=apply(res$pdraws,2,quantile,q.upper) +} + +res$q.lower=q.lower +res$q.upper=q.upper +res$modeltype=fit$modeltype + +res$delta=fit.delta + +class(res)="OpenBT_emulate" + +return(res) +} + + + +# Scan the trees in the posterior to extract tree properties +# Returns the mean trees as a list of lists in object mt +# and the variance trees as a list of lists in object st. +# The format is mt[[i]][[j]] is the jth posterior tree from the ith posterior +# sum-of-trees (ensemble) sample. +# The tree is encoded in 4 vectors - the node ids, the node variables, +# the node cutpoints and the node thetas. +openbt.scanpost<-function(post) +{ + fp=file(paste(post$folder,"/",post$modelname,".fit",sep=""),open="r") + if(scan(fp,what=integer(),nmax=1,quiet=TRUE) != post$nd) stop("Error scanning posterior\n") + if(scan(fp,what=integer(),nmax=1,quiet=TRUE) != post$m) stop("Error scanning posterior\n") + if(scan(fp,what=integer(),nmax=1,quiet=TRUE) != post$mh) stop("Error scanning posterior\n") + if(scan(fp,what=integer(),nmax=1,quiet=TRUE) != post$nd*post$m) stop("Error scanning posterior\n") + + # scan mean trees + numnodes=scan(fp,what=integer(),nmax=post$nd*post$m,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + ids=scan(fp,what=integer(),nmax=lenvec,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + vars=scan(fp,what=integer(),nmax=lenvec,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + cuts=scan(fp,what=integer(),nmax=lenvec,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + thetas=scan(fp,what=double(),nmax=lenvec,quiet=TRUE) + + # scan var trees + if(scan(fp,what=integer(),nmax=1,quiet=TRUE) != post$nd*post$mh) stop("Error scanning posterior\n") + snumnodes=scan(fp,what=integer(),nmax=post$nd*post$mh,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + sids=scan(fp,what=integer(),nmax=lenvec,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + svars=scan(fp,what=integer(),nmax=lenvec,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + scuts=scan(fp,what=integer(),nmax=lenvec,quiet=TRUE) + lenvec=scan(fp,what=integer(),nmax=1,quiet=TRUE) + sthetas=scan(fp,what=double(),nmax=lenvec,quiet=TRUE) + + close(fp) + + # Now rearrange things into lists of lists so its easier to manipulate + mt=list() + ndx=2 + cs.numnodes=c(0,cumsum(numnodes)) + for(i in 1:post$nd) { + ens=list() + for(j in 1:post$m) + { + tree=list(id=ids[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]], + var=vars[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]], + cut=cuts[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]], + theta=thetas[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]]) + ens[[j]]=tree + ndx=ndx+1 + } + mt[[i]]=ens + } + + + st=list() + ndx=2 + cs.numnodes=c(0,cumsum(snumnodes)) + for(i in 1:post$nd) { + ens=list() + for(j in 1:post$mh) + { + tree=list(id=sids[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]], + var=svars[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]], + cut=scuts[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]], + theta=sthetas[(cs.numnodes[ndx-1]+1):cs.numnodes[ndx]]) + ens[[j]]=tree + ndx=ndx+1 + } + st[[i]]=ens + } + + return(list(mt=mt,st=st)) +} + + + +# Save a posterior tree fit from the tmp working directory +# into a local zip file given by [file].zip +# If not file option specified, uses [model name].zip as the file. +openbt.save<-function(post,fname=NULL) +{ + if(class(post)!="OpenBT_posterior") stop("Invalid object.\n") + + if(is.null(fname)) fname=post$modelname + if(substr(fname,nchar(fname)-3,nchar(fname))!=".obt") fname=paste(fname,".obt",sep="") + + save(post,file=paste(post$folder,"/post.RData",sep="")) + + zipr(fname,paste(post$folder,"/",list.files(post$folder),sep="")) + cat("Saved posterior to ",fname,"\n") +} + + +# Load a posterior tree fit from a zip file into a tmp working directory. +openbt.load<-function(fname) +{ + if(substr(fname,nchar(fname)-3,nchar(fname))!=".obt") fname=paste(fname,".obt",sep="") + + folder=tempdir(check=TRUE) + if(!dir.exists(folder)) dir.create(folder) + tmpsubfolder=tempfile(tmpdir="") + tmpsubfolder=substr(tmpsubfolder,5,nchar(tmpsubfolder)) + tmpsubfolder=paste("openbt",tmpsubfolder,sep="") + folder=paste(folder,"/",tmpsubfolder,sep="") + if(!dir.exists(folder)) dir.create(folder) + + unzip(fname,exdir=folder) + post=loadRData(paste(folder,"/post.RData",sep="")) + post$folder=folder + + return(post) +} + + +# This is so lame. Seriously. +loadRData <- function(fname) +{ + load(fname) + get(ls()[ls() != "fname"]) +} + + +print.OpenBT_posterior<-function(post) +{ + MODEL_BT=1 + MODEL_BINOMIAL=2 + MODEL_POISSON=3 + MODEL_BART=4 + MODEL_HBART=5 + MODEL_PROBIT=6 + MODEL_MODIFIEDPROBIT=7 + MODEL_MERCK_TRUNCATED=8 + + cat("OpenBT Posterior\n") + cat("Model type: ") + if(post$modeltype==MODEL_BART) + { + cat("Bayesian Additive Regression Trees model (BART)\n") + cat("k=",post$k,"\n") + cat("tau=",post$tau,"\n") + cat("lambda=",post$overalllambda,"\n") + cat("nu=",post$overallnu,"\n") + } + if(post$modeltype==MODEL_HBART) + { + cat("Heteroscedastic Bayesian Additive Regression Trees model (HBART)\n") + } + if(post$modeltype==MODEL_PROBIT) + { + cat("Dichotomous outcome model: Albert & Chib Probit (fixed)\n") + } + if(post$modeltype==MODEL_MODIFIEDPROBIT) + { + cat("Dichotomous outcome model: Modified Albert & Chib Probit\n") + } + if(post$modeltype==MODEL_MERCK_TRUNCATED) + { + cat("Truncated BART model\n") + } + + cat("ntree=", post$m, " \n",sep="") + cat("ntreeh=",post$mh," \n",sep="") + cat(post$nd," posterior draws.\n") + summary(post$xicuts) +} + +summary.OpenBT_posterior<-function(post) +{ + cat("No summary method for object.\n") +} + +print.OpenBT_predict<-function(pred) +{ + MODEL_BT=1 + MODEL_BINOMIAL=2 + MODEL_POISSON=3 + MODEL_BART=4 + MODEL_HBART=5 + MODEL_PROBIT=6 + MODEL_MODIFIEDPROBIT=7 + MODEL_MERCK_TRUNCATED=8 + + cat("OpenBT Prediction\n") + cat(ncol(pred$mdraws), " prediction locations.\n") + cat(nrow(pred$mdraws), " realizations.\n") + if(pred$modeltype==MODEL_PROBIT || pred$modeltype==MODEL_MODIFIEDPROBIT) + { + cat("Probability quantiles: ",pred$q.lower,",",pred$q.upper,"\n") + } + cat("Mean quantiles: ",pred$q.lower,",",pred$q.upper,"\n") + cat("Variance quantiles: ",pred$q.lower,",",pred$q.upper,"\n\n") +} + +summary.OpenBT_predict<-function(pred) +{ + cat("No summary method for object.\n") +} + +print.OpenBT_sobol<-function(sobol) +{ + cat("OpenBT Sobol Indices\n") + cat(ncol(sobol$vidraws), " variables.\n") + cat(nrow(sobol$vidraws), " realizations.\n") +} + +summary.OpenBT_sobol<-function(sobol) +{ + cat("Summary of Posterior Sobol Sensitivity Indices\n") + + cat("Expected Sobol Indices (Mean)\n") + print(sobol$msi) + print(sobol$mtsi) + print(sobol$msij) + + cat("\nStd. Dev. of Sobol Indices (Mean)\n") + print(sobol$msi.sd) + print(sobol$tsi.sd) + print(sobol$sij.sd) +} + +plot.OpenBT_sobol<-function(sobol) +{ + par(mfrow=c(3,1)) + boxplot(sobol$sidraws,ylab="Sobol Sensitivity",main="First Order Sobol Indices",xlab="Variables") + boxplot(sobol$sijdraws,ylab="Sobol Sensitivity",main="Two-way Sobol Indices",xlab="Variables") + boxplot(sobol$tsidraws,ylab="Sobol Sensitivity",main="Total Sobol Indices",xlab="Variables") +} + +print.OpenBT_vartivity<-function(vartivity) +{ + cat("OpenBT Variable Activity\n") + cat(ncol(vartivity$vdraws), " variables.\n") + cat(nrow(vartivity$vdraws), " realizations.\n") +} + +summary.OpenBT_vartivity<-function(vartivity) +{ + cat("Summary of Posterior Variable Activity\n") + + p=length(vartivity$mvdraws) + if(p<11) + { + cat("Expected Variable Activity (Mean)\n") + mean.vartivity=vartivity$mvdraws + ix=sort(mean.vartivity,index.return=TRUE,decreasing=TRUE)$ix + mean.vartivity=round(mean.vartivity[ix],2) + names(mean.vartivity)=ix + print(mean.vartivity) + cat("Expected Variable Activity (Variance)\n") + sd.vartivity=vartivity$mvdrawsh + ix=sort(sd.vartivity,index.return=TRUE,decreasing=TRUE)$ix + sd.vartivity=round(sd.vartivity[ix],2) + names(sd.vartivity)=ix + print(sd.vartivity) + } + else + { + cat("Expected Variable Activity (Mean)\n") + mean.vartivity=vartivity$mvdraws + ix=sort(mean.vartivity,index.return=TRUE,decreasing=TRUE)$ix + mean.vartivity=round(mean.vartivity[ix],2) + rest=sum(mean.vartivity[11:p]) + mean.vartivity=mean.vartivity[1:11] + mean.vartivity[11]=rest + names(mean.vartivity)=c(ix[1:10],"...") + print(mean.vartivity) + cat("Expected Variable Activity (Variance)\n") + sd.vartivity=vartivity$mvdrawsh + ix=sort(sd.vartivity,index.return=TRUE,decreasing=TRUE)$ix + sd.vartivity=round(sd.vartivity[ix],2) + rest=sum(sd.vartivity[11:p]) + sd.vartivity=sd.vartivity[1:11] + sd.vartivity[11]=rest + names(sd.vartivity)=c(ix[1:10],"...") + print(sd.vartivity) + } +} + +plot.OpenBT_vartivity<-function(vartivity) +{ + par(mfrow=c(1,2)) + yrange=c(0,max(max(vartivity$vdraws),max(vartivity$vdrawsh))) + boxplot(vartivity$vdraws,ylab="% node splits",main="Mean Trees",xlab="Variables",ylim=yrange) + boxplot(vartivity$vdrawsh,ylab="% node splits",main="Variance Trees",xlab="Variables",ylim=yrange) +} + +summary.OpenBT_cutinfo<-function(xi) +{ + p=length(xi) + cat("Number of variables: ",p,"\n") + cat("Number of cutpoints per variable\n") + for(i in 1:p) + { + cat("Variable ",i,": ",length(xi[[i]])," cutpoints\n") + } +} + +print.OpenBT_cutinfo<-function(xi) +{ + summary.OpenBT_cutinfo(xi) +} + +# Takes the n x p design matrix and a scalar or vector of number of cutpoints +# per variable, returns a BARTcutinfo object with variables/cutpoints initalized. +makecuts.openbt<-function(x,numcuts) +{ + p=ncol(x) + if(length(numcuts)==1) + { + numcuts=rep(numcuts,p) + } + else if(ncol(x) != length(numcuts)) + { + cat("Number of variables does not equal length of numcuts vector!\n") + return(0) + } + + xi=vector("list",p) + minx=apply(x,2,min) + maxx=apply(x,2,max) + for(i in 1:p) + { + xinc=(maxx[i]-minx[i])/(numcuts[i]+1) + xi[[i]]=(1:numcuts[i])*xinc+minx[i] + } + + class(xi)="OpenBT_cutinfo" + return(xi) +} + +# Takes an existing OpenBT_cutinfo object xi and the particular variable to update, id, +# and the vector of cutpoints to manually assign to that variable, updates and returns +# the modified OpenBT_cutinfo object. +setvarcuts.openbt<-function(xi,id,cutvec) +{ + p=length(xi) + if(id>p || id<1) + { + cat("Invalid variable specified\n") + return(0) + } + xi[[id]]=cutvec + return(xi) +} + + +#-------------------------------------------------- +#Get model mixing weights +#-------------------------------------------------- +mixingwts.openbt = function( + fit=NULL, + x.test=NULL, + tc=2, + numwts=NULL, + q.lower=0.025, + q.upper=0.975 +) +{ + # model type definitions + MODEL_BT=1 + MODEL_BINOMIAL=2 + MODEL_POISSON=3 + MODEL_BART=4 + MODEL_HBART=5 + MODEL_PROBIT=6 + MODEL_MODIFIEDPROBIT=7 + MODEL_MERCK_TRUNCATED=8 + MODEL_MIXBART=9 + + model_types = c("mixbart", "mix_emulate") + #-------------------------------------------------- + # params + if(is.null(fit)) stop("No fitted model specified!\n") + if(is.null(x.test)) stop("No prediction points specified!\n") + if(is.null(numwts)){stop("Missing number of model weights parameter numwts.\n")} + if(!(fit$model %in% model_types)){stop("Wrong model type! This function is for mixbart.\n")} + #if(fit$modeltype!=MODEL_MIXBART){stop("Wrong model type! This function is for mixbart.\n")} + + nslv=tc + x.test=as.matrix(x.test) + p=ncol(x.test) + n=nrow(x.test) + xwroot="xw" + fitroot=".fit" + if(fit$model == "mixbart"){m = fit$m; mh = fit$mh}else{m = fit$mix_model_args$ntree; mh = fit$mix_model_args$ntreeh} + if(fit$model != "mixbart"){fitroot = ".fitmix"} + #-------------------------------------------------- + #write out config file + fout=file(paste(fit$folder,"/config.mxwts",sep=""),"w") + writeLines(c(fit$modelname,fit$modeltype,fit$xiroot,xwroot, + fitroot,paste(fit$nd),paste(m), + paste(mh),paste(p),paste(numwts),paste(tc)), fout) + close(fout) + + #-------------------------------------------------- + #write out data subsets + #folder=paste(".",fit$modelname,"/",sep="") + xlist=split(as.data.frame(x.test),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(xlist[[i]]),file=paste(fit$folder,"/",xwroot,i-1,sep="")) + for(i in 1:p) write(fit$xicuts[[i]],file=paste(fit$folder,"/",fit$xiroot,i,sep="")) + + #-------------------------------------------------- + #run prediction program + cmdopt=100 #default to serial/OpenMP + runlocal=FALSE + cmd="openbtcli --conf" + if(Sys.which("openbtcli")[[1]]=="") # not installed in a global location, so assume current directory + runlocal=TRUE + + if(runlocal) cmd="./openbtcli --conf" + + cmdopt=system(cmd) + + if(cmdopt==101) # MPI + { + cmd=paste("mpirun -np ",tc," openbtmixingwts ",fit$folder,sep="") + } + + if(cmdopt==100) # serial/OpenMP + { + if(runlocal) + cmd=paste("./openbtmixingwts ",fit$folder,sep="") + else + cmd=paste("openbtmixingwts ",fit$folder,sep="") + } + + system(cmd) + system(paste("rm -f ",fit$folder,"/config.mxwts",sep="")) + + + #-------------------------------------------------- + #format and return + res=list() + wt_list = list() + mean_matrix = sd_matrix = ub_matrix = lb_matrix = med_matrix = matrix(0, nrow = n, ncol = numwts) + + #Get the file names for the model weights + #--file name for model weight j using processor p is ".wjdrawsp" + for(j in 1:numwts){ + #Get the files for weight j + tagname = paste0(".w", j,"draws*") + fnames=list.files(fit$folder,pattern=paste(fit$modelname,tagname,sep=""),full.names=TRUE) + + #Bind the posteriors for weight j across all x points -- npost X n data + wt_list[[j]] = do.call(cbind,sapply(fnames,data.table::fread)) + + #Now populate the summary stat matrices -- n X k matrices + mean_matrix[,j] = apply(wt_list[[j]], 2, mean) + sd_matrix[,j] = apply(wt_list[[j]], 2, sd) + med_matrix[,j] = apply(wt_list[[j]], 2, median) + lb_matrix[,j] = apply(wt_list[[j]], 2, quantile,q.lower) + ub_matrix[,j] = apply(wt_list[[j]], 2, quantile,q.upper) + } + + #Save the list of posterior draws -- each list element is an npost X n dataframe + res$wdraws = wt_list + + #Get model mixing results + res$wmean=mean_matrix + res$wsd=sd_matrix + res$w.5=med_matrix + res$w.lower=lb_matrix + res$w.upper=ub_matrix + + res$q.lower=q.lower + res$q.upper=q.upper + res$modeltype=fit$modeltype + + class(res)="OpenBT_mixingwts" + return(res) +} + + +#---------------------------------------------- +# Model mixing and emulation interface +#---------------------------------------------- +openbt.mix_emulate = function( +mix_model_data = list(y_train = NULL, x_train = NULL), +emu_model_data = list('model1'=list(z_train = NULL, x_train = NULL),'model2'=list(z_train = NULL, x_train = NULL)), +mix_model_args = list('ntree'=10,'ntreeh'=1,'k'=2,'overallnu'=10,'overallsd'=NA,'power'=2.0,'base'=0.95,'powerh'=NA,'baseh'=NA), +emu_model_args = matrix(c(100,1,2,10,NA,2.0,0.95,NA,NA), nrow = 2, ncol = 9,byrow = TRUE, + dimnames = list(paste0('model',1:2),c('ntree','ntreeh','k','overallnu','overallsd','power','base','powerh','baseh'))), +discrep_model_args = list('k' = 10, 'beta0' = NA), +ndpost=1000, nskip=100,nadapt=1000,adaptevery=100, +pbd=.7, +pb=.5, +stepwpert=.1, +probchv=.1, +minnumbot=5, +tc=2, +printevery=100, +numcut=100, +xicuts=NULL, +sigmav_list=NULL, +chv_list = NULL, +summarystats = FALSE, +model=NULL, +modelname="model" +#sigmav=rep(1,length(y.train)), +#chv = cor(x.train,method="spearman"), +) +{ +#-------------------------------------------------- +# model type definitions and check default arguments +modeltype=0 +MODEL_MIX_EMULATE=1 #Full problem +MODEL_MIX_ORTHOGONAL=2 # Full problem with orthogonal discrepancy + +modeltype_list = c('mix_emulate','mix_orthogonal') + +if(is.null(model)){ + stop(cat("Enter a model type. Valid types include:\n", paste(modeltype_list,collapse = ', '))) +} + +if(!(model %in% modeltype_list)){ + stop(cat("Invalid model type. Valid types include:\n", paste(modeltype_list,collapse = ', '))) +} + +if(model == 'mix_emulate'){ + modeltype = MODEL_MIX_EMULATE + if(is.null(mix_model_args$overallsd)) ntreeh=1 +} + +#-------------------------------------------------- +# Define terms +nd = ndpost +burn = nskip +nummodels = length(emu_model_data) + +#-------------------------------------------------- +# Process the data +# Model mixing data +if(is.null(mix_model_data$x_train) || is.null(mix_model_data$y_train)){ + stop('Invalid model mixing inputs. Either x_train or y_train is NULL.') +} + +# Cast x_train data to matrix +if(!is.matrix(mix_model_data$x_train)){ + mix_model_data$x_train = matrix(is.null(mix_model_data$x_train), ncol = 1) +} + +# Define terms +n = nrow(mix_model_data$x_train) +p = ncol(mix_model_data$x_train) +ymean = mean(mix_model_data$y_train) + +# Emulation data +nc_vec = 0 +pc_vec = 0 +for(l in 1:nummodels){ + + if(is.null(emu_model_data[[l]]$x_train) || is.null(emu_model_data[[l]]$z_train)){ + stop(paste0('Invalid emulation inputs. Either x_train or y_train is NULL in model ', l,'.')) + } + + # Cast the x_train data to matrix + if(!is.matrix(mix_model_data$x_train)){ + emu_model_data[[l]]$x_train = matrix(is.null(emu_model_data[[l]]$x_train), ncol = 1) + } + + # Define terms + nc_vec[l] = nrow(emu_model_data[[l]]$x_train) + pc_vec[l] = ncol(emu_model_data[[l]]$x_train) + +} + +# Get means of the model runs and mean center the data +zmean_vec = 0 +for(l in 1:nummodels){ + zmean_vec[l] = mean(emu_model_data[[l]]$z_train) + emu_model_data[[l]]$z_train = emu_model_data[[l]]$z_train - zmean_vec[l] +} + +#-------------------------------------------------- +# cutpoints +# get joint input matrix +col_list1 = colnames(mix_model_data$x_train) +col_list2 = unique(unlist(lapply(emu_model_data, function(x) colnames(x$x_train)))) + +if(sum(sort(col_list1) != sort(col_list2))>0){ + stop(cat("The input variables in model mixing differ from the input variables in emulation.", + "\nModel Mixing Inputs: \n", col_list1, + "\nAll Emulation Inputs: \n", col_list2)) +} + +# Create a master list of inputs -- includes all from field obs and computer models +x_matrix = mix_model_data$x_train +xc_col_list = list() +for(l in 1:nummodels){ + # get matrix and Fill columns that are not present with NA + x_new = matrix(0, nrow = nc_vec[l], ncol = p, dimnames = list(NULL, col_list1)) + xc_cols = 0 + ind = 1 + for(j in 1:length(col_list1)){ + cname = col_list1[j] + if(cname %in% colnames(emu_model_data[[l]]$x_train)){ + # Add the corresponding data to the x_matrix + x_new[,cname] = emu_model_data[[l]]$x_train[,cname] + # Keep track of which columns are in the inputs for the lth computer model + xc_cols[ind] = j + ind = ind + 1 + }else{ + # If the predictor is not in the input space of lth computer model, then just use NA in the x_matrix + x_new[,cname] = rep(NA, nc_vec[l]) + } + } + xc_col_list = append(xc_col_list, list(xc_cols)) + x_matrix = rbind(x_matrix, x_new) +} +names(xc_col_list) = paste0('model',1:nummodels) + +# Use x_matrix to create cutpoints +if(!is.null(xicuts)){ + xi=xicuts +}else{ + xi=vector("list",p) + minx_temp=apply(t(x_matrix),1,function(x) min(x, na.rm = TRUE)) + maxx_temp=apply(t(x_matrix),1,function(x) max(x, na.rm = TRUE)) + + maxx = round(maxx_temp,1) + ifelse((round(maxx_temp,1)-maxx_temp)>0,0,0.1) + minx = round(minx_temp,1) - ifelse((minx_temp - round(minx_temp,1))>0,0,0.1) + for(i in 1:p){ + xinc=(maxx[i]-minx[i])/(numcut+1) + xi[[i]]=(1:numcut)*xinc+minx[i] + } +} + +#------------------------------------------------ +# Priors +#------------------------------------------------ +# Model Mixing +# -- Terminal node parameters +rgy = range(mix_model_data$y_train) +m = mix_model_args$ntree +k = mix_model_args$k + +tau_disc = (rgy[2]-rgy[1])/(2*sqrt(m)*discrep_model_args$k) +tau_wts = (1)/(2*sqrt(m)*k) +beta_wts = 1/(2*m) +if(is.na(discrep_model_args$beta0)){ + beta_disc = mean(mix_model_data$y_train)/m +}else{ + beta_disc = discrep_model_args$beta0/m +} + +# -- Error variance +if(is.na(mix_model_args$overallsd)){ + mix_model_args$overallsd = sd(mix_model_data$y_train) +} +overalllambda_mix = mix_model_args$overallsd^2 +overallnu_mix = mix_model_args$overallnu + +# -- Tree prior (mean and variance trees) +power_mix = mix_model_args$power +base_mix = mix_model_args$base +powerh_mix = mix_model_args$powerh +baseh_mix = mix_model_args$baseh + +if(is.na(powerh_mix)){ + powerh_mix=power_mix +} +if(is.na(baseh_mix)){ + baseh_mix=base_mix +} + +# Emulation with BART +tau_emu = 0 +base_emu = baseh_emu = 0 +power_emu = powerh_emu = 0 +overalllambda_emu = overallnu_emu = 0 +for(l in 1:nummodels){ + # -- Terminal node parameters + m = emu_model_args[l,'ntree'] + k = emu_model_args[l,'k'] + rgz = range(emu_model_data[[l]]$z_train) + tau_emu[l] = (rgz[2]-rgz[1])/(2*sqrt(m)*k) + + # -- Error variance + if(is.na(emu_model_args[l,'overallsd'])){ + emu_model_args[l,'overallsd'] = sd(emu_model_data[[l]]$z_train) + } + overalllambda_emu[l] = emu_model_args[l,'overallsd']^2 + overallnu_emu[l] = emu_model_args[l,'overallnu'] + + # -- Tree prior (mean and variance trees) + power_emu[l] = emu_model_args[l,'power'] + base_emu[l] = emu_model_args[l,'base'] + powerh_emu[l] = emu_model_args[l,'powerh'] + baseh_emu[l] = emu_model_args[l,'baseh'] + + if(is.na(powerh_emu[l])) { + powerh_emu[l] = power_emu[l] + } + if(is.na(baseh_emu[l])) { + baseh_emu[l]=base_emu[l] + } +} + +#-------------------------------------------------- +# Birth and death probability +pbdh=pbd +pbh=pb +if(length(pbd)>1) { + pbdh=pbd[2] + pbd=pbd[1] +} +if(length(pb)>1) { + pbh=pb[2] + pb=pb[1] +} + + +#-------------------------------------------------- +# Process other arguments +#-------------------------------------------------- +stepwperth=stepwpert +if(length(stepwpert)>1) { + stepwperth=stepwpert[2] + stepwpert=stepwpert[1] +} + +probchvh=probchv +if(length(probchv)>1) { + probchvh=probchv[2] + probchv=probchv[1] +} + +minnumboth=minnumbot +if(length(minnumbot)>1) { + minnumboth=minnumbot[2] + minnumbot=minnumbot[1] +} + +# Set default argument for the sigmavs +if(is.null(sigmav_list)){ + for(l in 0:nummodels){ + sigmav_n = ifelse(l == 0, n, nc_vec[l]+n) + sigmav_list[[l+1]] = rep(1,sigmav_n) + } +}else{ + if(!is.list(sigmav_list)){ + stop("Invalid data type: sigmav_list must be a list object with length K+1.") + } + for(l in 0:nummodels){ + sigmav_n = ifelse(l == 0, n, nc_vec[l]+n) + if(length(sigmav_list[[l+1]]) != sigmav_n){ + stop(cat("Invalid sigmav_list entry at item",l+1,"\n Length of sigmav_list[[l]] is incorrect. Required lengths are listed below: \n", + "\t Mixing (l=1): ", n, " (number of field obs) \n", + "\t Emulation (l>1): ", n + nc_vec[l+1], " (number of field obs + number of model runs))")) + } + + } +} + +# Set the default argument for the chvs +if(is.null(chv_list)){ + for(l in 0:nummodels){ + if(l == 0){ + chv_data = mix_model_data$x_train + }else{ + xmixtemp = mix_model_data$x_train[,unlist(xc_col_list[l])] + if(length(unlist(xc_col_list[l])) == 1){xmixtemp = matrix(xmixtemp,ncol = 1)} + chv_data = rbind(emu_model_data[[l]]$x_train, xmixtemp) + } + chv_list[[l+1]] = cor(chv_data,method="spearman") + } +}else{ + if(!is.list(chv_list)){ + stop("Invalid data type: chv_list must be a list object with length K+1.") + } + for(l in 0:nummodels){ + chv_p = ifelse(l == 0, p, pc_vec[l]) + if(chv_list[[l+1]] != chv_p){ + stop(paste0("Invalid chv_list entry at item k = ",l+1,". Number of rows & columns in chv_list[[k]] must match", + " the number of columns in the design matrix for emulator k-1. If k = 1, then the number of rows & columns in", + " chv_list[[1]] must match the number of columns in the design matrix for the field observations.")) + } + } +} + +#-------------------------------------------------- +# Print statements +#-------------------------------------------------- +if(modeltype==MODEL_MIX_EMULATE){ + cat("Model: KOH Bayesian Additive Model Mixing Trees\n") +} +if(modeltype==MODEL_MIX_ORTHOGONAL){ + cat("Model: Orthogonal Bayesian Additive Model Mixing Trees\n") +} + +#-------------------------------------------------- +#write out config file +#-------------------------------------------------- +# Field data roots +yroots="yf" +xroots=c("xf",paste0("xc",1:nummodels)) +zroots=paste0("zc",1:nummodels) +sroots=c("sf",paste0("sc",1:nummodels)) +chgvroots=c("chgvf",paste0("chgvc",1:nummodels)) +idroots="id" + +# cut points root +xiroot="xi" + +# Design column numbers for computer models +xc_design_cols = 0 +h = 1 +# Format -- number of cols for model 1, col numbers for model 1,...,number of cols for model K, col numbers for model K +for(l in 1:nummodels){ + xc_design_cols[h] = paste(pc_vec[l]) + xc_design_cols[(h+1):(h+pc_vec[l])] = paste(xc_col_list[[l]]) + h = h+pc_vec[l] + 1 +} + +# Flatten the model arguments +m_vec = paste(c(mix_model_args$ntree, emu_model_args[,'ntree'])) +mh_vec = paste(c(mix_model_args$ntreeh, emu_model_args[,'ntreeh'])) +tau_vec = paste(c(tau_disc, tau_wts, tau_emu)) #Two mixing parameters, K emu parameters +beta_vec = paste(c(beta_disc, beta_wts)) #Only mixing parameters +base_vec = paste(c(base_mix, base_emu)) +baseh_vec = paste(c(baseh_mix, baseh_emu)) +power_vec = paste(c(power_mix, power_emu)) +powerh_vec = paste(c(powerh_mix, powerh_emu)) +lambda_vec = paste(c(overalllambda_mix, overalllambda_emu)) +nu_vec = paste(c(overallnu_mix, overallnu_emu)) +means_vec = c(ymean,zmean_vec) + +# Bind all of the K+1 dimensional vectors together into one matrix +# Then flatten by row -- this makes things easier when reading in the data in c++ +# This way, we read in all data for mixing, then read in the data for each emulator one at a time +info_matrix = data.frame(xroots, yzroots = c(yroots,zroots),sroots,chgvroots,means = paste(means_vec), + m_vec, mh_vec, base_vec, baseh_vec, power_vec, powerh_vec, lambda_vec, nu_vec) + +info_vec = unlist(as.vector(t(info_matrix))) + +# Creating temp directory and config file +folder=tempdir(check=TRUE) +if(!dir.exists(folder)) dir.create(folder) +tmpsubfolder=tempfile(tmpdir="") +tmpsubfolder=substr(tmpsubfolder,5,nchar(tmpsubfolder)) +tmpsubfolder=paste("openbt",tmpsubfolder,sep="") +folder=paste(folder,"/",tmpsubfolder,sep="") +if(!dir.exists(folder)) dir.create(folder) +fout=file(paste(folder,"/config",sep=""),"w") + +# Write lines to the config file fout +# writeLines(c(paste(modeltype),paste(nummodels), +# xroots,yroot,zroots,sroots,chgvroots,paste(zmean_vec),xc_design_cols, +# paste(nd),paste(burn),paste(nadapt),paste(adaptevery), +# m_vec,mh_vec,tau_vec,beta_vec,lambda_vec,nu_vec,base_vec,power_vec,baseh_vec,powerh_vec, +# paste(pbd),paste(pb),paste(pbdh),paste(pbh),paste(stepwpert),paste(stepwperth), +# paste(probchv),paste(probchvh),paste(minnumbot),paste(minnumboth), +# paste(printevery),paste(xiroot),paste(tc),paste(modelname),paste(summarystats)),fout) + +writeLines(c(paste(modeltype),paste(nummodels),info_vec,xc_design_cols, + paste(nd),paste(burn),paste(nadapt),paste(adaptevery), + tau_vec,beta_vec, + paste(pbd),paste(pb),paste(pbdh),paste(pbh),paste(stepwpert),paste(stepwperth), + paste(probchv),paste(probchvh),paste(minnumbot),paste(minnumboth), + paste(printevery),paste(xiroot),paste(tc),paste(modelname),paste(summarystats)),fout) +close(fout) + +#-------------------------------------------------- +#write out data subsets +#-------------------------------------------------- +nslv=tc-1 +# Model Mixing data +ylist=split(mix_model_data$y_train,(seq(n)-1) %/% (n/nslv)) +for(i in 1:nslv) write(ylist[[i]],file=paste(folder,"/",yroots,i,sep="")) + +xlist=split(as.data.frame(mix_model_data$x_train),(seq(n)-1) %/% (n/nslv)) +for(i in 1:nslv) write(t(xlist[[i]]),file=paste(folder,"/",xroots[1],i,sep="")) + +# Emulation Data +#idlist = vector(mode='list', length = nslv) +for(l in 1:nummodels){ + zlist=split(emu_model_data[[l]]$z_train,(seq(nc_vec[l])-1) %/% (nc_vec[l]/nslv)) + for(i in 1:nslv) write(zlist[[i]],file=paste(folder,"/",zroots[l],i,sep="")) + + xlist=split(as.data.frame(emu_model_data[[l]]$x_train),(seq(nc_vec[l])-1) %/% (nc_vec[l]/nslv)) + for(i in 1:nslv) write(t(xlist[[i]]),file=paste(folder,"/",xroots[l+1],i,sep="")) +} + +# Write the ids per mpi +#for(i in 1:nslv) write(t(idlist[[i]]),file=paste(folder,"/",idroots,i,sep="")) + +# Variance and rotation data +for(l in 0:nummodels){ + ntemp = ifelse(l==0, n, nc_vec[l]+n) + slist=split(sigmav_list[[l+1]],(seq(ntemp)-1) %/% (ntemp/nslv)) + for(i in 1:nslv) write(slist[[i]],file=paste(folder,"/",sroots[l+1],i,sep="")) + + chv_list[[l+1]][is.na(chv_list[[l+1]])]=0 # if a var as 0 levels it will have a cor of NA so we'll just set those to 0. + write(chv_list[[l+1]],file=paste(folder,"/",chgvroots[l+1],sep="")) +} + +# Cutpoints +for(i in 1:p) write(xi[[i]],file=paste(folder,"/",xiroot,i,sep="")) +rm(chv_list) + +#-------------------------------------------------- +#run program +#-------------------------------------------------- +cmdopt=100 #default to serial/OpenMP +runlocal=FALSE +cmd="openbtmixing --conf" +if(Sys.which("openbtmixing")[[1]]=="") # not installed in a global location, so assume current directory + runlocal=TRUE + +if(runlocal) cmd="./openbtmixing --conf" + +cmdopt=system(cmd) + +if(cmdopt==101) # MPI +{ + cmd=paste("mpirun -np ",tc," openbtmixing ",folder,sep="") +} + +if(cmdopt==100) # serial/OpenMP +{ + if(runlocal) + cmd=paste("./openbtmixing ",folder,sep="") + else + cmd=paste("openbtmixing ",folder,sep="") +} + +system(cmd) +#system(paste("rm -f ",folder,"/config",sep="")) + +# Collect results +res=list() +res$modeltype=modeltype; res$model=model +res$xroot=xroots; res$yroot=yroots; res$zroots=zroots;res$sroot=sroots; res$chgvroot=chgvroots; res$xc_col_list=xc_col_list +res$means = means_vec; res$nummodels = nummodels; +res$nd=nd; res$burn=burn; res$nadapt=nadapt; res$adaptevery=adaptevery; +res$mix_model_args = mix_model_args; res$emu_model_args = emu_model_args; +res$pbd=pbd; res$pb=pb; res$pbdh=pbdh; res$pbh=pbh; res$stepwpert=stepwpert; res$stepwperth=stepwperth +res$probchv=probchv; res$probchvh=probchvh; res$minnumbot=minnumbot; res$minnumboth=minnumboth +res$printevery=printevery; res$xiroot=xiroot; res$minx=minx; res$maxx=maxx; +res$summarystats=summarystats; res$tc=tc; res$modelname=modelname +class(xi)="OpenBT_cutinfo" +res$xicuts=xi +res$folder=folder +class(res)="OpenBT_posterior" + +#res$k=k_vec; res$m=m_vec; res$mh=mh_vec;res$tau=tau_vec; res$beta_vec=beta_vec; res$overalllambda=lambda_vec; res$overallnu=nu_vec; +#res$base=base_vec; res$power=power_vec; res$baseh=baseh_vec; res$powerh=powerh_vec + +return(res) + +} + +# Predict function for mixing and emulation +openbt.predict_mix_emulate = function(fit=NULL, x_test=NULL,tc=2,q.lower=0.025,q.upper=0.975){ + # model type definitions + MODEL_MIX_EMULATE=1 #Full problem + MODEL_MIX_ORTHOGONAL=2 # Full problem with orthogonal discrepancy + + #-------------------------------------------------- + # Define objects + if(is.null(fit)) stop("No fitted model specified!\n") + if(is.null(x_test)) stop("No prediction points specified for !\n") + nslv=tc + means_vec=fit$means + p=ncol(x_test) + n=nrow(x_test) + nummodels = fit$nummodels + xproot = 'xp' + + x_test=as.matrix(x_test) + x_col_list = fit$xc_col_list + + #-------------------------------------------------- + # Flatten any vector arguments that need to be passed to pred + # Design column numbers for computer models -- flattens the x_col_list + xc_design_cols = 0 + h = 1 + # Format -- number of cols for model 1, col numbers for model 1,...,number of cols for model K, col numbers for model K + for(l in 1:nummodels){ + pc = length(x_col_list[[l]]) + xc_design_cols[h] = paste(pc) + xc_design_cols[(h+1):(h+pc)] = paste(fit$xc_col_list[[l]]) + h = h + pc + 1 + } + + # Flatten the m and mh vectors + mvec = fit$mix_model_args$ntree + mvec[2:(nummodels+1)] = fit$emu_model_args[,'ntree'] + mhvec = fit$mix_model_args$ntreeh + mhvec[2:(nummodels+1)] = fit$emu_model_args[,'ntreeh'] + + # Create an info matrix and vector of arguments that are required for the individual models + info_matrix = data.frame(means = paste(means_vec),mvec, mhvec) + info_vec = unlist(as.vector(t(info_matrix))) + + #-------------------------------------------------- + #write out config file + fout=file(paste(fit$folder,"/config.pred",sep=""),"w") + writeLines(c(fit$modelname,fit$modeltype,fit$xiroot,xproot, + paste(fit$nd),paste(nummodels),paste(p),paste(tc), + info_vec,xc_design_cols), fout) + close(fout) + + #-------------------------------------------------- + #write out data subsets + #folder=paste(".",fit$modelname,"/",sep="") + xlist=split(as.data.frame(x_test),(seq(n)-1) %/% (n/nslv)) + for(i in 1:nslv) write(t(xlist[[i]]),file=paste(fit$folder,"/",xproot,i-1,sep="")) + for(i in 1:p) write(fit$xicuts[[i]],file=paste(fit$folder,"/",fit$xiroot,i,sep="")) + + #-------------------------------------------------- + #run prediction program + cmdopt=100 #default to serial/OpenMP + runlocal=FALSE + cmd="openbtcli --conf" + if(Sys.which("openbtcli")[[1]]=="") # not installed in a global location, so assume current directory + runlocal=TRUE + + if(runlocal) cmd="./openbtcli --conf" + + cmdopt=system(cmd) + + if(cmdopt==101) # MPI + { + cmd=paste("mpirun -np ",tc," openbtmixingpred ",fit$folder,sep="") + } + + if(cmdopt==100) # serial/OpenMP + { + if(runlocal) + cmd=paste("./openbtmixingpred ",fit$folder,sep="") + else + cmd=paste("openbtmixingpred ",fit$folder,sep="") + } + + #cmd=paste("mpirun -np ",tc," openbtpred",sep="") + #cat(cmd) + system(cmd) + system(paste("rm -f ",fit$folder,"/config.pred",sep="")) + + #-------------------------------------------------- + #format and return + res_model=vector(mode = 'list', length = nummodels+2) + names(res_model) = c("mixmodel", paste0("emulator",1:nummodels),"Information") + res = vector(mode = 'list', length = 2) + names(res) = c('mdraws', 'sdraws') + + fnames=list.files(fit$folder,pattern=paste(fit$modelname,".mdraws*",sep=""),full.names=TRUE) + res$mdraws=do.call(cbind,sapply(fnames,data.table::fread)) + fnames=list.files(fit$folder,pattern=paste(fit$modelname,".sdraws*",sep=""),full.names=TRUE) + res$sdraws=do.call(cbind,sapply(fnames,data.table::fread)) + + #Separate results into mixing vs emulation + for(i in 0:nummodels){ + # Initialize the list elements + res_model[[i+1]] = vector(mode = 'list', length = 12) + names(res_model[[i+1]]) = paste0(rep(c("m","s"),6), + rep(c('draws','mean','sd','.5','.lower','.upper'),each = 2)) + ind = seq(i*fit$nd+1, (i+1)*fit$nd, 1) + + # Summarize the data for the selected model + res_model[[i+1]]$mdraws = res$mdraws[ind,] + res_model[[i+1]]$sdraws = res$sdraws[ind,] + res_model[[i+1]]$mmean=apply(res_model[[i+1]]$mdraws,2,mean) + res_model[[i+1]]$smean=apply(res_model[[i+1]]$sdraws,2,mean) + res_model[[i+1]]$msd=apply(res_model[[i+1]]$mdraws,2,sd) + res_model[[i+1]]$ssd=apply(res_model[[i+1]]$sdraws,2,sd) + res_model[[i+1]]$m.5=apply(res_model[[i+1]]$mdraws,2,quantile,0.5) + res_model[[i+1]]$s.5=apply(res_model[[i+1]]$sdraws,2,quantile,0.5) + res_model[[i+1]]$m.lower=apply(res_model[[i+1]]$mdraws,2,quantile,q.lower) + res_model[[i+1]]$s.lower=apply(res_model[[i+1]]$sdraws,2,quantile,q.lower) + res_model[[i+1]]$m.upper=apply(res_model[[i+1]]$mdraws,2,quantile,q.upper) + res_model[[i+1]]$s.upper=apply(res_model[[i+1]]$sdraws,2,quantile,q.upper) + } + + res_model[[nummodels+2]] = list(q.lower=q.lower,q.upper=q.upper,modeltype=fit$modeltype) + rm(res) + class(res_model)="OpenBT_predict" + + return(res_model) +} diff --git a/Ropenbt/R/pareto_helpers.R b/Ropenbt/R/pareto_helpers.R new file mode 100644 index 0000000..165e4f8 --- /dev/null +++ b/Ropenbt/R/pareto_helpers.R @@ -0,0 +1,117 @@ +## pareto_helpers.R: R functions for manipulating output of OpenBT's mopareto() model. + + +# For cpf, find y_j at each y_i cutpoint for all i=1:d, where j = (not i). +# This really only works for d=2. +# +# Args: cpf: (d)x(??) matrix CPF realization. +# +# Returns: (dq)-tuple of y_j values, where j = (not i). +GetIntersectVals = function(cpf, my.cuts, d=2) +{ + + GetIntersectValsFori = function(i) { + # Returns: q-tuple of y_j values, where j = (not i). + yi.vals = my.cuts[, i] + not.i = ifelse(i == 1, 2, 1) + sapply(yi.vals, function(yi) min(cpf[not.i, cpf[i, ] <= yi], na.rm = TRUE)) + } + + unlist(lapply(1:d, GetIntersectValsFori)) +} + + + +# Compute Modified Band data depth based UQ region +calcMBD = function(fit, d=2) +{ + if(class(fit)!="OpenBT_mopareto") stop("fit object not recognized, should be type OpenBT_mopareto.\n") + + cp.theta = lapply(fit, `[[`,"theta") + nd = length(fit) + + ## 1. Create cuts + n.cuts = 2^6 # also called q + cp.theta.cbind = Reduce(cbind, cp.theta) + my.cuts = sapply(1:d, function(i) seq(min(cp.theta.cbind[i, ]), + max(cp.theta.cbind[i, ]), length.out = n.cuts)) # (q x d) matrix + + ## 2. Create (dq)x(nd) matrix M + mat.M = sapply(cp.theta, GetIntersectVals, my.cuts) + + ## 3. Create (2q)x(nd) matrix R + mat.R.max = t(apply(mat.M, 1, rank, ties.method = "max")) + mat.R.min = t(apply(mat.M, 1, rank, ties.method = "min")) + + ## 4. Create (unnormalized) T matrix + # If l (r) = # strictly to left (right) and m is multiplicity, + # then the correct unnormalized depth is (l+m)(r+m)-m^2. + # No ties implies mat.R.max equals mat.R.min. + mat.T = mat.R.max * (nd - mat.R.min + 1) - (mat.R.max - mat.R.min + 1)^2 + mat.T = mat.T / choose(nd, 2) + + ## 5. Compute modified depth of each Pareto front. + depth.mod = colMeans(mat.T, na.rm=TRUE) + + ## 6. Compute UQ region + uq.cols = (rank(-depth.mod) <= (0.5 * nd)) + pf.uq.mbd = Reduce(cbind, lapply(which(uq.cols), function(i) cp.theta[[i]])) + + # 7. Compute PS + cp.uq.a.cbind = Reduce(cbind, lapply(which(uq.cols), function(i) fit[[i]]$a)) # Use fit here + cp.uq.b.cbind = Reduce(cbind, lapply(which(uq.cols), function(i) fit[[i]]$b)) # Use fit here + + return(list(depth.mbd=depth.mod,pf.uq.mbd=pf.uq.mbd,cp.uq.a=cp.uq.a.cbind,cp.uq.b=cp.uq.b.cbind)) +} + + + +# Compute Vorobev Random Sets based PF UQ region +calcRS = function(fit, d=2) +{ + if(class(fit)!="OpenBT_mopareto") stop("fit object not recognized, should be type OpenBT_mopareto.\n") + + cp.theta = lapply(fit, `[[`,"theta") + nd = length(fit) + + # 2. if R is unknown then find the extremal values for the + # objectives over the RNP sets realizations + cp.theta.cbind = Reduce(cbind, cp.theta) + + ref.max = apply(cp.theta.cbind, 1, max) + ref.min = apply(cp.theta.cbind, 1, min) + ref.vol = Reduce(`*`, ref.max - ref.min) + + # 4. Determine the average volume of the attained sets + FindVol = function(pt.set) { + # Find hypervolume of pt.set wrt ref.pt. + emoa::hypervolume_indicator(points = as.matrix(pt.set, nrow=d), + o = as.matrix(ref.max, nrow=d)) * -1 + } + vols.cpfs = sapply(cp.theta, FindVol) + avg.vol = mean(vols.cpfs) + + # 5. Find the value of the Vorob'ev threshold β∗ by dichotomy: + WhichCPFsDomPt = function(z) { + DoesCPFDomPt = function(cpt, z) { + dom.by.dim = lapply(1:d, function(i) cpt[i, ] <= z[i]) + Reduce(`|`, Reduce(`&`, dom.by.dim)) + } + sapply(cp.theta, DoesCPFDomPt, z=z) + } + + pf.att = apply(cp.theta.cbind, 2, WhichCPFsDomPt) # nd by n.pts matrix + pf.att = colMeans(pf.att) # n.pts-tuple + + # This region is defined by its nondominated point set. + uq.cols = (0.25 <= pf.att) & (pf.att <= 0.75) + pf.uq.rs = cp.theta.cbind[, uq.cols] + + # Compute UQ region for PS and its accuracy ############################## + cp.a.cbind = Reduce(cbind, lapply(fit, `[[`, "a")) # Use fit here + cp.b.cbind = Reduce(cbind, lapply(fit, `[[`, "b")) # Use fit here + cp.uq.a.cbind = cp.a.cbind[, uq.cols] + cp.uq.b.cbind = cp.b.cbind[, uq.cols] + + return(list(pf.uq=pf.uq.rs,cp.uq.a=cp.uq.a.cbind,cp.uq.b=cp.uq.b.cbind)) +} diff --git a/Ropenbt/man/calcMBD.Rd b/Ropenbt/man/calcMBD.Rd new file mode 100644 index 0000000..72db04b --- /dev/null +++ b/Ropenbt/man/calcMBD.Rd @@ -0,0 +1,32 @@ +\name{calcMBD} +\alias{calcMBD} +%\docType{package} +\title{Construct UQ regions for Pareto Front/Set using Modified Band Depth approach.} +\description{ + The function \code{calcMBD()} takes \code{OpenBT_mopareto} object and calculates UQ regions for the estimated Pareto Front and Pareto Set using the modified band depth approach. +} +\usage{ +calcMBD(fit,d=2) +} +\arguments{ + \item{fit}{An \code{OpenBT_mopareto} object.} + \item{d}{The number of responses being considered in the Pareto front/set. Currently only d=2 responses is supported.} +} +\details{ + \code{calcMBD()} calculates the modified band depth UQ region for the Pareto front and set calculated using two OpenBT models. +} +\references{ + Horiguchi, Akira, Santner, Thomas J., Sun, Ying, and Pratola, Matthew T. (2022) + Using BART to perform Pareto optimization and quantify its uncertainties. + \emph{Technometrics}, vol.64, 564--574. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{mopareto.openbt}} +} +\examples{ +# For example usage, see readthedocs. +} \ No newline at end of file diff --git a/Ropenbt/man/calcRS.Rd b/Ropenbt/man/calcRS.Rd new file mode 100644 index 0000000..97b3df9 --- /dev/null +++ b/Ropenbt/man/calcRS.Rd @@ -0,0 +1,32 @@ +\name{calcRS} +\alias{calcRS} +%\docType{package} +\title{Construct UQ regions for Pareto Front/Set using random sets approach.} +\description{ + The function \code{calcRS()} takes \code{OpenBT_mopareto} object and calculates UQ regions for the estimated Pareto Front and Pareto Set using the random sets approach. +} +\usage{ +calcRS(fit,d=2) +} +\arguments{ + \item{fit}{An \code{OpenBT_mopareto} object.} + \item{d}{The number of responses being considered in the Pareto front/set. Currently only d=2 responses is supported.} +} +\details{ + \code{calcRS()} calculates the random sets based UQ region for the Pareto front and set calculated using two OpenBT models. +} +\references{ + Horiguchi, Akira, Santner, Thomas J., Sun, Ying, and Pratola, Matthew T. (2022) + Using BART to perform Pareto optimization and quantify its uncertainties. + \emph{Technometrics}, vol.64, 564--574. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{mopareto.openbt}} +} +\examples{ +# For example usage, see readthedocs. +} \ No newline at end of file diff --git a/Ropenbt/man/load.openbt.Rd b/Ropenbt/man/load.openbt.Rd new file mode 100644 index 0000000..9f95762 --- /dev/null +++ b/Ropenbt/man/load.openbt.Rd @@ -0,0 +1,37 @@ +\name{openbt.load} +\alias{openbt.load} +%\docType{package} +\title{Load a previously saved OpenBT model.} +\description{ + The function \code{openbt.load()} loads an OpenBT model previously saved using \code{openbt.save()}. +} +\usage{ +\method{openbt}{load}( +fname) +} +\arguments{ + \item{fname}{Filename of the OpenBT model to load from disk storage.} +} +\details{ + \code{openbt.load()} loads an OpenBT model that was previously saved using \code{openbt.save()}. + + Returns an object of type \code{OpenBT_posterior}. +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +\dontrun{ + # Load fitted model to a new object. + fit2=openbt.load("test") +} +} \ No newline at end of file diff --git a/Ropenbt/man/makecuts.Rd b/Ropenbt/man/makecuts.Rd new file mode 100644 index 0000000..697caa4 --- /dev/null +++ b/Ropenbt/man/makecuts.Rd @@ -0,0 +1,34 @@ +\name{makecuts.openbt} +\alias{makecuts.openbt} +%\docType{package} +\title{Make a variable cuts object to be used in OpenBT regression models.} +\description{ + The function \code{makecuts.openbt()} takes an n by p design matrix and a scalar or vector number of cutpoints per variable, and returns an object of type \code{OpenBT_cutinfo} with the cutpoint information. +} +\usage{ +\method{makecuts}{openbt}(x,numcuts) +} +\arguments{ + \item{x}{An n by p design matrix.} + \item{numcuts}{Can be a scalar (same number of cutpoints for each variable) or vector describing the number of cutpoints per variable.} +} +\details{ + \code{makecuts.openbt()} creates a cutpoint object for an OpenBT model. + + Returns an object of type \code{OpenBT_cutinfo} which is a length p list, each having a vector describing the cutpoint locations. +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +# For example usage, see readthedocs. +} \ No newline at end of file diff --git a/Ropenbt/man/mopareto.openbt.Rd b/Ropenbt/man/mopareto.openbt.Rd new file mode 100644 index 0000000..4a03274 --- /dev/null +++ b/Ropenbt/man/mopareto.openbt.Rd @@ -0,0 +1,68 @@ +\name{mopareto.openbt} +\alias{mopareto.openbt} +%\docType{package} +\title{Pareto Front Multiobjective Optimization for OpenBT regression models.} +\description{ + The function \code{mopareto.openbt()} calculates the Pareto front and set for 2 fitted Bayesian tree models. Supports continous response BART or single tree models. +} +\usage{ +\method{mopareto}{openbt}( +fit1, +fit2, +fit3, +q.lower=0.025, +q.upper=0.975, +tc=2) +} +\arguments{ + \item{fit1}{First object of type \code{OpenBT_posterior} from a previous call to \code{openbt()}} + \item{fit2}{Second object of type \code{OpenBT_posterior} from a previous call to \code{openbt()}} + \item{fit3}{Optionally, a third object of type \code{OpenBT_posterior} from a previous call to \code{openbt()}} + \item{q.lower}{Lower quantile to return.} + \item{q.upper}{Upper quantile to return.} + \item{tc}{Number of cores to use for parallel computing.} +} +\details{ + \code{mopareto.openbt()} calculates the Pareto front and set for a pair (optionally, a 3-tuple) of Bayesian regression tree models that have been fit by \code{openbt()}. + + Returns an object of type \code{OpenBT_mopareto} which is a length ndpost list with each list entry consisting of the following matrices: +} +\value{ + \item{theta}{Posterior realization of the predicted values along the Pareto front. Stored in a 2-row matrix whose number of columns corresponds to the number of hyperrectangles on the Pareto front.} + \item{a}{Posterior realization of the lower-left corner of a hyperrectangle on the Pareto front. Stored in a p-row matrix whose number of columns corresponds to the number of hyperrectangles on the Pareto front.} + \item{b}{Posterior realization of the upper-right corner of a hyperrectangle on the Pareto front. Stored in a p-row matrix whose number of columns corresponds to the number of hyperrectangles on the Pareto front.} +} +\references{ + Horiguchi, Akira, Santner, Thomas J., Sun, Ying, and Pratola, Matthew T. (2022) + Using BART to perform Pareto optimization and quantify its uncertainties. + \emph{Technometrics}, vol.64, 564--574. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +\code{\link{calcMBD}} +\code{\link{calcRS}} +} +\examples{ +\dontrun{ + # Example of MO optimization + fit.a=openbt(x,y,pbd=c(0.7,0.0),ntreeh=1,numcut=20,tc=6,model="bart",modelname="branin-a") + fit.b=openbt(x,y,pbd=c(0.7,0.0),ntreeh=1,numcut=20,tc=6,model="bart",modelname="branin-b") + + pf=mopareto.openbt(fit.a,fit.b,tc=6) # quite slow, use many cores to speed it up + + # plot front + par(mfrow=c(1,2)) + plot(0,0,xlim=c(-2,2),ylim=c(-2,2),type="n",xlab="y1",ylab="y2") + for(vhr in pf) points(t(vhr$theta),pch=20) + + # plot set + plot(0,0,xlim=c(0,1),ylim=c(0,1),type="n",xlab="x1",ylab="x2") + for(vhr in pf) rect(vhr$a[1, ], vhr$a[2, ], vhr$b[1, ], vhr$b[2, ], col=gray(0.5,alpha=0.1), border=NA) + + # More examples found in pareto_example.R +} +} \ No newline at end of file diff --git a/Ropenbt/man/openbt.Rd b/Ropenbt/man/openbt.Rd new file mode 100644 index 0000000..0fc366f --- /dev/null +++ b/Ropenbt/man/openbt.Rd @@ -0,0 +1,105 @@ +\name{openbt} +\alias{openbt} +%\docType{package} +\title{Interface to Bayesian Regression Tree models in OpenBT.} +\description{ + The function \code{openbt()} is the main function for fitting Bayesian Regression Tree models that are provided in the OpenBT project. +} +\usage{ +openbt(x.train, y.train, ntree=NULL, ntreeh=NULL, ndpost=1000, + nskip=100, k=NULL, power=2.0, base=.95, tc=2, sigmav=rep(1,length(y.train)), + fmean=mean(y.train), overallsd = NULL, overallnu=NULL, + chv = cor(x.train,method="spearman"), pbd=.7, pb=.5, stepwpert=.1, probchv=.1, + minnumbot=5, printevery=100, numcut=100, xicuts=NULL, nadapt=1000, adaptevery=100, + summarystats=FALSE,truncateds=NULL,model=NULL,modelname="model") +} +\arguments{ + \item{x.train}{nxp matrix of predictor variables for the training data.} + \item{y.train}{nx1 vector of the observed response for the training data.} + \item{ntree}{Number of trees used for the mean model.} + \item{ntreeh}{Number of trees used for the variance model.} + \item{ndpost}{Number of iterations to run the MCMC algorithm after burn-in.} + \item{nskip}{Number of MCMC iterations treated as burn-in and discarded.} + \item{k}{Prior hyperparameter for the mean model.} + \item{power}{Power parameter in the tree depth penalizing prior.} + \item{base}{Base parameter in the tree depth penalizing prior.} + \item{tc}{Number of OpenMP threads to use.} + \item{sigmav}{Initialization of square-root of variance parameter.} + \item{fmean}{Overall mean of the data for pre-centering the data before running the model.} + \item{overallsd}{A rudimentary estimate of the process standard deviation. Used in calibrating the variance prior.} + \item{overallnu}{Shape parameter for the variance prior.} + \item{chv}{Predictor correlation matrix used as a pre-conditioner for MCMC change-of-variable proposals.} + \item{pbd}{Probability of performing a birth/death proposal, otherwise perform a rotate proopsal.} + \item{pb}{Probability of performing a birth proposal given that we choose to perform a birth/death proposal.} + \item{stepwpert}{Initial width of proposal distribution for peturbing cutpoints.} + \item{probchv}{Probability of performing a change-of-variable proposal. Otherwise, only do a perturb proposal}. + \item{minnumbot}{Minimum number of observations required in bottom (terminal) nodes.} + \item{printevery}{Outputs MCMC algorithm status every printevery iterations.} + \item{numcut}{Number of cutpoints to use for each predictor variable.} + \item{xicuts}{More detailed construction of cutpoints can be specified using makecuts() and passed as an argument here.} + \item{nadapt}{Number of MCMC iterations allowed for adaptive MCMC. These are also discarded.} + \item{adaptevery}{Adapt MCMC proposal distributions every adaptevery iterations until the algorithm has run for nadapt iterations.} + \item{summarystats}{Return detailed summary statistics about the fitting procedure.} + \item{truncateds}{For using the truncated Gaussian model contributed by Dai Feng.} + \item{model}{Specifies the model type. Options include 'bt','binomial','poisson','bart','hbart','probit','modifiedprobit' and 'merck_truncated'.} + \item{modelname}{A user-specified string that names their model, such as myExampleModel.} +} +\details{ + \code{openbt()} is the main model fitting function for continuous and dichotomous response data. + The most common models to fit are a homoscedastic single-tree model, a homoscedastic BART model and a heteroscedastic BART model. + + For a BART model, set \code{pbd=c(0.7,0.0)} and \code{ntreeh=1}. This forces a scalar (homoscedastic) variance term. + + For a single-tree model, set \code{pbd=(0.7,0.0)}, \code{ntreeh=1} and \code{ntree=1}. This forces the mean component to be modeled using only one tree. + + Examples are available on the OpenBT wiki page. +} +\value{ + \item{res}{Fitted model object of S3 class OpenBT_posterior.} +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{predict.openbt}} +} +\examples{ +\dontrun{ + remotes::install_bitbucket("mpratola/openbt/Ropenbt") + library(Ropenbt) + + # Test Branin function, rescaled + braninsc <- function(xx) + { + x1 <- xx[1] + x2 <- xx[2] + + x1bar <- 15*x1 - 5 + x2bar <- 15 * x2 + + term1 <- x2bar - 5.1*x1bar^2/(4*pi^2) + 5*x1bar/pi - 6 + term2 <- (10 - 10/(8*pi)) * cos(x1bar) + + y <- (term1^2 + term2 - 44.81) / 51.95 + return(y) + } + + # Simulate branin data for testing + set.seed(99) + n=500 + p=2 + x = matrix(runif(n*p),ncol=p) + y=rep(0,n) + for(i in 1:n) y[i] = braninsc(x[i,]) + + # Fit BART model + fit=openbt(x,y,tc=4,model="bart",modelname="branin") + summary(fit) +} +} \ No newline at end of file diff --git a/Ropenbt/man/predict.openbt.Rd b/Ropenbt/man/predict.openbt.Rd new file mode 100644 index 0000000..3a521ae --- /dev/null +++ b/Ropenbt/man/predict.openbt.Rd @@ -0,0 +1,79 @@ +\name{predict.openbt} +\alias{predict.openbt} +%\docType{package} +\title{Drawing Posterior Predictive Realizations for OpenBT regression models.} +\description{ + The function \code{predict.openbt()} is the main function for drawing posterior predictive realizations at new inputs using a fitted model stored in an \code{OpenBT_posterior} object returned from \code{openbt()}. +} +\usage{ +\method{predict}{openbt}( +object, +x.test=NULL, +tc=2, +fmean=fit$fmean, +q.lower=0.025, +q.upper=0.975, ...) +} +\arguments{ + \item{object}{Object of type \code{OpenBT_posterior} from a previous call to \code{openbt()}} + \item{x.test}{New input settings in the form of an \code{npred x p} matrix at which to construct predictions. Defaults to the training inputs.} + \item{tc}{Number of CPU cores to use for parallel computing.} + \item{fmean}{Mean-centering vector for the training data. Defaults to the value used by \code{openbt()} when fitting the model. Usually should be left to the default.} + \item{q.lower}{Lower quantile to return.} + \item{q.upper}{Upper quantile to return.} + \item{...}{Additional arguments for S3 compatibility (currently unused).} +} +\details{ + \code{predict.openbt()} is the main function for calculating posterior predictions and uncertainties once a model has been fit by \code{openbt()}. + + Returns an object of type \code{OpenBT_predict} with the following entries. +} +\value{ + \item{mdraws}{Posterior realizations of the mean function, \eqn{f({\bf x})}{f(x)} stored in an \code{ndpost x npred} matrix, + where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{sdraws}{Posterior realizations of the standard deviation function, \eqn{s({\bf x})}{s(x)} stored in an \code{ndpost x npred} matrix, + where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{mmean}{Posterior predictive mean of \eqn{f({\bf x})}{f(x)}.} + \item{smean}{Posterior predictive mean of the standard deviation, \eqn{s({\bf x})}{s(x)}.} + \item{msd}{Posterior standard deviation of the mean, \eqn{f({\bf x})}{f(x)}.} + \item{ssd}{Posterior standard deviation of the standard devation, \eqn{s({\bf x})}{s(x)}.} + \item{m.5}{Posterior median of the mean function realizations, \eqn{f({\bf x})}{f(x)}.} + \item{m.lower}{Posterior \code{q.lower} quantile of the mean function realizations.} + \item{m.upper}{Posterior \code{q.upper} quantile of the mean function realizations.} + \item{s.5}{Posterior median of the standard deviation function realizations, \eqn{s({\bf x})}{s(x)}.} + \item{s.lower}{Posterior \code{q.lower} quantile of the standard deviation function realizations.} + \item{s.upper}{Posterior \code{q.upper} quantile of the standard deviation function realizations.} + \item{q.lower}{Lower quantile used in constructing the above.} + \item{q.upper}{Upper quantile used in constructing the above.} + \item{pdraws}{Posterior realizations of the probabilities from a probit model.} + \item{pmean}{Posterior predictive mean of the probabilities from a probit model.} + \item{psd}{Posterior standard deviation of the probit model fitted probabilities.} + \item{p.5}{Posterior median of the probit model fitted probabilities.} + \item{p.lower}{Posterior \code{q.lower} quantile of the probit model fitted probabilities.} + \item{p.upper}{Posterior \code{q.upper} quantile of the probit model fitted probabilities.} + \item{q.lower}{The lower quantile used.} + \item{q.upper}{The upper quantile used.} + \item{modeltype}{The type of model fitted.} +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +\dontrun{ + # Calculate in-sample predictions + fitp=predict.openbt(fit,x,tc=4) + + # Make a simple plot + plot(y,fitp$mmean,xlab="observed",ylab="fitted") + abline(0,1) +} +} \ No newline at end of file diff --git a/Ropenbt/man/save.openbt.Rd b/Ropenbt/man/save.openbt.Rd new file mode 100644 index 0000000..6ab1a22 --- /dev/null +++ b/Ropenbt/man/save.openbt.Rd @@ -0,0 +1,36 @@ +\name{openbt.save} +\alias{openbt.save} +%\docType{package} +\title{Save a fitted OpenBT model to disk.} +\description{ + The function \code{openbt.save()} will save an OpenBT model to disk. +} +\usage{ +\method{openbt}{save}( +post,fname=NULL) +} +\arguments{ + \item{post}{An object of type \code{OpenBT_posterior} to save to disk storage.} + \item{fname}{The filename to store the saved result.} +} +\details{ + \code{openbt.save()} will save an OpenBT model that was previously fitted using \code{openbt()}. The data is stored in compressed format and the extension \code{.obt} is automatically appended to the filename \code{fname}. +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +\dontrun{ + # Save fitted model as test.obt in the working directory + openbt.save(fit,"test") +} +} \ No newline at end of file diff --git a/Ropenbt/man/setvarcuts.Rd b/Ropenbt/man/setvarcuts.Rd new file mode 100644 index 0000000..4079388 --- /dev/null +++ b/Ropenbt/man/setvarcuts.Rd @@ -0,0 +1,38 @@ +\name{setvarcuts.openbt} +\alias{setvarcuts.openbt} +%\docType{package} +\title{Modify an existing \code{OpenBT_cutinfo} cutpoint object.} +\description{ + The function \code{servarcuts.openbt()} can modify an exisiting cutpoint object. +} +\usage{ +\method{setvarcuts}{openbt}( +xi, +id, +cutvec) +} +\arguments{ + \item{xi}{Object of type \code{OpenBT_cutinfo} created from a previous call to \code{makecuts()}} + \item{id}{The variable to modify.} + \item{cutvec}{The new cutpoint vector for variable \code{id}.} +} +\details{ + \code{setvarcuts.openbt()} allows one to modify an existing cutpoint object by replacing the cutpoints for variable \code{id} with a new cutpoint vector given by \code{cutvec}. + + Returns an object of type \code{OpenBT_cutinfo}. +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +# For example usage, see readthedocs. +} \ No newline at end of file diff --git a/Ropenbt/man/sobol.openbt.Rd b/Ropenbt/man/sobol.openbt.Rd new file mode 100644 index 0000000..be9a86c --- /dev/null +++ b/Ropenbt/man/sobol.openbt.Rd @@ -0,0 +1,75 @@ +\name{sobol.openbt} +\alias{sobol.openbt} +%\docType{package} +\title{Sobol Sensitivity Indices for OpenBT regression models.} +\description{ + The function \code{sobol.openbt()} calculates the Sobol' sensitivity metric for Bayesian tree models. Supports continous response BART or single tree models. +} +\usage{ +\method{sobol}{openbt}( +fit=NULL, +q.lower=0.025, +q.upper=0.975, +tc=2) +} +\arguments{ + \item{fit}{Object of type \code{OpenBT_posterior} from a previous call to \code{openbt()} with model type 'bart' or 'hbart'.} + \item{q.lower}{Lower quantile to return.} + \item{q.upper}{Upper quantile to return.} + \item{tc}{Number of cores to use in parallel computing.} +} +\details{ + \code{sobol.openbt()} calculates the Sobol' sensitivity indices for a continuous response Bayesian regression tree model that has been fit by \code{openbt()}. These are exact (closed-form) calculations, and are much faster and more accurate than Monte-Carlo approximation methods. + + Returns an object of type \code{OpenBT_sobol} with the following entries. +} +\value{ + \item{vidraws}{Posterior realizations of the 1-way Sobol' variability, stored in an \code{ndpost x p} matrix, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{vijdraws}{Posterior realizations of the 2-way Sobol' variability, stored in an \code{ndpost x p*(p-1)/2} matrix, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{tvidraws}{Posterior realizations of the total single effect Sobol' variability, stored in a matrix with ndpost rows, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{vdraws}{Posterior realizations of total variability, stored in a matrix with ndpost rows.} + \item{sidraws}{Posterior realizations of the 1-way Sobol' sensitivities, stored in an \code{ndpost x p} matrix, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{sijdraws}{Posterior realizations of the 2-way Sobol' sensitivities, stored in an \code{ndpost x p*(p-1)/2} matrix, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{tsidraws}{Posterior realizations of the total single effect Sobol' sensitivities, stored in a matrix with ndpost rows, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{msi}{Posterior mean of the 1-way Sobol' sensitivities.} + \item{msi.sd}{Posterior s.d. of the 1-way Sobol' sensitivities.} + \item{si.5}{Posterior median of the 1-way Sobol' sensitivities.} + \item{si.lower}{Posterior q.lower quantile of the 1-way Sobol' sensitivities.} + \item{si.upper}{Posterior q.upper quantile of the 1-way Sobol' sensitivities.} + \item{msij}{Posterior mean of the 2-way Sobol' sensitivities.} + \item{sij.sd}{Posterior s.d. of the 2-way Sobol' sensitivities.} + \item{sij.5}{Posterior median of the 2-way Sobol' sensitivities.} + \item{sij.lower}{Posterior q.lower quantile of the 2-way Sobol' sensitivities.} + \item{sij.upper}{Posterior q.upper quantile of the 2-way Sobol' sensitivities.} + \item{mtsi}{Posterior mean of the total effect Sobol' sensitivities.} + \item{tsi.sd}{Posterior s.d. of the total effect Sobol' sensitivities.} + \item{tsi.5}{Posterior median of the total effect Sobol' sensitivities.} + \item{tsi.lower}{Posterior q.lower quantile of the total effect Sobol' sensitivities.} + \item{tsi.upper}{Posterior q.upper quantile of the total effect Sobol' sensitivities.} + \item{q.lower}{The lower quantile used.} + \item{q.upper}{The upper quantile used.} + \item{modeltype}{The type of model fitted by \code{openbt()}.} +} +\references{ + Horiguchi, Akira, Pratola, Matthew T. and Santner, Thomas J. (2020) + Assessing Variable Activity for Bayesian Regression Trees. + \emph{Reliability Engineering and System Safety}, \bold{207}, 107391. + + Horiguchi, Akira and Pratola, Matthew T. (2025) Estimating Shapley Effects in Big-Data Emulation and Regression Settings using Bayesian Additive Regression Trees. \emph{arXiv preprint}, \bold{arXiv:2304.03809}, 1--32. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +\dontrun{ + # Calculate Sobol indices + fits=sobol.openbt(fit2) + fits$msi + fits$mtsi + fits$msij +} +} \ No newline at end of file diff --git a/Ropenbt/man/vartivity.openbt.Rd b/Ropenbt/man/vartivity.openbt.Rd new file mode 100644 index 0000000..34fb5ae --- /dev/null +++ b/Ropenbt/man/vartivity.openbt.Rd @@ -0,0 +1,65 @@ +\name{vartivity.openbt} +\alias{vartivity.openbt} +%\docType{package} +\title{Classical variable activity metric for OpenBT regression models.} +\description{ + The function \code{vartivity.openbt()} calculates the classical variable activity metric for Bayesian tree models that is based on counting the number of internal nodes that split on each variable in a dataset. +} +\usage{ +\method{vartivity}{openbt}( +fit=NULL, +q.lower=0.025, +q.upper=0.975) +} +\arguments{ + \item{fit}{Object of type \code{OpenBT_posterior} from a previous call to \code{openbt()}} + \item{q.lower}{Lower quantile to return.} + \item{q.upper}{Upper quantile to return.} +} +\details{ + \code{vartivity.openbt()} calculates the classical variable activity metric for a Bayesian regression tree model that has been fit by \code{openbt()}. + + Returns an object of type \code{OpenBT_vartivity} with the following entries. +} +\value{ + \item{vdraws}{Posterior realizations of the split proportions for each variable in the mean model, stored in an \code{ndpost x npred} matrix, where ndpost is the number of kept MCMC draws in the \code{openbt} run.} + \item{vdrawsh}{Posterior realizations of the split proportions for each variable in the variance model, stored in an \code{ndpost x npred} matrix, where ndpost is the number of kept MCMC draws in the \code{openbt} run. When a constant variance model is used, this will be zero.} + \item{mvdraws}{Posterior mean of the split proportions for the mean model.} + \item{mvdrawsh}{Posterior mean of the split proportions for the variance model.} + \item{vdraws.sd}{Posterior standard deviation of the split proportions for the mean model.} + \item{vdrawsh.sd}{Posterior standard deviation of the split proportions for the variance model.} + \item{vdraws.5}{Posterior median of the split proportions for the mean model.} + \item{vdrawsh.5}{Posterior median of the split proportions for the mean model.} + \item{vdraws.lower}{Posterior \code{q.lower} quantile of the split proportions for the mean model.} + \item{vdraws.upper}{Posterior \code{q.upper} quantile of the split proportions for the mean model.} + \item{vdrawsh.lower}{Posterior \code{q.lower} quantile of the split proportions for the variance model.} + \item{vdrawsh.upper}{Posterior \code{q.upper} quantile of the split proportions for the variance model.} + \item{q.lower}{The lower quantile used.} + \item{q.upper}{The upper quantile used.} + \item{modeltype}{The type of model fitted by \code{openbt()}.} +} +\references{ + Chipman, Hugh A., George, Edward I., and McCulloch, Robert E. (2010) + BART: Bayesian additive regression trees. + \emph{The Annals of Applied Statistics}, \bold{4}, 266--298. + + Horiguchi, Akira, Pratola, Matthew T. and Santner, Thomas J. (2020) + Assessing Variable Activity for Bayesian Regression Trees. + \emph{Reliability Engineering and System Safety}, \bold{207}, 107391. +} +\author{ +Matthew T. Pratola [aut, cre, cph] +Maintainer: Matthew T. Pratola +} +\seealso{ +\code{\link{openbt}} +} +\examples{ +\dontrun{ + # Calculate variable activity information + fitv=vartivity.openbt(fit2) + + # Plot variable activity + plot(fitv) +} +} \ No newline at end of file