o This repository has been archived and is no longer maintained. o The code is provided for historical reference and may contain unpatched or unknown vulnerabilities. o It should not be used in production systems.
A BioBERT-based NLP model to perform relation extraction (RE) and named entity recognition (NER) to identify directional DDIs. Model weights are from BioBERT-Large v1.1. Training and validation scripts are modified from https://github.com/kamalkraj/BERT-NER-TF and implemented with TensorFlow v2.
python3tensorflow(version >= 2.0)fastprogress(version >= 0.1.21)seqeval(version >= 0.0.5)
BERT-TF-mastercontains the main scripts for training and validationDDI_datacontains the training and validation datasets for RE and NER steps
- Download BioBERT-Large v1.1. Save as a sub-directory, e.g. 'biobert_large'.
- Convert TensorFlow version 1 model weights to TensorFlow version 2 model weights; follow procedure in
tf1_convert_tf2.sh. - First, run training and validation for RE step. To do this, run
myrun_re.pyunderBERT-TF-masterdirectory. An example bash script,example_re.sh, shows the various command line arguments supplied tomyrun_re.py. - Second, run training and validation for NER step. To do this, run
myrun_ner.pyunderBERT-TF-masterdirectory, followed bymyner_detokenize.pyunderBERT-TF-master/biocodesdirectory. An example bash script,example_ner.sh, shows the various command line arguments supplied to both these scripts.
This software and documentation were developed by the authors in their capacities as Oak Ridge Institute for Science and Education (ORISE) research fellows at the U.S. Food and Drug Administration (FDA).
FDA assumes no responsibility whatsoever for use by other parties of the Software, its source code, documentation or compiled executables, and makes no guarantees, expressed or implied, about its quality, reliability, or any other characteristic. Further, FDA makes no representations that the use of the Software will not infringe any patent or proprietary rights of third parties. The use of this code in no way implies endorsement by the FDA or confers any advantage in regulatory decisions.