ConfDENSE

This repository contains the official implementation of ConfDENSE, proposed and used by Pinaki et al. The primary goal of ConfDENSE is to identify the molecular conformer that is most responsible for the odor profile exhibited by a molecule.

A conformer is a different three-dimensional arrangement of the same molecule that arises due to rotations around its chemical bonds. While the atoms and chemical composition remain unchanged, the spatial arrangement of the atoms differs. Since molecular interactions are inherently three-dimensional, different conformers can contribute differently to a molecule's odor profile. In addition to conformer discovery, we also investigate how effectively the ConfDENSE framework can be used for molecular odor prediction.

ConfDENSE consists of two main components.

The first component is the pretraining of a PointNet-based architecture. For each molecule in the dataset, we generate multiple conformers and sample points from their corresponding electron density distributions to construct point clouds (Refer to this). Each conformer-specific point cloud is then used to train the PointNet model to predict the odor labels of the parent molecule. In our experiments, we generate 100 conformers for each molecule and train the PointNet model on the resulting point clouds. For more details, refer to this.

Once the PointNet model has been trained, we move to the second component, called the Aggregator. The Aggregator is trained separately using the saved outputs of the PointNet model. For each molecule, it receives the predictions corresponding to its 100 conformers and learns to produce a final odor prediction for the molecule. The train, validation, and test splits used during PointNet training are retained for Aggregator training. For more details, refer to this.

Finally, to identify the "optimal" conformer of a given molecule, we explore two different approaches. Finally, to identify the "optimal" conformer of a given molecule, we explore two different approaches. In the second approach, we compute the cosine similarity between each conformer's PointNet prediction and the ground-truth odor profile of the molecule. The conformer whose prediction is most similar to the true odor profile is selected as the optimal conformer. Further details regarding both approaches can be found here.

Electron Density Point Cloud Data

We use the standard molecular odor dataset, which we refer to throughout this repository as the GS-LF dataset. The dataset contains 4,983 molecules, where each molecule is annotated with one or more odor labels, making it a multi-label classification dataset.

For every molecule, we generate 100 conformers and sample points from the corresponding electron density distribution of each conformer. This process produces a point cloud representation for every conformer of every molecule. Consequently, each molecule is represented by 100 conformer-specific point clouds that capture its three-dimensional electron density structure.

For illustration, examples of point clouds generated from conformers of different molecules are shown below.

CCC(O)c1ccccc1 CC(C)(C)c1ccc(O)cc1 CC(C)(C)c1ccc(O)cc1

The above illustration is generated from a small subset of the full dataset. Similar visualizations can be produced using the illustrator.ipynb notebook provided in this repository. The notebook operates on the sample dataset located in sample_shard_data. We provide this sample so that users can better understand the storage format and structure of the point cloud data used throughout the ConfDENSE framework.

Note

The complete dataset is substantially larger than the sample data included in this repository. Researchers interested in obtaining the full dataset may contact the corresponding author at pinaki@uk.hert.edu.

PointNet Model Training & Results

Aggregator Training & Analysis

The outputs produced by the PointNet component are stored in the conf_data directory as:

train_predictions.npz
valid_predictions.npz
test_predictions.npz

These files contain the conformer-level predictions that are subsequently used to train the Aggregator model.

Aggregator Architecture

The Aggregator architecture consists of three main components:

Index Encoding – analogous to the positional encodings used in standard attention mechanisms, allowing the model to distinguish between different conformers.
Set2Set Pooling – used to aggregate information across the set of conformer predictions.
Multi-Layer Perceptron (MLP) – used to produce the final molecular odor prediction, followed by a sigmoid activation layer for multi-label classification.

The implementation of the Aggregator, ConformerAggregator, can be found in:

utils/AggregatorClasses.py

Training the Aggregator

Training is performed using the .npz files described above. The complete training and evaluation pipeline is provided in the notebook:

analysis_and_kde.ipynb

By default, the notebook loads the pretrained Aggregator weights used in the paper. Users who wish to retrain the Aggregator from scratch can uncomment the following code in the corresponding block:

# Train the model
# trained_model = train_model(
#     model,
#     train_loader,
#     counts_pos=counts_pos,
#     valid_loader=valid_loader,
#     learning_rate=CONFIG['learning_rate'],
#     num_epochs=CONFIG['num_epochs'],
#     gamma=CONFIG['gamma'],
#     step_size=CONFIG['step_size'],
#     patience=100
# )

# Save trained model
# trained_model.eval()
# torch.save(
#     trained_model.state_dict(),
#     "conformer_model_weights.pth"
# )

Hyperparameter Configuration

Users interested in modifying the training configuration can edit the CONFIG.yaml file. Each configuration parameter is documented within the file itself.

For hyperparameter search and experimentation, refer to:

hyper_aggregator.py

This script contains the code used for exploring different Aggregator configurations and training settings.

Conformer Analysis

The conformer analysis pipeline is also provided in the analysis_and_kde.ipynb notebook.

The primary function responsible for performing conformer-level analysis is:

analyze_conformer_key(key, model)

Here, key corresponds to the value stored in the Index_ column of the GS-LF dataset located in the conf_data directory. This identifier uniquely specifies a molecule and allows the analysis pipeline to retrieve all associated conformers and predictions.

The function evaluates the conformer-level predictions produced by the PointNet model and compares them using the similarity-based approaches described earlier in this README. This enables the identification and ranking of conformers that are most representative of a molecule's odor profile.

The resulting analyses and their implications are discussed in detail in the paper.

References

Contributors

Name	Affiliation
Sarabeshwar Balaji	Indian Institute of Science Education and Research Bhopal (IISER Bhopal), India
Mrityunjay Sharma	CSIR-CSIO, Chandigarh, India
Aryan Amit Barsainyan	National Institute of Technology Karnataka Surathkal, Karnataka, India
Pinaki Saha	University of Hertfordshire, UH Biocomputation Group, United Kingdom
Ritesh Kumar	CSIR-CSIO, Chandigarh, India
Volker Steuber	University of Hertfordshire, UH Biocomputation Group, United Kingdom
Michael Schmuker	Helmholtz-Gemeinschaft, Berlin, Germany

Citing This Work

To cite this work, please use this bibtex entry:

@article{saha2026confdense,
  title={ConfDENSE: A conformer aware electron density based machine learning paradigm for navigating the odorant landscape},
  author={Saha, Pinaki and Balaji, Sarabeshwar and Sharma, Mrityunjay and Barsainyan, Aryan Amit and Kumar, Ritesh and Steuber, Volker and Schmuker, Michael},
  year={2026},
  publisher={ChemRxiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConfDENSE

Electron Density Point Cloud Data

PointNet Model Training & Results

Aggregator Training & Analysis

Aggregator Architecture

Training the Aggregator

Hyperparameter Configuration

Conformer Analysis

References

Contributors

Citing This Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ConfDENSE

Electron Density Point Cloud Data

PointNet Model Training & Results

Aggregator Training & Analysis

Aggregator Architecture

Training the Aggregator

Hyperparameter Configuration

Conformer Analysis

References

Contributors

Citing This Work

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages