Plain MD restart support by jthorton · Pull Request #1884 · OpenFreeEnergy/openfe

jthorton · 2026-03-23T12:44:10Z

Fixes #1719, #1720, #1721
This PR splits up the plain MD protocol unit into a setup and simulation unit and adds the ability to restart failed simulations.

Notes

The simulation must reach the production stage to generate a restart file, failures at the equilibration stages will not trigger a restart and will start again.
Restarts using the state checkpoint of openmm and are portable between hardware
The versions of openfe, gufe and openmm are required to be the same inorder for restarts to be safe

Checklist

All new code is appropriately documented (user-facing code must have complete docstrings).
Added a news entry, or the changes are not user-facing.
Ran pre-commit: you can run pre-commit locally or comment on this PR with pre-commit.ci autofix.

Manual Tests: these are slow so don't need to be run every commit, only before merging and when relevant changes are made (generally at reviewer-discretion).

GPU integration tests
example notebook testing
packaging tests: run this for any large feature PRs or PRs that add test data.

Developers certificate of origin

I certify that this contribution is covered by the MIT License here and the Developer Certificate of Origin at https://developercertificate.org/.

jthorton · 2026-03-23T12:47:04Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

IAlibay

Overall this looks good to me. There's a few areas where it might be good to enhance things, especially when it comes maybe just sticking on the checkpoint reporter the whole way through the simulation, but they wouldn't affect the overall organization of the code.

IAlibay · 2026-03-24T11:08:33Z

        return pdbs


 class PlainMDProtocol(gufe.Protocol):


[scope] It may be good to take the opportunity to move Protocols, results, settings and units to their own files.

I played around with this and ended up having a cyclic import issue as the PlainMDProtocol is a type hint in the PlainMDSetupUnit maybe we can look at this later?

:/ that probably needs fixing - can you open an issue? We usually use gufe.Protocol as the type hint for units so that you can create multiple protocols using the same units.

IAlibay · 2026-03-24T11:18:19Z

+        # g. Save the system and positions to file
+        system_outfile = shared_basepath / "system.xml.bz2"
+        serialization.serialize(stateA_system, system_outfile)
+        # not really need if we save out the pre-minimized file


I somewhat agree, however PDBs are low precision, savin the npy is good in my opinion.

removed that comment.

IAlibay · 2026-03-24T11:38:21Z

+                traj.save_pdb(shared_basepath / output_settings.minimized_structure)
+            # equilibrate
+            # NVT equilibration
+            if equil_steps_nvt:


[scope] Thinking about this - is there any reason we can't just add the checkpoint reporter at this stage? It's more likely that someone running the PlainMD protocol would be running longer equilibrations / productions, so it might be nice to be able to restart at any point during it.

Yeah thats a good idea, and we could use the total step count to work out which stage the simulation was at?

IAlibay · 2026-03-24T11:47:28Z

-                    stateA_topology, stateA_positions, file=f, keepIds=True
-                )
-
        # 10. Get platform


[nit] The comments will need to be updated once the structure is settled.

IAlibay · 2026-03-24T11:49:49Z

+                    if x.getName() == "MonteCarloBarostat":
+                        x.setFrequency(0)
+
+                simulation.context.setVelocitiesToTemperature(to_openmm(temperature))


[scope / nit] it would definitely be good to split this up into method calls or something like that. There's a lot of code in an "else" section.

# Conflicts: # src/openfe/protocols/openmm_md/plain_md_methods.py

jthorton · 2026-04-22T08:31:41Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

jthorton · 2026-04-22T09:56:11Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

codecov · 2026-04-22T10:26:37Z

Codecov Report

❌ Patch coverage is 96.15385% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.59%. Comparing base (7959cc6) to head (2f5ee38).

Files with missing lines	Patch %	Lines
src/openfe/protocols/openmm_md/plain_md_methods.py	95.50%	8 Missing ⚠️
...fe/tests/protocols/openmm_md/test_plain_md_slow.py	0.00%	2 Missing ⚠️
src/openfe/protocols/openmm_afe/base_afe_units.py	50.00%	1 Missing ⚠️
src/openfe/protocols/openmm_septop/base_units.py	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1884      +/-   ##
==========================================
- Coverage   94.78%   90.59%   -4.20%     
==========================================
  Files         210      211       +1     
  Lines       18841    19028     +187     
==========================================
- Hits        17859    17238     -621     
- Misses        982     1790     +808

Flag	Coverage Δ
fast-tests	`90.59% <96.15%> (?)`
slow-tests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

… logic, update tests

jthorton · 2026-04-22T12:20:51Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

IAlibay

Just an initial half of the review (only got to the docstring of _run_MD.

IAlibay · 2026-04-23T11:33:00Z

Could you do this as a separate file? There's a few places where we want to keep uncharged inputs and I'd prefer untangle that from this PR (mostly so we don't have to think about two things at the same time).

I'll revert this for now, it was just very slow while doing local testing and I'll come back to it in a separate PR.

IAlibay · 2026-04-23T11:36:26Z

        # returns a dict of repeat_id: sorted list of ProtocolUnitResult
        return repeats

+    def _validate(


I'm being incredibly nit picky on this one - so please push back. Any chance you could move this method above _create.

This is purely an asthetic thing of "that's the order we've been doing for the other Protocols".

No problem done!

IAlibay · 2026-04-23T11:41:50Z

+    def _validate(
+        self,
+        stateA: ChemicalSystem,
+        stateB: ChemicalSystem,


Can you add a check that stateA is stateB?

Good catch, done and added a test.

IAlibay · 2026-04-23T11:53:09Z

+            # This technically should be NotImplementedError
+            # but gufe.Protocol.validate calls `_validate` wrapped around an
+            # except for NotImplementedError, so we can't raise it here
+            raise NotImplementedError("Can't extend simulations yet")


😅 this doesn't match the comment - we've been using ValueError here because NotImplementedError gets squashed in validate.

Ah I didn't want to change the error that was raised in the protocol but I guess we are breaking a lot of stuff here, updated!

IAlibay

One more thing, otherwise it looks good to me!

IAlibay · 2026-04-23T13:37:01Z

+        Worker method to set the temperature, barostat and run dynamics and save final structure output.
+        """
+        # set the velocities to temperature
+        simulation.context.setVelocitiesToTemperature(to_openmm(temperature))


By doing this you are reassigning velocities even if you're in the middle of a restart.

During a restart, would loadState not assign velocities that we want to continue from?

Ah good catch! I have added another flag to run dynamics to control if we should reassign the velocities, this introduces another parameter to track the state, similar to production started but I think this is clear?

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

jthorton · 2026-04-24T10:35:58Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

IAlibay

This looks amazing - just one thing and one question and it's good to go!

IAlibay · 2026-04-24T12:08:56Z

Can you add a news entry please?

IAlibay · 2026-04-24T12:12:51Z

+        output_settings: MDOutputSettings,
+        verbose: bool = True,
+        output_path: None | pathlib.Path = None,
+        reinitialize_velocities: bool = True,


This is great, thanks!

IAlibay · 2026-04-24T12:18:10Z

+                )
+
+        # add the checkpoint reporter so we can recover during the equilibration / production phases
+        if output_settings.checkpoint_storage_filename:


I'm pretty sure it doesn't but just in case - could you confirm that on restart the CheckpointReporter doesn't reset the global step count?

IAlibay · 2026-04-24T12:27:08Z

+                output_path=output_path,
+                reinitialize_velocities=reinitialize_velocities,
+            )
+            # if we have run this stage we then need to reinitialize velocities in the next stages


I see what you mean, but frankly I don't think there's a way around this.

github-actions · 2026-04-24T12:29:57Z

No API break detected ✅

jthorton added 2 commits March 23, 2026 12:39

split md protocol to setup and simulate, add restart support

e30f523

fix indent in copied htop code

7e3fc2f

[pre-commit.ci] auto fixes from pre-commit.com hooks

dc28530

for more information, see https://pre-commit.ci

IAlibay self-requested a review March 23, 2026 13:21

IAlibay reviewed Mar 24, 2026

View reviewed changes

jameseastwood assigned jthorton Mar 24, 2026

jthorton added 7 commits March 26, 2026 10:46

Merge branch 'main' into md_restarts

62e451b

Merge branch 'main' into md_restarts

5e8042b

allow for resume in any stage

397c630

Merge remote-tracking branch 'origin/md_restarts' into md_restarts

e77b1c0

Merge remote-tracking branch 'origin/main' into md_restarts

a9c0f94

# Conflicts: # src/openfe/protocols/openmm_md/plain_md_methods.py

add a single run dynamics function

8ba51cc

Merge branch 'main' into md_restarts

6fe2f1e

pre-commit-ci Bot and others added 2 commits April 22, 2026 08:31

[pre-commit.ci] auto fixes from pre-commit.com hooks

88eb820

for more information, see https://pre-commit.ci

fix tests, fix other methods which use plain md, update api docs

39f6422

[pre-commit.ci] auto fixes from pre-commit.com hooks

645caff

for more information, see https://pre-commit.ci

update restart to only look for checkpoints, split out remaining step…

b3d7619

… logic, update tests

pre-commit-ci Bot and others added 3 commits April 22, 2026 12:21

[pre-commit.ci] auto fixes from pre-commit.com hooks

7017934

for more information, see https://pre-commit.ci

fix tests and mypy

15a3717

revert type change on settings

96906f2

jthorton requested a review from IAlibay April 22, 2026 14:16

IAlibay requested changes Apr 23, 2026

View reviewed changes

Merge branch 'main' into md_restarts

2f5ee38

IAlibay requested changes Apr 23, 2026

View reviewed changes

IAlibay reviewed Apr 23, 2026

View reviewed changes

Comment thread src/openfe/protocols/openmm_md/plain_md_methods.py Outdated

jthorton and others added 6 commits April 24, 2026 09:45

Update src/openfe/protocols/openmm_md/plain_md_methods.py

9edc553

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

Update src/openfe/protocols/openmm_md/plain_md_methods.py

e73ebcb

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

Update src/openfe/protocols/openmm_md/plain_md_methods.py

1420013

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

Update src/openfe/protocols/openmm_md/plain_md_methods.py

62dee5e

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>

PR feedback, fix missing solvent comp check

cf2c919

Merge branch 'main' into md_restarts

9205b82

pre-commit-ci Bot and others added 3 commits April 24, 2026 10:36

[pre-commit.ci] auto fixes from pre-commit.com hooks

bdcabdd

for more information, see https://pre-commit.ci

revert charges in test file

3199457

Merge remote-tracking branch 'origin/md_restarts' into md_restarts

62231fb

IAlibay requested changes Apr 24, 2026

View reviewed changes

Merge branch 'main' into md_restarts

8134b6e

atravitz added this to the 1.11.0 milestone Apr 24, 2026

Conversation

jthorton commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Developers certificate of origin

Uh oh!

jthorton commented Mar 23, 2026

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jthorton commented Apr 22, 2026

Uh oh!

jthorton commented Apr 22, 2026

Uh oh!

codecov Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jthorton commented Apr 22, 2026

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jthorton commented Mar 23, 2026 •

edited

Loading

codecov Bot commented Apr 22, 2026 •

edited

Loading