Proxy Updates#1
Open
A-Tarraf wants to merge 30 commits into
Open
Conversation
…isplay input flexibility
… a bug where invalid args were sent to ftio
…o, moved ftio visualization to mean value and changed ftio signal name to proper wave name
… ability in case of node failure
…mq communication and fixed 2 small bugs
Feature ftio
Author
|
We tested out the version and it work great |
… expand - New --auto-root / --root-url-dir flags: child proxies discover the root URL from a shared filesystem file (root.url) written by the root at startup. Root URL can also be injected via PROXY_ROOT_URL env var. - Graceful leave: SIGTERM handler sends /leave?from=<url> to the root before exiting, triggering immediate TBON repair without waiting for a missed scrape. - /leave HTTP endpoint on the root: removes the departing node from the topology and calls the existing self-repair logic to rewire the TBON. - Fix NaN/null serialization: Gauge min/max/total fields that are NaN were serialized as JSON null, crashing the root when deserializing child scrapes. Fixed with a serialize_with helper that maps NaN/infinite to 0.0; the binary UNIX socket protocol (which does not support deserialize_any) is unaffected. - Add experiment/run_malleability_test.sh: end-to-end test against a 4-node Docker cluster exercising graceful leave, shrink self-repair, and expand auto-join. Script checks prerequisites and auto-builds the binary if needed. - Add experiment/README.md with quick-start, cluster setup link, expected output, and a table of what each step tests. - Update README.md with a Malleability Support section documenting all new flags, endpoints, DMR integration guidance, and a pointer to the experiment.
Root cause: {{fnall}} in mpi_wrappers.w generates Fortran wrappers for
ALL MPI functions. OpenMPI 5.x added the MPI-4 Session API
(MPI_Session_*), whose Fortran wrappers use MPI_Session_f2c() — but that
returns an int handle while the C functions expect an MPI_Session*
pointer. GCC 15 treats this as an error.
Fixes:
1. exporters/mpi/mpi_wrappers.w — Added 9 problematic MPI-4 Session
functions to the {{fnall}} exclusion list (they handle process-set
management, not data transfer, so excluding them doesn't affect proxy
measurement)
2. install.sh — Changed the mpicc step to use || error_out instead of
checking file existence, so a real compilation failure is caught
and reported
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The new version of FTIO creates a prediction server that can be accessed via ZMQ and MessagePack.
Tim Deringer (@Tim-Dieringer) just finished his thesis on some improvements and linkage with FTIO. More precisely, these are his changes:
The changes to FTIO's integration are complementary to https://github.com/tuda-parallel/FTIO/tree/feature/metric_proxy_bindings