Draft: Add automated test runner and visualization script for stress testing by ahzero7d1 · Pull Request #1080 · epfml/disco

ahzero7d1 · 2026-03-20T13:08:23Z

Add extra arguments for running full test configurations
Separate test configuration and execution script
Add visualization script

JulienVig

Thanks Ahyoung, the scripts are very handy! I've left a few comments and questions.

Can you rename scripts to something like python to clarify the difference with the other typescript scripts also in cli/src?
Can you also add a short README.md in the scripts (python) folder with instructions and example to use the two scripts?

JulienVig · 2026-04-09T13:11:16Z

cli/src/scripts/visualize_logs.py

+import pandas as pd
+import matplotlib.pyplot as plt


Could you add a requirements.txt with the libraries contributors should install?

JulienVig · 2026-04-09T13:29:13Z

cli/src/scripts/visualize_logs.py

+def plot_mean_std(df, metric, output_path, title, ylabel):
+    summary = df.groupby("step")[metric].agg(["mean", "std"]).reset_index() 
+
+    summary["std"] = summary["std"].fillna(0)


Why were there NaN stds?

NaN std can occur when running the function with only one client. While this is not a typical case, I think it is nice to keep this line to handle edge cases.

Sounds good. Could you add a small comment for this in the code?

JulienVig · 2026-04-09T14:18:06Z

cli/src/scripts/run_experiments.py

+
+def main():
+    # path of configuration file for experiments
+    config_path = Path(sys.argv[1])


Could you set the default value to the path to basic_tests.json?

JulienVig · 2026-04-09T14:48:42Z

cli/src/experiments/basic_tests.json

+  "experiments": [
+    {
+      "testID": "mnist_fed_mean_cnn3_p3_d600_e50_r2",
+      "task": "mnist_federated",


I'm getting this error:

cli/dist/args.js:46 throw Error(`${unsafeArgs.task} not implemented.`); ^ Error: mnist_federated not implemented

Did you create new default tasks (mnist_federated, titanic_decentralized, etc.)?

Yes, I didn’t push them since they add multiple extra default tasks.
For now, we should define a new default task based on the scheme and minNbOfParticipants, and we can adjust the rest of learning parameters in the experiment setting json file.

JulienVig · 2026-04-09T14:54:35Z

cli/src/cli.ts

+  const streamPath = path.join(dir, `client${userIndex}_local_log.ndjson`);
+
+  const finalLog: SummaryLogs[] = [];
+  // create a write stream that saves learning logs during the train
+  let ndjsonStream: ReturnType<typeof createWriteStream> | null = null;
+
+  if (args.save){
+    ndjsonStream = createWriteStream(streamPath, {flags: "w"});
+  }
+


Did you choose the "ndjson" name over jsonl for a particular reason? I had only ever seen jsonl until now so unless you particularly prefer ndjson I would change it to jsonl

There was no special reason for that. I will replace ndjson to jsonl

JulienVig · 2026-04-09T15:11:22Z

discojs/src/models/model.ts

+  /**
+   * Return validation metrics
+   * 
+   * TODO: currently only works for TFJS, gpt models
+   */
+  evaluate(
+    _validationDataset?: Dataset<Batched<DataFormat.ModelEncoded[D]>>
+  ): Promise<ValidationMetrics>{
+    throw new Error("Evaluation not supported for this model");
+  }


Suggested change

/**

* Return validation metrics

*

* TODO: currently only works for TFJS, gpt models

*/

evaluate(

_validationDataset?: Dataset<Batched<DataFormat.ModelEncoded[D]>>

): Promise<ValidationMetrics>{

throw new Error("Evaluation not supported for this model");

}

/**

* Return validation metrics

*/

abstract evaluate(

_validationDataset?: Dataset<Batched<DataFormat.ModelEncoded[D]>>

): Promise<ValidationMetrics>;

I think we can make this method abstract rather than throwing

I didn’t make the method abstract since ONNXModel class does not implement evaluate. Do you think it would be better to make it abstract and throw an error in that class for now?

ahzero7d1 added 11 commits April 9, 2026 12:39

Add SummaryLogs saving

b32c9c7

Global model validation before local train

b23d8ca

Add additional CLI arguments for testing

cd2012b

Separate test configuration and enable log visualization

e10475f

Remove extra tasks generated for testing

e0f91ee

Fix lint issues

d2e809f

Fix lint issues

a484730

Fix error

149ecdb

Fix run experiment DP args

353992a

Add continuous logging and experiment setting examples

2b27378

Fix lint error and gitignore

75e469e

ahzero7d1 force-pushed the stress-testing branch from a6673ed to 75e469e Compare April 9, 2026 10:42

ahzero7d1 requested a review from JulienVig April 9, 2026 11:06

JulienVig requested changes Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Add automated test runner and visualization script for stress testing#1080

Draft: Add automated test runner and visualization script for stress testing#1080
ahzero7d1 wants to merge 11 commits intodevelopfrom
stress-testing

ahzero7d1 commented Mar 20, 2026

Uh oh!

JulienVig left a comment

Uh oh!

JulienVig Apr 9, 2026

Uh oh!

JulienVig Apr 9, 2026

Uh oh!

ahzero7d1 Apr 13, 2026

Uh oh!

JulienVig Apr 13, 2026

Uh oh!

JulienVig Apr 9, 2026

Uh oh!

JulienVig Apr 9, 2026

Uh oh!

ahzero7d1 Apr 13, 2026

Uh oh!

JulienVig Apr 9, 2026

Uh oh!

ahzero7d1 Apr 13, 2026

Uh oh!

JulienVig Apr 9, 2026

Uh oh!

ahzero7d1 Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ahzero7d1 commented Mar 20, 2026

Uh oh!

JulienVig left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants