Draft: Add automated test runner and visualization script for stress testing#1080
Draft: Add automated test runner and visualization script for stress testing#1080
Conversation
ahzero7d1
commented
Mar 20, 2026
- Add extra arguments for running full test configurations
- Separate test configuration and execution script
- Add visualization script
a6673ed to
75e469e
Compare
JulienVig
left a comment
There was a problem hiding this comment.
Thanks Ahyoung, the scripts are very handy! I've left a few comments and questions.
- Can you rename
scriptsto something likepythonto clarify the difference with the other typescript scripts also incli/src? - Can you also add a short
README.mdin thescripts(python) folder with instructions and example to use the two scripts?
| import pandas as pd | ||
| import matplotlib.pyplot as plt |
There was a problem hiding this comment.
Could you add a requirements.txt with the libraries contributors should install?
| def plot_mean_std(df, metric, output_path, title, ylabel): | ||
| summary = df.groupby("step")[metric].agg(["mean", "std"]).reset_index() | ||
|
|
||
| summary["std"] = summary["std"].fillna(0) |
There was a problem hiding this comment.
Why were there NaN stds?
There was a problem hiding this comment.
NaN std can occur when running the function with only one client. While this is not a typical case, I think it is nice to keep this line to handle edge cases.
There was a problem hiding this comment.
Sounds good. Could you add a small comment for this in the code?
|
|
||
| def main(): | ||
| # path of configuration file for experiments | ||
| config_path = Path(sys.argv[1]) |
There was a problem hiding this comment.
Could you set the default value to the path to basic_tests.json?
| "experiments": [ | ||
| { | ||
| "testID": "mnist_fed_mean_cnn3_p3_d600_e50_r2", | ||
| "task": "mnist_federated", |
There was a problem hiding this comment.
I'm getting this error:
cli/dist/args.js:46
throw Error(`${unsafeArgs.task} not implemented.`);
^
Error: mnist_federated not implemented
Did you create new default tasks (mnist_federated, titanic_decentralized, etc.)?
There was a problem hiding this comment.
Yes, I didn’t push them since they add multiple extra default tasks.
For now, we should define a new default task based on the scheme and minNbOfParticipants, and we can adjust the rest of learning parameters in the experiment setting json file.
| const streamPath = path.join(dir, `client${userIndex}_local_log.ndjson`); | ||
|
|
||
| const finalLog: SummaryLogs[] = []; | ||
| // create a write stream that saves learning logs during the train | ||
| let ndjsonStream: ReturnType<typeof createWriteStream> | null = null; | ||
|
|
||
| if (args.save){ | ||
| ndjsonStream = createWriteStream(streamPath, {flags: "w"}); | ||
| } | ||
|
|
There was a problem hiding this comment.
Did you choose the "ndjson" name over jsonl for a particular reason? I had only ever seen jsonl until now so unless you particularly prefer ndjson I would change it to jsonl
There was a problem hiding this comment.
There was no special reason for that. I will replace ndjson to jsonl
| /** | ||
| * Return validation metrics | ||
| * | ||
| * TODO: currently only works for TFJS, gpt models | ||
| */ | ||
| evaluate( | ||
| _validationDataset?: Dataset<Batched<DataFormat.ModelEncoded[D]>> | ||
| ): Promise<ValidationMetrics>{ | ||
| throw new Error("Evaluation not supported for this model"); | ||
| } |
There was a problem hiding this comment.
| /** | |
| * Return validation metrics | |
| * | |
| * TODO: currently only works for TFJS, gpt models | |
| */ | |
| evaluate( | |
| _validationDataset?: Dataset<Batched<DataFormat.ModelEncoded[D]>> | |
| ): Promise<ValidationMetrics>{ | |
| throw new Error("Evaluation not supported for this model"); | |
| } | |
| /** | |
| * Return validation metrics | |
| */ | |
| abstract evaluate( | |
| _validationDataset?: Dataset<Batched<DataFormat.ModelEncoded[D]>> | |
| ): Promise<ValidationMetrics>; |
I think we can make this method abstract rather than throwing
There was a problem hiding this comment.
I didn’t make the method abstract since ONNXModel class does not implement evaluate. Do you think it would be better to make it abstract and throw an error in that class for now?