Replace Hangfire build job runner with a local implementation#947
Replace Hangfire build job runner with a local implementation#947ddaspit wants to merge 8 commits into
Conversation
043389d to
a490550
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #947 +/- ##
==========================================
- Coverage 66.74% 64.67% -2.08%
==========================================
Files 381 379 -2
Lines 21736 21634 -102
Branches 2773 2762 -11
==========================================
- Hits 14507 13991 -516
- Misses 6191 6619 +428
+ Partials 1038 1024 -14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5cd22da to
0c064cc
Compare
|
I ended up only making a few updates:
|
ddaspit
left a comment
There was a problem hiding this comment.
@ddaspit partially reviewed 43 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on pmachapman).
src/Machine/src/Serval.Machine.Shared/Models/Build.cs line 13 at r3 (raw file):
public enum BuildJobRunnerType { Hangfire,
I think we can safely remove this type. We will need to make sure that there aren't any currently running builds before we deploy.
ddaspit
left a comment
There was a problem hiding this comment.
I don't think I can approve this PR, since I created it.
@ddaspit reviewed 1 file and all commit messages, made 1 comment, and resolved 1 discussion.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on pmachapman).
pmachapman
left a comment
There was a problem hiding this comment.
@pmachapman partially reviewed 43 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on ddaspit).
docker-compose.yml line 27 at r3 (raw file):
- BuildJob__ClearML__1__DockerImage=${MACHINE_PY_CPU_IMAGE:-ghcr.io/sillsdev/machine.py:1.12.0.cpu_only} - BuildJob__ClearML__2__Queue=${CLEARML_CPU_QUEUE:-lambert_24gb.cpu_only} - BuildJob__ClearML__2__DockerImage=${MACHINE_PY_CPU_IMAGE:-ghcr.io/sillsdev/machine.py:1.12.0.cpu_only}
Do we want to specify the version of machine.py from now on (if so, we should update appsettings.json and docker-compose.atlas.yml), or was this just a workaround?
Code quote:
- BuildJob__ClearML__0__DockerImage=${MACHINE_PY_IMAGE:-ghcr.io/sillsdev/machine.py:1.12.0}
- BuildJob__ClearML__1__Queue=${CLEARML_CPU_QUEUE:-lambert_24gb.cpu_only}
- BuildJob__ClearML__1__DockerImage=${MACHINE_PY_CPU_IMAGE:-ghcr.io/sillsdev/machine.py:1.12.0.cpu_only}
- BuildJob__ClearML__2__Queue=${CLEARML_CPU_QUEUE:-lambert_24gb.cpu_only}
- BuildJob__ClearML__2__DockerImage=${MACHINE_PY_CPU_IMAGE:-ghcr.io/sillsdev/machine.py:1.12.0.cpu_only}src/Machine/src/Serval.Machine.Shared/Services/SmtTransferLocalBuildJobFactory.cs line 59 at r3 (raw file):
} private record PostprocessData(int TrainCount, double Confidence);
NIT: Should this record be public? I notice the all three LocalBuildJobFactories declare this exact record.
Code quote:
private record PostprocessData(int TrainCount, double Confidence);
ddaspit
left a comment
There was a problem hiding this comment.
@ddaspit reviewed 5 files and all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on ddaspit).
Enkidu93
left a comment
There was a problem hiding this comment.
@Enkidu93 partially reviewed 36 files, made 3 comments, and resolved 1 discussion.
Reviewable status: 46 of 49 files reviewed, 1 unresolved discussion (waiting on pmachapman).
docker-compose.yml line 27 at r3 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
Do we want to specify the version of machine.py from now on (if so, we should update appsettings.json and docker-compose.atlas.yml), or was this just a workaround?
I was thinking we could switch to explicitly setting it. I mention my rationale in a top-level comment. I went ahead and updated it in the appsettings as well - good call! Do you think this is a good change? I can't remember why we've left it as latest in the past.
src/Machine/src/Serval.Machine.Shared/Services/SmtTransferLocalBuildJobFactory.cs line 59 at r3 (raw file):
Previously, pmachapman (Peter Chapman) wrote…
NIT: Should this record be public? I notice the all three LocalBuildJobFactories declare this exact record.
Done. I went ahead and made it public. Feel free to push back if there's a reason for having it defined separately, @ddaspit.
src/Echo/src/EchoEngine/TranslationEngineService.cs line 183 at r6 (raw file):
try { //Wait for build to exist in the database before starting the build.
We will want to remove this once we move Echo to Serval.Machine.
ddaspit
left a comment
There was a problem hiding this comment.
@ddaspit reviewed 6 files and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on Enkidu93 and pmachapman).
src/Serval/src/Serval.Translation/Services/PlatformService.cs line 236 at r6 (raw file):
public async Task<bool> BuildExistsAsync(string buildId, CancellationToken cancellationToken = default) { Build? build = await _builds.GetAsync(b => b.Id == buildId, cancellationToken);
I think there is an ExistsAsync method on IRepository.
|
Previously, ddaspit (Damien Daspit) wrote…
Done. Yep 🤦♂️ |
ddaspit
left a comment
There was a problem hiding this comment.
@ddaspit reviewed 1 file and all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on pmachapman).
…eded; fix incorrect runner bug; explicitly set machine.py version in docker-compose.yml
a6f17fc to
e8c58e6
Compare
pmachapman
left a comment
There was a problem hiding this comment.
@pmachapman reviewed 11 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on Enkidu93).
docker-compose.yml line 27 at r3 (raw file):
Do you think this is a good change?
I think so. If a PR requires a special point release build of machine.py, it should probably define it in its YAML and appSettings.json.
This change is