odb: add write_packfile, for_each_unique_abbrev, convert_object_id#2074
odb: add write_packfile, for_each_unique_abbrev, convert_object_id#2074MayCXC wants to merge 1 commit intogitgitgadget:masterfrom
Conversation
|
There is an issue in commit 3d511c2:
|
785108f to
146c7ed
Compare
|
/submit |
|
Submitted as pull.2074.git.1774530437562.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
6e569fc to
5b3e9a8
Compare
|
/submit |
|
Submitted as pull.2074.v2.git.1774532383055.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
a69b8d3 to
e03f501
Compare
Add three vtable methods to odb_source that were not part of the recent ps/odb-sources and ps/object-counting series: - write_packfile: ingest a pack from a file descriptor. The files backend chooses between index-pack (large packs) and unpack-objects (small packs below fetch.unpackLimit). Options cover thin-pack fixing, promisor marking, fsck, lockfile capture, and shallow file passing. - for_each_unique_abbrev: iterate objects matching a hex prefix for disambiguation. Searches loose objects via oidtree, then multi-pack indices, then non-MIDX packs. - convert_object_id: translate between hash algorithms using the loose object map. Used during SHA-1 to SHA-256 migration. Also add ODB_SOURCE_HELPER to the source type enum, preparing for the helper backend in the next commit. The write_packfile vtable method replaces the pattern where callers spawn index-pack/unpack-objects directly. fast-import already uses odb_write_packfile() and this allows non-files backends to handle pack ingestion through their own mechanism. Signed-off-by: Aaron Paterson <apaterson@pm.me>
e03f501 to
f28dd43
Compare
|
Patrick Steinhardt wrote on the Git mailing list (how to reply to this email): On Thu, Mar 26, 2026 at 01:39:43PM +0000, Aaron Paterson via GitGitGadget wrote:
> From: Aaron Paterson <apaterson@pm.me>
>
> Add three vtable methods to odb_source that were not part of the
> recent ps/odb-sources and ps/object-counting series:
>
> - write_packfile: ingest a pack from a file descriptor. The files
> backend chooses between index-pack (large packs) and
> unpack-objects (small packs below fetch.unpackLimit). Options
> cover thin-pack fixing, promisor marking, fsck, lockfile
> capture, and shallow file passing.
>
> - for_each_unique_abbrev: iterate objects matching a hex prefix
> for disambiguation. Searches loose objects via oidtree, then
> multi-pack indices, then non-MIDX packs.
>
> - convert_object_id: translate between hash algorithms using the
> loose object map. Used during SHA-1 to SHA-256 migration.
This will conflict with ps/odb-generic-object-name-handling, which
already introduces generic callbacks for `for_each_unique_abbrev()`.
There's also ongoing work by Justin to handle writing packfiles via the
ODB transaction interface.
> Also add ODB_SOURCE_HELPER to the source type enum, preparing for
> the helper backend in the next commit.
Huh.
> The write_packfile vtable method replaces the pattern where callers
> spawn index-pack/unpack-objects directly. fast-import already uses
> odb_write_packfile() and this allows non-files backends to handle
> pack ingestion through their own mechanism.
I'm again a bit puzzled, same as with your previous patch series. It
would be nice to collaborate on this topic, but that will require a bit
more coordination than just sending in a patch series as things are
quite in flux here.
Patrick |
|
User |
|
apaterson@pm.me wrote on the Git mailing list (how to reply to this email): Of course, and my apologies, gitgadget is not formatting these messages as clearly as I would like them to be.
Both this series and the last were adapted from my fork that supports [1] with a feature similar to gitremote-helpers. My hope is that the fork can converge with master so that sqlite-git can become redistributable. The local backends vtable was already a step in this direction, so the question is if letting users bring their own local backends, the way they currently can with helpers for remote backends, is in scope for git core.
Either way, it sounds like series 1 will be covered by upstream, so next I would like to contribute support for git-local-* helpers. This allows users to create .git repositories with storage formats other than packs and builtin alternatives like reftables, which seems appropriate as direct sqlite support would probably be out of scope for core. Local helpers are already implemented in [2] but if it makes sense to hold off and rebuild it after e.g. ps/odb-generic-object-name-handling is merged, I am not in such a rush.
[1] https://github.com/mayCXC/sqlite-git
[2] https://github.com/gitgitgadget/git/compare/master...MayCXC:git:ps/series-2-helpers-v3.patch
- Aaron
On Thursday, March 26th, 2026 at 7:58 AM, Patrick Steinhardt <ps@pks.im> wrote:
> On Thu, Mar 26, 2026 at 01:39:43PM +0000, Aaron Paterson via GitGitGadget wrote:
> > From: Aaron Paterson <apaterson@pm.me>
> >
> > Add three vtable methods to odb_source that were not part of the
> > recent ps/odb-sources and ps/object-counting series:
> >
> > - write_packfile: ingest a pack from a file descriptor. The files
> > backend chooses between index-pack (large packs) and
> > unpack-objects (small packs below fetch.unpackLimit). Options
> > cover thin-pack fixing, promisor marking, fsck, lockfile
> > capture, and shallow file passing.
> >
> > - for_each_unique_abbrev: iterate objects matching a hex prefix
> > for disambiguation. Searches loose objects via oidtree, then
> > multi-pack indices, then non-MIDX packs.
> >
> > - convert_object_id: translate between hash algorithms using the
> > loose object map. Used during SHA-1 to SHA-256 migration.
>
> This will conflict with ps/odb-generic-object-name-handling, which
> already introduces generic callbacks for `for_each_unique_abbrev()`.
> There's also ongoing work by Justin to handle writing packfiles via the
> ODB transaction interface.
>
> > Also add ODB_SOURCE_HELPER to the source type enum, preparing for
> > the helper backend in the next commit.
>
> Huh.
>
> > The write_packfile vtable method replaces the pattern where callers
> > spawn index-pack/unpack-objects directly. fast-import already uses
> > odb_write_packfile() and this allows non-files backends to handle
> > pack ingestion through their own mechanism.
>
> I'm again a bit puzzled, same as with your previous patch series. It
> would be nice to collaborate on this topic, but that will require a bit
> more coordination than just sending in a patch series as things are
> quite in flux here.
>
> Patrick
> |
|
Patrick Steinhardt wrote on the Git mailing list (how to reply to this email): On Thu, Mar 26, 2026 at 02:21:07PM +0000, apaterson@pm.me wrote:
> Of course, and my apologies, gitgadget is not formatting these
> messages as clearly as I would like them to be.
>
> Both this series and the last were adapted from my fork that supports
> [1] with a feature similar to gitremote-helpers. My hope is that the
> fork can converge with master so that sqlite-git can become
> redistributable. The local backends vtable was already a step in this
> direction, so the question is if letting users bring their own local
> backends, the way they currently can with helpers for remote backends,
> is in scope for git core.
Thanks for the context! This also matches with our eventual goal, even
though we rather envision that it makes more sense to maybe use a plugin
in the form of a shared object instead of using a helper executable.
> Either way, it sounds like series 1 will be covered by upstream, so
> next I would like to contribute support for git-local-* helpers. This
> allows users to create .git repositories with storage formats other
> than packs and builtin alternatives like reftables, which seems
> appropriate as direct sqlite support would probably be out of scope
> for core. Local helpers are already implemented in [2] but if it makes
> sense to hold off and rebuild it after e.g.
> ps/odb-generic-object-name-handling is merged, I am not in such a
> rush.
I've currently got around 10 more patch series pending that are mostly
ready to be sent out, but that all build on one another. As said, there
is a ton of stuff changing in the area of pluggable object databases,
and I expect it'll probably take two more Git releases until we have
fully carved out the foundation. Once that's done I think it should
become quieter, and at that point it'll become easier to also do
drive-by contributions without requiring too much coordination.
You can have a look at [1], which is our (non-official and
GitLab-specific) epic for the work that we have planned over the next
few months. Maybe it helps you a bit to figure out where we're going.
More concretely, next steps will be:
- I plan to turn in-memory, loose and packed backends into proper ODB
sources.
- I plan to introduce backend-specific consistency checks.
- I plan to introduce backend-specific logic for optimizations.
- I plan to introduce backend-specific logic of generating packfiles.
- Justin is revamping how writes work and plans to refactor existing
callers that do ad-hoc transactions. This will also eventually cover
writing packfiles into the ODB.
If you'd like to get involved earlier I'd propose that we sync off-list
to figure out how to collaborate without stepping on each others toes
all the time :)
Thanks!
Patrick |
|
Patrick Steinhardt wrote on the Git mailing list (how to reply to this email): On Thu, Mar 26, 2026 at 01:39:43PM +0000, Aaron Paterson via GitGitGadget wrote:
> From: Aaron Paterson <apaterson@pm.me>
>
> Add three vtable methods to odb_source that were not part of the
> recent ps/odb-sources and ps/object-counting series:
>
> - write_packfile: ingest a pack from a file descriptor. The files
> backend chooses between index-pack (large packs) and
> unpack-objects (small packs below fetch.unpackLimit). Options
> cover thin-pack fixing, promisor marking, fsck, lockfile
> capture, and shallow file passing.
>
> - for_each_unique_abbrev: iterate objects matching a hex prefix
> for disambiguation. Searches loose objects via oidtree, then
> multi-pack indices, then non-MIDX packs.
>
> - convert_object_id: translate between hash algorithms using the
> loose object map. Used during SHA-1 to SHA-256 migration.
This will conflict with ps/odb-generic-object-name-handling, which
already introduces generic callbacks for `for_each_unique_abbrev()`.
There's also ongoing work by Justin to handle writing packfiles via the
ODB transaction interface.
> Also add ODB_SOURCE_HELPER to the source type enum, preparing for
> the helper backend in the next commit.
Huh.
> The write_packfile vtable method replaces the pattern where callers
> spawn index-pack/unpack-objects directly. fast-import already uses
> odb_write_packfile() and this allows non-files backends to handle
> pack ingestion through their own mechanism.
I'm again a bit puzzled, same as with your previous patch series. It
would be nice to collaborate on this topic, but that will require a bit
more coordination than just sending in a patch series as things are
quite in flux here.
Patrick |
|
apaterson@pm.me wrote on the Git mailing list (how to reply to this email): Of course, and my apologies, gitgadget is not formatting these messages as clearly as I would like them to be.
Both this series and the last were adapted from my fork that supports [1] with a feature similar to gitremote-helpers. My hope is that the fork can converge with master so that sqlite-git can become redistributable. The local backends vtable was already a step in this direction, so the question is if letting users bring their own local backends, the way they currently can with helpers for remote backends, is in scope for git core.
Either way, it sounds like series 1 will be covered by upstream, so next I would like to contribute support for git-local-* helpers. This allows users to create .git repositories with storage formats other than packs and builtin alternatives like reftables, which seems appropriate as direct sqlite support would probably be out of scope for core. Local helpers are already implemented in [2] but if it makes sense to hold off and rebuild it after e.g. ps/odb-generic-object-name-handling is merged, I am not in such a rush.
[1] https://github.com/mayCXC/sqlite-git
[2] https://github.com/gitgitgadget/git/compare/master...MayCXC:git:ps/series-2-helpers-v3.patch
- Aaron
On Thursday, March 26th, 2026 at 7:58 AM, Patrick Steinhardt <ps@pks.im> wrote:
> On Thu, Mar 26, 2026 at 01:39:43PM +0000, Aaron Paterson via GitGitGadget wrote:
> > From: Aaron Paterson <apaterson@pm.me>
> >
> > Add three vtable methods to odb_source that were not part of the
> > recent ps/odb-sources and ps/object-counting series:
> >
> > - write_packfile: ingest a pack from a file descriptor. The files
> > backend chooses between index-pack (large packs) and
> > unpack-objects (small packs below fetch.unpackLimit). Options
> > cover thin-pack fixing, promisor marking, fsck, lockfile
> > capture, and shallow file passing.
> >
> > - for_each_unique_abbrev: iterate objects matching a hex prefix
> > for disambiguation. Searches loose objects via oidtree, then
> > multi-pack indices, then non-MIDX packs.
> >
> > - convert_object_id: translate between hash algorithms using the
> > loose object map. Used during SHA-1 to SHA-256 migration.
>
> This will conflict with ps/odb-generic-object-name-handling, which
> already introduces generic callbacks for `for_each_unique_abbrev()`.
> There's also ongoing work by Justin to handle writing packfiles via the
> ODB transaction interface.
>
> > Also add ODB_SOURCE_HELPER to the source type enum, preparing for
> > the helper backend in the next commit.
>
> Huh.
>
> > The write_packfile vtable method replaces the pattern where callers
> > spawn index-pack/unpack-objects directly. fast-import already uses
> > odb_write_packfile() and this allows non-files backends to handle
> > pack ingestion through their own mechanism.
>
> I'm again a bit puzzled, same as with your previous patch series. It
> would be nice to collaborate on this topic, but that will require a bit
> more coordination than just sending in a patch series as things are
> quite in flux here.
>
> Patrick
> |
|
Patrick Steinhardt wrote on the Git mailing list (how to reply to this email): On Thu, Mar 26, 2026 at 02:21:07PM +0000, apaterson@pm.me wrote:
> Of course, and my apologies, gitgadget is not formatting these
> messages as clearly as I would like them to be.
>
> Both this series and the last were adapted from my fork that supports
> [1] with a feature similar to gitremote-helpers. My hope is that the
> fork can converge with master so that sqlite-git can become
> redistributable. The local backends vtable was already a step in this
> direction, so the question is if letting users bring their own local
> backends, the way they currently can with helpers for remote backends,
> is in scope for git core.
Thanks for the context! This also matches with our eventual goal, even
though we rather envision that it makes more sense to maybe use a plugin
in the form of a shared object instead of using a helper executable.
> Either way, it sounds like series 1 will be covered by upstream, so
> next I would like to contribute support for git-local-* helpers. This
> allows users to create .git repositories with storage formats other
> than packs and builtin alternatives like reftables, which seems
> appropriate as direct sqlite support would probably be out of scope
> for core. Local helpers are already implemented in [2] but if it makes
> sense to hold off and rebuild it after e.g.
> ps/odb-generic-object-name-handling is merged, I am not in such a
> rush.
I've currently got around 10 more patch series pending that are mostly
ready to be sent out, but that all build on one another. As said, there
is a ton of stuff changing in the area of pluggable object databases,
and I expect it'll probably take two more Git releases until we have
fully carved out the foundation. Once that's done I think it should
become quieter, and at that point it'll become easier to also do
drive-by contributions without requiring too much coordination.
You can have a look at [1], which is our (non-official and
GitLab-specific) epic for the work that we have planned over the next
few months. Maybe it helps you a bit to figure out where we're going.
More concretely, next steps will be:
- I plan to turn in-memory, loose and packed backends into proper ODB
sources.
- I plan to introduce backend-specific consistency checks.
- I plan to introduce backend-specific logic for optimizations.
- I plan to introduce backend-specific logic of generating packfiles.
- Justin is revamping how writes work and plans to refactor existing
callers that do ad-hoc transactions. This will also eventually cover
writing packfiles into the ODB.
If you'd like to get involved earlier I'd propose that we sync off-list
to figure out how to collaborate without stepping on each others toes
all the time :)
Thanks!
Patrick |
This adds three ODB source vtable methods that were not part of the
recent ps/odb-sources and ps/object-counting series, plus caller
routing for object-name.c and fast-import.c.
New vtable methods:
write_packfile: Ingest a pack from a file descriptor. The files
backend chooses between index-pack (large packs) and
unpack-objects (small packs below fetch.unpackLimit). Options
cover thin-pack fixing, promisor marking, fsck, lockfile capture,
and shallow file passing. Non-files backends can handle pack
ingestion through their own mechanism.
for_each_unique_abbrev: Iterate objects matching a hex prefix for
disambiguation. The files backend searches loose objects via
oidtree, multi-pack indices, then non-MIDX packs.
convert_object_id: Translate between hash algorithms using the
loose object map. Used during SHA-1 to SHA-256 migration.
Caller routing:
object-name.c: The abbreviation and disambiguation paths
(find_short_object_filename, find_abbrev_len_packed, and
find_short_packed_object) directly access files-backend internals
(loose cache, pack store, MIDX). These are converted to dispatch
through the for_each_unique_abbrev vtable method, so that
non-files backends participate through proper abstraction rather
than being skipped.
fast-import.c: end_packfile() replaced direct pack indexing,
registration, and odb_source_files_downcast() with a call to
odb_write_packfile(). gfi_unpack_entry() falls back to
odb_read_object() when the pack slot is NULL (non-files backends
ingest packs without registering them on disk).
This addresses Patrick's feedback on the previous submission [1]:
the correct fix for downcast sites is proper vtable abstraction, not
skipping non-files backends.
Additional:
repository.h
Motivation: These methods are needed by the local helper backend
series [2], which delegates object and reference storage
to external git-local- helper processes. sqlite-git [3] is a
working proof of concept that stores objects, refs, and reflogs
in a single SQLite database with full worktree support.
CC: Junio C Hamano gitster@pobox.com, Patrick Steinhardt ps@pks.im
[1] https://github.com/gitgitgadget/git/pull/2068.patch
[2] https://github.com/gitgitgadget/git/compare/master...MayCXC:git:ps/series-2-helpers-v3.patch
[3] https://github.com/MayCXC/sqlite-git
cc: Patrick Steinhardt ps@pks.im