Skip to content

odb: add write_packfile, for_each_unique_abbrev, convert_object_id#2074

Open
MayCXC wants to merge 1 commit intogitgitgadget:masterfrom
MayCXC:ps/series-1-vtable-v3
Open

odb: add write_packfile, for_each_unique_abbrev, convert_object_id#2074
MayCXC wants to merge 1 commit intogitgitgadget:masterfrom
MayCXC:ps/series-1-vtable-v3

Conversation

@MayCXC
Copy link
Copy Markdown

@MayCXC MayCXC commented Mar 26, 2026

This adds three ODB source vtable methods that were not part of the
recent ps/odb-sources and ps/object-counting series, plus caller
routing for object-name.c and fast-import.c.

New vtable methods:

  • write_packfile: Ingest a pack from a file descriptor. The files
    backend chooses between index-pack (large packs) and
    unpack-objects (small packs below fetch.unpackLimit). Options
    cover thin-pack fixing, promisor marking, fsck, lockfile capture,
    and shallow file passing. Non-files backends can handle pack
    ingestion through their own mechanism.

  • for_each_unique_abbrev: Iterate objects matching a hex prefix for
    disambiguation. The files backend searches loose objects via
    oidtree, multi-pack indices, then non-MIDX packs.

  • convert_object_id: Translate between hash algorithms using the
    loose object map. Used during SHA-1 to SHA-256 migration.

Caller routing:

  • object-name.c: The abbreviation and disambiguation paths
    (find_short_object_filename, find_abbrev_len_packed, and
    find_short_packed_object) directly access files-backend internals
    (loose cache, pack store, MIDX). These are converted to dispatch
    through the for_each_unique_abbrev vtable method, so that
    non-files backends participate through proper abstraction rather
    than being skipped.

  • fast-import.c: end_packfile() replaced direct pack indexing,
    registration, and odb_source_files_downcast() with a call to
    odb_write_packfile(). gfi_unpack_entry() falls back to
    odb_read_object() when the pack slot is NULL (non-files backends
    ingest packs without registering them on disk).

This addresses Patrick's feedback on the previous submission [1]:
the correct fix for downcast sites is proper vtable abstraction, not
skipping non-files backends.

Additional:

  • ODB_SOURCE_HELPER added to the source type enum
  • odb/source-type.h extracted to avoid circular includes with
    repository.h
  • OBJECT_INFO_KEPT_ONLY flag for backends that track kept status
  • self_contained_out output field on odb_write_packfile_options

Motivation: These methods are needed by the local helper backend
series [2], which delegates object and reference storage
to external git-local- helper processes. sqlite-git [3] is a
working proof of concept that stores objects, refs, and reflogs
in a single SQLite database with full worktree support.

CC: Junio C Hamano gitster@pobox.com, Patrick Steinhardt ps@pks.im

[1] https://github.com/gitgitgadget/git/pull/2068.patch
[2] https://github.com/gitgitgadget/git/compare/master...MayCXC:git:ps/series-2-helpers-v3.patch
[3] https://github.com/MayCXC/sqlite-git

cc: Patrick Steinhardt ps@pks.im

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Mar 26, 2026

There is an issue in commit 3d511c2:
odb: add write_packfile, for_each_unique_abbrev, convert_object_id

  • Commit not signed off

@MayCXC MayCXC force-pushed the ps/series-1-vtable-v3 branch 2 times, most recently from 785108f to 146c7ed Compare March 26, 2026 13:00
@MayCXC
Copy link
Copy Markdown
Author

MayCXC commented Mar 26, 2026

/submit

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Mar 26, 2026

Submitted as pull.2074.git.1774530437562.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2074/MayCXC/ps/series-1-vtable-v3-v1

To fetch this version to local tag pr-2074/MayCXC/ps/series-1-vtable-v3-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2074/MayCXC/ps/series-1-vtable-v3-v1

@MayCXC MayCXC force-pushed the ps/series-1-vtable-v3 branch 2 times, most recently from 6e569fc to 5b3e9a8 Compare March 26, 2026 13:32
@MayCXC
Copy link
Copy Markdown
Author

MayCXC commented Mar 26, 2026

/submit

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Mar 26, 2026

Submitted as pull.2074.v2.git.1774532383055.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2074/MayCXC/ps/series-1-vtable-v3-v2

To fetch this version to local tag pr-2074/MayCXC/ps/series-1-vtable-v3-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2074/MayCXC/ps/series-1-vtable-v3-v2

@MayCXC MayCXC force-pushed the ps/series-1-vtable-v3 branch 8 times, most recently from a69b8d3 to e03f501 Compare March 26, 2026 23:59
Add three vtable methods to odb_source that were not part of the
recent ps/odb-sources and ps/object-counting series:

 - write_packfile: ingest a pack from a file descriptor. The files
   backend chooses between index-pack (large packs) and
   unpack-objects (small packs below fetch.unpackLimit). Options
   cover thin-pack fixing, promisor marking, fsck, lockfile
   capture, and shallow file passing.

 - for_each_unique_abbrev: iterate objects matching a hex prefix
   for disambiguation. Searches loose objects via oidtree, then
   multi-pack indices, then non-MIDX packs.

 - convert_object_id: translate between hash algorithms using the
   loose object map. Used during SHA-1 to SHA-256 migration.

Also add ODB_SOURCE_HELPER to the source type enum, preparing for
the helper backend in the next commit.

The write_packfile vtable method replaces the pattern where callers
spawn index-pack/unpack-objects directly. fast-import already uses
odb_write_packfile() and this allows non-files backends to handle
pack ingestion through their own mechanism.

Signed-off-by: Aaron Paterson <apaterson@pm.me>
@MayCXC MayCXC force-pushed the ps/series-1-vtable-v3 branch from e03f501 to f28dd43 Compare March 27, 2026 00:39
@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Apr 7, 2026

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Thu, Mar 26, 2026 at 01:39:43PM +0000, Aaron Paterson via GitGitGadget wrote:
> From: Aaron Paterson <apaterson@pm.me>
> 
> Add three vtable methods to odb_source that were not part of the
> recent ps/odb-sources and ps/object-counting series:
> 
>  - write_packfile: ingest a pack from a file descriptor. The files
>    backend chooses between index-pack (large packs) and
>    unpack-objects (small packs below fetch.unpackLimit). Options
>    cover thin-pack fixing, promisor marking, fsck, lockfile
>    capture, and shallow file passing.
> 
>  - for_each_unique_abbrev: iterate objects matching a hex prefix
>    for disambiguation. Searches loose objects via oidtree, then
>    multi-pack indices, then non-MIDX packs.
> 
>  - convert_object_id: translate between hash algorithms using the
>    loose object map. Used during SHA-1 to SHA-256 migration.

This will conflict with ps/odb-generic-object-name-handling, which
already introduces generic callbacks for `for_each_unique_abbrev()`.
There's also ongoing work by Justin to handle writing packfiles via the
ODB transaction interface.

> Also add ODB_SOURCE_HELPER to the source type enum, preparing for
> the helper backend in the next commit.

Huh.

> The write_packfile vtable method replaces the pattern where callers
> spawn index-pack/unpack-objects directly. fast-import already uses
> odb_write_packfile() and this allows non-files backends to handle
> pack ingestion through their own mechanism.

I'm again a bit puzzled, same as with your previous patch series. It
would be nice to collaborate on this topic, but that will require a bit
more coordination than just sending in a patch series as things are
quite in flux here.

Patrick

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Apr 7, 2026

User Patrick Steinhardt <ps@pks.im> has been added to the cc: list.

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Apr 7, 2026

apaterson@pm.me wrote on the Git mailing list (how to reply to this email):

Of course, and my apologies, gitgadget is not formatting these messages as clearly as I would like them to be.

Both this series and the last were adapted from my fork that supports [1] with a feature similar to gitremote-helpers. My hope is that the fork can converge with master so that sqlite-git can become redistributable. The local backends vtable was already a step in this direction, so the question is if letting users bring their own local backends, the way they currently can with helpers for remote backends, is in scope for git core.

Either way, it sounds like series 1 will be covered by upstream, so next I would like to contribute support for git-local-* helpers. This allows users to create .git repositories with storage formats other than packs and builtin alternatives like reftables, which seems appropriate as direct sqlite support would probably be out of scope for core. Local helpers are already implemented in [2] but if it makes sense to hold off and rebuild it after e.g. ps/odb-generic-object-name-handling is merged, I am not in such a rush.

[1] https://github.com/mayCXC/sqlite-git
[2] https://github.com/gitgitgadget/git/compare/master...MayCXC:git:ps/series-2-helpers-v3.patch

- Aaron

On Thursday, March 26th, 2026 at 7:58 AM, Patrick Steinhardt <ps@pks.im> wrote:

> On Thu, Mar 26, 2026 at 01:39:43PM +0000, Aaron Paterson via GitGitGadget wrote:
> > From: Aaron Paterson <apaterson@pm.me>
> >
> > Add three vtable methods to odb_source that were not part of the
> > recent ps/odb-sources and ps/object-counting series:
> >
> >  - write_packfile: ingest a pack from a file descriptor. The files
> >    backend chooses between index-pack (large packs) and
> >    unpack-objects (small packs below fetch.unpackLimit). Options
> >    cover thin-pack fixing, promisor marking, fsck, lockfile
> >    capture, and shallow file passing.
> >
> >  - for_each_unique_abbrev: iterate objects matching a hex prefix
> >    for disambiguation. Searches loose objects via oidtree, then
> >    multi-pack indices, then non-MIDX packs.
> >
> >  - convert_object_id: translate between hash algorithms using the
> >    loose object map. Used during SHA-1 to SHA-256 migration.
> 
> This will conflict with ps/odb-generic-object-name-handling, which
> already introduces generic callbacks for `for_each_unique_abbrev()`.
> There's also ongoing work by Justin to handle writing packfiles via the
> ODB transaction interface.
> 
> > Also add ODB_SOURCE_HELPER to the source type enum, preparing for
> > the helper backend in the next commit.
> 
> Huh.
> 
> > The write_packfile vtable method replaces the pattern where callers
> > spawn index-pack/unpack-objects directly. fast-import already uses
> > odb_write_packfile() and this allows non-files backends to handle
> > pack ingestion through their own mechanism.
> 
> I'm again a bit puzzled, same as with your previous patch series. It
> would be nice to collaborate on this topic, but that will require a bit
> more coordination than just sending in a patch series as things are
> quite in flux here.
> 
> Patrick
>

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Apr 7, 2026

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Thu, Mar 26, 2026 at 02:21:07PM +0000, apaterson@pm.me wrote:
> Of course, and my apologies, gitgadget is not formatting these
> messages as clearly as I would like them to be.
> 
> Both this series and the last were adapted from my fork that supports
> [1] with a feature similar to gitremote-helpers. My hope is that the
> fork can converge with master so that sqlite-git can become
> redistributable. The local backends vtable was already a step in this
> direction, so the question is if letting users bring their own local
> backends, the way they currently can with helpers for remote backends,
> is in scope for git core.

Thanks for the context! This also matches with our eventual goal, even
though we rather envision that it makes more sense to maybe use a plugin
in the form of a shared object instead of using a helper executable.

> Either way, it sounds like series 1 will be covered by upstream, so
> next I would like to contribute support for git-local-* helpers. This
> allows users to create .git repositories with storage formats other
> than packs and builtin alternatives like reftables, which seems
> appropriate as direct sqlite support would probably be out of scope
> for core. Local helpers are already implemented in [2] but if it makes
> sense to hold off and rebuild it after e.g.
> ps/odb-generic-object-name-handling is merged, I am not in such a
> rush.

I've currently got around 10 more patch series pending that are mostly
ready to be sent out, but that all build on one another. As said, there
is a ton of stuff changing in the area of pluggable object databases,
and I expect it'll probably take two more Git releases until we have
fully carved out the foundation. Once that's done I think it should
become quieter, and at that point it'll become easier to also do
drive-by contributions without requiring too much coordination.

You can have a look at [1], which is our (non-official and
GitLab-specific) epic for the work that we have planned over the next
few months. Maybe it helps you a bit to figure out where we're going.

More concretely, next steps will be:

  - I plan to turn in-memory, loose and packed backends into proper ODB
    sources.

  - I plan to introduce backend-specific consistency checks.

  - I plan to introduce backend-specific logic for optimizations.

  - I plan to introduce backend-specific logic of generating packfiles.

  - Justin is revamping how writes work and plans to refactor existing
    callers that do ad-hoc transactions. This will also eventually cover
    writing packfiles into the ODB.

If you'd like to get involved earlier I'd propose that we sync off-list
to figure out how to collaborate without stepping on each others toes
all the time :)

Thanks!

Patrick

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Apr 7, 2026

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Thu, Mar 26, 2026 at 01:39:43PM +0000, Aaron Paterson via GitGitGadget wrote:
> From: Aaron Paterson <apaterson@pm.me>
> 
> Add three vtable methods to odb_source that were not part of the
> recent ps/odb-sources and ps/object-counting series:
> 
>  - write_packfile: ingest a pack from a file descriptor. The files
>    backend chooses between index-pack (large packs) and
>    unpack-objects (small packs below fetch.unpackLimit). Options
>    cover thin-pack fixing, promisor marking, fsck, lockfile
>    capture, and shallow file passing.
> 
>  - for_each_unique_abbrev: iterate objects matching a hex prefix
>    for disambiguation. Searches loose objects via oidtree, then
>    multi-pack indices, then non-MIDX packs.
> 
>  - convert_object_id: translate between hash algorithms using the
>    loose object map. Used during SHA-1 to SHA-256 migration.

This will conflict with ps/odb-generic-object-name-handling, which
already introduces generic callbacks for `for_each_unique_abbrev()`.
There's also ongoing work by Justin to handle writing packfiles via the
ODB transaction interface.

> Also add ODB_SOURCE_HELPER to the source type enum, preparing for
> the helper backend in the next commit.

Huh.

> The write_packfile vtable method replaces the pattern where callers
> spawn index-pack/unpack-objects directly. fast-import already uses
> odb_write_packfile() and this allows non-files backends to handle
> pack ingestion through their own mechanism.

I'm again a bit puzzled, same as with your previous patch series. It
would be nice to collaborate on this topic, but that will require a bit
more coordination than just sending in a patch series as things are
quite in flux here.

Patrick

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Apr 7, 2026

apaterson@pm.me wrote on the Git mailing list (how to reply to this email):

Of course, and my apologies, gitgadget is not formatting these messages as clearly as I would like them to be.

Both this series and the last were adapted from my fork that supports [1] with a feature similar to gitremote-helpers. My hope is that the fork can converge with master so that sqlite-git can become redistributable. The local backends vtable was already a step in this direction, so the question is if letting users bring their own local backends, the way they currently can with helpers for remote backends, is in scope for git core.

Either way, it sounds like series 1 will be covered by upstream, so next I would like to contribute support for git-local-* helpers. This allows users to create .git repositories with storage formats other than packs and builtin alternatives like reftables, which seems appropriate as direct sqlite support would probably be out of scope for core. Local helpers are already implemented in [2] but if it makes sense to hold off and rebuild it after e.g. ps/odb-generic-object-name-handling is merged, I am not in such a rush.

[1] https://github.com/mayCXC/sqlite-git
[2] https://github.com/gitgitgadget/git/compare/master...MayCXC:git:ps/series-2-helpers-v3.patch

- Aaron

On Thursday, March 26th, 2026 at 7:58 AM, Patrick Steinhardt <ps@pks.im> wrote:

> On Thu, Mar 26, 2026 at 01:39:43PM +0000, Aaron Paterson via GitGitGadget wrote:
> > From: Aaron Paterson <apaterson@pm.me>
> >
> > Add three vtable methods to odb_source that were not part of the
> > recent ps/odb-sources and ps/object-counting series:
> >
> >  - write_packfile: ingest a pack from a file descriptor. The files
> >    backend chooses between index-pack (large packs) and
> >    unpack-objects (small packs below fetch.unpackLimit). Options
> >    cover thin-pack fixing, promisor marking, fsck, lockfile
> >    capture, and shallow file passing.
> >
> >  - for_each_unique_abbrev: iterate objects matching a hex prefix
> >    for disambiguation. Searches loose objects via oidtree, then
> >    multi-pack indices, then non-MIDX packs.
> >
> >  - convert_object_id: translate between hash algorithms using the
> >    loose object map. Used during SHA-1 to SHA-256 migration.
> 
> This will conflict with ps/odb-generic-object-name-handling, which
> already introduces generic callbacks for `for_each_unique_abbrev()`.
> There's also ongoing work by Justin to handle writing packfiles via the
> ODB transaction interface.
> 
> > Also add ODB_SOURCE_HELPER to the source type enum, preparing for
> > the helper backend in the next commit.
> 
> Huh.
> 
> > The write_packfile vtable method replaces the pattern where callers
> > spawn index-pack/unpack-objects directly. fast-import already uses
> > odb_write_packfile() and this allows non-files backends to handle
> > pack ingestion through their own mechanism.
> 
> I'm again a bit puzzled, same as with your previous patch series. It
> would be nice to collaborate on this topic, but that will require a bit
> more coordination than just sending in a patch series as things are
> quite in flux here.
> 
> Patrick
>

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget bot commented Apr 7, 2026

Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Thu, Mar 26, 2026 at 02:21:07PM +0000, apaterson@pm.me wrote:
> Of course, and my apologies, gitgadget is not formatting these
> messages as clearly as I would like them to be.
> 
> Both this series and the last were adapted from my fork that supports
> [1] with a feature similar to gitremote-helpers. My hope is that the
> fork can converge with master so that sqlite-git can become
> redistributable. The local backends vtable was already a step in this
> direction, so the question is if letting users bring their own local
> backends, the way they currently can with helpers for remote backends,
> is in scope for git core.

Thanks for the context! This also matches with our eventual goal, even
though we rather envision that it makes more sense to maybe use a plugin
in the form of a shared object instead of using a helper executable.

> Either way, it sounds like series 1 will be covered by upstream, so
> next I would like to contribute support for git-local-* helpers. This
> allows users to create .git repositories with storage formats other
> than packs and builtin alternatives like reftables, which seems
> appropriate as direct sqlite support would probably be out of scope
> for core. Local helpers are already implemented in [2] but if it makes
> sense to hold off and rebuild it after e.g.
> ps/odb-generic-object-name-handling is merged, I am not in such a
> rush.

I've currently got around 10 more patch series pending that are mostly
ready to be sent out, but that all build on one another. As said, there
is a ton of stuff changing in the area of pluggable object databases,
and I expect it'll probably take two more Git releases until we have
fully carved out the foundation. Once that's done I think it should
become quieter, and at that point it'll become easier to also do
drive-by contributions without requiring too much coordination.

You can have a look at [1], which is our (non-official and
GitLab-specific) epic for the work that we have planned over the next
few months. Maybe it helps you a bit to figure out where we're going.

More concretely, next steps will be:

  - I plan to turn in-memory, loose and packed backends into proper ODB
    sources.

  - I plan to introduce backend-specific consistency checks.

  - I plan to introduce backend-specific logic for optimizations.

  - I plan to introduce backend-specific logic of generating packfiles.

  - Justin is revamping how writes work and plans to refactor existing
    callers that do ad-hoc transactions. This will also eventually cover
    writing packfiles into the ODB.

If you'd like to get involved earlier I'd propose that we sync off-list
to figure out how to collaborate without stepping on each others toes
all the time :)

Thanks!

Patrick

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant