More work supporting objects larger than 4GB on Windows by dscho · Pull Request #2137 · gitgitgadget/git

dscho · 2026-06-04T07:25:22Z

This patch series tries to address the problems pointed out by the expensive tests that now run in CI: t5608 and t7508 verify various aspects about objects larger than 4GB, which Git does not currently handle correctly when run on a platform where size_t is 64-bit and unsigned long is 32-bit.

Changes vs v1:

Rebased onto master, which merged ps/odb-source-loose (with which these patches previously conflicted rather badly).
Removed superfluous size_t s variables (thanks, Patrick!).

Cc: Signed-off-by: Kristofer Karlsson krka@spotify.com
cc: Patrick Steinhardt ps@pks.im

dscho · 2026-06-04T10:49:59Z

/submit

gitgitgadget · 2026-06-04T10:52:00Z

Submitted as pull.2137.git.1780570272.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2137/dscho/objects-larger-than-4gb-on-windows-pt2-v1

To fetch this version to local tag pr-2137/dscho/objects-larger-than-4gb-on-windows-pt2-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2137/dscho/objects-larger-than-4gb-on-windows-pt2-v1

gitgitgadget · 2026-06-08T16:16:25Z

@@ -3232,7 +3232,7 @@ static int apply_binary_fragment(struct apply_state *state,
 				 struct patch *patch)


Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Thu, Jun 04, 2026 at 10:51:07AM +0000, Johannes Schindelin via GitGitGadget wrote: > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > `patch_delta()` takes the source and delta sizes by value and writes > back the reconstructed target size through an `unsigned long *`. That > datatype cannot represent a value that exceeds 4 GiB on systems where > `unsigned long` is 32-bit (notably 64-bit Windows builds), though, even > though the delta encoding itself, the on-disk layout, and the in-memory > buffers happily carry such sizes. A `size_t` companion to > `get_delta_hdr_size()`, `get_delta_hdr_size_sz()`, was introduced in > 17fa077596 (delta, packfile: use size_t for delta header sizes, > 2026-05-08) precisely so that `patch_delta()` could be widened without > changing the on-the-wire decoding helper's signature. > > Widen `patch_delta()`'s three size parameters to `size_t` and switch > its internal use of `get_delta_hdr_size()` to the `_sz` variant. > Then propagate the wider type through the callers. Does `get_delta_hdr_size()` have any remaining callers after this patch series? I currently only spot two such callers, and you convert both of them in this patch. And can we reasonably add a test case that exercises this change? > diff --git a/packfile.c b/packfile.c > index 89366abfe3..e202f48837 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -1964,10 +1964,8 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset, > (uintmax_t)curpos, p->pack_name); > data = NULL; > } else { > - unsigned long sz; > data = patch_delta(base, base_size, delta_data, > - delta_size, &sz); > - size = sz; > + delta_size, &size); Nice that we get rid of this awkward construct. Patrick

Johannes Schindelin wrote on the Git mailing list (how to reply to this email):

Hi Patrick, On Mon, 15 Jun 2026, Patrick Steinhardt wrote: > On Thu, Jun 04, 2026 at 10:51:07AM +0000, Johannes Schindelin via GitGitGadget wrote: > > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > > > `patch_delta()` takes the source and delta sizes by value and writes > > back the reconstructed target size through an `unsigned long *`. That > > datatype cannot represent a value that exceeds 4 GiB on systems where > > `unsigned long` is 32-bit (notably 64-bit Windows builds), though, even > > though the delta encoding itself, the on-disk layout, and the in-memory > > buffers happily carry such sizes. A `size_t` companion to > > `get_delta_hdr_size()`, `get_delta_hdr_size_sz()`, was introduced in > > 17fa077596 (delta, packfile: use size_t for delta header sizes, > > 2026-05-08) precisely so that `patch_delta()` could be widened without > > changing the on-the-wire decoding helper's signature. > > > > Widen `patch_delta()`'s three size parameters to `size_t` and switch > > its internal use of `get_delta_hdr_size()` to the `_sz` variant. > > Then propagate the wider type through the callers. > > Does `get_delta_hdr_size()` have any remaining callers after this patch > series? I currently only spot two such callers, and you convert both of > them in this patch. As you noticed later on in the review: No, there are no such callers left, and the `_sz` variant gets renamed, concluding the incremental migration of that function from `unsigned long` to `size_t`. > And can we reasonably add a test case that exercises this change? Not reasonably, no. This would require constructing another artificial _large_ object, this time with an unpacked Git object with a size >=4GB that needs to be transmogrified into a different object. Better leave the verification of this patch to static analysis (GCC or Clang have become quite good at spotting things like this; Coverity would be, too, if it ever comes back up from its "upgrades to the Scan servers", https://web.archive.org/web/20260516152422/https://scan.coverity.com/ seems to be the start date of this update). > > > diff --git a/packfile.c b/packfile.c > > index 89366abfe3..e202f48837 100644 > > --- a/packfile.c > > +++ b/packfile.c > > @@ -1964,10 +1964,8 @@ void *unpack_entry(struct repository *r, struct packed_git *p, off_t obj_offset, > > (uintmax_t)curpos, p->pack_name); > > data = NULL; > > } else { > > - unsigned long sz; > > data = patch_delta(base, base_size, delta_data, > > - delta_size, &sz); > > - size = sz; > > + delta_size, &size); > > Nice that we get rid of this awkward construct. Awkward, but necessary to allow for an incremental, reviewable conversion ;-) Ciao, Johannes

gitgitgadget · 2026-06-08T16:16:28Z

User Patrick Steinhardt <ps@pks.im> has been added to the cc: list.

gitgitgadget · 2026-06-08T16:16:45Z

@@ -453,7 +453,7 @@ static int check_pack_inflate(struct packed_git *p,
 		struct pack_window **w_curs,


Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Thu, Jun 04, 2026 at 10:51:08AM +0000, Johannes Schindelin via GitGitGadget wrote: > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > `write_reuse_object()` learned to track its packed-object size as > `size_t` in 606c192380 (odb, packfile: use size_t for streaming > object sizes, 2026-05-08), but the comparison sink it feeds, > `check_pack_inflate()`, still takes the expected decompressed size > as `unsigned long`. The call site bridges the mismatch with > `cast_size_t_to_ulong()`, which on Windows turns a >4 GiB object > into an immediate die(). > > That function only uses `expect` once: as the right-hand side of a > `stream.total_out == expect` equality test against zlib's counter. > zlib's own `total_out` counter is `uLong` and is therefore still > 32-bit-bound on Windows. Widening `expect` to `size_t` cannot fix that, > but it is a strict improvement nonetheless: instead of dying outright, > an oversized object now simply makes the equality fail and lets > `write_reuse_object()` fall back to `write_no_reuse_object()`, which > decompresses and re-deflates the content (and which the larger > pack-objects widening series targets separately). Hm. I wonder whether it's possible to reset `stream.total_out` on every iteration and instead have a local `size_t` variable that we use to track the total number of inflated bytes? Patrick

Johannes Schindelin wrote on the Git mailing list (how to reply to this email):

Hi Patrick, On Mon, 15 Jun 2026, Patrick Steinhardt wrote: > On Thu, Jun 04, 2026 at 10:51:08AM +0000, Johannes Schindelin via GitGitGadget wrote: > > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > > > `write_reuse_object()` learned to track its packed-object size as > > `size_t` in 606c192380 (odb, packfile: use size_t for streaming > > object sizes, 2026-05-08), but the comparison sink it feeds, > > `check_pack_inflate()`, still takes the expected decompressed size > > as `unsigned long`. The call site bridges the mismatch with > > `cast_size_t_to_ulong()`, which on Windows turns a >4 GiB object > > into an immediate die(). > > > > That function only uses `expect` once: as the right-hand side of a > > `stream.total_out == expect` equality test against zlib's counter. > > zlib's own `total_out` counter is `uLong` and is therefore still > > 32-bit-bound on Windows. Widening `expect` to `size_t` cannot fix that, > > but it is a strict improvement nonetheless: instead of dying outright, > > an oversized object now simply makes the equality fail and lets > > `write_reuse_object()` fall back to `write_no_reuse_object()`, which > > decompresses and re-deflates the content (and which the larger > > pack-objects widening series targets separately). > > Hm. I wonder whether it's possible to reset `stream.total_out` on every > iteration and instead have a local `size_t` variable that we use to > track the total number of inflated bytes? Possible? Yes. Appropriate? Unlikely. We would now pretend to have inflated less bytes, _just_ to appease a data type limitation that we already worked around in d05d666977 (git-zlib: handle data streams larger than 4GB, 2026-05-08). Ciao, Johannes

gitgitgadget · 2026-06-08T16:17:07Z

@@ -1239,6 +1239,8 @@ static void *gfi_unpack_entry(
 	unsigned long *sizep)


Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Thu, Jun 04, 2026 at 10:51:09AM +0000, Johannes Schindelin via GitGitGadget wrote: > diff --git a/builtin/fast-import.c b/builtin/fast-import.c > index 82bc6dcc00..3dff898c43 100644 > --- a/builtin/fast-import.c > +++ b/builtin/fast-import.c > @@ -1239,6 +1239,8 @@ static void *gfi_unpack_entry( > unsigned long *sizep) > { > enum object_type type; > + size_t size_st = 0; > + void *data; > struct packed_git *p = all_packs[oe->pack_id]; > if (p == pack_data && p->pack_size < (pack_size + the_hash_algo->rawsz)) { > /* The object is stored in the packfile we are writing to > @@ -1260,7 +1262,10 @@ static void *gfi_unpack_entry( > */ > p->pack_size = pack_size + the_hash_algo->rawsz; > } > - return unpack_entry(the_repository, p, oe->idx.offset, &type, sizep); > + data = unpack_entry(the_repository, p, oe->idx.offset, &type, &size_st); > + if (sizep) > + *sizep = cast_size_t_to_ulong(size_st); > + return data; > } Nit, please feel free to ignore: do we want to add a NEEDSWORK comment here? Patrick

Johannes Schindelin wrote on the Git mailing list (how to reply to this email):

Hi Patrick, On Mon, 15 Jun 2026, Patrick Steinhardt wrote: > On Thu, Jun 04, 2026 at 10:51:09AM +0000, Johannes Schindelin via GitGitGadget wrote: > > diff --git a/builtin/fast-import.c b/builtin/fast-import.c > > index 82bc6dcc00..3dff898c43 100644 > > --- a/builtin/fast-import.c > > +++ b/builtin/fast-import.c > > @@ -1239,6 +1239,8 @@ static void *gfi_unpack_entry( > > unsigned long *sizep) > > { > > enum object_type type; > > + size_t size_st = 0; > > + void *data; > > struct packed_git *p = all_packs[oe->pack_id]; > > if (p == pack_data && p->pack_size < (pack_size + the_hash_algo->rawsz)) { > > /* The object is stored in the packfile we are writing to > > @@ -1260,7 +1262,10 @@ static void *gfi_unpack_entry( > > */ > > p->pack_size = pack_size + the_hash_algo->rawsz; > > } > > - return unpack_entry(the_repository, p, oe->idx.offset, &type, sizep); > > + data = unpack_entry(the_repository, p, oe->idx.offset, &type, &size_st); > > + if (sizep) > > + *sizep = cast_size_t_to_ulong(size_st); > > + return data; > > } > > Nit, please feel free to ignore: do we want to add a NEEDSWORK comment > here? Hehe... My mind translates the `cast_size_t_to_ulong()` function to "NEEDSWORK!" already ;-) Ciao, Johannes

gitgitgadget · 2026-06-08T16:17:32Z

@@ -86,11 +86,8 @@ void *patch_delta(const void *src_buf, unsigned long src_size,
 * This must be called twice on the delta data buffer, first to get the


Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Thu, Jun 04, 2026 at 10:51:11AM +0000, Johannes Schindelin via GitGitGadget wrote: > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > When I started the transition from `unsigned long` to `size_t`, in the > interest of keeping the patches reviewable, I introduced these calls to > prevent data type narrowing from silently failing to handle large object > sizes. I also introduced `*_sz()` variants that would allow most of the > callers to keep using that `unsigned long` that the 90s kindly asked to > be returned. > > After the preceding commits, the only places that called the narrow > wrappers either no longer exist or already use the `_sz` form > internally, so the wrappers just narrow values back through > `cast_size_t_to_ulong()` for no reason. > > Drop them and rename the `_sz` variants back to the natural names. Aha, so you already address my comment I had on one of the preceding patches :) Patrick

gitgitgadget · 2026-06-08T16:18:01Z

@@ -3321,7 +3321,7 @@ static int apply_binary(struct apply_state *state,
 	if (odb_has_object(the_repository->objects, &oid, 0)) {


Patrick Steinhardt wrote on the Git mailing list (how to reply to this email):

On Thu, Jun 04, 2026 at 10:51:12AM +0000, Johannes Schindelin via GitGitGadget wrote: > diff --git a/builtin/cat-file.c b/builtin/cat-file.c > index fa45f774d7..fa6e396ddc 100644 > --- a/builtin/cat-file.c > +++ b/builtin/cat-file.c > @@ -120,7 +120,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) > struct object_id oid; > enum object_type type; > char *buf; > - unsigned long size; > + size_t size; > struct object_context obj_context = {0}; > struct object_info oi = OBJECT_INFO_INIT; > unsigned flags = OBJECT_INFO_LOOKUP_REPLACE; > @@ -166,7 +166,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) > if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) { > size_t s = size; > buf = replace_idents_using_mailmap(buf, &s); > - size = cast_size_t_to_ulong(s); > + size = s; > } > > printf("%"PRIuMAX"\n", (uintmax_t)size); Can't we drop this local variable completely and instead supply `&size` directly? > @@ -219,7 +225,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) > if (use_mailmap) { > size_t s = size; > buf = replace_idents_using_mailmap(buf, &s); > - size = cast_size_t_to_ulong(s); > + size = s; > } > > /* otherwise just spit out the data */ > @@ -266,7 +272,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) > if (use_mailmap) { > size_t s = size; > buf = replace_idents_using_mailmap(buf, &s); > - size = cast_size_t_to_ulong(s); > + size = s; > } > break; > } > @@ -446,7 +455,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d > if (use_mailmap) { > size_t s = size; > contents = replace_idents_using_mailmap(contents, &s); > - size = cast_size_t_to_ulong(s); > + size = s; > } > > if (type != data->type) Likewise for these three instances. > @@ -555,7 +564,7 @@ static void batch_object_write(const char *obj_name, > if (!buf) > die(_("unable to read %s"), oid_to_hex(&data->oid)); > buf = replace_idents_using_mailmap(buf, &s); > - data->size = cast_size_t_to_ulong(s); > + data->size = s; > > free(buf); > } And I think this site here can be adapted, as well. > diff --git a/diff.c b/diff.c > index 5a584fa1d5..816b89dc6c 100644 > --- a/diff.c > +++ b/diff.c > @@ -4594,8 +4594,9 @@ int diff_populate_filespec(struct repository *r, > } > } > else { > + size_t size_st = 0; > struct object_info info = { > - .sizep = &s->size > + .sizep = &size_st > }; > > if (!(size_only || check_binary)) > @@ -4617,6 +4618,7 @@ int diff_populate_filespec(struct repository *r, > die("unable to read %s", oid_to_hex(&s->oid)); > > object_read: > + s->size = cast_size_t_to_ulong(size_st); > if (size_only || check_binary) { > if (size_only) > return 0; > @@ -4631,6 +4633,7 @@ object_read: > if (odb_read_object_info_extended(r->objects, &s->oid, &info, > OBJECT_INFO_LOOKUP_REPLACE)) > die("unable to read %s", oid_to_hex(&s->oid)); > + s->size = cast_size_t_to_ulong(size_st); > } > s->should_free = 1; > } The flow in this function is quite weird if you ask me, but that's a preexisting issue. This does look correct to me, even if it's awkward. Patrick

Johannes Schindelin wrote on the Git mailing list (how to reply to this email):

Hi Patrick, On Mon, 15 Jun 2026, Patrick Steinhardt wrote: > On Thu, Jun 04, 2026 at 10:51:12AM +0000, Johannes Schindelin via GitGitGadget wrote: > > diff --git a/builtin/cat-file.c b/builtin/cat-file.c > > index fa45f774d7..fa6e396ddc 100644 > > --- a/builtin/cat-file.c > > +++ b/builtin/cat-file.c > > @@ -120,7 +120,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) > > struct object_id oid; > > enum object_type type; > > char *buf; > > - unsigned long size; > > + size_t size; > > struct object_context obj_context = {0}; > > struct object_info oi = OBJECT_INFO_INIT; > > unsigned flags = OBJECT_INFO_LOOKUP_REPLACE; > > @@ -166,7 +166,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) > > if (use_mailmap && (type == OBJ_COMMIT || type == OBJ_TAG)) { > > size_t s = size; > > buf = replace_idents_using_mailmap(buf, &s); > > - size = cast_size_t_to_ulong(s); > > + size = s; > > } > > > > printf("%"PRIuMAX"\n", (uintmax_t)size); > > Can't we drop this local variable completely and instead supply `&size` > directly? Well spotted! > > @@ -219,7 +225,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) > > if (use_mailmap) { > > size_t s = size; > > buf = replace_idents_using_mailmap(buf, &s); > > - size = cast_size_t_to_ulong(s); > > + size = s; > > } > > > > /* otherwise just spit out the data */ > > @@ -266,7 +272,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name) > > if (use_mailmap) { > > size_t s = size; > > buf = replace_idents_using_mailmap(buf, &s); > > - size = cast_size_t_to_ulong(s); > > + size = s; > > } > > break; > > } > > @@ -446,7 +455,7 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d > > if (use_mailmap) { > > size_t s = size; > > contents = replace_idents_using_mailmap(contents, &s); > > - size = cast_size_t_to_ulong(s); > > + size = s; > > } > > > > if (type != data->type) > > Likewise for these three instances. I totally agree. > > @@ -555,7 +564,7 @@ static void batch_object_write(const char *obj_name, > > if (!buf) > > die(_("unable to read %s"), oid_to_hex(&data->oid)); > > buf = replace_idents_using_mailmap(buf, &s); > > - data->size = cast_size_t_to_ulong(s); > > + data->size = s; > > > > free(buf); > > } > > And I think this site here can be adapted, as well. Indeed! > > diff --git a/diff.c b/diff.c > > index 5a584fa1d5..816b89dc6c 100644 > > --- a/diff.c > > +++ b/diff.c > > @@ -4594,8 +4594,9 @@ int diff_populate_filespec(struct repository *r, > > } > > } > > else { > > + size_t size_st = 0; > > struct object_info info = { > > - .sizep = &s->size > > + .sizep = &size_st > > }; > > > > if (!(size_only || check_binary)) > > @@ -4617,6 +4618,7 @@ int diff_populate_filespec(struct repository *r, > > die("unable to read %s", oid_to_hex(&s->oid)); > > > > object_read: > > + s->size = cast_size_t_to_ulong(size_st); > > if (size_only || check_binary) { > > if (size_only) > > return 0; > > @@ -4631,6 +4633,7 @@ object_read: > > if (odb_read_object_info_extended(r->objects, &s->oid, &info, > > OBJECT_INFO_LOOKUP_REPLACE)) > > die("unable to read %s", oid_to_hex(&s->oid)); > > + s->size = cast_size_t_to_ulong(size_st); > > } > > s->should_free = 1; > > } > > The flow in this function is quite weird if you ask me, but that's a > preexisting issue. This does look correct to me, even if it's awkward. Yes, on all four accounts. Ciao, Johannes

On Windows, `unsigned long` and `long` are 32 bits even on 64-bit builds. The MSVC compatibility header has shimmed `ftruncate()` with #define ftruncate _chsize ever since `compat/msvc-posix.h` was introduced. `_chsize()` takes a 32-bit `long` for the new length, which silently truncates files (and the requested size) to 2 GiB. That is enough to make t7508 test 126 "git add fails gracefully with 4 GiB and 8 GiB files" fail under MSVC: `test-tool truncate` creates a sparse 4 GiB or 8 GiB file via the shimmed `ftruncate()`, and the test never gets off the ground. `_chsize_s()` is the modern replacement, accepts a 64-bit `__int64` length, and is the only sensible target on Windows. The catch is that it does not follow the POSIX `-1` + `errno` convention: it returns `0` on success and an errno value (a small positive integer) on failure. A plain `#define ftruncate _chsize_s` would therefore silently break callers that test the return value as `< 0` or against `-1`, of which there are several: `http.c`, `parallel-checkout.c`, and `t/helper/test-truncate.c` among them. Introduce a `static inline` wrapper that calls `_chsize_s()`, copies its errno return into `errno`, and translates the result to the familiar `-1` / `0` convention, then point `ftruncate` at the wrapper. Place the wrapper after `#include "mingw-posix.h"` so the `off_t` parameter resolves to the already-widened `off64_t` rather than the 32-bit `_off_t` from `compat/vcbuild/include/unistd.h`. MinGW is unaffected: its `ftruncate()` already takes `off_t` and routes through `ftruncate64()` when `_FILE_OFFSET_BITS=64`, which is the default in our build. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

`patch_delta()` takes the source and delta sizes by value and writes back the reconstructed target size through an `unsigned long *`. That datatype cannot represent a value that exceeds 4 GiB on systems where `unsigned long` is 32-bit (notably 64-bit Windows builds), though, even though the delta encoding itself, the on-disk layout, and the in-memory buffers happily carry such sizes. A `size_t` companion to `get_delta_hdr_size()`, `get_delta_hdr_size_sz()`, was introduced in 17fa077 (delta, packfile: use size_t for delta header sizes, 2026-05-08) precisely so that `patch_delta()` could be widened without changing the on-the-wire decoding helper's signature. Widen `patch_delta()`'s three size parameters to `size_t` and switch its internal use of `get_delta_hdr_size()` to the `_sz` variant. Then propagate the wider type through the callers. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

`write_reuse_object()` learned to track its packed-object size as `size_t` in 606c192 (odb, packfile: use size_t for streaming object sizes, 2026-05-08), but the comparison sink it feeds, `check_pack_inflate()`, still takes the expected decompressed size as `unsigned long`. The call site bridges the mismatch with `cast_size_t_to_ulong()`, which on Windows turns a >4 GiB object into an immediate die(). That function only uses `expect` once: as the right-hand side of a `stream.total_out == expect` equality test against zlib's counter. zlib's own `total_out` counter is `uLong` and is therefore still 32-bit-bound on Windows. Widening `expect` to `size_t` cannot fix that, but it is a strict improvement nonetheless: instead of dying outright, an oversized object now simply makes the equality fail and lets `write_reuse_object()` fall back to `write_no_reuse_object()`, which decompresses and re-deflates the content (and which the larger pack-objects widening series targets separately). Drop the `cast_size_t_to_ulong()` shim at the call site now that the receiving parameter speaks the same type as `entry_size`. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

The topic `js/objects-larger-than-4gb-on-windows` widened the streaming, index-pack and unpack-objects paths to `size_t` but deliberately stopped at the in-memory `unpack_entry()` cascade, which still hands back the unpacked size through `unsigned long *`. On Windows that boundary truncates above 4 GiB because that data type is only 32 bits wide on that platform. Widen the code path. Except `packed_object_info_with_index_pos()`: It cannot yet pass `oi->sizep` directly because the field is still `unsigned long *`; bridge it with a `size_t` temporary that narrows back, and let a later commit drop the bridge once the field is wide too. `gfi_unpack_entry()` keeps its narrow signature because fast-import tracks sizes through `unsigned long` everywhere it crosses subsystem boundaries, keeping its signature allows the scope of this commit to be somewhat reasonable, still. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

`pack-objects` stores per-entry object sizes in either the 31-bit `size_` member of the `struct object_entry` or, when the value does not fit, the `pack->delta_size[]` spill array. The accessors (`oe_size`, `oe_delta_size`, `oe_get_size_slow`, `oe_size_*_than`) and the setters (`oe_set_size`, `oe_set_delta_size`) used `unsigned long` for the spill type, which on Windows means the spill silently caps at 4 GiB per entry. That is what made `upload-pack` die with "object too large to read on this platform" when serving the >4 GiB blob in `t5608` tests 5 and 6 when run with `GIT_TEST_CLONE_2GB`. Widen them all to `size_t` (including `pack->delta_size`) and drop the three `cast_size_t_to_ulong()` calls in `check_object()` that guarded `in_pack_size`. The two `SET_SIZE(entry, canonical_size)` calls in the same function stay cast-free as before, since `canonical_size` is still `unsigned long` until a later commit widens `object_info::sizep`. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

When I started the transition from `unsigned long` to `size_t`, in the interest of keeping the patches reviewable, I introduced these calls to prevent data type narrowing from silently failing to handle large object sizes. I also introduced `*_sz()` variants that would allow most of the callers to keep using that `unsigned long` that the 90s kindly asked to be returned. After the preceding commits, the only places that called the narrow wrappers either no longer exist or already use the `_sz` form internally, so the wrappers just narrow values back through `cast_size_t_to_ulong()` for no reason. Drop them and rename the `_sz` variants back to the natural names. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

When `js/objects-larger-than-4gb-on-windows` widened the streaming, index-pack and unpack-objects code paths, in the interest of keeping the patches somewhat reasonably-sized, it left the public ODB API still typed in `unsigned long`. In particular `struct object_info::sizep` and the four wrappers built on top of it (`odb_read_object`, `odb_read_object_peeled`, `odb_read_object_info`, `odb_pretend_object`) still return the unpacked size through `unsigned long *`, so on Windows `cat-file -s` and the `git add` / `git status` paths for a >4 GiB blob silently cap at 4 GiB. Widen the field and the four wrappers. The previous commits already widened the `unpack_entry()` cascade and pack-objects' in-core size accessors, so most of the cascade arrives here with no further work: the temporary shims in `packed_object_info_with_index_pos()` and in `unpack_entry()`'s delta-base recovery path go away, the two `SET_SIZE(entry, cast_size_t_to_ulong(canonical_size))` calls in `check_object()` and the matching one in `drop_reused_delta()` collapse to plain `SET_SIZE`, and `oe_get_size_slow()`'s tail `cast_size_t_to_ulong()` is gone too. What remains narrow are the boundaries this series does not intend to touch: the diff, blame, textconv and fast-import machinery. Even so, this patch is unfortunately quite large. Assisted-by: Opus 4.7 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

dscho · 2026-06-15T11:51:22Z

/submit

gitgitgadget · 2026-06-15T11:53:03Z

Submitted as pull.2137.v2.git.1781524349.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-2137/dscho/objects-larger-than-4gb-on-windows-pt2-v2

To fetch this version to local tag pr-2137/dscho/objects-larger-than-4gb-on-windows-pt2-v2:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-2137/dscho/objects-larger-than-4gb-on-windows-pt2-v2

dscho self-assigned this Jun 4, 2026

dscho force-pushed the objects-larger-than-4gb-on-windows-pt2 branch from 2491da9 to f3aeae9 Compare June 4, 2026 08:04

dscho mentioned this pull request Jun 4, 2026

t5608, t7508: require LONG_IS_64BIT for >4GB tests #2129

Closed

dscho mentioned this pull request Jun 5, 2026

Missing Warning Leads to Silent Data Loss When Committing Large Files (>4GB) git-for-windows/git#6012

Open

1 task

gitgitgadget Bot reviewed Jun 8, 2026

View reviewed changes

dscho added 7 commits June 15, 2026 07:40

dscho force-pushed the objects-larger-than-4gb-on-windows-pt2 branch from f3aeae9 to 37d030d Compare June 15, 2026 09:24

		@@ -3232,7 +3232,7 @@ static int apply_binary_fragment(struct apply_state *state,
		struct patch *patch)

		@@ -453,7 +453,7 @@ static int check_pack_inflate(struct packed_git *p,
		struct pack_window **w_curs,

		@@ -1239,6 +1239,8 @@ static void *gfi_unpack_entry(
		unsigned long *sizep)

		@@ -86,11 +86,8 @@ void patch_delta(const void src_buf, unsigned long src_size,
		* This must be called twice on the delta data buffer, first to get the

		@@ -3321,7 +3321,7 @@ static int apply_binary(struct apply_state *state,
		if (odb_has_object(the_repository->objects, &oid, 0)) {

Conversation

dscho commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dscho commented Jun 4, 2026

Uh oh!

gitgitgadget Bot commented Jun 4, 2026

Uh oh!

gitgitgadget Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot commented Jun 8, 2026

Uh oh!

gitgitgadget Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

gitgitgadget Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

dscho commented Jun 15, 2026

Uh oh!

gitgitgadget Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dscho commented Jun 4, 2026 •

edited

Loading