Description of problem: git build started failing on tests with zlib-1.2.11-8 on f28: https://koji.fedoraproject.org/koji/taskinfo?taskID=27187823 It can be seen from build.log that the tests fail b/c "git log" prints garbage after the correct log entries. It turns out that the same git binary package works with zlib-1.2.11-7. Version-Release number of selected component (if applicable): zlib-1.2.11-8.fc28.aarch64 How reproducible: by building git, which executes tests in the %check phase, or using the simplified steps below, where commit() { echo "$1" >tracked && git add tracked && git commit -m "$1"; } First step of the reproducer can be done with the old version of zlib to show that the problematic code is executed only in the last step. Steps to Reproduce: 1. mkdir t5538-push-shallow; cd t5538-push-shallow; git init . 2. commit 1 3. git repack 4. dnf upgrade zlib-1.2.11-8 5. git log --format="%s" Actual results: 1 U Expected results: 1 Additional info: The U in the log output is garbage and the git test output show more examples of (mostly binary) garbage. Moreover, I built git without optimizations (-O0); I had suspected a compiler bug. It has the same problem, but the binary garbage printed by "git log" is sometimes different from an optimized build. This strongly suggests there is a bad memory access somewhere. Here is a ltrace output showing what does git call in zlib. $ ltrace -e @libz.so.1 git log --format="%s" libz.so.1->inflateInit2_(0xffffffc7b7a8, 15, 0xaaaab34461a8, 112 <unfinished ...> libz.so.1->malloc(7160) = 0xaaaadf0d6740 libz.so.1->inflateReset2(0xffffffc7b7a8, 15, 7169, 0xaaaadf0d8330 <unfinished ...> libz.so.1->inflateReset(0xffffffc7b7a8, 0, 0xffffbe580108, 0xffffc0cc <unfinished ...> libz.so.1->inflateResetKeep(0xffffffc7b7a8, 0, 0xaaaadf0d6740, 0xffffc0cc) = 0 <... inflateReset resumed> ) = 0 <... inflateReset2 resumed> ) = 0 <... inflateInit2_ resumed> ) = 0 libz.so.1->adler32(0, 0, 0, 0 <unfinished ...> libz.so.1->adler32_z(0, 0, 0, 0) = 1 <... adler32 resumed> ) = 1 libz.so.1->adler32(1, 0xaaaadf0d40a0, 218, 0 <unfinished ...> libz.so.1->adler32_z(1, 0xaaaadf0d40a0, 218, 0) = 0x709d4367 <... adler32 resumed> ) = 0x709d4367 libz.so.1->memset(0xaaaadf0d417a, 'U', 1) = 0xaaaadf0d417a libz.so.1->free(0xaaaadf0d6740) = <void> 1 U --- SIGCHLD (Child exited) --- libz.so.1->__cxa_finalize(0xffffbe59fac0, 0, 0xffffbe572510, 0) = 0xffffbe4f4fa8 +++ exited (status 0) +++
Thanks for tracking this down Pavel. I hit this last night while trying to build a minor git bugfix release, as it breaks the test suite badly. I would imagine this causes breakage in many other packages too, many which may not have test suites that catch the issue before it reaches users.
Jeremy can you take a look at this?
Yes, I will get the patch authors to take a look. Thanks,
Jeremy, can we please also have the links to the upstream pull requests? Or all the links pointing to (attempts for) upstream discussions?
(In reply to Pavel Raiskup from comment #4) > Jeremy, can we please also have the links to the upstream pull requests? > Or all the links pointing to (attempts for) upstream discussions? They're all AFAIK at the upstream maintainers githhub (along with the 100 odd others): https://github.com/madler/zlib/pulls
Joining the discussion, I will start looking into it ASAP.
I thought this sounded like a zlib bug, but I think I've tracked it down to what appears to be a bug in git, which i've fixed and now all the unit tests have passed. I'm going to get someone else to take a quick peek and doublecheck the patch.
Interesting. Thanks for poking at this Jeremy and adenilson!
So, I think I fully understand what is happening now. "Bug" is a bit strong, more a misinterpretation of whats allowed. So git, in unpack_compressed_entry() is allocating a buffer of 'size' using xmallocz_gently() which actually allocates a slightly larger buffer and null terminates it. The code in unpack_compressed_entry() is utilizing this feature to go ahead and pass size+1 to zlib as the size of the buffer. To zlib it considers that it owns a buffer of size+1 and does a correct decompression (sets consumed/remaining sizes correctlt) but poisons the remainder of the buffer. The zlib code then gets the return it expects and continues on its merry way not considering that the null may have been poisoned. Initially a quck hack to set the buffer passed to zlib as 'size' cleans up all the unit test failures, despite the fact that zlib is triggering an error case (which is ignored due to a successful return) due to the extra byte being consumed (AKA git's overconsumption check fails if the overconsumption is exactly 1 byte).
Posted a patch here: https://public-inbox.org/git/20180525231713.23047-1-lintonrjeremy@gmail.com/T/#u I'm changing the component.
Comment #9 above should read "The git code.."
Thanks for the quick diagnosis and patch Jeremy! I have a scratch build running now: https://koji.fedoraproject.org/koji/taskinfo?taskID=27194025 I'll check back on that shortly and follow up on your patch on the git list.
Pavel, can you give test the scratch build on aarch64 and confirm it fixes things? The test suite passes so I'm confident it will, but it's always good to have extra confirmation. Plus this has been broken on f28 aarch64 for about a month. Another day or two won't hurt at this point. I pushed this to the master branch of my fork: https://src.fedoraproject.org/fork/tmz/rpms/git
Please, consider waiting for upstream git fix till we break the upstream-first rule again.
Yeah, I don't intend to push anything until upstream has a consensus. I'm following that discussion closely. I built those scratch packages just for testing to help confirm that it resolved the issue. I don't have access to any aarch64 hardware, so the only testing I'm able to do is running the test suite in koji. That's why I asked about testing the case where you noticed the issue. :)
In unfortunate timing, a git released an update today which fixes 2 security issues (CVE-2018-11233 & CVE-2018-11235)¹. This forces our hand a bit with regard to fixing the aarch64 builds. We could disable the tests on aarch64 to allow a successful git build (which we know to be broken). That's certainly not ideal. Instead, I've applied Jeremy's packfile patch to aarch64 only. Upstream has some questions about the patch, which can hopefully be resolved soon and the patch (or some version of it) can be accepted. (I wish I had seen the broken git builds in koschei when they showed up weeks ago. I don't know if those failed builds didn't generate any notification or if I've disabled the notifications because they too frequently were due to transient issues.) ¹ I've alerted the Red Hat security team so they can create tracking bugs for the issues
git-2.17.1-2.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-75f7624a9f
Still there's possibility that our zlib is broken; could we fix the optimization patches so the input/output API isn't changed at all?
I hope that the conversation on the git list continues and a consensus is reached on the proper changes and in which components, of course. Since there is a relatively high risk vulnerability requiring a git build (#1583862), I didn't feel that we had much choice but to apply the git patch on aarch64. Seeing that git was fairly well broken on aarch64 already, it's unlikely that it's going to be worse with the patch. :) So far as I can tell, upstream is (rightly) concerned that the patch undoes a previous change to the deflate handling in packfile.c and wants to be sure that the fix isn't going to cause any regressions. The tests suite still passes, including the test which was added at that time (in 39eea7bdd9 ("Fix incorrect error check while reading deflated pack data", 2009-10-21)). That makes me think that even if the patch isn't the ideal fix, it's not going to cause problems. I only applied the current git patches on aarch64 and I don't have the update set to close this bug when the update goes stable.
I'm a bit split about the "correct" fix. The zlib changes are as I understand it intentionally poisioning the buffer, but I don't really think the git code is correct either (despite the attempt at defensive coding, it needs to protect that null and its not doing it). I would have a tendency to fix both of them...
git-2.17.1-2.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-75f7624a9f
git-2.17.1-2.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.
It may well be that changes are needed in both git and zlib. I don't think anyone upstream at git is opposed to that idea, they just want to ensure that the changes don't lose a fix for a previous issue. Without the zlib patches upstream, the only people affected by this are us, and only on aarch64. That makes it harder to get much traction or interest and leaves us in the less-than-ideal position of carrying non-upstream patches in both git and zlib. It would be great if the folks who best understand these changes could work with git and zlib upstream to get any necessary patches accepted.
> It would be great if the folks who best understand these changes could work > with git and zlib upstream to get any necessary patches accepted. They are, jlinton is the one that debugged and sent the git patch, the upstream zlib maintainer is non responsive, these same patches are in chromeOS/chromium and other places too.
Alright. I asked because upstream git had some questions and I haven't seen any responses there.
yes, last questions in the thread were in the message from Junio C Hamano on the 28th and there was no reply.
Hi Pavel, do you have a link to the upstream thread you mention? I'll check that we're following this up with upstream.
I Dave, the thread is at https://public-inbox.org/git/20180525231713.23047-1-lintonrjeremy@gmail.com/T/#u (from comment 10). :) Thanks for looking into this further!
Yes, I'm going to probably just post the null terminator version RSN.. AFAIK, I posted a response last week to that affect although its not showing up in that public-inbox thread for whatever reason.
Thanks Jeremy. FWIW, the message never arrived in my mailbox either, so I imagine it didn't reach the git list (or was dropped by the list server itself).
git-2.17.1-3.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-972fa3841b
I replaced the initial patch with what's now queued upstream (https://github.com/git/git/commit/b611396e97), and applied it to all architectures. I set the update to close this ticket when it reaches stable. If there's any reason to keep it open, let me know and I'll adjust the update settings. Thanks to Jeremy and everyone else that helped get this fixed.
git-2.17.1-3.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-972fa3841b
git-2.17.1-3.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.