Bug 2177088
| Summary: | intermittent core dump in hex2bin (librpm.so.8) | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Colin Walters <walters> | |
| Component: | rpm | Assignee: | Packaging Maintenance Team <packaging-team-maint> | |
| Status: | ASSIGNED --- | QA Contact: | swm-qe | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 8.8 | CC: | ffesti, jwboyer, mbanas, mcurlej, pmatilai | |
| Target Milestone: | rc | Keywords: | Triaged | |
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | No Doc Update | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2177225 (view as bug list) | Environment: | ||
| Last Closed: | Type: | Bug | ||
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Deadline: | 2023-06-12 | |||
|
Description
Colin Walters
2023-03-09 23:06:46 UTC
OK I got a build with debuginfo. The stack goes pretty deep into librpm: Starts here https://github.com/coreos/rpm-ostree/blob/4bdcefabaaf3e51729150f03bfd0ad6732334af1/src/libpriv/rpmostree-core.cxx#L2741 Here's the new backtrace with a bit more local variables: ``` (gdb) bt #0 0x00007ffad342c0bc in rnibble (c=0 '\000') at ../include/rpm/rpmstring.h:99 #1 0x00007ffad342c0bc in hex2bin (h=h@entry=0x7ffaaf0709e0, tag=tag@entry=5090, num=num@entry=14, len=226) at rpmfi.c:1515 #2 0x00007ffad342fdd7 in rpmfilesPopulate (flags=229374, h=0x7ffaaf0709e0, fi=0x7ffaad702d90) at rpmfi.c:1590 #3 0x00007ffad342fdd7 in rpmfilesNew (pool=pool@entry=0x0, h=h@entry=0x7ffaaf0709e0, tagN=tagN@entry=1117, flags=flags@entry=229374) at rpmfi.c:1652 #4 0x00007ffad34307bd in rpmfiNewPool (pool=pool@entry=0x0, h=h@entry=0x7ffaaf0709e0, tagN=tagN@entry=1117, flags=flags@entry=229374) at rpmfi.c:1760 #5 0x00007ffad34307eb in rpmfiNew (ts=ts@entry=0x7ffaac357a00, h=h@entry=0x7ffaaf0709e0, tagN=tagN@entry=1117, flags=flags@entry=229374) at rpmfi.c:1767 #6 0x0000559c271d391a in handle_file_dispositions (error=0x7ffac27286e0, cancellable=0x7ffaaee35010 [_IS_CANCELLABLE (cancellable)], out_files_skip_delete=<synthetic pointer>, out_files_skip_add=<synthetic pointer>, ts=0x7ffaac357a00, tmprootfs_dfd=557, self=0x7ffaac44e190) at src/libpriv/rpmostree-core.cxx:2738 #7 0x0000559c271d391a in rpmostree_context_assemble(RpmOstreeContext*, GCancellable*, GError**) (self=<optimized out>, cancellable=cancellable@entry=0x7ffaaee35010 [_IS_CANCELLABLE (cancellable)], error=error@entry=0x7ffac27286e0) at src/libpriv/rpmostree-core.cxx:4184 #8 0x0000559c27202878 in perform_local_assembly (error=<optimized out>, cancellable=<optimized out>, self=<optimized out>) at src/daemon/rpmostree-sysroot-upgrader.cxx:1025 #9 0x0000559c27202878 in rpmostree_sysroot_upgrader_deploy(RpmOstreeSysrootUpgrader*, OstreeDeployment**, GCancellable*, GError**) (self=self@entry=0x7ffab000c4d0, out_deployment=out_deployment@entry=0x7ffac27285d0, cancellable=cancellable@entry=0x7ffaaee35010 [_IS_CANCELLABLE (cancellable)], error=error@entry=0x7ffac27286e0) at src/daemon/rpmostree-sysroot-upgrader.cxx:1335 #10 0x0000559c271c072c in deploy_transaction_execute(RpmostreedTransaction*, GCancellable*, GError**) (transaction=0x7ffac40cde00, cancellable=<optimized out>, error=<optimized out>) at src/daemon/rpmostreed-transaction-types.cxx:1596 #11 0x0000559c271b8453 in transaction_execute_thread(GTask*, gpointer, gpointer, GCancellable*) (task=0x7ffac40e3920 [RUE], source_object=<optimized out>, task_data=<optimized out>, cancellable=0x7ffaaee35010 [_IS_CANCELLABLE (cancellable)]) at src/daemon/rpmostreed-transaction.cxx:338 #12 0x00007ffad3c69357 in g_task_thread_cleanup () at gtask.c:1318 #13 0x00007ffad3c69357 in g_task_thread_pool_thread (thread_data=0x7ffac40e3920, pool_data=<optimized out>) at gtask.c:1336 #14 0x00007ffad36df9d3 in g_thread_pool_thread_proxy (data=<optimized out>) at gthreadpool.c:307 #15 0x00007ffad36defca in g_thread_proxy (data=0x7ffa9c1a2a30) at gthread.c:784 #16 0x00007ffad0cb71cf in start_thread () at /lib64/libpthread.so.0 #17 0x00007ffad0923dd3 in __libc_start_main () at /lib64/libc.so.6 #18 0x0000000000000000 in () (gdb) ``` ``` (gdb) list 94 * @return binary nibble 95 */ 96 RPM_GNUC_CONST 97 static inline unsigned char rnibble(char c) 98 { 99 if (c >= '0' && c <= '9') 100 return (c - '0'); 101 if (c >= 'a' && c <= 'f') 102 return (c - 'a') + 10; 103 if (c >= 'A' && c <= 'F') (gdb) p c $4 = 0 '\000' ``` OK, so we seem to have walked off the end into a NUL character? ``` (gdb) up #1 hex2bin (h=h@entry=0x7ffaaf0709e0, tag=tag@entry=5090, num=num@entry=14, len=226) at rpmfi.c:1515 1515 *t = (rnibble(s[0]) << 4) | rnibble(s[1]); (gdb) p s $5 = 0x7ffaafffefff "" ``` Right, s[1] seems invalid here? ``` (gdb) up #2 0x00007ffad342fdd7 in rpmfilesPopulate (flags=229374, h=0x7ffaaf0709e0, fi=0x7ffaad702d90) at rpmfi.c:1590 1590 fi->signatures = hex2bin(h, RPMTAG_FILESIGNATURES, (gdb) list 1585 1586 fi->signatures = NULL; 1587 /* grab hex signatures from header and store in binary format */ 1588 if (!(flags & RPMFI_NOFILESIGNATURES)) { 1589 fi->signaturelength = headerGetNumber(h, RPMTAG_FILESIGNATURELENGTH); 1590 fi->signatures = hex2bin(h, RPMTAG_FILESIGNATURES, 1591 totalfc, fi->signaturelength); 1592 } 1593 1594 /* XXX TR_REMOVED doesn;t need fmtimes, frdevs, finodes */ (gdb) ``` This relates to the file signatures. ``` (gdb) up #3 rpmfilesNew (pool=pool@entry=0x0, h=h@entry=0x7ffaaf0709e0, tagN=tagN@entry=1117, flags=flags@entry=229374) at rpmfi.c:1652 1652 if (rpmfilesPopulate(fi, h, flags)) (gdb) up #4 0x00007ffad34307bd in rpmfiNewPool (pool=pool@entry=0x0, h=h@entry=0x7ffaaf0709e0, tagN=tagN@entry=1117, flags=flags@entry=229374) at rpmfi.c:1760 1760 rpmfiles files = rpmfilesNew(pool, h, tagN, flags); (gdb) up #5 0x00007ffad34307eb in rpmfiNew (ts=ts@entry=0x7ffaac357a00, h=h@entry=0x7ffaaf0709e0, tagN=tagN@entry=1117, flags=flags@entry=229374) at rpmfi.c:1767 1767 return rpmfiNewPool(NULL, h, tagN, flags); (gdb) up ``` Didn't see anything obviously interesting in these frames. ``` #6 0x0000559c271d391a in handle_file_dispositions (error=0x7ffac27286e0, cancellable=0x7ffaaee35010 [_IS_CANCELLABLE (cancellable)], out_files_skip_delete=<synthetic pointer>, out_files_skip_add=<synthetic pointer>, ts=0x7ffaac357a00, tmprootfs_dfd=557, self=0x7ffaac44e190) at src/libpriv/rpmostree-core.cxx:2738 2738 g_auto (rpmfi) fi = rpmfiNew (ts, h, RPMTAG_BASENAMES, flags); (gdb) ``` And now we're at the faulting line. Versions here are $ rpm -q rpm rpm-ostree rpm-4.14.3-24.el8_6.x86_64 rpm-ostree-2022.10.112.g3d0ac35b-3.el8.x86_64 $ And to be clear here, in this case it's rhel8 rpm-ostree installing rhel9 RPMs. I wonder if that's in play here... https://github.com/coreos/rpm-ostree/pull/3610 may be related too in that rpm-ostree now tries to parse the IMA bits, but IIRC there were issues with that in the rhel8 era? Panu, any of this look familiar? Currently failing to reproduce this with -fsanitize=address enabled or glibc MALLOC_CHECK_=1 unfortunately... Going to look at pulling out either valgrind or https://rr-project.org/ next. Oh yup, that'd be bug 2018937. It got fixed on RHEL 9 but not 8 because IMA signatures don't exist there yet, but of course segfaulting on any package is not an option. We'll need to fix this in RHEL 8 rpm too then... > We'll need to fix this in RHEL 8 rpm too then... Yeah seems this way. OK so with my OCP hat on...I think this is *probably* not strictly speaking a blocker. Because while it causes the OS updater (rpm-ostree) to segfault...that's not fatal because it's all transactional (hooray), and actually the upgrade process gets retried and succeeds. It would however be *embarrassing* and I'm sure some customers would find this too. We have a CI test that verifies nothing segfaulted on the node, and that started intermittently failing...but the failure rate is low enough that it still counts as a pass. If you're curious, click on https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-rt-upgrade-4.14-minor-release-openshift-release-analysis-aggregator/1634080465902112768 and scroll down a bit and you'll see > Passed Tests > Passed: suite=[Symptom Detection], Node process segfaulted > Passed: Passed 8 times, failed 2 times. The historical pass rate is 96%. The required number of passes is 7. > Failure - periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade/1634080460835393536 > Failure - periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade/1634080463372947456 If we can get a fix queued for 8.6.z relatively soon, that's better. Then we can tag that right away into rhaos-4.12-rhel8. A tricky thing here is that we actually have two different places which are invoking rpmfiNew, and while for this caller we could ignore them, there's actually a distinct code path that *is* trying to (optionally) honor the IMA signatures. In the PR linked above https://github.com/coreos/rpm-ostree/pull/3610 it's supported for rpm-ostree to properly propagate RPM IMA signatures into the resulting ostree commit. (More on ostree and IMA: https://ostreedev.github.io/ostree/ima/ ) But...actually out of conservatism we didn't enable that by default, so I think I can conditionally pass the RPMFI_NOFILESIGNATURES in that case, and unconditionally in the first. Yeah, this seems viable to do in rpm-ostree; working on it. Moving that over to https://bugzilla.redhat.com/show_bug.cgi?id=2177225 Wouldn't this affect in-place traditional dnf RHEL8 -> RHEL9 upgrades too? Or hm, does LEAP always start an upgrade from the initramfs or so? This is a peculiar thing: from a few random samples of RHEL 9 packages I tried, none had IMA signatures that RHEL-8 would recognize at all, and so this bug doesn't trigger. It's not obvious to me from the logs which packages and from where exactly are being installed when this occurs. Colin? Note that the existence of the flaw is not in doubt, just trying to find an easy reproducer package. Alternatives would be setting up an IMA signing environment on RHEL-8 to try and create a reproducer, or backport support for the RHEL-9 style IMA signatures. Which would seem a weird thing to do just for a reproducers sake... |