Bug 1837809
Summary: | "FIPS module installed state definition is modified" changes cause systemctl segfaults during buildroot population | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> | ||||||||||
Component: | nosync | Assignee: | Mikolaj Izdebski <mizdebsk> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | high | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 31 | CC: | crypto-team, fedoraproject, fweimer, java-sig-commits, jorton, kevin, mboddu, mizdebsk, pbrobinson, praiskup, rjones, tmraz, zbyszek | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2020-11-03 23:16:31 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 910269 | ||||||||||||
Attachments: |
|
Description
Adam Williamson
2020-05-20 03:31:19 UTC
The problem is the change should not change anything in the buildroot. I do not see how it could ever cause a segfault like this. Yes, the openssl will try to open and read the /proc/sys/crypto/fips_enabled, but if it is not present or it does not contain 1, it should just behave like it always did. Florian, could any of secure_getenv(), open(), read() be problematic to be called from shared library constructor? No, everything in glibc should be initialized by this point, especially since libcrypto links against libc. I looked at the problematic OpenSSL version, and nothing in the FIPS initialization sequence stands out, either. I would have to look at a coredump to debug this further, sorry. I'm trying now to reproduce the segfault and I'll try to somehow make it to produce a usable coredump. Hmm, it is not reproducible in mock :( What to do now? (In reply to Tomas Mraz from comment #5) > Hmm, it is not reproducible in mock :( > > What to do now? Perhaps file a releng ticket and hope that they can scrape a coredump off the builder? https://pagure.io/releng/issues I agree it's strange - I'd seen this change before I did my Koji research and it wasn't very high on my list of suspects as it seemed pretty innocuous. And I can't reproduce it in mock either :( But it really *does* seem to be the culprit - it's not just nbdkit, I've checked some other builds that have happened since I reverted the change and the bug seems to have gone from those too. I guess there must be some wrinkle involving the builder environment here somehow, and yeah, we may need to try and get the coredump out from the builders. CCing nirik. Created attachment 1690305 [details]
core.udevadm
Created attachment 1690306 [details]
core.systemd-tmpfile
Created attachment 1690318 [details]
coredump-systemd-random
Created attachment 1690319 [details]
All coredumps
@adamwill gave me a latest koji task (https://koji.fedoraproject.org/koji/buildinfo?buildID=1508895) and I grabbed the coredumps for the x86_64 task (https://koji.fedoraproject.org/koji/taskinfo?taskID=44714165). I am attached the coredumps for that x86_64 task. I'm guessing this is probably related: (gdb) print __environ[3] $10 = 0x7ffd61cdced5 "LD_PRELOAD=/var/tmp/tmp.mock.fe1m8f16/$LIB/nosync.so" Where can we get a copy of that file? I assume it's from the nosync package because that implementation is clearly buggy: it assumes that its ELF constructor has run if open is called, which is not a valid assumption for an interposing function: The interposition relationship is not taken into account for ELF constructor ordering. OK, so it is the open() call in constructor that triggers this. I suppose we need then to fix the nosync because there is basically no way around this requirement to call open() on the /proc/sys/crypto/fips_enabled in the constructor. (In reply to Tomas Mraz from comment #15) > OK, so it is the open() call in constructor that triggers this. I suppose we > need then to fix the nosync because there is basically no way around this > requirement to call open() on the /proc/sys/crypto/fips_enabled in the > constructor. Yes, it needs to be fixed in the nosync package: https://github.com/kjn/nosync/pull/4 Thanks guys! I can do a nosync build with your PR backported and then we can test restoring the openssl change, if you like? (In reply to Adam Williamson from comment #17) > Thanks guys! I can do a nosync build with your PR backported and then we can > test restoring the openssl change, if you like? Sure, I'd also appreciate a review of the patch itself (although it seems to work as expected in cursory tests). Florian, thank you very much for the investigation and nosync patch. Florian: I don't think I'm qualified to review the patch :) Tomas would be a better choice I guess. The patch looks good, I've provided a review on the github PR. Hum, looking at this a bit harder I think the nosync that gets used is actually from the mock *host* environment, i.e. whatever the builders are running in this case. So I think we'd need to send a nosync update for whatever release the builders are running and get it pushed stable (or at least installed on the builders)... BTW, this is probably why we couldn't reproduce in mock - the nosync stuff is just skipped over if you don't have nosync installed on the host. It may well reproduce if you install nosync on the host (and make sure the build uses an affected openssl somehow). FEDORA-2020-eb7b7b9aa8 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-eb7b7b9aa8 FEDORA-2020-329ce47baf has been submitted as an update to Fedora 32. https://bodhi.fedoraproject.org/updates/FEDORA-2020-329ce47baf OK, so as you can see I've rebuilt nosync for 31, 32 (and Rawhide). Kevin, Mohan, can we update the builders to the new build now, or should we wait for it to go stable? We can do it before then. Mohan: can you do this? I'd say just tag the f31 build into f31-infra-candidate, let it get signed and land in f31-infra-stg and then move to 'f31-infra' and 'dnf --refresh -y update nosync' on builders. (In reply to Florian Weimer from comment #13) > I'm guessing this is probably related: > > (gdb) print __environ[3] > $10 = 0x7ffd61cdced5 "LD_PRELOAD=/var/tmp/tmp.mock.fe1m8f16/$LIB/nosync.so" A loop-mounted nbdkit could be offer a better solution. Purely by coincidence (not knowing about nosync or its use in Fedora Koji) I wrote a special nbdkit plugin to handle Koji Fedora/RISC-V builds, which has the same drop flush behaviour: https://rwmj.wordpress.com/2020/03/21/new-nbdkit-remote-tmpfs-tmpdisk-plugin/ http://libguestfs.org/nbdkit-tmpdisk-plugin.1.html https://github.com/libguestfs/nbdkit/blob/0632acc76bfeb7d70d3eefa42fc842ce6b7be4f8/plugins/tmpdisk/tmpdisk.c#L182 FEDORA-2020-eb7b7b9aa8 has been pushed to the Fedora 31 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-eb7b7b9aa8` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-eb7b7b9aa8 See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. FEDORA-2020-329ce47baf has been pushed to the Fedora 32 testing repository. In short time you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2020-329ce47baf` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-329ce47baf See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates. Could we have update for EPEL7+ please? (mock is still supported on el7+) The nosync package from EPEL-7 (nosync-1.0-2.el7) should not have this bug as it was introduced in 1.1 upstream version. What is the nosync package version which you use? > it was introduced in 1.1 upstream version. Good to know, thanks. I didn't check this. > What is the nosync package version which you use? The default one. I now see there's no nosync package for el8, yet. So scratch my previous request. (In reply to Kevin Fenzi from comment #27) > We can do it before then. > > Mohan: can you do this? I'd say just tag the f31 build into > f31-infra-candidate, let it get signed and land in f31-infra-stg and then > move to 'f31-infra' and 'dnf --refresh -y update nosync' on builders. This has been a roller coaster ride, due to nosync being multilib which is not supported in infra repos, anyway, all the builders are updated to nosync-1.1-8.fc31. Awesome, thanks! I'll unrevert the openssl change then try an nbdkit build again and see if it works. Fix looks good, I did an openssl -5 with the change reapplied and fired an nbdkit scratch build against that, the root.log from the x86_64 build doesn't show the bug: https://kojipkgs.fedoraproject.org//work/tasks/6821/44786821/root.log I guess we can close this when the updates go stable. Thank you, Adam, for all the initial detective work and the verification of the fix. FEDORA-2020-eb7b7b9aa8 has been pushed to the Fedora 31 stable repository. If problem still persists, please make note of it in this bug report. FEDORA-2020-329ce47baf has been pushed to the Fedora 32 stable repository. If problem still persists, please make note of it in this bug report. This message is a reminder that Fedora 31 is nearing its end of life. Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '31'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 31 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. |