Bug 1156198
Summary: | Problematic dependency chain (glibc->basesystem->fedora-release->fedora-release-workstation->NetworkManager-config-fedora-connectivity->NetworkManager) in image creation causes broken 21 Beta TC4 32-bit Workstation | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mike Ruckman <mruckman> | ||||||
Component: | fedora-release | Assignee: | Dennis Gilmore <dennis> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 21 | CC: | admiller, awilliam, dennis, jdisnard, johannbg, jsynacek, kdudka, kzak, lnykryn, mruckman, msekleta, ooprala, ovasik, pasik, pbrady, p, redhat.bugs, robatino, satellitgo, s, systemd-maint, twaugh, vpavlin, zbyszek | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | AcceptedBlocker | ||||||||
Fixed In Version: | fedora-release-21-1 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2014-10-31 02:43:35 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1043124 | ||||||||
Attachments: |
|
Proposed as a Blocker for 21-beta by Fedora user roshi using the blocker tracking app because: This is a pretty clear violation of the Beta requirements: All release-blocking images must boot in their supported configurations. Created attachment 950057 [details]
systemd.log_level=debug output
Attached more verbose logs.
There are some errors during package installation when composing 32-bit WS live images that do not occur when composing 64-bit WS lives. Compare TC4 logs: 32-bit: https://kojipkgs.fedoraproject.org//work/tasks/4390/7894390/root.log 64-bit: https://kojipkgs.fedoraproject.org//work/tasks/4392/7894392/root.log some errors occur in both, but these occur in 32-bit only: Installing: device-mapper ################### [ 179/1386]/var/tmp/rpm-tmp.Q8ApXX: line 1: groupadd: command not found DEBUG util.py:283: /var/tmp/rpm-tmp.Q8ApXX: line 4: useradd: command not found Installing: libssh2 ################### [ 181/1386]/var/tmp/rpm-tmp.CRBM0R: line 1: groupadd: command not found DEBUG util.py:283: /var/tmp/rpm-tmp.CRBM0R: line 3: useradd: command not found DEBUG util.py:283: warning: user tss does not exist - using root DEBUG util.py:283: warning: group tss does not exist - using root DEBUG util.py:283: warning: user tss does not exist - using root DEBUG util.py:283: warning: group tss does not exist - using root Installing: libpwquality ################### [ 191/1386]warning: group dbus does not exist - using root Installing: dbus ################### [ 192/1386]warning: group polkitd does not exist - using root DEBUG util.py:283: warning: group polkitd does not exist - using root Installing: polkit-pkla-compat ################### [ 193/1386]/var/tmp/rpm-tmp.WxBAJp: line 1: groupadd: command not found DEBUG util.py:283: /var/tmp/rpm-tmp.WxBAJp: line 2: useradd: command not found DEBUG util.py:283: warning: user polkitd does not exist - using root DEBUG util.py:283: warning: user polkitd does not exist - using root OK, yeah. This is looking likely. I built a test 32-bit live with the systemd debug shell enabled, from that i can run the journal and see that indeed dbus fails to start, complaining about users and groups: dbus-daemon[802]: Unknown username "polkitd" in message bus configuration file dbus-daemon[802]: Unknown username "polkitd" in message bus configuration file dbus-daemon[802]: Failed to start message bus: Could not get UID and GID for username "dbus" So I think it's becoming clear where to go with this, let me finish my toast and I'll trace it further. It seems we may be dealing with one or more circular dep loops here. Here's one I've spotted: coreutils requires openssl-libs which requires crypto-policies which requires coreutils note - shadow-utils requires coreutils, so loops involving coreutils (and coreutils' deps in general) are interesting. Between f20 and f21, coreutils started building against openssl. That adds rather a lot of deps (more than just the loop I noted above) and may well be the source of the problem here. I'm running a scratch coreutils build which doesn't use openssl right now, and will test a live compose with it. the bits of coreutils which actually wind up linked against openssl are md5sum, sha1sum, sha224sum, sha256sum, sha384sum, sha512sum, and (for some reason) sort. OK, looks like I nailed it. I did a build of coreutils which does not build against openssl. Then I built a 32-bit live image with it. With that coreutils, there are almost no errors during package install (just one from polkit which looks like an issue where a subpackage installs before the main package has created the polkitd user), and the image boots successfully. Basically, coreutils' linking against openssl causes a problem: * Lots of things Requires(pre): shadow-utils to create users/groups, including dbus, which *other* things require and needs to get installed quite early * shadow-utils requires coreutils * coreutils requires openssl-libs, which itself has quite a big dep chain, including some of the things from Asterisk #1 when yum hits this kind of no-win situation where A needs B but B needs C which needs D which needs A, or whatever, it winds up getting resolved basically arbitrarily. As of right now it seems that for 32-bit Workstation live images it gets resolved in favour of dbus, so dbus installs before shadow-utils and can't create its user. (dbus redirects its user and group creation commands to /dev/null, which is why we don't see the errors). But I don't think we could rely on it being that way forever, and we don't know how it gets resolved in creation of each of the other live images, and other images...basically, as long as we have this mess, we could have serious borkage any time we're creating something that involves deploying a typical package set from scratch. I think dropping openssl support from coreutils is probably OK as a short-term solution. What that does is make coreutils use its built-in hashing code (for the *sum utilities) instead of openssl - it doesn't lead to a loss of functionality, just code duplication. Still, in the long term it's good for things not to be re-inventing stuff that should be shared, so I can see why we would want the openssl support in coreutils. To have it, though, we need to break the dep problem somehow. I can't immediately see a super simple way to do that. For Beta I'd suggest we just go with the no-openssl coreutils. I am +1 blocker on this, the 32-bit live should boot obviously, and this could certainly be causing the same problem in other images, or other problems we haven't noticed yet (even though the 64-bit image *boots*, it still has errors during early package install that really shouldn't happen). +1 blocker, Adam's research seems sane. this is something could and likely will effect system installs using anaconda also, resulting in non booting installs as well as non booting lives. I think adamw's plan to build coreutils without openssl support is solid. We just can't afford to have large dependency cycles in low level packages such as coreutils. +1 blocker. +1 blocker another loop: coreutils -> openssl-libs -> krb5-libs -> coreutils Marking AcceptedBlocker. Aha. thanks to the glory of rpmdep.pl, I found the real smoking gun. openssl-libs requires libcom_err. libcom_err requires glibc. glibc requires basesystem, which requires setup, which requires fedora-release, which requires fedora-release-(product) which for Workstation is fulfilled by fedora-release-workstation, which requires NetworkManager_config_connectivity_fedora, which requires NetworkManager, which pulls in a whole bunch of stuff. That actually may explain why this occurs on the Workstation live specifically. There are other smaller loops like the ones I noticed above, but this is the one which has the really big consequences. try this: rpmdep -dot openssl-libs.dot openssl-libs dot -Tps openssl-libs.dot -o openssl-libs.ps gimp openssl-libs.ps coreutils-8.22-20.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/coreutils-8.22-20.fc21 Wow, there's a *real* object lesson in 'what could possibly go wrong' here. Compare: Beta TC3 Workstation 32-bit: https://kojipkgs.fedoraproject.org//work/tasks/8252/7818252/root.log Beta TC4 Workstation 32-bit: https://kojipkgs.fedoraproject.org//work/tasks/4390/7894390/root.log notice how vastly different the order of package installation is. In TC3, a whole pile of libraries gets installed before we get anywhere near dbus, openssl-libs, or fedora-release-workstation - which is installed [ 970/1383]. glibc is still early - it's [ 92/1383] - but it doesn't cause yum to have to try and resolve a massive problem by dragging in that whole messy NetworkManager dep chain. NetworkManager-glib is [ 797/1383] in TC3. In TC4 it's [ 201/1386], right in the middle of all the error messages. In other words - the change in fedora-release 21-0.16 to have fedora-release require "system-release-product", which wasn't even *mentioned in the package changelog*, both caused the whole Generic mess - https://bugzilla.redhat.com/show_bug.cgi?id=1154235 - and caused this bug by massively affecting package ordering during installation. I'm gonna file this one away to point at next time someone's telling me how their trivial change can't possibly break anything...;) NetworkManager-0.9.10.0-9.git20140704.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/NetworkManager-0.9.10.0-9.git20140704.fc21 So my immediate fix for this is to drop NetworkManager-config-connectivity-fedora's dependency on NetworkManager. It doesn't need that dep, all it contains is a configuration file. I've tested that this produces a working image whose compose log looks a lot more like TC3's than TC4's. However, re-assigning the bug to fedora-release , as kalev mentioned there was some kind of plan to have fedora-release-workstation depend on a bunch of stuff that 'defines' the Workstation product. That is not going to be viable so long as we have this dep chain from glibc up to fedora-release-workstation; that would have to be broken somewhere. I suspect glibc's Requires(pre): basesystem is not actually necessary, but I certainly don't want to tinker with that right before a Beta release. it's been in there forever, I think back to like 2003 at least - it's sufficiently old I can't manage to track its addition from the git commit history (back in 2007 it was changed from Prereq: basesystem libgcc to Requires(pre): basesystem libgcc , but I'm still trying to track down when the Prereq: form was introduced). correction to c#18 - the 'system-release-product' change was mentioned in the package changelog, but missed from the f21 branch git log, and from the update description. so I just did some spelunking. glibc's dependency on basesystem dates back to somewhere between Red Hat Linux 6.2 and Red Hat Linux 7.0. The changelog of RHL 7.0's glibc package, however, leaves something to be desired: %changelog * %{date} Jakub Jelinek <jakub> - build from CVS archive so I'm not sure I can manage to find the actual explanation of why glibc does it. But I think it may be reasonable to consider the possibility that maybe it doesn't need to any more, after...erm...14 years. AFAIK, basesystem package is there only to handle the right installation order -> basesystem, filesystem, setup, ... glibc ... to have, filesystem layout and basic users/groups on the system for other dependent packages. No other reason for that package, I'm not sure if it is still required or not, we may try to drop it completely from Rawhide to see if the dependency order hack is still required or not. FYI the openssl use by coreutils is for speed. Upstream vendor architecture specific dev is concentrated in the openssl project, thus we get 50% speedups etc. on common architectures NetworkManager-0.9.10.0-10.git20140704.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/NetworkManager-0.9.10.0-10.git20140704.fc21 The NM -9 build in Beta RC1 is confirmed to fix this - the 32-bit WS live boots, as do all other tested images. NetworkManager-0.9.10.0-10.git20140704.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report. fedora-release-21-1 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/fedora-release-21-1 fedora-release-21-1, fedora-repos-21-1 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report. Hi, I just had the "group dbus does not exist - using root" error when creating a livecd on a x86_64 RHEL 6.x clone. I found it was due to the host system using nscd to cache credentials. Running service nscd stop before the build has helped :-) Maybe that'll help someone else. Cheers, Rich |
Created attachment 950050 [details] console output during boot Description of problem: i386 Live images fail to startup. The boot process never makes it past plymouth, and logs show multiple services not being able to start. After a time of attempting to start services systemd shuts the machine down. Version-Release number of selected component (if applicable): Fedora-Live-Workstation-i686-21_Beta-TC4.iso How reproducible: Always Steps to Reproduce: 1. Launch Live Image 2. Wait. 3. Actual results: System shuts down after timeout. Expected results: System should boot to the DE. Additional info: This has been confirmed multiple times. x86_64 doesn't seem to be affected.