Bug 1127103
Summary: | Workstation image compose sometimes fails due to filesystem consistency issues (caused by sssd library being held open) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Adam Williamson <awilliam> |
Component: | distribution | Assignee: | Václav Pavlín <vpavlin> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 21 | CC: | bcl, dennis, elad, jskladan, kalevlember, kparal, mattdm, moez.roy, mruckman, notting, pbrobinson, robatino, satellitgo, sgallagh |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | AcceptedBlocker | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2014-08-27 19:10:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1043119, 1127280 |
Description
Adam Williamson
2014-08-06 07:30:08 UTC
Some history on this - bug 501334 was very similar to this. Note that the patch attached to that bug won't solve the problem, it doesn't load from lib64. And when I added lib64 I ended up with other errors: OSError: /usr/lib/libnss_myhostname.so.2: wrong ELF class: ELFCLASS32 So I'm reluctant to add this kind of hack to livecd-creator. Discussed in 2014-08-06 Blocker Review Meeting. Accepted as a blocker as this is a conditional violation of all Alpha criteria related to live images. Might make sense to kill the lazy umounting code to make sure the composes fail hard when the umounting fails. Discussed at the 2014-08-13 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2014-08-13/ Releng team works hard on this bug. No need for action from our side for now. OK, I've been debugging this issue today (along with dgilmore, pjones and dgilmore). So what's happening is that inside the compose chroot, libnss_sss.so.2 isn't loaded when it starts, but when RPM starts creating group names in %post, libnss_sss.so.2 gets loaded to confirm that the ID isn't in use. (There was a recent patch to shadow-utils to have it check all ID sources before generating a new ID). But at the end of the compose, libnss_sss.so.2 isn't unloaded and its causing errors unmounting the filesystem, breaking the compose. Whereas libnss_file.so.2 was actually already loaded in the outer chroot, so it's not in the way. (The linker just points at the existing memory location) So the workaround we can use here is to have the compose process pass LD_PRELOAD=/usr/lib[64]/libnss_sss.so.2 as part of the environment to livecd-creator. I've tested this and it completes the compose successfully. I'm told that this process is going to be largely rewritten in Fedora 22, so having this hack in for one release seems pretty sensible. (In reply to Stephen Gallagher from comment #5) > OK, I've been debugging this issue today (along with dgilmore, pjones and > dgilmore). > That should have read "dgilmore, pjones and codonell"... > I'm told that this process is going to be largely rewritten in Fedora 22, so
> having this hack in for one release seems pretty sensible.
Just to clarify, the expected change in F22 is a move from livecd-creator to livemedia-creator. Unlike livecd-creator, lm-c uses anaconda to make its images, which means the rpm transaction will run in its own subprocess. This process will exit(2) upon completion, freeing up the reference to the image filesystem.
two issues with the LD_PRELOAD option. one is that sssd is nbot installed into the compose root so there is nothing to load, we can work around that by adding sssd to the comps group. the second is we have no way to execute arbitrary commands. so livecd-creator will need patching to do the preloading. We can't execute arbitrary commands, but this is an environment variable. Can't we just set up the environment before launching livecd-creator? We can not run or set anything, koji creates a chroot and executes commands in it. the only way to deal with it it to add sssd to the comps group so that the libraries are in the chroot and to patch livecd-creator to preload them. We don't need to patch livecd creator to preload them. We just need to make two edits to the mock config: 1) Add sssd-client to the default package set: config_opts['chroot_setup_cmd'] = 'install @buildsys-build sssd-client' 2) Set the LD_PRELOAD in the environment for the entire chroot config_opts['files']['etc/profile.d/compose-preload.sh'] = """ export LD_PRELOAD=/usr/lib/libnss_sss.so.2 """ Voila. Problem solved, no code edits. Discussed in 2014-08-20 Freeze Exception Review Meeting [1]. (06:31:09 PM) sgallagh: The easier-but-less-complete workaround was used last night to complete a compose of Workstation (06:31:28 PM) sgallagh: The more complete and less-hacky patch will be done for a future compose (tonight?) (06:32:00 PM) sgallagh: In any case, this bug will be closed imminently [1] http://meetbot.fedoraproject.org/fedora-blocker-review/2013-08-20/ we have patched livecd-creator and appliance-creator to preload libnss_sss.so.2 as its the only way to actually do it. |