We've know about this issue for a while, but not had a bug to track it; so let's have one. Currently, compose of Workstation-type images (both live and ARM, apparently) sometimes fails (and we think sometimes succeeds but hits problems, like SELinux mislabels) due to filesystem consistency errors. This is caused because the filesystem cannot be cleanly unmounted after package installation, which in turn seems to be caused by something - probably a package %post or %postinst script - holding an sssd library open. At present we haven't identified precisely what's doing that, but dgilmore has come up with a workaround which involves installing sssd on the image build host and preloading the libraries. Proposed as an Alpha blocker: this is a conditional violation of all the criteria relating to live images, in the case where live image compose fails (as obviously images that can't be built fail all those requirements).
Some history on this - bug 501334 was very similar to this. Note that the patch attached to that bug won't solve the problem, it doesn't load from lib64. And when I added lib64 I ended up with other errors: OSError: /usr/lib/libnss_myhostname.so.2: wrong ELF class: ELFCLASS32 So I'm reluctant to add this kind of hack to livecd-creator.
Discussed in 2014-08-06 Blocker Review Meeting. Accepted as a blocker as this is a conditional violation of all Alpha criteria related to live images.
Might make sense to kill the lazy umounting code to make sure the composes fail hard when the umounting fails.
Discussed at the 2014-08-13 blocker review meeting: http://meetbot.fedoraproject.org/fedora-blocker-review/2014-08-13/ Releng team works hard on this bug. No need for action from our side for now.
OK, I've been debugging this issue today (along with dgilmore, pjones and dgilmore). So what's happening is that inside the compose chroot, libnss_sss.so.2 isn't loaded when it starts, but when RPM starts creating group names in %post, libnss_sss.so.2 gets loaded to confirm that the ID isn't in use. (There was a recent patch to shadow-utils to have it check all ID sources before generating a new ID). But at the end of the compose, libnss_sss.so.2 isn't unloaded and its causing errors unmounting the filesystem, breaking the compose. Whereas libnss_file.so.2 was actually already loaded in the outer chroot, so it's not in the way. (The linker just points at the existing memory location) So the workaround we can use here is to have the compose process pass LD_PRELOAD=/usr/lib[64]/libnss_sss.so.2 as part of the environment to livecd-creator. I've tested this and it completes the compose successfully. I'm told that this process is going to be largely rewritten in Fedora 22, so having this hack in for one release seems pretty sensible.
(In reply to Stephen Gallagher from comment #5) > OK, I've been debugging this issue today (along with dgilmore, pjones and > dgilmore). > That should have read "dgilmore, pjones and codonell"...
> I'm told that this process is going to be largely rewritten in Fedora 22, so > having this hack in for one release seems pretty sensible. Just to clarify, the expected change in F22 is a move from livecd-creator to livemedia-creator. Unlike livecd-creator, lm-c uses anaconda to make its images, which means the rpm transaction will run in its own subprocess. This process will exit(2) upon completion, freeing up the reference to the image filesystem.
two issues with the LD_PRELOAD option. one is that sssd is nbot installed into the compose root so there is nothing to load, we can work around that by adding sssd to the comps group. the second is we have no way to execute arbitrary commands. so livecd-creator will need patching to do the preloading.
We can't execute arbitrary commands, but this is an environment variable. Can't we just set up the environment before launching livecd-creator?
We can not run or set anything, koji creates a chroot and executes commands in it. the only way to deal with it it to add sssd to the comps group so that the libraries are in the chroot and to patch livecd-creator to preload them.
We don't need to patch livecd creator to preload them. We just need to make two edits to the mock config: 1) Add sssd-client to the default package set: config_opts['chroot_setup_cmd'] = 'install @buildsys-build sssd-client' 2) Set the LD_PRELOAD in the environment for the entire chroot config_opts['files']['etc/profile.d/compose-preload.sh'] = """ export LD_PRELOAD=/usr/lib/libnss_sss.so.2 """ Voila. Problem solved, no code edits.
Discussed in 2014-08-20 Freeze Exception Review Meeting [1]. (06:31:09 PM) sgallagh: The easier-but-less-complete workaround was used last night to complete a compose of Workstation (06:31:28 PM) sgallagh: The more complete and less-hacky patch will be done for a future compose (tonight?) (06:32:00 PM) sgallagh: In any case, this bug will be closed imminently [1] http://meetbot.fedoraproject.org/fedora-blocker-review/2013-08-20/
we have patched livecd-creator and appliance-creator to preload libnss_sss.so.2 as its the only way to actually do it.