Bug 1127103 - Workstation image compose sometimes fails due to filesystem consistency issues (caused by sssd library being held open)
Summary: Workstation image compose sometimes fails due to filesystem consistency issue...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: distribution
Version: 21
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Václav Pavlín
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: AcceptedBlocker
Depends On:
Blocks: F21AlphaBlocker 1127280
TreeView+ depends on / blocked
 
Reported: 2014-08-06 07:30 UTC by Adam Williamson
Modified: 2014-08-27 19:10 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-27 19:10:58 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Adam Williamson 2014-08-06 07:30:08 UTC
We've know about this issue for a while, but not had a bug to track it; so let's have one.

Currently, compose of Workstation-type images (both live and ARM, apparently) sometimes fails (and we think sometimes succeeds but hits problems, like SELinux mislabels) due to filesystem consistency errors. This is caused because the filesystem cannot be cleanly unmounted after package installation, which in turn seems to be caused by something - probably a package %post or %postinst script - holding an sssd library open.

At present we haven't identified precisely what's doing that, but dgilmore has come up with a workaround which involves installing sssd on the image build host and preloading the libraries.

Proposed as an Alpha blocker: this is a conditional violation of all the criteria relating to live images, in the case where live image compose fails (as obviously images that can't be built fail all those requirements).

Comment 1 Brian Lane 2014-08-06 15:30:58 UTC
Some history on this - bug 501334 was very similar to this. Note that the patch attached to that bug won't solve the problem, it doesn't load from lib64. And when I added lib64 I ended up with other errors: 

OSError: /usr/lib/libnss_myhostname.so.2: wrong ELF class: ELFCLASS32

So I'm reluctant to add this kind of hack to livecd-creator.

Comment 2 Mike Ruckman 2014-08-06 17:49:54 UTC
Discussed in 2014-08-06 Blocker Review Meeting. Accepted as a blocker as this is a conditional violation of all Alpha criteria related to live images.

Comment 3 Kalev Lember 2014-08-13 15:00:09 UTC
Might make sense to kill the lazy umounting code to make sure the composes fail hard when the umounting fails.

Comment 4 Kamil Páral 2014-08-13 17:42:07 UTC
Discussed at the 2014-08-13 blocker review meeting:
http://meetbot.fedoraproject.org/fedora-blocker-review/2014-08-13/
Releng team works hard on this bug. No need for action from our side for now.

Comment 5 Stephen Gallagher 2014-08-19 20:56:52 UTC
OK, I've been debugging this issue today (along with dgilmore, pjones and dgilmore).

So what's happening is that inside the compose chroot, libnss_sss.so.2 isn't loaded when it starts, but when RPM starts creating group names in %post, libnss_sss.so.2 gets loaded to confirm that the ID isn't in use. (There was a recent patch to shadow-utils to have it check all ID sources before generating a new ID).
But at the end of the compose, libnss_sss.so.2 isn't unloaded and its causing errors unmounting the filesystem, breaking the compose.
Whereas libnss_file.so.2 was actually already loaded in the outer chroot, so it's not in the way. (The linker just points at the existing memory location)

So the workaround we can use here is to have the compose process pass
LD_PRELOAD=/usr/lib[64]/libnss_sss.so.2
as part of the environment to livecd-creator. I've tested this and it completes the compose successfully.

I'm told that this process is going to be largely rewritten in Fedora 22, so having this hack in for one release seems pretty sensible.

Comment 6 Stephen Gallagher 2014-08-19 20:57:43 UTC
(In reply to Stephen Gallagher from comment #5)
> OK, I've been debugging this issue today (along with dgilmore, pjones and
> dgilmore).
> 

That should have read "dgilmore, pjones and codonell"...

Comment 7 Peter Jones 2014-08-19 21:00:11 UTC
> I'm told that this process is going to be largely rewritten in Fedora 22, so
> having this hack in for one release seems pretty sensible.

Just to clarify, the expected change in F22 is a move from livecd-creator to livemedia-creator.  Unlike livecd-creator, lm-c uses anaconda to make its images, which means the rpm transaction will run in its own subprocess.  This process will exit(2) upon completion, freeing up the reference to the image filesystem.

Comment 8 Dennis Gilmore 2014-08-19 22:33:50 UTC
two issues with the LD_PRELOAD option. one is that sssd is nbot installed into the compose root so there is nothing to load, we can work around that by adding sssd to the comps group. the second is we have no way to execute arbitrary commands. so livecd-creator will need patching to do the preloading.

Comment 9 Stephen Gallagher 2014-08-20 02:13:13 UTC
We can't execute arbitrary commands, but this is an environment variable. Can't we just set up the environment before launching livecd-creator?

Comment 10 Dennis Gilmore 2014-08-20 02:46:35 UTC
We can not run or set anything, koji creates a chroot and executes commands in it. the only way to deal with it it to add sssd to the comps group so that the libraries are in the chroot and to patch livecd-creator to preload them.

Comment 11 Stephen Gallagher 2014-08-20 14:02:02 UTC
We don't need to patch livecd creator to preload them. We just need to make two edits to the mock config:

1) Add sssd-client to the default package set:
config_opts['chroot_setup_cmd'] = 'install @buildsys-build sssd-client'

2) Set the LD_PRELOAD in the environment for the entire chroot
config_opts['files']['etc/profile.d/compose-preload.sh'] = """
export LD_PRELOAD=/usr/lib/libnss_sss.so.2
"""


Voila. Problem solved, no code edits.

Comment 12 Josef Skladanka 2014-08-20 16:36:40 UTC
Discussed in 2014-08-20 Freeze Exception Review Meeting [1].

(06:31:09 PM) sgallagh: The easier-but-less-complete workaround was used last night to complete a compose of Workstation
(06:31:28 PM) sgallagh: The more complete and less-hacky patch will be done for a future compose (tonight?)
(06:32:00 PM) sgallagh: In any case, this bug will be closed imminently

[1] http://meetbot.fedoraproject.org/fedora-blocker-review/2013-08-20/

Comment 13 Dennis Gilmore 2014-08-22 20:38:23 UTC
we have patched livecd-creator and appliance-creator to preload libnss_sss.so.2 as its the only way to actually do it.


Note You need to log in before you can comment on or make changes to this bug.