Red Hat Bugzilla – Bug 481796
nscd interaction with chroot is poor
Last modified: 2010-03-06 14:48:12 EST
Created attachment 330116 [details]
The kickstart file (minus root password)
Description of problem:
livecd-creator isn't able to unmount install_root after installing the packages. lsof shows that install_root/lib/libnss_files-2.9.so is being used by livecd-creator. This happens about halfway through running the transaction somewhere around the point PackageKit is getting installed. If PackageKit isn't included in the transaction, we don't hit this bug.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. livecd-creator -f Fedora -c minimal-10.ks --cache=/usr/src/livecd-f10
livecd-creator doesn't unmount, ext3 filesystem is corrupted, the sun implodes into a small speck in the sky
livecd-creator unmounts install_root, finishes creating livecd, sun shines brightly and birds sing pretty songs
So this has been sporadically reported, but I can't ever reproduce it.
What's your nss switch configuration outside of the chroot? Are using nscd at all?
So yeah, disabled nscd (which was on 'cause of the glibc not statting /etc/resolv.conf after NM reconnects), built a live image, and it worked perfectly.
This problem was appearing on two separate computers, one with ldap enabled, one not, so nsswitch would have been different. But nscd was on for both of them, so that is probably the culprit.
Thanks for the hint on where to look. Should livecd-creator check whether nscd is running, disable it while generating the livecd, and then enable it again afterwards? Or is there a better way to fix this?
Well, we could do that. But it feels a bit much.
Jakub -- is there any way to not have nscd holding open files in a chroot when installing packages there?
Jonathan -- do you happen to have a log handy from this? If so, are there messages about a group or user not existing? (I do have it reproducing consistently at the moment and it looks like that's where things are going badly. But I'm trying to make sure I'm not chasing down something unrelated :)
I'm afraid I don't have a log handy, but, yes, I did have multiple errors about being unable to create a user or group. If I recall correctly, there was one as dbus was being installed, another as hal was being installed, and a number of others.
Hope that helps.
Okay, the root cause here is a bug in glibc/nscd --
What happens is that nscd is running on the host machine. livecd-creator starts and is using nscd (it can access the socket, all is well). We create a filesystem image, mount it and then use librpm to install into the chroot. A package then creates a user in its %pre and since there's not an inotify watch in the chroot, we have to wait for the timeout and nscd doesn't recognize the user and so
a) We end up with files with the wrong ownership. This is very bad and could lead to images with significant security concerns.
b) This ends up with the process having an open file in the chroot which leads to the inability to unmount. This is bad, but probably not as bad in the scheme of things.
For now, I'm hacking around by stopping nscd in livecd-creator, but we should get the real problem fixed too
(In reply to comment #6)
> starts and is using nscd (it can access the socket, all is well). We create a
> filesystem image, mount it and then use librpm to install into the chroot. A
> package then creates a user in its %pre and since there's not an inotify watch
> in the chroot
What does this chroot contain? Does it have the original /var/* mounted so that processes running in the chroot see the nscd socket? If yes, then it is entirely the fault of this setup. There is no way that nscd can or should magically recognize this. It's a completely broken setup. If the /etc/passwd file a process sees is different from that nscd sees nothing good can possibly happen.
If /var/* is shared then mounts a special, empty /var/run/nscd over it to prevent sharing nscd.
The only things bind mounted are /var/cache/yum, /sys, /dev/pts and /proc. There's no sharing of the real /var. The chroot contains an entirely new system image and so by it's very nature *has* to be a different /etc/passwd than outside the chroot (the package set outside of the chroot could be minimal, inside could include things like postgresql/mysql/httpd/anything else)
(In reply to comment #8)
> The chroot contains an entirely new
> system image and so by it's very nature *has* to be a different /etc/passwd
> than outside the chroot
Then explain how nscd even gets into the picture. The only way to get access is through the socket. If that isn't there nothing in the chroot can use nscd. Of course the process calling chroot will continue to use a possibly existing connection (shared memory regions, to be exact). But this is something you certainly handle.
The process which is laying down files (livecd-creator, etc) is running from outside of the chroot. It can't run from inside of it because the chroot is empty when we start things going. So the livecd-creator process ends up using the "host" nscd which, yes, knows nothing of the /etc/passwd inside of the chroot.
Unfortunately, there's no per-process way that I know of to say "livecd-creator -- don't use nscd". I mean, I guess there's always the ship a wrapper that stops and starts nscd around livecd-creator runs but that seems like an awfully big hammer :-)
(Note: I'm still not sure why there's the manifestation that we can't unmount things, but that's at least less of my concern at this point. The fact that files are getting put down in images with the wrong ownership is a bigger problem for me)
(In reply to comment #10)
> The process which is laying down files (livecd-creator, etc) is running from
> outside of the chroot. It can't run from inside of it because the chroot is
> empty when we start things going. So the livecd-creator process ends up using
> the "host" nscd which, yes, knows nothing of the /etc/passwd inside of the
Which already means you absolutely have to use a separate process to perform all the actions in the chroot that require access to any of the databases under nscd control. Yes, turning of nscd for one process would work but there isn't and won't be such a selector. It would invite yet more chaos.
> (Note: I'm still not sure why there's the manifestation that we can't unmount
That's beyond me as well. nscd cannot have anything to do with it if it doesn't run in the chroot.
I'll move the bug back to livecd-tools.
(In reply to comment #11)
> (In reply to comment #10)
> > The process which is laying down files (livecd-creator, etc) is running from
> > outside of the chroot. It can't run from inside of it because the chroot is
> > empty when we start things going. So the livecd-creator process ends up using
> > the "host" nscd which, yes, knows nothing of the /etc/passwd inside of the
> > chroot.
> Which already means you absolutely have to use a separate process to perform
> all the actions in the chroot that require access to any of the databases under
> nscd control. Yes, turning of nscd for one process would work but there isn't
> and won't be such a selector. It would invite yet more chaos.
librpm doesn't fork off new processes for when it's laying down files within the chroot and doing so isn't likely to win it any favors with people who already complain about the speed of installs. This isn't just livecd-creator, I can reproduce the exact same thing with nscd running and then doing 'rpm --root=/path/to/my/chroot -ivh mysql-server*rpm' (or similar for some other package that creates a user that doesn't exist outside of my chroot.
(In reply to comment #12)
> I can reproduce the exact same thing with nscd running and then doing 'rpm
> --root=/path/to/my/chroot -ivh mysql-server*rpm' (or similar for some other
> package that creates a user that doesn't exist outside of my chroot.
And? Then set up all this chroot nonsense correctly. There is nothing wrong in nscd.
(In reply to comment #13)
> (In reply to comment #12)
> > I can reproduce the exact same thing with nscd running and then doing 'rpm
> > --root=/path/to/my/chroot -ivh mysql-server*rpm' (or similar for some other
> > package that creates a user that doesn't exist outside of my chroot.
> And? Then set up all this chroot nonsense correctly. There is nothing wrong
> in nscd.
So how do you define 'correctly'? There's no reason at all every user that's ever going to be created or used inside of a chroot should need to be defined outside of it. This all works with zero problems as long as the user isn't using nscd.
I can confirm that this issue also exists in RHEL 5.2<
nscd in RHEL 5.1 doesnt seem to have this.
If you want lookups in the chroot to use chroot's nsswitch.conf and passwd/group/..., then doing the lookups through library functions from process started outside of chroot which did at least one lookup outside the chroot is very wrong anyway, even when not using nscd. nsswitch.conf is read just once per process, and if e.g. nsswitch.conf outside of chroot doesn't even use files, or uses ldap/nis/nis+/... first before falling back to files, while nsswitch.conf inside of the chroot is just files, then it will misbehave without nscd in the picture as well.
What you really want is doing the lookups in a process started in the chroot.
You don't need to spawn it all the time for every file or rpm being installed,
it is enough to spawn it just once and you could talk to it over pipe, etc.
It could even use inotify on the /etc/ files and doing some caching for you when the files haven't changed.
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '10'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 10's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 10 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
nscd behaves as it is expected to. No change.