Description of problem: On my rawhide system I saw problems which must be reproducible bugs. The first issue might be timing/race-related, but the following errors shows some error cases which isn't handled as good as they could/should. [root@localhost ~]# livecd-creator -c /usr/share/spin-kickstarts/fedora-livecd-desktop-en_US.ks ... Installing: pulseaudio-module-bluetooth ##################### [962/962] Removing password for user root. passwd: Success umount: /var/tmp/imgcreate-eNHsQm/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) ioctl: LOOP_CLR_FD: Device or resource busy umount: /var/tmp/imgcreate-eNHsQm/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) e2fsck 1.41.4 (27-Jan-2009) fedora-livecd--i: recovering journal Clearing orphaned inode 131081 (uid=0, gid=0, mode=0100644, size=1318912) Clearing orphaned inode 131082 (uid=0, gid=0, mode=0100644, size=491520) Clearing orphaned inode 131080 (uid=0, gid=0, mode=0100644, size=188416) Clearing orphaned inode 131079 (uid=0, gid=0, mode=0100644, size=24576) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached inode 139338 Connect to /lost+found? yes WARNING: PROGRAMMING BUG IN E2FSCK! OR SOME BONEHEAD (YOU) IS CHECKING A MOUNTED (LIVE) FILESYSTEM. inode_link_info[139338] is 2, inode.i_links_count is 1. They should be the same! Inode 139338 ref count is 1, should be 1. Fix? yes Unattached inode 172035 Connect to /lost+found? yes WARNING: PROGRAMMING BUG IN E2FSCK! OR SOME BONEHEAD (YOU) IS CHECKING A MOUNTED (LIVE) FILESYSTEM. inode_link_info[172035] is 2, inode.i_links_count is 1. They should be the same! Inode 172035 ref count is 1, should be 1. Fix? yes Unattached inode 172036 Connect to /lost+found? yes WARNING: PROGRAMMING BUG IN E2FSCK! OR SOME BONEHEAD (YOU) IS CHECKING A MOUNTED (LIVE) FILESYSTEM. inode_link_info[172036] is 2, inode.i_links_count is 1. They should be the same! Inode 172036 ref count is 1, should be 1. Fix? yes Pass 5: Checking group summary information fedora-livecd--i: ***** FILE SYSTEM WAS MODIFIED ***** fedora-livecd--i: 87898/262144 files (0.4% non-contiguous), 595266/1048576 blocks e2fsck 1.41.4 (27-Jan-2009) fedora-livecd--i: recovering journal Pass 1: Checking inodes, blocks, and sizes Inode 49129 is in use, but has dtime set. Fix? yes Inode 49130 is in use, but has dtime set. Fix? yes Inode 134176 has illegal block(s). Clear? yes Illegal block #12 (2766177956) in inode 134176. CLEARED. Illegal block #13 (2376458416) in inode 134176. CLEARED. Illegal block #14 (3768689888) in inode 134176. CLEARED. ... fedora-livecd--i: ***** FILE SYSTEM WAS MODIFIED ***** fedora-livecd--i: 87895/147456 files (0.5% non-contiguous), 586314/587556 blocks /usr/lib/python2.6/site-packages/imgcreate/errors.py:45: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 return unicode(self.message) Error creating Live CD : fsck returned an error! umount: /var/tmp/imgcreate-eNHsQm/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Traceback (most recent call last): File "/usr/bin/livecd-creator", line 140, in <module> sys.exit(main()) File "/usr/bin/livecd-creator", line 135, in main creator.cleanup() File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 575, in cleanup shutil.rmtree(self.__builddir, ignore_errors = True) File "/usr/lib/python2.6/shutil.py", line 216, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.6/shutil.py", line 216, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.6/shutil.py", line 216, in rmtree rmtree(fullname, ignore_errors, onerror) ... File "/usr/lib/python2.6/shutil.py", line 216, in rmtree rmtree(fullname, ignore_errors, onerror) File "/usr/lib/python2.6/shutil.py", line 197, in rmtree if os.path.islink(path): File "/usr/lib/python2.6/posixpath.py", line 135, in islink return stat.S_ISLNK(st.st_mode) File "/usr/lib/python2.6/stat.py", line 56, in S_ISLNK return S_IFMT(mode) == S_IFLNK RuntimeError: maximum recursion depth exceeded umount: /var/tmp/imgcreate-eNHsQm/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Exception RuntimeError: RuntimeError('maximum recursion depth exceeded',) in <bound method x86LiveImageCreator.__del__ of <imgcreate.live.x86LiveImageCreator object at 0x875906c>> ignored umount: /var/tmp/imgcreate-eNHsQm/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Exception RuntimeError: RuntimeError('maximum recursion depth exceeded',) in <bound method x86LiveImageCreator.__del__ of <imgcreate.live.x86LiveImageCreator object at 0x875906c>> ignored ... ad infitum Version-Release number of selected component (if applicable): livecd-tools-024-1.fc11.i586 kernel-PAE-2.6.29.3-140.fc11.i686 system is uptodate with rawhide, updates and updates-testing and some packages from koji
What version of spin-kickstarts is that with? I'm doing builds regularly without problems. Usually this creeps in due to a buggy package which starts something in its %post that it shouldn't (or something in the %post of the kickstart doing so)
spin-kickstarts-0.11.3-1.fc11.noarch The unchecked (suspicious-looking) umount in /usr/lib/python2.6/site-packages/imgcreate/fs.py succeeds fine, but the checked umount of /var/tmp/imgcreate-Lb1BpV/install_root fails because of COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME livecd-cr 11520 root mem REG 7,3 55540 4787 /var/tmp/imgcreate-Lb1BpV/install_root/lib/libnss_files-2.10.1.so livecd-cr 11520 root 12r REG 7,3 28205056 176 /var/tmp/imgcreate-Lb1BpV/install_root/var/lib/rpm/Packages livecd-cr 11520 root 25r REG 7,3 45056 177 /var/tmp/imgcreate-Lb1BpV/install_root/var/lib/rpm/Name
Aha, you're running with nscd enabled. If you disable it, then that will help. There's a glibc (nscd) bug filed about it, but I'm not finding it at the moment. But there's not anything that we can do about it in livecd-creator :(
Bug 481796 - nscd interaction with chroot is poor A warning from livecd-creator could be nice ...
Created attachment 453192 [details] Patch to circumvent libnss_ problem (In reply to comment #3) > Aha, you're running with nscd enabled. If you disable it, then that will help. > There's a glibc (nscd) bug filed about it, but I'm not finding it at the > moment. But there's not anything that we can do about it in livecd-creator :( Just got bitten by the same bug when rebuilding an old FC11 live-cd, attached is my very hackish fix to livecd-creator.
Bug still exists in livecd-tools-14.0-1.fc14.i686 (Fedora-14): # lsof | grep nss | grep live livecd-cr 7862 root mem REG 8,5 54380 1185085 /lib/libnss_files-2.13.so livecd-cr 7862 root mem REG 7,5 49660 5663 /var/tmp/imgcreate-v7Py2N/install_root/lib/libnss_nis-2.13.so livecd-cr 7862 root mem REG 8,5 1275848 827011 /usr/lib/libnss3.so livecd-cr 7862 root mem REG 8,5 107528 807053 /usr/lib/libnssutil3.so Fix from comment 5 still works OK, but hunk 2 has a 4 line offset.
Note: nscd is not running, this is a pure nss problem...
I have never seen any problems when nscd was disabled. Can you reproduce the problem with nscd stopped and a plain /usr/share/spin-kickstarts/fedora-live-base.ks ?
I don't know, but I feel that it's beside the point: the reason I want to build LiveCD's, is that I want to add things to the fedora-live-base, and yes; with a .ks file that includes fedora-live-base.ks and runs a lot of commands in %post, I have a 30-50% failure rate when nis is enabled and nscd is not even installed (see comment #6 above), what happens is libnss_nis gets opened from the created image instead of the host root filesystem. Preloading all libnss modules (see attachment) has always worked so far, and the following text from dlopen manpage indicates that it has a good chance to continue to work: If the same library is loaded again with dlopen(), the same file handle is returned. The dl library maintains reference counts for library handles, so a dynamic library is not deallocated until dlclose() has been called on it as many times as dlopen() has succeeded on it. The _init() routine, if present, is only called once. But a subsequent call with RTLD_NOW may force symbol resolution for a library earlier loaded with RTLD_LAZY. /Anders
The point is that we need to find out what your problem is, how to reproduce it, and if it is the same as reported here. So you know for sure that it is caused by NIS? The problem goes away if you disable NIS? In that case I think it is a different problem, but the _real_ solution would be the same as here: Don't do chroot in long-lived processes. livecd-creator should be rewritten to use separate processes for chroot operations.
Created attachment 479046 [details] Small livecd spec that trigger the problem Problem occurs after openssh-server is installed, guess that it is the 'getent' call in the preinstall script that triggers the loading libnss_nis. Original reporter had problem with libnss_files, so I would say that there is a fair chance that it is the same problem (which is why my fix preloads all libnss_* files, not only the one I have problems with). It is not a problem with nis per se, but with libnss that picks up the libnss_nis library from the wrong place. Installing: cronie ##################### [182/194] livecd-cr 21831 root mem REG 8,5 54380 1185085 /lib/libnss_files-2.13.so livecd-cr 21831 root mem REG 8,5 33992 1177996 /lib/libnss_dns-2.13.so livecd-cr 21831 root mem REG 8,5 1275848 827011 /usr/lib/libnss3.so livecd-cr 21831 root mem REG 8,5 107528 807053 /usr/lib/libnssutil3.so Installing: openssh ##################### [183/194] livecd-cr 21831 root mem REG 8,5 54380 1185085 /lib/libnss_files-2.13.so livecd-cr 21831 root mem REG 8,5 33992 1177996 /lib/libnss_dns-2.13.so livecd-cr 21831 root mem REG 8,5 1275848 827011 /usr/lib/libnss3.so livecd-cr 21831 root mem REG 8,5 107528 807053 /usr/lib/libnssutil3.so Installing: openssh-server ##################### [184/194] livecd-cr 21831 root mem REG 8,5 54380 1185085 /lib/libnss_files-2.13.so livecd-cr 21831 root mem REG 8,5 33992 1177996 /lib/libnss_dns-2.13.so livecd-cr 21831 root mem REG 7,4 49660 4232 /var/tmp/imgcreate-njKDq5/install_root/lib/libnss_nis-2.13.so livecd-cr 21831 root mem REG 8,5 1275848 827011 /usr/lib/libnss3.so livecd-cr 21831 root mem REG 8,5 107528 807053 /usr/lib/libnssutil3.so Installing: sendmail ##################### [185/194] livecd-cr 21831 root mem REG 8,5 54380 1185085 /lib/libnss_files-2.13.so livecd-cr 21831 root mem REG 8,5 33992 1177996 /lib/libnss_dns-2.13.so livecd-cr 21831 root mem REG 7,4 49660 4232 /var/tmp/imgcreate-njKDq5/install_root/lib/libnss_nis-2.13.so livecd-cr 21831 root mem REG 8,5 1275848 827011 /usr/lib/libnss3.so livecd-cr 21831 root mem REG 8,5 107528 807053 /usr/lib/libnssutil3.so
(In reply to comment #10) > The point is that we need to find out what your problem is, how to reproduce > it, and if it is the same as reported here. BTW, if you still has the problem when nscd is running, could you try my fix?