Bug 501334 - livecd-creator unmounts busy device and many errors follows
Summary: livecd-creator unmounts busy device and many errors follows
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: livecd-tools
Version: rawhide
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Jeremy Katz
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-05-18 15:47 UTC by Mads Kiilerich
Modified: 2011-02-16 08:42 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-05-19 18:12:12 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Patch to circumvent libnss_ problem (896 bytes, patch)
2010-10-13 11:17 UTC, Anders Blomdell
no flags Details | Diff
Small livecd spec that trigger the problem (10.85 KB, text/plain)
2011-02-16 08:37 UTC, Anders Blomdell
no flags Details

Description Mads Kiilerich 2009-05-18 15:47:23 UTC
Description of problem:

On my rawhide system I saw problems which must be reproducible bugs. The first issue might be timing/race-related, but the following errors shows some error cases which isn't handled as good as they could/should.

[root@localhost ~]# livecd-creator -c /usr/share/spin-kickstarts/fedora-livecd-desktop-en_US.ks

...

  Installing: pulseaudio-module-bluetooth  ##################### [962/962] 

Removing password for user root.
passwd: Success
umount: /var/tmp/imgcreate-eNHsQm/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
ioctl: LOOP_CLR_FD: Device or resource busy
umount: /var/tmp/imgcreate-eNHsQm/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
e2fsck 1.41.4 (27-Jan-2009)
fedora-livecd--i: recovering journal
Clearing orphaned inode 131081 (uid=0, gid=0, mode=0100644, size=1318912)
Clearing orphaned inode 131082 (uid=0, gid=0, mode=0100644, size=491520)
Clearing orphaned inode 131080 (uid=0, gid=0, mode=0100644, size=188416)
Clearing orphaned inode 131079 (uid=0, gid=0, mode=0100644, size=24576)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Unattached inode 139338
Connect to /lost+found? yes

WARNING: PROGRAMMING BUG IN E2FSCK!
	OR SOME BONEHEAD (YOU) IS CHECKING A MOUNTED (LIVE) FILESYSTEM.
inode_link_info[139338] is 2, inode.i_links_count is 1.  They should be the same!
Inode 139338 ref count is 1, should be 1.  Fix? yes

Unattached inode 172035
Connect to /lost+found? yes

WARNING: PROGRAMMING BUG IN E2FSCK!
	OR SOME BONEHEAD (YOU) IS CHECKING A MOUNTED (LIVE) FILESYSTEM.
inode_link_info[172035] is 2, inode.i_links_count is 1.  They should be the same!
Inode 172035 ref count is 1, should be 1.  Fix? yes

Unattached inode 172036
Connect to /lost+found? yes

WARNING: PROGRAMMING BUG IN E2FSCK!
	OR SOME BONEHEAD (YOU) IS CHECKING A MOUNTED (LIVE) FILESYSTEM.
inode_link_info[172036] is 2, inode.i_links_count is 1.  They should be the same!
Inode 172036 ref count is 1, should be 1.  Fix? yes

Pass 5: Checking group summary information

fedora-livecd--i: ***** FILE SYSTEM WAS MODIFIED *****
fedora-livecd--i: 87898/262144 files (0.4% non-contiguous), 595266/1048576 blocks
e2fsck 1.41.4 (27-Jan-2009)
fedora-livecd--i: recovering journal
Pass 1: Checking inodes, blocks, and sizes
Inode 49129 is in use, but has dtime set.  Fix? yes

Inode 49130 is in use, but has dtime set.  Fix? yes

Inode 134176 has illegal block(s).  Clear? yes

Illegal block #12 (2766177956) in inode 134176.  CLEARED.
Illegal block #13 (2376458416) in inode 134176.  CLEARED.
Illegal block #14 (3768689888) in inode 134176.  CLEARED.
...
fedora-livecd--i: ***** FILE SYSTEM WAS MODIFIED *****
fedora-livecd--i: 87895/147456 files (0.5% non-contiguous), 586314/587556 blocks
/usr/lib/python2.6/site-packages/imgcreate/errors.py:45: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  return unicode(self.message)
Error creating Live CD : fsck returned an error!
umount: /var/tmp/imgcreate-eNHsQm/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
Traceback (most recent call last):
  File "/usr/bin/livecd-creator", line 140, in <module>
    sys.exit(main())
  File "/usr/bin/livecd-creator", line 135, in main
    creator.cleanup()
  File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 575, in cleanup
    shutil.rmtree(self.__builddir, ignore_errors = True)
  File "/usr/lib/python2.6/shutil.py", line 216, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.6/shutil.py", line 216, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.6/shutil.py", line 216, in rmtree
    rmtree(fullname, ignore_errors, onerror)
...
  File "/usr/lib/python2.6/shutil.py", line 216, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "/usr/lib/python2.6/shutil.py", line 197, in rmtree
    if os.path.islink(path):
  File "/usr/lib/python2.6/posixpath.py", line 135, in islink
    return stat.S_ISLNK(st.st_mode)
  File "/usr/lib/python2.6/stat.py", line 56, in S_ISLNK
    return S_IFMT(mode) == S_IFLNK
RuntimeError: maximum recursion depth exceeded
umount: /var/tmp/imgcreate-eNHsQm/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
Exception RuntimeError: RuntimeError('maximum recursion depth exceeded',) in <bound method x86LiveImageCreator.__del__ of <imgcreate.live.x86LiveImageCreator object at 0x875906c>> ignored
umount: /var/tmp/imgcreate-eNHsQm/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
Exception RuntimeError: RuntimeError('maximum recursion depth exceeded',) in <bound method x86LiveImageCreator.__del__ of <imgcreate.live.x86LiveImageCreator object at 0x875906c>> ignored

... ad infitum


Version-Release number of selected component (if applicable):

livecd-tools-024-1.fc11.i586
kernel-PAE-2.6.29.3-140.fc11.i686
system is uptodate with rawhide, updates and updates-testing and some packages from koji

Comment 1 Jeremy Katz 2009-05-18 20:26:35 UTC
What version of spin-kickstarts is that with?  I'm doing builds regularly without problems.  Usually this creeps in due to a buggy package which starts something in its %post that it shouldn't (or something in the %post of the kickstart doing so)

Comment 2 Mads Kiilerich 2009-05-19 13:13:56 UTC
spin-kickstarts-0.11.3-1.fc11.noarch

The unchecked (suspicious-looking) umount in /usr/lib/python2.6/site-packages/imgcreate/fs.py succeeds fine, but the checked umount of /var/tmp/imgcreate-Lb1BpV/install_root fails because of 

COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
livecd-cr 11520 root  mem    REG    7,3    55540 4787 /var/tmp/imgcreate-Lb1BpV/install_root/lib/libnss_files-2.10.1.so
livecd-cr 11520 root   12r   REG    7,3 28205056  176 /var/tmp/imgcreate-Lb1BpV/install_root/var/lib/rpm/Packages
livecd-cr 11520 root   25r   REG    7,3    45056  177 /var/tmp/imgcreate-Lb1BpV/install_root/var/lib/rpm/Name

Comment 3 Jeremy Katz 2009-05-19 18:12:12 UTC
Aha, you're running with nscd enabled.  If you disable it, then that will help.  There's a glibc (nscd) bug filed about it, but I'm not finding it at the moment.  But there's not anything that we can do about it in livecd-creator :(

Comment 4 Mads Kiilerich 2009-12-04 23:35:32 UTC
Bug 481796 - nscd interaction with chroot is poor

A warning from livecd-creator could be nice ...

Comment 5 Anders Blomdell 2010-10-13 11:17:52 UTC
Created attachment 453192 [details]
Patch to circumvent libnss_ problem

(In reply to comment #3)
> Aha, you're running with nscd enabled.  If you disable it, then that will help.
>  There's a glibc (nscd) bug filed about it, but I'm not finding it at the
> moment.  But there's not anything that we can do about it in livecd-creator :(

Just got bitten by the same bug when rebuilding an old FC11 live-cd, attached is my very hackish fix to livecd-creator.

Comment 6 Anders Blomdell 2011-02-15 10:08:32 UTC
Bug still exists in livecd-tools-14.0-1.fc14.i686 (Fedora-14):

# lsof | grep nss | grep live
livecd-cr  7862      root  mem       REG        8,5    54380    1185085 /lib/libnss_files-2.13.so
livecd-cr  7862      root  mem       REG        7,5    49660       5663 /var/tmp/imgcreate-v7Py2N/install_root/lib/libnss_nis-2.13.so
livecd-cr  7862      root  mem       REG        8,5  1275848     827011 /usr/lib/libnss3.so
livecd-cr  7862      root  mem       REG        8,5   107528     807053 /usr/lib/libnssutil3.so

Fix from comment 5 still works OK, but hunk 2 has a 4 line offset.

Comment 7 Anders Blomdell 2011-02-15 16:22:29 UTC
Note: nscd is not running, this is a pure nss problem...

Comment 8 Mads Kiilerich 2011-02-15 16:50:08 UTC
I have never seen any problems when nscd was disabled.

Can you reproduce the problem with nscd stopped and a plain /usr/share/spin-kickstarts/fedora-live-base.ks ?

Comment 9 Anders Blomdell 2011-02-15 17:09:26 UTC
I don't know, but I feel that it's beside the point: the reason I want to build LiveCD's, is that I want to add things to the fedora-live-base, and yes; with a .ks file that includes fedora-live-base.ks and runs a lot of commands in %post, I have a 30-50% failure rate when nis is enabled and nscd is not even installed (see comment #6 above), what happens is libnss_nis gets opened from the created image instead of the host root filesystem. 

Preloading all libnss modules (see attachment) has always worked so far, and the following text from dlopen manpage indicates that it has a good chance to continue to work:

       If the same library is loaded again with dlopen(), the same file handle
       is returned.  The dl library maintains  reference  counts  for  library
       handles,  so  a  dynamic library is not deallocated until dlclose() has
       been called on it as many times as dlopen() has succeeded on  it.   The
       _init()  routine,  if  present,  is only called once.  But a subsequent
       call with RTLD_NOW may force symbol resolution for  a  library  earlier
       loaded with RTLD_LAZY.

/Anders

Comment 10 Mads Kiilerich 2011-02-15 17:52:58 UTC
The point is that we need to find out what your problem is, how to reproduce it, and if it is the same as reported here.

So you know for sure that it is caused by NIS? The problem goes away if you disable NIS?

In that case I think it is a different problem, but the _real_ solution would be the same as here: Don't do chroot in long-lived processes. livecd-creator should be rewritten to use separate processes for chroot operations.

Comment 11 Anders Blomdell 2011-02-16 08:37:40 UTC
Created attachment 479046 [details]
Small livecd spec that trigger the problem

Problem occurs after openssh-server is installed, guess that it is the 'getent' call in the preinstall script that triggers the loading libnss_nis. Original reporter had problem with libnss_files, so I would say that there is a fair chance that it is the same problem (which is why my fix preloads all libnss_* files, not only the one I have problems with). It is not a problem with nis per se, but with libnss that picks up the libnss_nis library from the wrong place.

  Installing: cronie                       ##################### [182/194]
livecd-cr 21831 root mem REG 8,5    54380    1185085 /lib/libnss_files-2.13.so
livecd-cr 21831 root mem REG 8,5    33992    1177996 /lib/libnss_dns-2.13.so
livecd-cr 21831 root mem REG 8,5  1275848     827011 /usr/lib/libnss3.so
livecd-cr 21831 root mem REG 8,5   107528     807053 /usr/lib/libnssutil3.so
  Installing: openssh                      ##################### [183/194]
livecd-cr 21831 root mem REG 8,5    54380    1185085 /lib/libnss_files-2.13.so
livecd-cr 21831 root mem REG 8,5    33992    1177996 /lib/libnss_dns-2.13.so
livecd-cr 21831 root mem REG 8,5  1275848     827011 /usr/lib/libnss3.so
livecd-cr 21831 root mem REG 8,5   107528     807053 /usr/lib/libnssutil3.so
  Installing: openssh-server               ##################### [184/194]
livecd-cr 21831 root mem REG 8,5    54380    1185085 /lib/libnss_files-2.13.so
livecd-cr 21831 root mem REG 8,5    33992    1177996 /lib/libnss_dns-2.13.so
livecd-cr 21831 root mem REG 7,4    49660       4232 /var/tmp/imgcreate-njKDq5/install_root/lib/libnss_nis-2.13.so
livecd-cr 21831 root mem REG 8,5  1275848     827011 /usr/lib/libnss3.so
livecd-cr 21831 root mem REG 8,5   107528     807053 /usr/lib/libnssutil3.so
  Installing: sendmail                     ##################### [185/194]
livecd-cr 21831 root mem REG 8,5    54380    1185085 /lib/libnss_files-2.13.so
livecd-cr 21831 root mem REG 8,5    33992    1177996 /lib/libnss_dns-2.13.so
livecd-cr 21831 root mem REG 7,4    49660       4232 /var/tmp/imgcreate-njKDq5/install_root/lib/libnss_nis-2.13.so
livecd-cr 21831 root mem REG 8,5  1275848     827011 /usr/lib/libnss3.so
livecd-cr 21831 root mem REG 8,5   107528     807053 /usr/lib/libnssutil3.so

Comment 12 Anders Blomdell 2011-02-16 08:42:08 UTC
(In reply to comment #10)
> The point is that we need to find out what your problem is, how to reproduce
> it, and if it is the same as reported here.
BTW, if you still has the problem when nscd is running, could you try my fix?


Note You need to log in before you can comment on or make changes to this bug.