Bug 542167 - livecd-creator fails to unmount
Summary: livecd-creator fails to unmount
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: livecd-tools
Version: 12
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: David Huff
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-11-28 16:33 UTC by James Heather
Modified: 2010-09-11 17:35 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 509427
Environment:
Last Closed: 2010-09-11 17:35:14 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description James Heather 2009-11-28 16:33:21 UTC
+++ This bug was initially created as a clone of Bug #509427 +++

[This was originally filed against the kernel, but it is a livecd-creator bug]

I am trying to create rawhide live cds, without much luck.

After some patching, livecd-creator installs the packages, but when it tries to unmount its install_root loop mount, umount fails, claiming it is busy, and things go south from there.

Surprisingly, the loop mount stays around, and umount keeps claiming it is busy, even after I did all of the following:

switch to runlevel 1 
kill all userspace except for my shell
verify that no kernel thread has any open fds

finally, when I cd into the install root and try to ls something, I am greeted with EXT3 error messages about directory indexes being out of bounds.

This has happened with both ext3 and ext4, current rawhide.

--- Additional comment from mclasen on 2009-07-06 10:32:39 EDT ---

Hmm, for whatever reason, in todays rawhide I don't have the unmount problem.

Instead, I have resize2fs spinning in a loop again :-(

--- Additional comment from fedora-triage-list on 2009-11-16 05:36:57 EDT ---


This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

--- Additional comment from j.heather.uk on 2009-11-28 07:07:39 EDT ---

I am hitting this too. I have a kickstart file that I build as 32-bit and 64-bit isos; I have noticed that the 32-bit image fails more often than not, but I think the 64-bit image always succeeds.

The kickstart file is available if you need it. But in any case, here is the tail end of the output:

  Installing: xorg-x11-drv-savage          ################### [1603/1604] 
  Installing: xorg-x11-drivers             ################### [1604/1604] 

cp: cannot stat `/var/tmp/imgcreate-FZOXxE/install_root/usr/share/doc/HTML/readme-live-image/en_US/readme-live-image-en_US.txt': No such file or directory
umount: /var/tmp/imgcreate-FZOXxE/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
/usr/lib/python2.6/site-packages/imgcreate/errors.py:45: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  return unicode(self.message)
Error creating Live CD : Unable to unmount filesystem at /var/tmp/imgcreate-FZOXxE/install_root
umount: /var/tmp/imgcreate-FZOXxE/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
/usr/lib/python2.6/site-packages/imgcreate/errors.py:40: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  return str(self.message)
Traceback (most recent call last):
  File "/usr/bin/livecd-creator", line 140, in <module>
    sys.exit(main())
  File "/usr/bin/livecd-creator", line 135, in main
    creator.cleanup()
  File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 578, in cleanup
    self.unmount()
  File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 556, in unmount
    self._unmount_instroot()
  File "/usr/lib/python2.6/site-packages/imgcreate/live.py", line 191, in _unmount_instroot
    LoopImageCreator._unmount_instroot(self)
  File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 943, in _unmount_instroot
    self.__instloop.cleanup()
  File "/usr/lib/python2.6/site-packages/imgcreate/fs.py", line 346, in cleanup
    Mount.cleanup(self)
  File "/usr/lib/python2.6/site-packages/imgcreate/fs.py", line 325, in cleanup
    self.unmount()
  File "/usr/lib/python2.6/site-packages/imgcreate/fs.py", line 356, in unmount
    raise MountError("Unable to unmount filesystem at %s" % self.mountdir)
imgcreate.errors.MountError: Unable to unmount filesystem at /var/tmp/imgcreate-FZOXxE/install_root
umount: /var/tmp/imgcreate-FZOXxE/install_root: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
Exception imgcreate.errors.MountError: MountError('Unable to unmount filesystem at /var/tmp/imgcreate-FZOXxE/install_root',) in <bound method x86LiveImageCreator.__del__ of <imgcreate.live.x86LiveImageCreator object at 0x2251390>> ignored

--- Additional comment from j.heather.uk on 2009-11-28 07:45:42 EDT ---

I have a not very happy workaround. I don't know if it works consistently, but it's just worked this one time.

At the end of my kickstart file I now have:

%post
#hack to try to stop umount probs
sync
sleep 30s
%end

It may be that either the sync or the sleep would be sufficient.

Also, in retrospect it would have been better to --nochroot it so that these commands aren't being executed inside the filesystem we want to be able to umount straight afterwards.

--- Additional comment from j.heather.uk on 2009-11-28 11:28:57 EDT ---

OK, I think I have worked out what is going on.

Some background processes are being started, possibly by the rpm installation, that are running within the chrooted environment. In my case, it seems to have been akmods. Because these are running in the background, there is no guarantee that they will have finished by the time we want to umount; i.e., we have a race condition. The 30 secs of sleep in my workaround above usually sorts this out, but it's obviously not the ideal way.

A better workaround is to put the following in a separate kickstart file, and include it right at the end of the kickstart file you're building:

[james@melissa f12]$ cat sleepy-hack.ks 
%post --nochroot
#hack to try to stop umount probs
while (/usr/sbin/lsof /dev/loop* | grep -v "$0" | grep "$INSTALL_ROOT")
do
	sleep 5s
done
%end
[james@melissa f12]$ 

It checks every five seconds whether any relevant files are still open, and doesn't exit until the only thing left is the shell script that runs the above code.

The proper solution is for livecd-creator to do this checking for itself before doing the umount. It doesn't even need to check for open files like above; it just needs to check whether the umount worked, and pause and go back for another try if not.

Comment 1 James Heather 2010-02-12 09:17:32 UTC
Is there any movement on this? This seems like quite an important bug (and probably a five minute fix).

James

Comment 2 James Heather 2010-02-12 12:18:31 UTC
This is from /usr/lib/python2.6/site-packages/imgcreate/creator.py:

    def unmount(self):
        """Unmounts the target filesystem.

        The ImageCreator class detaches the system from the install root, but
        other subclasses may also detach the loopback mounted filesystem image
        from the install root.

        """
        try:
            os.unlink(self._instroot + "/etc/mtab")
        except OSError:
            pass

        self.__destroy_selinuxfs()

        self._undo_bindmounts()

        self._unmount_instroot()

It seems that first unmount operation is being tried, and any exceptions raised are being ignored. The final unmount is being unmounted without any error checking, which is where the fatal exception happens.

All that is needed is for any unmount that fails to be retried after a short sleep.

I'd do this myself, but I don't speak Python. I hope I have given enough information for this to be a very simple fix, though.

James

Comment 3 Alan Pevec 2010-04-23 10:55:07 UTC
> Some background processes are being started, possibly by the rpm installation,
> that are running within the chrooted environment.

That's a broken RPM then, %post must not start anything.

Another thing to check is to make sure nscd is not running, see bugs 501334 481796

Comment 4 James Heather 2010-04-23 11:20:18 UTC
(In reply to comment #3)
> > Some background processes are being started, possibly by the rpm installation,
> > that are running within the chrooted environment.
> 
> That's a broken RPM then, %post must not start anything.

Thanks.

As far as I can see, something like akmods (from rpmfusion), and presumably dkms, has little choice in the matter. When a new kernel gets installed, they need to trigger automatic building and installing of kernel module rpms. This can't be done synchronously within the %post, because the rpm database is locked to allow the kernel to be installed.

So the only option is to kick off a background process that builds the new rpms and then installs them when the database becomes free again.

This is very useful.

James

Comment 5 Fedora Admin XMLRPC Client 2010-05-07 15:41:31 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 6 Bruno Wolff III 2010-09-11 17:35:14 UTC
This has been mostly resolved. The umounts were being blocked because they weren't being done in the correct order. To fix this we use lazy umounts if needed and log the issue (well will do this in th next F14 release, 033 just does lazy umounts). There is still an issue with control c's leaving stuff mounted, but that's a bit outside of this report. Currently in F12 033 is still in testing waiting for karma to move to stable.


Note You need to log in before you can comment on or make changes to this bug.