+++ This bug was initially created as a clone of Bug #509427 +++ [This was originally filed against the kernel, but it is a livecd-creator bug] I am trying to create rawhide live cds, without much luck. After some patching, livecd-creator installs the packages, but when it tries to unmount its install_root loop mount, umount fails, claiming it is busy, and things go south from there. Surprisingly, the loop mount stays around, and umount keeps claiming it is busy, even after I did all of the following: switch to runlevel 1 kill all userspace except for my shell verify that no kernel thread has any open fds finally, when I cd into the install root and try to ls something, I am greeted with EXT3 error messages about directory indexes being out of bounds. This has happened with both ext3 and ext4, current rawhide. --- Additional comment from mclasen on 2009-07-06 10:32:39 EDT --- Hmm, for whatever reason, in todays rawhide I don't have the unmount problem. Instead, I have resize2fs spinning in a loop again :-( --- Additional comment from fedora-triage-list on 2009-11-16 05:36:57 EDT --- This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle. Changing version to '12'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping --- Additional comment from j.heather.uk on 2009-11-28 07:07:39 EDT --- I am hitting this too. I have a kickstart file that I build as 32-bit and 64-bit isos; I have noticed that the 32-bit image fails more often than not, but I think the 64-bit image always succeeds. The kickstart file is available if you need it. But in any case, here is the tail end of the output: Installing: xorg-x11-drv-savage ################### [1603/1604] Installing: xorg-x11-drivers ################### [1604/1604] cp: cannot stat `/var/tmp/imgcreate-FZOXxE/install_root/usr/share/doc/HTML/readme-live-image/en_US/readme-live-image-en_US.txt': No such file or directory umount: /var/tmp/imgcreate-FZOXxE/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) /usr/lib/python2.6/site-packages/imgcreate/errors.py:45: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 return unicode(self.message) Error creating Live CD : Unable to unmount filesystem at /var/tmp/imgcreate-FZOXxE/install_root umount: /var/tmp/imgcreate-FZOXxE/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) /usr/lib/python2.6/site-packages/imgcreate/errors.py:40: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 return str(self.message) Traceback (most recent call last): File "/usr/bin/livecd-creator", line 140, in <module> sys.exit(main()) File "/usr/bin/livecd-creator", line 135, in main creator.cleanup() File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 578, in cleanup self.unmount() File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 556, in unmount self._unmount_instroot() File "/usr/lib/python2.6/site-packages/imgcreate/live.py", line 191, in _unmount_instroot LoopImageCreator._unmount_instroot(self) File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 943, in _unmount_instroot self.__instloop.cleanup() File "/usr/lib/python2.6/site-packages/imgcreate/fs.py", line 346, in cleanup Mount.cleanup(self) File "/usr/lib/python2.6/site-packages/imgcreate/fs.py", line 325, in cleanup self.unmount() File "/usr/lib/python2.6/site-packages/imgcreate/fs.py", line 356, in unmount raise MountError("Unable to unmount filesystem at %s" % self.mountdir) imgcreate.errors.MountError: Unable to unmount filesystem at /var/tmp/imgcreate-FZOXxE/install_root umount: /var/tmp/imgcreate-FZOXxE/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Exception imgcreate.errors.MountError: MountError('Unable to unmount filesystem at /var/tmp/imgcreate-FZOXxE/install_root',) in <bound method x86LiveImageCreator.__del__ of <imgcreate.live.x86LiveImageCreator object at 0x2251390>> ignored --- Additional comment from j.heather.uk on 2009-11-28 07:45:42 EDT --- I have a not very happy workaround. I don't know if it works consistently, but it's just worked this one time. At the end of my kickstart file I now have: %post #hack to try to stop umount probs sync sleep 30s %end It may be that either the sync or the sleep would be sufficient. Also, in retrospect it would have been better to --nochroot it so that these commands aren't being executed inside the filesystem we want to be able to umount straight afterwards. --- Additional comment from j.heather.uk on 2009-11-28 11:28:57 EDT --- OK, I think I have worked out what is going on. Some background processes are being started, possibly by the rpm installation, that are running within the chrooted environment. In my case, it seems to have been akmods. Because these are running in the background, there is no guarantee that they will have finished by the time we want to umount; i.e., we have a race condition. The 30 secs of sleep in my workaround above usually sorts this out, but it's obviously not the ideal way. A better workaround is to put the following in a separate kickstart file, and include it right at the end of the kickstart file you're building: [james@melissa f12]$ cat sleepy-hack.ks %post --nochroot #hack to try to stop umount probs while (/usr/sbin/lsof /dev/loop* | grep -v "$0" | grep "$INSTALL_ROOT") do sleep 5s done %end [james@melissa f12]$ It checks every five seconds whether any relevant files are still open, and doesn't exit until the only thing left is the shell script that runs the above code. The proper solution is for livecd-creator to do this checking for itself before doing the umount. It doesn't even need to check for open files like above; it just needs to check whether the umount worked, and pause and go back for another try if not.
Is there any movement on this? This seems like quite an important bug (and probably a five minute fix). James
This is from /usr/lib/python2.6/site-packages/imgcreate/creator.py: def unmount(self): """Unmounts the target filesystem. The ImageCreator class detaches the system from the install root, but other subclasses may also detach the loopback mounted filesystem image from the install root. """ try: os.unlink(self._instroot + "/etc/mtab") except OSError: pass self.__destroy_selinuxfs() self._undo_bindmounts() self._unmount_instroot() It seems that first unmount operation is being tried, and any exceptions raised are being ignored. The final unmount is being unmounted without any error checking, which is where the fatal exception happens. All that is needed is for any unmount that fails to be retried after a short sleep. I'd do this myself, but I don't speak Python. I hope I have given enough information for this to be a very simple fix, though. James
> Some background processes are being started, possibly by the rpm installation, > that are running within the chrooted environment. That's a broken RPM then, %post must not start anything. Another thing to check is to make sure nscd is not running, see bugs 501334 481796
(In reply to comment #3) > > Some background processes are being started, possibly by the rpm installation, > > that are running within the chrooted environment. > > That's a broken RPM then, %post must not start anything. Thanks. As far as I can see, something like akmods (from rpmfusion), and presumably dkms, has little choice in the matter. When a new kernel gets installed, they need to trigger automatic building and installing of kernel module rpms. This can't be done synchronously within the %post, because the rpm database is locked to allow the kernel to be installed. So the only option is to kick off a background process that builds the new rpms and then installs them when the database becomes free again. This is very useful. James
This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.
This has been mostly resolved. The umounts were being blocked because they weren't being done in the correct order. To fix this we use lazy umounts if needed and log the issue (well will do this in th next F14 release, 033 just does lazy umounts). There is still an issue with control c's leaving stuff mounted, but that's a bit outside of this report. Currently in F12 033 is still in testing waiting for karma to move to stable.