Bug 509427
Summary: | livecd-creator fails to unmount | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Matthias Clasen <mclasen> | ||||||
Component: | livecd-tools | Assignee: | David Huff <dhuff> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 12 | CC: | apevec, bruno, chenrano2002, c.shoemaker, david.r, davidz, dcantrell, dhuff, drfudgeboy, fedora, itamar, Jasper.Hartline, julian.fedora, katzj, kernel-maint, mads, massi.ergosum, mkhan01, rpandit, sgupta, wtogami | ||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | livecd-tools-034-7.fc14 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 542167 (view as bug list) | Environment: | |||||||
Last Closed: | 2010-09-16 03:47:38 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 617844 | ||||||||
Bug Blocks: | |||||||||
Attachments: |
|
Description
Matthias Clasen
2009-07-02 18:11:14 UTC
Hmm, for whatever reason, in todays rawhide I don't have the unmount problem. Instead, I have resize2fs spinning in a loop again :-( This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle. Changing version to '12'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping I am hitting this too. I have a kickstart file that I build as 32-bit and 64-bit isos; I have noticed that the 32-bit image fails more often than not, but I think the 64-bit image always succeeds. The kickstart file is available if you need it. But in any case, here is the tail end of the output: Installing: xorg-x11-drv-savage ################### [1603/1604] Installing: xorg-x11-drivers ################### [1604/1604] cp: cannot stat `/var/tmp/imgcreate-FZOXxE/install_root/usr/share/doc/HTML/readme-live-image/en_US/readme-live-image-en_US.txt': No such file or directory umount: /var/tmp/imgcreate-FZOXxE/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) /usr/lib/python2.6/site-packages/imgcreate/errors.py:45: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 return unicode(self.message) Error creating Live CD : Unable to unmount filesystem at /var/tmp/imgcreate-FZOXxE/install_root umount: /var/tmp/imgcreate-FZOXxE/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) /usr/lib/python2.6/site-packages/imgcreate/errors.py:40: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 return str(self.message) Traceback (most recent call last): File "/usr/bin/livecd-creator", line 140, in <module> sys.exit(main()) File "/usr/bin/livecd-creator", line 135, in main creator.cleanup() File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 578, in cleanup self.unmount() File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 556, in unmount self._unmount_instroot() File "/usr/lib/python2.6/site-packages/imgcreate/live.py", line 191, in _unmount_instroot LoopImageCreator._unmount_instroot(self) File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 943, in _unmount_instroot self.__instloop.cleanup() File "/usr/lib/python2.6/site-packages/imgcreate/fs.py", line 346, in cleanup Mount.cleanup(self) File "/usr/lib/python2.6/site-packages/imgcreate/fs.py", line 325, in cleanup self.unmount() File "/usr/lib/python2.6/site-packages/imgcreate/fs.py", line 356, in unmount raise MountError("Unable to unmount filesystem at %s" % self.mountdir) imgcreate.errors.MountError: Unable to unmount filesystem at /var/tmp/imgcreate-FZOXxE/install_root umount: /var/tmp/imgcreate-FZOXxE/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Exception imgcreate.errors.MountError: MountError('Unable to unmount filesystem at /var/tmp/imgcreate-FZOXxE/install_root',) in <bound method x86LiveImageCreator.__del__ of <imgcreate.live.x86LiveImageCreator object at 0x2251390>> ignored I have a not very happy workaround. I don't know if it works consistently, but it's just worked this one time. At the end of my kickstart file I now have: %post #hack to try to stop umount probs sync sleep 30s %end It may be that either the sync or the sleep would be sufficient. Also, in retrospect it would have been better to --nochroot it so that these commands aren't being executed inside the filesystem we want to be able to umount straight afterwards. OK, I think I have worked out what is going on. Some background processes are being started, possibly by the rpm installation, that are running within the chrooted environment. In my case, it seems to have been akmods. Because these are running in the background, there is no guarantee that they will have finished by the time we want to umount; i.e., we have a race condition. The 30 secs of sleep in my workaround above usually sorts this out, but it's obviously not the ideal way. A better workaround is to put the following in a separate kickstart file, and include it right at the end of the kickstart file you're building: [james@melissa f12]$ cat sleepy-hack.ks %post --nochroot #hack to try to stop umount probs while (/usr/sbin/lsof /dev/loop* | grep -v "$0" | grep "$INSTALL_ROOT") do sleep 5s done %end [james@melissa f12]$ It checks every five seconds whether any relevant files are still open, and doesn't exit until the only thing left is the shell script that runs the above code. The proper solution is for livecd-creator to do this checking for itself before doing the umount. It doesn't even need to check for open files like above; it just needs to check whether the umount worked, and pause and go back for another try if not. I have the same problem with my FC12 (32-bit, ext4 FS). It seems to be related to the yum cache. The workaround from J.Heather (while loop) works for me. Here the log: ... cp: cannot stat `/var/tmp/imgcreate-pvCyyp/install_root/usr/share/doc/HTML/readme-live-image/en_US/readme-live-image-en_US.txt': No such file or directory umount: /var/tmp/imgcreate-pvCyyp/install_root/var/cache/yum: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) /usr/lib/python2.6/site-packages/imgcreate/errors.py:45: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 return unicode(self.message) Error creating Live CD : Unable to unmount filesystem at /var/tmp/imgcreate-pvCyyp/install_root//var/cache/yum umount: /var/tmp/imgcreate-pvCyyp/install_root/var/cache/yum: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) /usr/lib/python2.6/site-packages/imgcreate/errors.py:40: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 return str(self.message) Traceback (most recent call last): File "/usr/bin/livecd-creator", line 140, in <module> sys.exit(main()) File "/usr/bin/livecd-creator", line 135, in main creator.cleanup() File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 578, in cleanup self.unmount() File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 554, in unmount self._undo_bindmounts() File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 301, in _undo_bindmounts b.unmount() File "/usr/lib/python2.6/site-packages/imgcreate/fs.py", line 113, in unmount raise MountError("Unable to unmount filesystem at %s" % self.dest) imgcreate.errors.MountError: Unable to unmount filesystem at /var/tmp/imgcreate-pvCyyp/install_root//var/cache/yum umount: /var/tmp/imgcreate-pvCyyp/install_root/var/cache/yum: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Exception imgcreate.errors.MountError: MountError('Unable to unmount filesystem at /var/tmp/imgcreate-pvCyyp/install_root//var/cache/yum',) in <bound method x86LiveImageCreator.__del__ of <imgcreate.live.x86LiveImageCreator object at 0x92dd2ac>> ignored Following workaround listed above by J. Heather, my build was getting stuck on libnss_files. Per this similar bug solution to turn off nscd worked for me. https://bugzilla.redhat.com/show_bug.cgi?id=501334 What's the best way to recover from this kind of error? (In reply to comment #8) > What's the best way to recover from this kind of error? The most reliable solution I've found so far is to simply reboot and try again. Ok, folks, I think I should close this as NOTABUG, and you can all go off and either file bugs against packages that are doing the wrong thing, or against livecd-creator to ... wait longer? Not sure. (or I could reassign this one to livecd-creator...) But it sounds like things do not unmount because they are kept busy, which is the proper behavior for the -kernel- in any case... Thoughts? -Eric Hehe, I failed to notice this was filed against 'kernel'. It's not a kernel bug. IMO, it's a livecd-creator bug. Awesome! Make it so. And now I only have 131 open bugs or so ;) (In reply to comment #11) > Hehe, I failed to notice this was filed against 'kernel'. It's not a kernel > bug. IMO, it's a livecd-creator bug. I agree. But I think livecd-tools should be more fitting than LiveCD (which is for bugs related to the officially composed desktop spin I guess). This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component. the failure to unmount is due to: 1) The rpm database being left open on exit from an exception. 2) /proc /sys and /dev bind mounts not being unmounted first, when an exception occurs. Exception will occur in the repo setup if it can't find a repository in the kickstart file, causing the program to exist leaving at least the bind mounts left.. and if you actually do install a package or two and decide to ctrl-c to exit, it will leave at least the RPM database open from Yum/LiveCD creator. Maybe ideally calling cleanup() with the correct iterations through the bind mounts will clear up the bind mounts being left open on an exception. Just catch the exception, go to cleanup() who has to unmount /proc /sys /dev from install_root Secondly also do ts.closeDB() when an exception is caught, since livecd-creator uses Yum functions YumBase for example and librpm, the RPM database stays open for future transactions until explicitly closed. This is covered in rpm-python bindings documents. ts = TransactionSet() ts.closeDB() -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers -- Fedora Bugzappers volunteer triage team https://fedoraproject.org/wiki/BugZappers I am facing this issue on F13 as well. I also met this issue in F12, livecd-creator failed and leaves the umount loop device. see the output: [ray@localhost spin-kickstarts]$ sudo livecd-creator -c autoqa.ks -t /tmp/ mke2fs 1.41.9 (22-Aug-2009) Filesystem label=livecd-autoqa-i6 OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 196608 inodes, 786432 blocks 7864 blocks (1.00%) reserved for the super user First data block=0 Maximum filesystem blocks=805306368 24 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912 Writing inode tables: done Creating journal (16384 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 37 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override. tune2fs 1.41.9 (22-Aug-2009) Setting maximal mount count to -1 Setting interval between checks to 0 seconds /usr/lib/python2.6/site-packages/imgcreate/errors.py:45: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 return unicode(self.message) Error creating Live CD : Unable to download from repo : Cannot retrieve repository metadata (repomd.xml) for repository: autoqa. Please verify its path and try again umount: /tmp/imgcreate-rx6cbZ/install_root/var/cache/yum: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Traceback (most recent call last): File "/usr/bin/livecd-creator", line 140, in <module> sys.exit(main()) File "/usr/bin/livecd-creator", line 135, in main creator.cleanup() File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 578, in cleanup self.unmount() File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 554, in unmount self._undo_bindmounts() File "/usr/lib/python2.6/site-packages/imgcreate/creator.py", line 301, in _undo_bindmounts b.unmount() File "/usr/lib/python2.6/site-packages/imgcreate/fs.py", line 113, in unmount raise MountError("Unable to unmount filesystem at %s" % self.dest) imgcreate.errors.MountError/usr/lib/python2.6/site-packages/imgcreate/errors.py:40: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 return str(self.message) : Unable to unmount filesystem at /tmp/imgcreate-rx6cbZ/install_root//var/cache/yum umount: /tmp/imgcreate-rx6cbZ/install_root/var/cache/yum: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) Exception imgcreate.errors.MountError: MountError('Unable to unmount filesystem at /tmp/imgcreate-rx6cbZ/install_root//var/cache/yum',) in <bound method x86LiveImageCreator.__del__ of <imgcreate.live.x86LiveImageCreator object at 0x93ae52c>> ignored [ray@localhost ~]$ sudo df -h | grep loop /dev/loop0 3.0G 69M 2.9G 3% /tmp/imgcreate-rx6cbZ/install_root [ray@localhost ~]$ sudo umount /tmp/imgcreate-rx6cbZ/install_root umount: /tmp/imgcreate-rx6cbZ/install_root: device is busy. (In some cases useful info about processes that use the device is found by lsof(8) or fuser(1)) *** Bug 600320 has been marked as a duplicate of this bug. *** (In reply to comment #18) > [ray@localhost spin-kickstarts]$ sudo livecd-creator -c autoqa.ks -t /tmp/ Please attach autoqa.ks. Thanks. (In reply to comment #20) > (In reply to comment #18) > > [ray@localhost spin-kickstarts]$ sudo livecd-creator -c autoqa.ks -t /tmp/ > > Please attach autoqa.ks. Thanks. fedora-live-base.ks can be found in spin-kickstarts SIG. autoqa.ks is listed here: [ray@localhost qemu]$ cat autoqa.ks # Maintained by the Fedora Desktop SIG: # http://fedoraproject.org/wiki/SIGs/Desktop # fedora-desktop-list %include fedora-live-base.ks repo --name=autoqa --baseurl=http://jlaska.fedorapeople.org/autoqa/repos/$releasever %packages autoqa autotest-server httpd mysql-server %end %post cat >> /etc/rc.d/init.d/livesys << EOF # Enable services chkconfig httpd on chkconfig mysqld on EOF cat >> /etc/rc.d/init.d/livesys-late << EOF # Change mysql root user passwd mysqladmin -u root password root # Setup databases mysql -u root -e "create database autotest_web; \ grant all privileges on autotest_web.* TO 'autotest'@'localhost' identified by ''; \ grant SELECT on autotest_web.* TO 'nobody'@'%'; \ grant SELECT on autotest_web.* TO 'nobody'@'localhost';" mysql -u root -e "create database tko; \ grant all privileges on tko.* TO 'autotest'@'localhost' identified by ''; \ grant SELECT on tko.* TO 'nobody'@'%'; \ grant SELECT on tko.* TO 'nobody'@'localhost'; \ flush privileges;" # Update autotest config for CFG in /usr/share/autotest/global_config.ini /usr/share/autotest/shadow_config.ini ; do # Clear password sed -i $CFG -e "s|NEWPASSWORD||" # Set hostname sed -i $CFG -e "s|^hostname:.*|hostname: localhost|" done # Pre-populate database python /usr/share/autotest/database/migrate.py --database=AUTOTEST_WEB sync python /usr/share/autotest/database/migrate.py --database=TKO sync # Run Django sync-db python /usr/share/autotest/frontend/manage.py syncdb --noinput # Enable super-user mode for default user mysql -u root -e "insert into autotest_web.auth_user(username, password, is_staff, is_active, is_superuser, last_login, date_joined) VALUES ('debug_user', 'apache authentication', 1, 1, 1, now(), now());" EOF %end I was asked to place some comments here after debugging on #fedora. The bug appeared when a bad repo was given to the livecd-creator; I think this is probably a specific instance of this more general problem. After the run the following is listed in lsof: python2.6 11438 root 4w REG 8,2 0 4235293 /var/tmp/imgcreate-RVjUsY/install_root/var/cache/yum/local/repomdDfMxwZtmp.xml (deleted) (or similar). We had already added a new exception to the first try in yum/yumRepo.py 's _getFileRepoXML(): except: misc.unlink_f(tfname) raise So the fact the file is deleted looks likely due to this new code (otherwise it would have remained), and whatever exception it is catching from _getFile() is happening while the file handle is open for writing. We didn't get much further down that that, but it looks like URLGrabber is failing in some instances to correctly close down open file handles before it raises exceptions, causing them to be left open (and files to remain on the fs) and preventing the mount being removable. I will be proposing the attached patch below to be committed to git as soon as possible. The proposed patch uses a lazy unmount on the filesystems which works to allow livecd-creator not completely faceplant when for example a file handle is left open until we can further track down the issue within Yum or python-urlgrabber. Without the proposed patch attached committed to git we eventially run out of loop devices which forces a user to unmount directories and release loop devices by hand, which is unneccessary. Bruno, would you be so inclined to commit this so that we can work around this issue? This patch leaves no mounted directories behind nor any bound loop devices and may be a better solution for now than taking the man hours to track down the issue where Yum or python-urlgrabber does not finish cleanly when you have an issue with the kickstart file which should not produce results like we are seeing here. Thank you. Created attachment 434094 [details]
Use -l flag in fs.py umount function for umount
Comment above. Use lazy unmount for directories in fs.py umount function to alleviate a bug which causes livecd-creator to faceplant when the repository is not accessible for any reason in the kickstart file.
Also will fix the original bug filer's claim of unable to unmount due to open filehandles when an exception causes a domino effect in livecd-creator. livecd-creator should always clean up nicely after it's self, and this patch will fix this issue.
I'll at least review it. I'll try to get to it this weekend, though I am working on a another patch review and getting more info on the blocker bugs so I might run out of time. In reply to Comment 25 Bug 617844 is probably the root cause of the failures to unmount. If you take a look at that it should shed some light that if anything bad occurs in that class and is not caught, it simply leaks file descriptors and doesn't clean them up before moving on, which is why the filesystems cannot be unmounted. Until the grabber.py urlgrabber issue is fixed, we will be seeing this issue (unless we use the lazy unmount until then) A modified version of the lazy umount patch has been applied. A new build of livecd-creator hasn't been done yet, but should be over the weekend. Bug 617844 depnds on this one. It will be fixed and updated in the next release at which time we can revert our lazy unmount patch I believe and this issue could then be close.d I would actually suggest that you keep the lazy umount patch, but print out warnings about file handles still open. Any kind of bug which leaves an open file handle in any of the stack would result in the iso being unmountable, which isn't terribly desirable, but similarly lazy umount effectively papers over bugs which should be fixed elsewhere. Keeping the behaviour but warning of bugs would be a good compromise. In reply to Comment 29 Well this is true. The only one found which resulted directly from urlgrabber was the one mentioned. You do have a point though.. with using the lazy umount anyways. Like the comments above which mention sleeping to work around daemons which have been started in the fake root filesystem. None are normally started but when it starts dealing with user created custom kickstart files, you never really know what is going to be going on during a build process. I guess we could leave it or try a normal umount, if it fails try lazy unmounting.. which shouldn't have iumplications on the bad side, but it is up to the kernel when they are unmounted. I think the best (feasible and long term most simple and stable) solution would be to run the "dangerous" stuff in separate processes. When the process terminates we are quite sure that all its resources has been released. IIUC cgroups can be used to make sure that all subprocesses are terminated too. Everything that chroots is probably dangerous, but there might be leaks not covered by that. It might be possible to use python-pp, but I'm not sure that would address 481796#c11 by default. (I assume that pp defaults to fork without exec by default and thus might leak stuff into the chroot. IIRC it do "fork with exec" (CreateProcess) on windows, and that mode might work better for our purpose.) One reason these "fails to unmount" errors are so annoying is that they are so hard to debug. It could be convenient to have a mode where it doesn't use lazy unmount but prints a warning and pauses if umount fails - or perhaps enters pdb. That will make it possible to investigate exactly what is preventing the umount. Perhaps it also could do something to help debug instead of just sleeping before (re)trying. I'd like to see the lazy umounts stay. This helps cleanup if there are problems and shouldn't cause a problem if there aren't any. If someone wants to expand the patch later to try normal mounts first and print a warning that a lazy umount was needed, I'm fine with that. That should probably be an RFE bug if you don't want it to be forgotten. I'll have to agree with Comment 32 on this. livecd-tools-033-1 has been built for rawhide and should fix this. Created attachment 435756 [details]
Unmount and if we have to, lazy unmount.
This patch is important for scripted or cron'ed livecd-creator processes where when something goes wrong, and cannot unmount the filesystem due to open files or some other reason, the loop devices don't get eaten up.
Unless we really need to, I'd like to defer applying this until the alpha release is out. We have enough issues without introducing more changes right now. After that, I'll get it applied. This patch is now committed upstream. livecd-tools-034-1.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/livecd-tools-034-1.fc14 livecd-tools-034-2.fc14 has been pushed to the Fedora 14 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update livecd-tools'. You can provide feedback for this update here: https://admin.fedoraproject.org/updates/livecd-tools-034-2.fc14 livecd-tools-034-7.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/livecd-tools-034-7.fc14 livecd-tools-034-7.fc14 has been pushed to the Fedora 14 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update livecd-tools'. You can provide feedback for this update here: https://admin.fedoraproject.org/updates/livecd-tools-034-7.fc14 livecd-tools-034-7.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report. |