Bug 506644

Summary: A device mapper ioctl fails resulting in livecd-creator terminating without building an iso
Product: [Fedora] Fedora Reporter: Bruno Wolff III <bruno>
Component: livecd-toolsAssignee: Jeremy Katz <katzj>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 11CC: davidz, katzj, mads
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-08-17 13:47:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
script output from the run none

Description Bruno Wolff III 2009-06-18 04:06:34 UTC
Created attachment 348374 [details]
script output from the run

Description of problem:
After building the root file system and running a few fsck's livecd-creator dies after a device mapper ioctl command fails.

Version-Release number of selected component (if applicable):
livecd-tools-024-1.fc11.x86_64

How reproducible:
I haven't had a chance to make it happen again yet, but the problem was reported on one of the mailing lists so someone else is seeing it.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
e2fsck 1.41.4 (27-Jan-2009)^M
Pass 1: Checking inodes, blocks, and sizes^M
Pass 2: Checking directory structure^M
Pass 3: Checking directory connectivity^M
Pass 4: Checking reference counts^M
Pass 5: Checking group summary information^M
Fedora-11-Live-g: 238934/655360 files (1.2% non-contiguous), 2119917/2621440 blocks^M
device-mapper: remove ioctl failed: Device or resource busy^M
Command failed^M
/usr/lib/python2.6/site-packages/imgcreate/errors.py:45: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6^M
  return unicode(self.message)^M
Error creating Live CD : Could not remove snapshot device^M


Expected results:


Additional info:

From /var/log/messages:
Jun 17 17:28:37 cerberus kernel: device-mapper: ioctl: unable to remove open device imgcreate-14955-61209

Comment 1 Bruno Wolff III 2009-06-18 13:02:52 UTC
It is not happening all of the time. A test I ran last not on an i386 arch ran to normal completion. I'll be retesting on the machine I saw the problem which is an x86_64 arch machine.

Comment 2 Bruno Wolff III 2009-06-18 15:09:19 UTC
It's definitely intermittant. I retested on the machine I originally saw the problem and things ran normally to completion.

Comment 3 Jeremy Katz 2009-06-18 15:26:59 UTC
What's the host running?  I saw this once and then realized I was running and older gvfs2 stack on the box and that the desktop was mounting the snapshot (from which point, no, it couldn't be removed)

Comment 4 Bruno Wolff III 2009-06-18 20:17:23 UTC
I am not sure what the context is.
It's a Fedora 11 machine with X86_64 and some i586 libs.
I have F11, updates, updates-testing, rpm fusion, updates, updates-testing and livna repos active.
I am seeing something else odd with boots that might be device mapper related.
I am going to try some more livecd-creator runs to see if it keeps happening or starts happening on the other machine.

Comment 5 Bruno Wolff III 2009-06-19 13:28:38 UTC
Retesting on the x86_64 machine got a normal run. So maybe the system was in an odd state when I had the problem occur. I doubt there is much to be done unless I can get it to happen again.

Comment 6 Jeremy Katz 2009-06-24 15:25:33 UTC
I've been doing a fair bit of builds and haven't hit it on my x86_64 F11 box (since I haven't updated to rawhide since I'm not in the office for a few weeks and don't want to be without my build box)

If you see it again, feel free to reopen but this might just be transient device-mapper bogons :/

Comment 7 Bruno Wolff III 2009-06-24 19:21:58 UTC
That's OK. I was going to do the same thing; I just hadn't gotten to it yet.
I haven't seen the failure since (including yesterday when a spin build worked fine), so I am thinking my system probably had gotten into a weird state. And there have been updates since then so even if something had been broken then, it might not be any more.

Comment 8 Mads Kiilerich 2009-08-13 11:16:20 UTC
I just saw it with livecd-tools-024-1.fc11.i586:

Losetup add /dev/loop0 mapping to /var/tmp/imgcreate-O5x5lH/tmp-I2Cboh/ext3fs.img
Extending sparse file /var/tmp/imgcreate-O5x5lH/iso-2hBDyT/LiveOS/osmin to 67108864
Losetup add /dev/loop1 mapping to /var/tmp/imgcreate-O5x5lH/iso-2hBDyT/LiveOS/osmin
device-mapper: remove ioctl failed: Device or resource busy
Command failed
/usr/lib/python2.6/site-packages/imgcreate/errors.py:45: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6
  return unicode(self.message)
Error creating Live CD : Could not remove snapshot device

and with kernel message
device-mapper: ioctl: unable to remove open device imgcreate-3660-14231

I think I have worked around this issue before by adding 1 s sleeps some places.

I think this issue is real even though it can't be reproduced reliably, so I will reopen it.

I have DeviceKit-disks-004-4.fc11.i586, so I assume that it isn't bug 488756

Comment 9 Jeremy Katz 2009-08-14 17:43:22 UTC
Where are you adding sleeps that seems to help?

Comment 10 Mads Kiilerich 2009-08-14 18:55:03 UTC
I think that something like this has helped before, but right now it works for me, so i can not confirm that it makes a difference. But I'm sure I'll notice when it happens ...

--- /usr/lib/python2.6/site-packages/imgcreate/fs.py.org	2009-08-13 13:59:23.000000000 +0200
+++ /usr/lib/python2.6/site-packages/imgcreate/fs.py	2009-08-14 17:08:04.000000000 +0200
@@ -511,7 +511,13 @@
         if not self.__created:
             return
 
-        rc = subprocess.call(["/sbin/dmsetup", "remove", self.__name])
+        for i in range(10):
+            rc = subprocess.call(["/sbin/dmsetup", "remove", self.__name])
+            if rc == 0:
+                break
+            logging.debug('%s: Error removing snapshot device %s' % (i, self.__name))
+            import time ; time.sleep(0.1)
+            import os; os.system('xeyes')
         if not ignore_errors and rc != 0:
             raise SnapshotError("Could not remove snapshot device")

But perhaps a fuser -m would be interesting here too?

It seemed to me like ignore_errors was set in too many cases.

Comment 11 Mads Kiilerich 2009-08-14 21:20:11 UTC
Got it!

...
Losetup add /dev/loop0 mapping to /var/tmp/imgcreate-xOyoxY/tmp-EHEKod/ext3fs.img
Extending sparse file /var/tmp/imgcreate-xOyoxY/iso-XAmGwe/LiveOS/osmin to 67108864
Losetup add /dev/loop1 mapping to /var/tmp/imgcreate-xOyoxY/iso-XAmGwe/LiveOS/osmin
device-mapper: remove ioctl failed: Device or resource busy
Command failed
0: Error removing snapshot device imgcreate-3546-63608
^Z
[1]+  Stopped

[root@localhost lars]# fuser -m /dev/mapper/imgcreate-3546-63608
[root@localhost lars]# killall xeyes
[root@localhost lars]# fg
Losetup remove /dev/loop1
Losetup remove /dev/loop0
Truncating sparse file /var/tmp/imgcreate-xOyoxY/iso-XAmGwe/LiveOS/osmin to 8192
Parallel mksquashfs: Using 2 processors
...

So: There is some kind of race, but the race ended before I ran fuser...

Comment 12 Jeremy Katz 2009-08-17 13:47:58 UTC
Added a sleep.  Kludgey, but if it helps, then I'll just be happy about it.