| Summary: | [xfs/xfstests 073] loop devices not destroyed on failed mount | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Eryu Guan <eguan> | ||||||
| Component: | kernel | Assignee: | Carlos Maiolino <cmaiolin> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Filesystem QE <fs-qe> | ||||||
| Severity: | low | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 6.1 | CC: | branto, cmaiolin, dchinner, eguan, esandeen, kzak, rwheeler | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2015-01-07 12:11:47 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
|
Description
Eryu Guan
2011-04-19 04:24:22 UTC
Created attachment 493071 [details]
test log of 073
Created attachment 493072 [details]
073.out
After the big xfs patchset (kernel-2.6.32-191.el6) I could hit this on intel host (and the test failed) as well.
The biggest problem I can see here is that after the test the test device (/dev/loop0) cannot be unmounted even though it is not in use by any userspace process:
[root@dell-pe2850-01 xfstests]# umount /dev/loop0
umount: /mnt/testarea/test: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
[root@dell-pe2850-01 xfstests]# lsof /mnt/testarea/test/
[root@dell-pe2850-01 xfstests]# fuser /mnt/testarea/test
[root@dell-pe2850-01 xfstests]#
Since the issue became more serious I've raised the Severity of this bug as well.
Boris, do you consider this a regression then? (I'm not sure, but sometimes another file setup for loopback holds a mount but doesn't show up in fuser ... does losetup -a show any other loop device ON that unmountable loopback fs?) > _check_xfs_filesystem: filesystem on /dev/sda5 has dirty log (see 073.full) From the log file: > xfs_logprint: /dev/sda5 contains a mounted and writable filesystem- The unmount of the loop device the test uses failed, and hence the filesystem check failed. Why did the loop device unmount fail? Is this just another case of loop device automatic deletion brokenness unrelated to XFS? I guess Eric's questions about why the loop device is unmountable need to be answered first before we go poking at XFS. Cheers, Dave. I've tested it a little more and it is not a regression (it is just a little harder to hit on the intel machine but I could hit it in -186 kernel as well). The unmount of the loop device was blocked by another loop device, I should have thought about it. I've further looked into it and found out that it is a case of loop device automatic deletion brokenness. The problem seems to occur when duplicate UUID fs is mounted with -o loop (the mount will fail but loop device will remain assigned (and apparently busy)). Then the test will successfully try to mount with -o nouuid and unmount (and free) the second loop device. The first loop device will remain assigned until the test finishes. This will result in failed attemp to unmount TEST_DIR and fsck fs. I could get the test to pass (on amd UP host) when I added following two lines after the duplicate UUID failed mount (if I didn't add the losetup -a line, I got 'Device or resource busy' for losetup -d, loop0 is there for simplification (it is the first empty loop device when the test started)): /var/lib/xfstests/073: ... 93: losetup -a > /dev/null 2>&1 94: losetup -d /dev/loop0 2>&1 ... Based on the above, I'm lowering the priority back to its original value. Cool, I was right ;) We should probably get a tight testcase for the loop problem if one isn't already known, and try to get that addressed... (In reply to comment #8) > Cool, I was right ;) We should probably get a tight testcase for the loop > problem if one isn't already known, and try to get that addressed... Is it lkely that this is a case of the mount binary not issuing the correct loop device destroy ioctl in the mount failure path? i.e. if mount gets fixed, then everything will just work properly again? Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Ok, this is a problem w/ loop device deletion, not xfs corruption:
_check_xfs_filesystem: filesystem on /dev/sda5 is inconsistent
*** xfs_check output ***
xfs_check: /dev/sda5 contains a mounted and writable filesystem
fatal error -- couldn't initialize XFS library
*** end xfs_check output
_check_xfs_filesystem: filesystem on /dev/sda5 is inconsistent
*** xfs_repair -n output ***
xfs_repair: /dev/sda5 contains a mounted and writable filesystem
fatal error -- couldn't initialize XFS library
*** end xfs_repair output
*** mount output ***
and:
> The problem seems to occur when duplicate UUID fs is mounted with -o loop (the mount will fail but loop device will remain assigned (and apparently busy)). Then the test will successfully try to mount with -o nouuid and unmount (and free) the second loop device. The first loop device will remain assigned until the test finishes. This will result in failed attemp to unmount TEST_DIR and fsck fs.
Does this bug persist? (It's pretty old, sorry)
Yes, it's still reproducible on kernel-2.6.32-424.el6 if test on loop device, real block device has no issue. *** xfs_check output *** xfs_check: /dev/loop0 contains a mounted and writable filesystem fatal error -- couldn't initialize XFS library *** end xfs_check output _check_xfs_filesystem: filesystem on /dev/loop0 is inconsistent *** xfs_repair -n output *** xfs_repair: /dev/loop0 contains a mounted and writable filesystem fatal error -- couldn't initialize XFS library *** end xfs_repair output Thanks Eryu. Carlos, you can probably just narrow this down to the simple mount failure and move the bug to util-linux[-ng]. Sure, will handle that. Hm, I forgot that this was in there too: # HACK WARNING: # # We're done with the nested loop mount, now we have to clean # up the pieces that mount is incapable of doing. losetup -d $loop2 > /dev/null 2>&1 ... Hi Eryu, is this bug reproducible only on specific systems/architectures? Which ones? I noticed you said to run it on an UP system, what does UP means? I couldn't hit it on the systems I tried. Cheers, --Carlos (In reply to Carlos Maiolino from comment #16) > Hi Eryu, > > is this bug reproducible only on specific systems/architectures? Which ones? I don't think so, I've seen it on different hardwares, nothing special AFAIK. > > I noticed you said to run it on an UP system, what does UP means? It's an old bug, the contents in comment 0 might be out of date. By UP I meant Uniprocessor. But it can apparently be reproduced on Multiprocessor hosts too. Carlos, did you set up scratch/test devs on loop devices? No need to worry about that, anymore. We discussed the reproduction of this bz yesterday and I've lent him a machine where I could reproduce the issue fairly reliably. btw: The loop devices are not necessary for reproduction, it is reproducible even if lvm devices are set for test & scratch dev (at least on some hosts. I don't think this is a problem with util-linux, while analysing the problem, I ran umount with strace and what I noticed was that umount command was properly sending to kernel an ioctl with a request to delete the device:
open("/dev/loop1", O_RDONLY) = 3
ioctl(3, 0x4c01, 0) = 0
close(3) = 0
So, looks like it's better to keep as a kernel bz until we figure out why the loop device is not being released.
A lot of work on this problem happened upstream, but mostly on the user-space side, so I'll talk with kzak if he worked on kernel side into this kind of problem.
After a rebooted the system where we were reproducing the bug, this, is no longer reproducible, but I don't know yet why. As Eric also said, he couldn't reproduce it manually, so, why is this being reproducible (and not reliable) with xfstests, looks to me a kind of race, but no clue where.
Upstream kernel now has a sysfs flag named autoclear (for loop devices), which removes the loop device when it's detached, I'm going to take a look into this flag to understand what it internally does.
The problem is related with the mtab file. The common mount entry for a loop device in the mtab file, is something like: /mnt/diskimage.img /mnt/mpoint xfs rw,loop=/dev/loop0 0 0 while, in /proc/mounts, it has: /dev/loop0 /mnt/mpoint xfs rw,seclabel,relatime,attr2,delaylog,noquota 0 0 If, for some reason the mtab is re-written and the loop=<device> argument is removed, when the device is unmounted, the loop device will not be detached, even when -d option is used. Why, it is not a reliable test while running xfstests 073 I don't know yet, not sure if mtab was modified or some another thing happened I cannot reproduce this bug on kernel-2.6.32-505.el6, I ran xfs/073 for 10 iterations. (But sometimes I can hit bug 1133304, so using -507 kernel and later is better) I had a chat with kzak regarding this, and if I'm right in my previous comment, this should not happen on rhel7. rhel7 uses loopdev autoclear flag for loop, and mtab is not used, a new loopdev code is used and the differences between el7 and 6 are big. Can you guys test it and see if this is reproducible on rhel7? Also, please let me know if you guys can still reproduce it on newer el6 versions. Thanks Hi Carlos, I confirmed that xfs/073 passed on RHEL7 without any issue, and latest rhel6 build could pass the test too. I'm closing this bug now. Thanks, Eryu |