Bug 697733 - [xfs/xfstests 073] loop devices not destroyed on failed mount
Summary: [xfs/xfstests 073] loop devices not destroyed on failed mount
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.1
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: Carlos Maiolino
QA Contact: Filesystem QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-19 04:24 UTC by Eryu Guan
Modified: 2015-01-07 12:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-01-07 12:11:47 UTC
Target Upstream Version:


Attachments (Terms of Use)
test log of 073 (304.37 KB, text/plain)
2011-04-19 04:25 UTC, Eryu Guan
no flags Details
073.out (2.35 KB, text/plain)
2011-04-19 04:25 UTC, Eryu Guan
no flags Details

Description Eryu Guan 2011-04-19 04:24:22 UTC
Description of problem:
xfstests 073 fails on UP host.

Running test 073
#! /bin/bash
# FS QA Test No. 073
#
# Test xfs_copy
#
#-----------------------------------------------------------------------
# Copyright (c) 2000-2003,2008 Silicon Graphics, Inc.  All Rights Reserved.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License as
FSTYP         -- xfs (non-debug)
PLATFORM      -- Linux/x86_64 hp-p6100z-01 2.6.32-122.el6.x86_64
MKFS_OPTIONS  -- -f -bsize=4096 /dev/sda6
MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/sda6 /mnt/testarea/scratch

073 24s ... 25s
_check_xfs_filesystem: filesystem on /dev/sda5 has dirty log (see 073.full)
_check_xfs_filesystem: filesystem on /dev/sda5 is inconsistent (c) (see 073.full)
_check_xfs_filesystem: filesystem on /dev/sda5 is inconsistent (r) (see 073.full)
Ran: 073
Passed all 1 tests

Version-Release number of selected component (if applicable):
2.6.32-122.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Run xfstests 073 on UP host
2.
3.
  
Actual results:
Test fail

Expected results:
Test pass

Additional info:

Comment 1 Eryu Guan 2011-04-19 04:25:13 UTC
Created attachment 493071 [details]
test log of 073

Comment 2 Eryu Guan 2011-04-19 04:25:41 UTC
Created attachment 493072 [details]
073.out

Comment 4 Boris Ranto 2011-08-24 12:30:46 UTC
After the big xfs patchset (kernel-2.6.32-191.el6) I could hit this on intel host (and the test failed) as well.
The biggest problem I can see here is that after the test the test device (/dev/loop0) cannot be unmounted even though it is not in use by any userspace process:

[root@dell-pe2850-01 xfstests]# umount /dev/loop0 
umount: /mnt/testarea/test: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))
[root@dell-pe2850-01 xfstests]# lsof /mnt/testarea/test/
[root@dell-pe2850-01 xfstests]# fuser /mnt/testarea/test
[root@dell-pe2850-01 xfstests]# 


Since the issue became more serious I've raised the Severity of this bug as well.

Comment 5 Eric Sandeen 2011-08-24 15:25:29 UTC
Boris, do you consider this a regression then?

(I'm not sure, but sometimes another file setup for loopback holds a mount but doesn't show up in fuser ... does losetup -a show any other loop device ON that unmountable loopback fs?)

Comment 6 Dave Chinner 2011-08-25 02:04:30 UTC
> _check_xfs_filesystem: filesystem on /dev/sda5 has dirty log (see 073.full)

From the log file:

> xfs_logprint: /dev/sda5 contains a mounted and writable filesystem-

The unmount of the loop device the test uses failed, and hence the filesystem check failed.

Why did the loop device unmount fail? Is this just another case of loop device automatic deletion brokenness unrelated to XFS? I guess Eric's questions about why the loop device is unmountable need to be answered first before we go poking at XFS.

Cheers,

Dave.

Comment 7 Boris Ranto 2011-08-25 11:16:25 UTC
I've tested it a little more and it is not a regression (it is just a little harder to hit on the intel machine but I could hit it in -186 kernel as well).

The unmount of the loop device was blocked by another loop device, I should have thought about it.

I've further looked into it and found out that it is a case of loop device automatic deletion brokenness.

The problem seems to occur when duplicate UUID fs is mounted with -o loop (the mount will fail but loop device will remain assigned (and apparently busy)). Then the test will successfully try to mount with -o nouuid and unmount (and free) the second loop device. The first loop device will remain assigned until the test finishes. This will result in failed attemp to unmount TEST_DIR and fsck fs.

I could get the test to pass (on amd UP host) when I added following two lines after the duplicate UUID failed mount (if I didn't add the losetup -a line, I got 'Device or resource busy' for losetup -d, loop0 is there for simplification (it is the first empty loop device when the test started)):

/var/lib/xfstests/073:
...
93:                losetup -a > /dev/null 2>&1
94:                losetup -d /dev/loop0 2>&1
...


Based on the above, I'm lowering the priority back to its original value.

Comment 8 Eric Sandeen 2011-08-25 18:11:31 UTC
Cool, I was right ;)  We should probably get a tight testcase for the loop problem if one isn't already known, and try to get that addressed...

Comment 9 Dave Chinner 2011-09-19 06:15:09 UTC
(In reply to comment #8)
> Cool, I was right ;)  We should probably get a tight testcase for the loop
> problem if one isn't already known, and try to get that addressed...

Is it lkely that this is a case of the mount binary not issuing the correct loop device destroy ioctl in the mount failure path? i.e. if mount gets fixed, then everything will just work properly again?

Comment 10 RHEL Program Management 2011-10-07 15:31:36 UTC
Since RHEL 6.2 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 11 Eric Sandeen 2013-10-22 19:23:10 UTC
Ok, this is a problem w/ loop device deletion, not xfs corruption:

_check_xfs_filesystem: filesystem on /dev/sda5 is inconsistent
*** xfs_check output ***
xfs_check: /dev/sda5 contains a mounted and writable filesystem

fatal error -- couldn't initialize XFS library
*** end xfs_check output
_check_xfs_filesystem: filesystem on /dev/sda5 is inconsistent
*** xfs_repair -n output ***
xfs_repair: /dev/sda5 contains a mounted and writable filesystem

fatal error -- couldn't initialize XFS library
*** end xfs_repair output
*** mount output ***

and:

> The problem seems to occur when duplicate UUID fs is mounted with -o loop (the mount will fail but loop device will remain assigned (and apparently busy)). Then the test will successfully try to mount with -o nouuid and unmount (and free) the second loop device. The first loop device will remain assigned until the test finishes. This will result in failed attemp to unmount TEST_DIR and fsck fs.

Does this bug persist?  (It's pretty old, sorry)

Comment 12 Eryu Guan 2013-10-23 03:41:19 UTC
Yes, it's still reproducible on kernel-2.6.32-424.el6 if test on loop device, real block device has no issue.

*** xfs_check output ***
xfs_check: /dev/loop0 contains a mounted and writable filesystem

fatal error -- couldn't initialize XFS library
*** end xfs_check output
_check_xfs_filesystem: filesystem on /dev/loop0 is inconsistent
*** xfs_repair -n output ***
xfs_repair: /dev/loop0 contains a mounted and writable filesystem

fatal error -- couldn't initialize XFS library
*** end xfs_repair output

Comment 13 Eric Sandeen 2013-10-23 13:37:34 UTC
Thanks Eryu.

Carlos, you can probably just narrow this down to the simple mount failure and move the bug to util-linux[-ng].

Comment 14 Carlos Maiolino 2013-10-23 15:33:22 UTC
Sure, will handle that.

Comment 15 Eric Sandeen 2013-10-23 18:29:23 UTC
Hm, I forgot that this was in there too:

# HACK WARNING:
#
# We're done with the nested loop mount, now we have to clean
# up the pieces that mount is incapable of doing.
losetup -d $loop2 > /dev/null 2>&1

...

Comment 16 Carlos Maiolino 2013-10-24 18:21:52 UTC
Hi Eryu,

is this bug reproducible only on specific systems/architectures? Which ones?

I noticed you said to run it on an UP system, what does UP means?

I couldn't hit it on the systems I tried.

Cheers,
--Carlos

Comment 17 Eryu Guan 2013-10-25 02:37:24 UTC
(In reply to Carlos Maiolino from comment #16)
> Hi Eryu,
> 
> is this bug reproducible only on specific systems/architectures? Which ones?

I don't think so, I've seen it on different hardwares, nothing special AFAIK.

> 
> I noticed you said to run it on an UP system, what does UP means?

It's an old bug, the contents in comment 0 might be out of date. By UP I meant Uniprocessor. But it can apparently be reproduced on Multiprocessor hosts too.

Comment 18 Eric Sandeen 2013-10-25 04:30:52 UTC
Carlos, did you set up scratch/test devs on loop devices?

Comment 19 Boris Ranto 2013-10-25 09:32:29 UTC
No need to worry about that, anymore. We discussed the reproduction of this bz yesterday and I've lent him a machine where I could reproduce the issue fairly reliably.

btw: The loop devices are not necessary for reproduction, it is reproducible even if lvm devices are set for test & scratch dev (at least on some hosts.

Comment 20 Carlos Maiolino 2013-10-28 14:11:31 UTC
I don't think this is a problem with util-linux, while analysing the problem, I ran umount with strace and what I noticed was that umount command was properly sending to kernel an ioctl with a request to delete the device:

open("/dev/loop1", O_RDONLY)            = 3
ioctl(3, 0x4c01, 0)                     = 0
close(3)                                = 0

So, looks like it's better to keep as a kernel bz until we figure out why the loop device is not being released.

A lot of work on this problem happened upstream, but mostly on the user-space side, so I'll talk with kzak if he worked on kernel side into this kind of problem.

After a rebooted the system where we were reproducing the bug, this, is no longer reproducible, but I don't know yet why. As Eric also said, he couldn't reproduce it manually, so, why is this being reproducible (and not reliable) with xfstests, looks to me a kind of race, but no clue where.

Upstream kernel now has a sysfs flag named autoclear (for loop devices), which removes the loop device when it's detached, I'm going to take a look into this flag to understand what it internally does.

Comment 21 Carlos Maiolino 2013-10-28 17:23:40 UTC
The problem is related with the mtab file.

The common mount entry for a loop device in the mtab file, is something like:

/mnt/diskimage.img /mnt/mpoint xfs rw,loop=/dev/loop0 0 0

while, in /proc/mounts, it has: 

/dev/loop0 /mnt/mpoint xfs rw,seclabel,relatime,attr2,delaylog,noquota 0 0

If, for some reason the mtab is re-written and the loop=<device> argument is removed, when the device is unmounted, the loop device will not be detached, even when -d option is used.

Why, it is not a reliable test while running xfstests 073 I don't know yet, not sure if mtab was modified or some another thing happened

Comment 22 Eryu Guan 2015-01-06 17:04:08 UTC
I cannot reproduce this bug on kernel-2.6.32-505.el6, I ran xfs/073 for 10 iterations. (But sometimes I can hit bug 1133304, so using -507 kernel and later is better)

Comment 23 Carlos Maiolino 2015-01-06 17:51:38 UTC
I had a chat with kzak regarding this, and if I'm right in my previous comment, this should not happen on rhel7.

rhel7 uses loopdev autoclear flag for loop, and mtab is not used, a new loopdev code is used and the differences between el7 and 6 are big.

Can you guys test it and see if this is reproducible on rhel7? Also, please let me know if you guys can still reproduce it on newer el6 versions.

Thanks

Comment 24 Eryu Guan 2015-01-07 12:11:47 UTC
Hi Carlos,

I confirmed that xfs/073 passed on RHEL7 without any issue, and latest rhel6 build could pass the test too. I'm closing this bug now.

Thanks,
Eryu


Note You need to log in before you can comment on or make changes to this bug.