Bug 828545

Summary: can't recreate filesystem
Product: Red Hat Enterprise Linux 7 Reporter: Matus Kocka <mkocka>
Component: e2fsprogsAssignee: Eric Sandeen <esandeen>
Status: CLOSED CURRENTRELEASE QA Contact: BaseOS QE - Apps <qe-baseos-apps>
Severity: high Docs Contact:
Priority: high    
Version: 7.0CC: aokuliar, kkolakow
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 13:34:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace mkfs -V -t ext2 -F /dev/sdc2 none

Description Matus Kocka 2012-06-04 21:24:58 UTC
How reproducible:
always

Steps to Reproduce:
1. umount /dev/sdc1
2. mkfs.ext3 /dev/sdc1
3.
  
Actual results:
/dev/sdc1 is apparently in use by the system; will not make a filesystem here!

Expected results:
new filesystem

Additional info:
e2fsprogs-1.42-4.el7.x86_64
3.3.0-0.13.el7.x86_64

mkfs.btrfs works

Comment 1 Eric Sandeen 2012-06-04 21:37:11 UTC
strace the mkfs please?

It seems most likely that the device _is_ in use, and mkfs.btrfs isn't checking for that.

When you say "umount /dev/sdc1" was it really previously mounted?

Is it part of an LVM physical volume or anything like that?

It seems unlkely that this is an e2fsprogs bug but let's get to the bottom of it.

Comment 2 Matus Kocka 2012-06-04 22:23:27 UTC
Attached,

yes, it was mounted before 

$ vgdisplay 
No volume groups found

Comment 3 Matus Kocka 2012-06-04 22:24:11 UTC
Created attachment 589286 [details]
strace mkfs -V -t ext2 -F /dev/sdc2

Comment 6 Eric Sandeen 2012-06-04 22:53:53 UTC
The fact that it's still mountable as btrfs makes me think that this might be a btrfs bug...

When unmounted, I can't unload the btrfs module, either.

I notice this:

[ 1131.481762] btrfs: disk space caching is enabled
[ 1131.490906] btrfs bad fsid on block 20971520
[ 1131.495622] btrfs bad fsid on block 20971520
[ 1131.500439] btrfs bad fsid on block 20971520
[ 1131.505136] btrfs bad fsid on block 20971520
[ 1131.509953] btrfs bad fsid on block 20971520
[ 1131.514652] btrfs bad fsid on block 20971520
[ 1131.519328] btrfs bad fsid on block 20971520
[ 1131.523636] btrfs: failed to read chunk root on sdc2
[ 1131.530084] btrfs: open_ctree failed

and I wonder if maybe there is an error path that doesn't drop a reference.... Josef?

Comment 7 Matus Kocka 2012-06-04 23:03:01 UTC
It is btrfs now, but it was created as ext4 from kickstart, after that I want to create:
ext2,ext3,ext4,xfs (both fails) and btrfs and that end with succes, but still can be mounted again. 
This is becaouse of postmark file-system performance test 



(In reply to comment #6)
> The fact that it's still mountable as btrfs makes me think that this might
> be a btrfs bug...
> 
> When unmounted, I can't unload the btrfs module, either.
> 
> I notice this:
> 
> [ 1131.481762] btrfs: disk space caching is enabled
> [ 1131.490906] btrfs bad fsid on block 20971520
> [ 1131.495622] btrfs bad fsid on block 20971520
> [ 1131.500439] btrfs bad fsid on block 20971520
> [ 1131.505136] btrfs bad fsid on block 20971520
> [ 1131.509953] btrfs bad fsid on block 20971520
> [ 1131.514652] btrfs bad fsid on block 20971520
> [ 1131.519328] btrfs bad fsid on block 20971520
> [ 1131.523636] btrfs: failed to read chunk root on sdc2
> [ 1131.530084] btrfs: open_ctree failed
> 
> and I wonder if maybe there is an error path that doesn't drop a
> reference.... Josef?

Comment 8 Eric Sandeen 2012-06-04 23:06:21 UTC
mkfs.btrfs succeeds because it doesn't try to open exclusively; it probably should:

$ strace mkfs.btrfs /dev/sdc2 2>&1 | grep open | grep sdc2
open("/dev/sdc2", O_RDONLY)             = 3
open("/dev/sdc2", O_RDWR)               = 3
open("/dev/sdc2", O_RDWR|O_CREAT, 0600) = 5
open("/dev/sdc2", O_RDWR)               = 6

where is the script that drove the earlier tests?

Josef maybe this isn't your bug after all ;)

Comment 9 Matus Kocka 2012-06-04 23:15:34 UTC
The test itself:
/mnt/tests/performance/postmark_devel_with_library/certification/runtest.sh

Also using library:
/mnt/tests/performance/common_functions/lib/common-performance-functions.sh

Comment 10 Eric Sandeen 2012-06-04 23:18:11 UTC
Did that test run leave logs somewhere?

Comment 11 Eric Sandeen 2012-06-04 23:28:48 UTC
ok I see:

ext2     not run due to mkfs/mount issue
ext3     not run due to mkfs/mount issue
ext4     not run due to mkfs/mount issue
xfs      not run due to mkfs/mount issue

Comment 12 Eric Sandeen 2012-06-04 23:35:24 UTC
Hm I wonder if you can reproduce this on another machine too?

It looks like /dev/sdc2 was mounted early in the boot process.

Does the script properly unmount it before running the mkfs tests?

I wonder what happens if mkfs.btrfs is pointed at a mounted ext4 filesystem...

Comment 14 Adam Okuliar 2012-06-08 14:40:53 UTC
We are creating custom filesystem layout during installation according these kickstart instructions

part /boot --fstype ext2 --size=200 --asprimary --label=BOOT --ondisk=sda
part /mnt/tests --fstype=ext4 --size=40960 --asprimary --label=MNT --ondisk=sda
part / --fstype=ext4 --size=1 --grow --asprimary --label=ROOT  --ondisk=sda

part /RHTSspareLUN1 --fstype=ext4 --size=20480 --asprimary --label=sdc_20GB --ondisk=sdc
part /RHTSspareLUN2 --fstype=ext4 --size=1 --grow --asprimary --label=sdc_rest --ondisk=sdc
part /RHTSspareLUN3 --fstype=ext4 --size=1 --grow --asprimary --label=sdb --ondisk=sdb
part /RHTSspareLUN4 --fstype=ext4 --size=1 --grow --asprimary --label=sdb --ondisk=sdd

After installation sdc1 disk is properly formanted and mounted. 
mount | grep sdc
/dev/sdc2 on /RHTSspareLUN2 type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered)
/dev/sdc1 on /RHTSspareLUN1 type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered)


Problems starts when we try to unmount and reformat sdc1
$ umount /dev/sdc1
$ mount | grep sdc
/dev/sdc2 on /RHTSspareLUN2 type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered)

...sdc1 seems to be unmounted, lets try reformat partition as ext3

$ mkfs.ext3 /dev/sdc1
mke2fs 1.42 (29-Nov-2011)
/dev/sdc1 is apparently in use by the system; will not make a filesystem here!

...mount claims that sdc1 is unmounted, but we can't make filesystem on it

Issue affects these machines:
dell-per210-01.lab.eng.brq.redhat.com 
ibm-x3650m3-01.lab.eng.brq.redhat.com

We have no problems on:
hp-dl360g6-02.rhts.eng.brq.redhat.com 

All affected machines uses deadline io scheduler by default. Unaffected machine uses cfq

Comment 15 Eric Sandeen 2012-06-08 14:51:46 UTC
Very strange.  I wouldn't expect the scheduler to matter.  I'll have to try to recreate here I guess.  Or - any idea when this started?  Must be a regression?

Comment 16 Adam Okuliar 2012-06-08 14:58:36 UTC
Eric, we can loan you machine for investigation, or can do bisesction ourselves. What is better for you?

Comment 17 Eric Sandeen 2012-06-08 15:18:39 UTC
I would be happy to have you do some bisection :)  (or maybe pursue the scheduler theory by trying cfq on the dell & ibm boxes?  I'd be surprised, but who knows)

Alternatively maybe a crashdump would be something to look at, perhaps we can figure out what still has hold of the device... (or, get it into that state again and I could try to poke around with crash on the live box).

Comment 18 Kamil Kolakowski 2012-06-11 08:06:44 UTC
Hi Eric,

We tried change io scheduler on dell and ibm but without change. Won't recreate partition.

My suggestion is that deadline is used in default on those "enterprise" boxes and box which have cfq works.

I think that it can be something what is installed/setup when box is detected as "enterprise".

Just suggestions.

We do bisection.

Thanks

Comment 19 Eric Sandeen 2012-06-11 19:46:26 UTC
Is multipath in the setup perhaps?

I suppose at this point loaning me a machine might be a decent way to go.

-Eric

Comment 20 Matus Kocka 2012-06-11 21:05:19 UTC
ibm-x3650m3-01.lab.eng.brq.redhat.com is now loaned to you 


Matus

Comment 21 Kamil Kolakowski 2012-06-13 10:45:22 UTC
Hi Eric,

I'm now testing this bz on dell-per210-01.lab.eng.brq.redhat.com and RHEL-7.0-20120612.n.1. It looks that this problem is fixed. I can create fs without any problems.

If you will sing off ibm box I will retest it on this machine as well.

Thanks

K

Comment 22 Kamil Kolakowski 2012-06-20 13:34:59 UTC
On ibm-x3650m3-01.lab.eng.brq.redhat.com and RHEL-7.0-20120612.n.1 it was retested. Fixed. Closing.