Bug 828545 - can't recreate filesystem
can't recreate filesystem
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: e2fsprogs (Show other bugs)
7.0
x86_64 Linux
high Severity high
: rc
: ---
Assigned To: Eric Sandeen
BaseOS QE - Apps
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-04 17:24 EDT by Matus Kocka
Modified: 2012-06-20 09:34 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-06-20 09:34:59 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
strace mkfs -V -t ext2 -F /dev/sdc2 (13.67 KB, text/plain)
2012-06-04 18:24 EDT, Matus Kocka
no flags Details

  None (edit)
Description Matus Kocka 2012-06-04 17:24:58 EDT
How reproducible:
always

Steps to Reproduce:
1. umount /dev/sdc1
2. mkfs.ext3 /dev/sdc1
3.
  
Actual results:
/dev/sdc1 is apparently in use by the system; will not make a filesystem here!

Expected results:
new filesystem

Additional info:
e2fsprogs-1.42-4.el7.x86_64
3.3.0-0.13.el7.x86_64

mkfs.btrfs works
Comment 1 Eric Sandeen 2012-06-04 17:37:11 EDT
strace the mkfs please?

It seems most likely that the device _is_ in use, and mkfs.btrfs isn't checking for that.

When you say "umount /dev/sdc1" was it really previously mounted?

Is it part of an LVM physical volume or anything like that?

It seems unlkely that this is an e2fsprogs bug but let's get to the bottom of it.
Comment 2 Matus Kocka 2012-06-04 18:23:27 EDT
Attached,

yes, it was mounted before 

$ vgdisplay 
No volume groups found
Comment 3 Matus Kocka 2012-06-04 18:24:11 EDT
Created attachment 589286 [details]
strace mkfs -V -t ext2 -F /dev/sdc2
Comment 6 Eric Sandeen 2012-06-04 18:53:53 EDT
The fact that it's still mountable as btrfs makes me think that this might be a btrfs bug...

When unmounted, I can't unload the btrfs module, either.

I notice this:

[ 1131.481762] btrfs: disk space caching is enabled
[ 1131.490906] btrfs bad fsid on block 20971520
[ 1131.495622] btrfs bad fsid on block 20971520
[ 1131.500439] btrfs bad fsid on block 20971520
[ 1131.505136] btrfs bad fsid on block 20971520
[ 1131.509953] btrfs bad fsid on block 20971520
[ 1131.514652] btrfs bad fsid on block 20971520
[ 1131.519328] btrfs bad fsid on block 20971520
[ 1131.523636] btrfs: failed to read chunk root on sdc2
[ 1131.530084] btrfs: open_ctree failed

and I wonder if maybe there is an error path that doesn't drop a reference.... Josef?
Comment 7 Matus Kocka 2012-06-04 19:03:01 EDT
It is btrfs now, but it was created as ext4 from kickstart, after that I want to create:
ext2,ext3,ext4,xfs (both fails) and btrfs and that end with succes, but still can be mounted again. 
This is becaouse of postmark file-system performance test 



(In reply to comment #6)
> The fact that it's still mountable as btrfs makes me think that this might
> be a btrfs bug...
> 
> When unmounted, I can't unload the btrfs module, either.
> 
> I notice this:
> 
> [ 1131.481762] btrfs: disk space caching is enabled
> [ 1131.490906] btrfs bad fsid on block 20971520
> [ 1131.495622] btrfs bad fsid on block 20971520
> [ 1131.500439] btrfs bad fsid on block 20971520
> [ 1131.505136] btrfs bad fsid on block 20971520
> [ 1131.509953] btrfs bad fsid on block 20971520
> [ 1131.514652] btrfs bad fsid on block 20971520
> [ 1131.519328] btrfs bad fsid on block 20971520
> [ 1131.523636] btrfs: failed to read chunk root on sdc2
> [ 1131.530084] btrfs: open_ctree failed
> 
> and I wonder if maybe there is an error path that doesn't drop a
> reference.... Josef?
Comment 8 Eric Sandeen 2012-06-04 19:06:21 EDT
mkfs.btrfs succeeds because it doesn't try to open exclusively; it probably should:

$ strace mkfs.btrfs /dev/sdc2 2>&1 | grep open | grep sdc2
open("/dev/sdc2", O_RDONLY)             = 3
open("/dev/sdc2", O_RDWR)               = 3
open("/dev/sdc2", O_RDWR|O_CREAT, 0600) = 5
open("/dev/sdc2", O_RDWR)               = 6

where is the script that drove the earlier tests?

Josef maybe this isn't your bug after all ;)
Comment 9 Matus Kocka 2012-06-04 19:15:34 EDT
The test itself:
/mnt/tests/performance/postmark_devel_with_library/certification/runtest.sh

Also using library:
/mnt/tests/performance/common_functions/lib/common-performance-functions.sh
Comment 10 Eric Sandeen 2012-06-04 19:18:11 EDT
Did that test run leave logs somewhere?
Comment 11 Eric Sandeen 2012-06-04 19:28:48 EDT
ok I see:

ext2     not run due to mkfs/mount issue
ext3     not run due to mkfs/mount issue
ext4     not run due to mkfs/mount issue
xfs      not run due to mkfs/mount issue
Comment 12 Eric Sandeen 2012-06-04 19:35:24 EDT
Hm I wonder if you can reproduce this on another machine too?

It looks like /dev/sdc2 was mounted early in the boot process.

Does the script properly unmount it before running the mkfs tests?

I wonder what happens if mkfs.btrfs is pointed at a mounted ext4 filesystem...
Comment 14 Adam Okuliar 2012-06-08 10:40:53 EDT
We are creating custom filesystem layout during installation according these kickstart instructions

part /boot --fstype ext2 --size=200 --asprimary --label=BOOT --ondisk=sda
part /mnt/tests --fstype=ext4 --size=40960 --asprimary --label=MNT --ondisk=sda
part / --fstype=ext4 --size=1 --grow --asprimary --label=ROOT  --ondisk=sda

part /RHTSspareLUN1 --fstype=ext4 --size=20480 --asprimary --label=sdc_20GB --ondisk=sdc
part /RHTSspareLUN2 --fstype=ext4 --size=1 --grow --asprimary --label=sdc_rest --ondisk=sdc
part /RHTSspareLUN3 --fstype=ext4 --size=1 --grow --asprimary --label=sdb --ondisk=sdb
part /RHTSspareLUN4 --fstype=ext4 --size=1 --grow --asprimary --label=sdb --ondisk=sdd

After installation sdc1 disk is properly formanted and mounted. 
mount | grep sdc
/dev/sdc2 on /RHTSspareLUN2 type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered)
/dev/sdc1 on /RHTSspareLUN1 type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered)


Problems starts when we try to unmount and reformat sdc1
$ umount /dev/sdc1
$ mount | grep sdc
/dev/sdc2 on /RHTSspareLUN2 type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered)

...sdc1 seems to be unmounted, lets try reformat partition as ext3

$ mkfs.ext3 /dev/sdc1
mke2fs 1.42 (29-Nov-2011)
/dev/sdc1 is apparently in use by the system; will not make a filesystem here!

...mount claims that sdc1 is unmounted, but we can't make filesystem on it

Issue affects these machines:
dell-per210-01.lab.eng.brq.redhat.com 
ibm-x3650m3-01.lab.eng.brq.redhat.com

We have no problems on:
hp-dl360g6-02.rhts.eng.brq.redhat.com 

All affected machines uses deadline io scheduler by default. Unaffected machine uses cfq
Comment 15 Eric Sandeen 2012-06-08 10:51:46 EDT
Very strange.  I wouldn't expect the scheduler to matter.  I'll have to try to recreate here I guess.  Or - any idea when this started?  Must be a regression?
Comment 16 Adam Okuliar 2012-06-08 10:58:36 EDT
Eric, we can loan you machine for investigation, or can do bisesction ourselves. What is better for you?
Comment 17 Eric Sandeen 2012-06-08 11:18:39 EDT
I would be happy to have you do some bisection :)  (or maybe pursue the scheduler theory by trying cfq on the dell & ibm boxes?  I'd be surprised, but who knows)

Alternatively maybe a crashdump would be something to look at, perhaps we can figure out what still has hold of the device... (or, get it into that state again and I could try to poke around with crash on the live box).
Comment 18 Kamil Kolakowski 2012-06-11 04:06:44 EDT
Hi Eric,

We tried change io scheduler on dell and ibm but without change. Won't recreate partition.

My suggestion is that deadline is used in default on those "enterprise" boxes and box which have cfq works.

I think that it can be something what is installed/setup when box is detected as "enterprise".

Just suggestions.

We do bisection.

Thanks
Comment 19 Eric Sandeen 2012-06-11 15:46:26 EDT
Is multipath in the setup perhaps?

I suppose at this point loaning me a machine might be a decent way to go.

-Eric
Comment 20 Matus Kocka 2012-06-11 17:05:19 EDT
ibm-x3650m3-01.lab.eng.brq.redhat.com is now loaned to you 


Matus
Comment 21 Kamil Kolakowski 2012-06-13 06:45:22 EDT
Hi Eric,

I'm now testing this bz on dell-per210-01.lab.eng.brq.redhat.com and RHEL-7.0-20120612.n.1. It looks that this problem is fixed. I can create fs without any problems.

If you will sing off ibm box I will retest it on this machine as well.

Thanks

K
Comment 22 Kamil Kolakowski 2012-06-20 09:34:59 EDT
On ibm-x3650m3-01.lab.eng.brq.redhat.com and RHEL-7.0-20120612.n.1 it was retested. Fixed. Closing.

Note You need to log in before you can comment on or make changes to this bug.