Bug 828545
Summary: | can't recreate filesystem | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Matus Kocka <mkocka> | ||||
Component: | e2fsprogs | Assignee: | Eric Sandeen <esandeen> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | BaseOS QE - Apps <qe-baseos-apps> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.0 | CC: | aokuliar, kkolakow | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-06-20 13:34:59 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Matus Kocka
2012-06-04 21:24:58 UTC
strace the mkfs please? It seems most likely that the device _is_ in use, and mkfs.btrfs isn't checking for that. When you say "umount /dev/sdc1" was it really previously mounted? Is it part of an LVM physical volume or anything like that? It seems unlkely that this is an e2fsprogs bug but let's get to the bottom of it. Attached, yes, it was mounted before $ vgdisplay No volume groups found Created attachment 589286 [details]
strace mkfs -V -t ext2 -F /dev/sdc2
The fact that it's still mountable as btrfs makes me think that this might be a btrfs bug... When unmounted, I can't unload the btrfs module, either. I notice this: [ 1131.481762] btrfs: disk space caching is enabled [ 1131.490906] btrfs bad fsid on block 20971520 [ 1131.495622] btrfs bad fsid on block 20971520 [ 1131.500439] btrfs bad fsid on block 20971520 [ 1131.505136] btrfs bad fsid on block 20971520 [ 1131.509953] btrfs bad fsid on block 20971520 [ 1131.514652] btrfs bad fsid on block 20971520 [ 1131.519328] btrfs bad fsid on block 20971520 [ 1131.523636] btrfs: failed to read chunk root on sdc2 [ 1131.530084] btrfs: open_ctree failed and I wonder if maybe there is an error path that doesn't drop a reference.... Josef? It is btrfs now, but it was created as ext4 from kickstart, after that I want to create: ext2,ext3,ext4,xfs (both fails) and btrfs and that end with succes, but still can be mounted again. This is becaouse of postmark file-system performance test (In reply to comment #6) > The fact that it's still mountable as btrfs makes me think that this might > be a btrfs bug... > > When unmounted, I can't unload the btrfs module, either. > > I notice this: > > [ 1131.481762] btrfs: disk space caching is enabled > [ 1131.490906] btrfs bad fsid on block 20971520 > [ 1131.495622] btrfs bad fsid on block 20971520 > [ 1131.500439] btrfs bad fsid on block 20971520 > [ 1131.505136] btrfs bad fsid on block 20971520 > [ 1131.509953] btrfs bad fsid on block 20971520 > [ 1131.514652] btrfs bad fsid on block 20971520 > [ 1131.519328] btrfs bad fsid on block 20971520 > [ 1131.523636] btrfs: failed to read chunk root on sdc2 > [ 1131.530084] btrfs: open_ctree failed > > and I wonder if maybe there is an error path that doesn't drop a > reference.... Josef? mkfs.btrfs succeeds because it doesn't try to open exclusively; it probably should: $ strace mkfs.btrfs /dev/sdc2 2>&1 | grep open | grep sdc2 open("/dev/sdc2", O_RDONLY) = 3 open("/dev/sdc2", O_RDWR) = 3 open("/dev/sdc2", O_RDWR|O_CREAT, 0600) = 5 open("/dev/sdc2", O_RDWR) = 6 where is the script that drove the earlier tests? Josef maybe this isn't your bug after all ;) The test itself: /mnt/tests/performance/postmark_devel_with_library/certification/runtest.sh Also using library: /mnt/tests/performance/common_functions/lib/common-performance-functions.sh Did that test run leave logs somewhere? ok I see: ext2 not run due to mkfs/mount issue ext3 not run due to mkfs/mount issue ext4 not run due to mkfs/mount issue xfs not run due to mkfs/mount issue Hm I wonder if you can reproduce this on another machine too? It looks like /dev/sdc2 was mounted early in the boot process. Does the script properly unmount it before running the mkfs tests? I wonder what happens if mkfs.btrfs is pointed at a mounted ext4 filesystem... We are creating custom filesystem layout during installation according these kickstart instructions part /boot --fstype ext2 --size=200 --asprimary --label=BOOT --ondisk=sda part /mnt/tests --fstype=ext4 --size=40960 --asprimary --label=MNT --ondisk=sda part / --fstype=ext4 --size=1 --grow --asprimary --label=ROOT --ondisk=sda part /RHTSspareLUN1 --fstype=ext4 --size=20480 --asprimary --label=sdc_20GB --ondisk=sdc part /RHTSspareLUN2 --fstype=ext4 --size=1 --grow --asprimary --label=sdc_rest --ondisk=sdc part /RHTSspareLUN3 --fstype=ext4 --size=1 --grow --asprimary --label=sdb --ondisk=sdb part /RHTSspareLUN4 --fstype=ext4 --size=1 --grow --asprimary --label=sdb --ondisk=sdd After installation sdc1 disk is properly formanted and mounted. mount | grep sdc /dev/sdc2 on /RHTSspareLUN2 type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered) /dev/sdc1 on /RHTSspareLUN1 type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered) Problems starts when we try to unmount and reformat sdc1 $ umount /dev/sdc1 $ mount | grep sdc /dev/sdc2 on /RHTSspareLUN2 type ext4 (rw,relatime,seclabel,user_xattr,barrier=1,data=ordered) ...sdc1 seems to be unmounted, lets try reformat partition as ext3 $ mkfs.ext3 /dev/sdc1 mke2fs 1.42 (29-Nov-2011) /dev/sdc1 is apparently in use by the system; will not make a filesystem here! ...mount claims that sdc1 is unmounted, but we can't make filesystem on it Issue affects these machines: dell-per210-01.lab.eng.brq.redhat.com ibm-x3650m3-01.lab.eng.brq.redhat.com We have no problems on: hp-dl360g6-02.rhts.eng.brq.redhat.com All affected machines uses deadline io scheduler by default. Unaffected machine uses cfq Very strange. I wouldn't expect the scheduler to matter. I'll have to try to recreate here I guess. Or - any idea when this started? Must be a regression? Eric, we can loan you machine for investigation, or can do bisesction ourselves. What is better for you? I would be happy to have you do some bisection :) (or maybe pursue the scheduler theory by trying cfq on the dell & ibm boxes? I'd be surprised, but who knows) Alternatively maybe a crashdump would be something to look at, perhaps we can figure out what still has hold of the device... (or, get it into that state again and I could try to poke around with crash on the live box). Hi Eric, We tried change io scheduler on dell and ibm but without change. Won't recreate partition. My suggestion is that deadline is used in default on those "enterprise" boxes and box which have cfq works. I think that it can be something what is installed/setup when box is detected as "enterprise". Just suggestions. We do bisection. Thanks Is multipath in the setup perhaps? I suppose at this point loaning me a machine might be a decent way to go. -Eric ibm-x3650m3-01.lab.eng.brq.redhat.com is now loaned to you Matus Hi Eric, I'm now testing this bz on dell-per210-01.lab.eng.brq.redhat.com and RHEL-7.0-20120612.n.1. It looks that this problem is fixed. I can create fs without any problems. If you will sing off ibm box I will retest it on this machine as well. Thanks K On ibm-x3650m3-01.lab.eng.brq.redhat.com and RHEL-7.0-20120612.n.1 it was retested. Fixed. Closing. |