Bug 663563
Summary: | [ext4/xfstests] 011 caused filesystem corruption after running many times in a loop | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Igor Zhang <yugzhang> | ||||
Component: | kernel | Assignee: | Lukáš Czerner <lczerner> | ||||
Status: | CLOSED ERRATA | QA Contact: | Petr Beňas <pbenas> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 5.6 | CC: | branto, eguan, esandeen, kzhang, lczerner, pbenas, pstehlik, qcai, rwheeler | ||||
Target Milestone: | beta | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-07-21 10:10:54 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Igor Zhang
2010-12-16 08:09:30 UTC
With more same runs under x86_64, I also caught this problem. Created attachment 473938 [details]
fix patch
Hello,
this issue has been fixed upstream, broken and then fixed again. So now it is _fixed_ upstream.
Quoting Eric Sandeen:
This bug was introduced in:
393418676a7602e1d7d3f6e560159c65c8cbd50e ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
I fixed it in:
7ce9d5d1f3c8736511daa413c64985a05b2feee3 ext4: fix ext4_free_inode() vs. ext4_claim_inode() race
it got broken again in:
955ce5f5be67dfe0d1d096b543af33fe8a1ce3dd ext4: Convert ext4_lock_group to use sb_bgl_lock
and ultimately fixed again in:
d17413c08cd2b1dd2bf2cfdbb0f7b736b2b2b15c ext4: clean up inode bitmaps manipulation in ext4_free_inode
This patch should fix the issue in RHEL5.6. It has been tested on 2.6.18-239.el5 i386 with expected result (no corruption during the test).
Igor, please could you give it a try on other architectures ?
Thanks!
-Lukas
On x86_64 with kernel 2.6.18-239.el5, this problem still existed: # uname -a Linux intel-s3e36-01.rhts.eng.nay.redhat.com 2.6.18-239.el5 #1 SMP Tue Jan 4 13:13:58 EST 2011 x86_64 x86_64 x86_64 GNU/Linux FSTYP -- ext4 PLATFORM -- Linux/x86_64 intel-s3e36-01 2.6.18-239.el5 MKFS_OPTIONS -- /dev/loop1 MOUNT_OPTIONS -- -o acl,user_xattr -o context=system_u:object_r:nfs_t:s0 /dev/loop1 /mnt/testarea/scratch 011 1s ... 1s _check_generic_filesystem: filesystem on /dev/loop0 is inconsistent (see 011.full) Ran: 011 Passed all 1 tests _check_generic filesystem: filesystem on /dev/loop0 is inconsistent *** fsck.ext4 output *** fsck 1.39 (29-May-2006) e4fsck 1.41.12 (17-May-2010) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Inode bitmap differences: -134175 Fix? no Free inodes count wrong for group #16 (8192, counted=8191). Fix? no Free inodes count wrong (327669, counted=327668). Fix? no /dev/loop0: ********** WARNING: Filesystem still has errors ********** /dev/loop0: 11/327680 files (0.0% non-contiguous), 55902/1310720 blocks *** end fsck.ext4 output *** mount output *** /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/sda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/loop1 on /mnt/testarea/scratch type ext4 (rw,acl,user_xattr,context="system_u:object_r:nfs_t:s0") *** end mount output (In reply to comment #5) > On x86_64 with kernel 2.6.18-239.el5, this problem still existed: Are you sure you have applied the patch above ? I am sorry but this is not clear from your comment. Anyway, with that patch I am not able to reproduce the corruption you have seen, however I am seeing different corruption: EXT4-fs error (device sdb): file system corruption: inode #3538947 logical block 2 mapped to 925434301 (size 1) which is kind of worrisome, but this it is for new BZ I guess. Need to look at it more closely. -Lukas (In reply to comment #8) > (In reply to comment #5) > > On x86_64 with kernel 2.6.18-239.el5, this problem still existed: > > Are you sure you have applied the patch above ? I am sorry but this is not > clear from your comment. > > Anyway, with that patch I am not able to reproduce the corruption you have > seen, however I am seeing different corruption: > > EXT4-fs error (device sdb): file system corruption: inode #3538947 logical > block 2 mapped to 925434301 (size 1) > > which is kind of worrisome, but this it is for new BZ I guess. Need to look at > it more closely. > > -Lukas Cite from your comment 4, "This patch should fix the issue in RHEL5.6. It has been tested on 2.6.18-239.el5 i386 with expected result..." I roughly got the idea that it has been fixed in kernel 2.6.18-239.el5. So I just tested against it. Checking from http://intranet.corp.redhat.com/ic/intranet/RHEL5ChangeLog2#23X.el5, there isn't your mentioned patch since kernel 2.6.18-236.el5. I'll retest this problem when a new kernel build containing the fix is released. The patch mentioned in Comment 4 definitely fixes the bug. The problems I have seen aside that are not related to the problem and is not reproducible outside my environment. Thanks! -Lukas This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Requesting blocker since this is fs corruption. *** Bug 667762 has been marked as a duplicate of this bug. *** in kernel-2.6.18-245.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed. Reproduced in 2.6.18-244.el5 and verified in 2.6.18-245.el5. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html |