Bug 208514
Summary: | singlenode gfs2 blows up when running fsx | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Dave Jones <davej> | ||||||
Component: | kernel | Assignee: | Ben Marzinski <bmarzins> | ||||||
Status: | CLOSED DUPLICATE | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5.0 | CC: | cluster-maint, nobody+wcheng, pfrields, wtogami | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-06-29 17:26:44 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 231239 | ||||||||
Bug Blocks: | 204760 | ||||||||
Attachments: |
|
Description
Dave Jones
2006-09-28 23:27:19 UTC
I rebooted into the 2699 kernel which has a slightly newer gfs2.. it got a bit further.. fsx foo truncating to largest ever: 0x13e76 truncating to largest ever: 0x2e52c truncating to largest ever: 0x3c2c2 truncating to largest ever: 0x3f15f truncating to largest ever: 0x3fcb9 truncating to largest ever: 0x3fe96 truncating to largest ever: 0x3ff9d Bus error (core dumped) dmesg has.. SELinux: initialized (dev loop0, type gfs2), uses xattr attempt to access beyond end of device loop0: rw=1, want=4090720, limit=409600 Buffer I/O error on device loop0, logical block 51133 lost page write due to I/O error on loop0 attempt to access beyond end of device loop0: rw=1, want=3675240, limit=409600 Buffer I/O error on device loop0, logical block 51044 lost page write due to I/O error on loop0 attempt to access beyond end of device loop0: rw=1, want=815376, limit=409600 Buffer I/O error on device loop0, logical block 50960 lost page write due to I/O error on loop0 I unmounted, and started over with the dd/mkfs/mount/fsx. this time fsx spewed pages and pages of errors, with another.. attempt to access beyond end of device loop1: rw=1, want=814672, limit=409600 Buffer I/O error on device loop1, logical block 50916 lost page write due to I/O error on loop1 in dmesg. I'll attach the fsx log if it's requested, but right now, I've not seen it last more than a few seconds, so this should be easily reproducable. The fsx failure ended with .. 14718(126 mod 256): READ 0xd2d4 thru 0x15384 (0x80b1 bytes) 14719(127 mod 256): MAPREAD 0x5773 thru 0xe481 (0x8d0f bytes) 14720(128 mod 256): WRITE 0x2d330 thru 0x39df8 (0xcac9 bytes) HOLE save_buffer write: No space left on device df showed .. /dev/loop1 210M 70M 141M 34% /mnt/test An error in space accounting with sparse files maybe ? See also bz #205307 which I think I'll close as a dup of this one. I suspect though that the original problem described in 205307 is now fixed but there is obviously still a problem here and this bug has the most uptodate information in it. *** Bug 205307 has been marked as a duplicate of this bug. *** Ben, I believe that we fixed the bug reported here some time ago, but if you could confirm that 100% that would be good. Also Dave has suggested there might be a problem relating to accounting for blocks, so if you could run a few tests creating and removing files and check that we get back all the space that was allocated when things are deleted, then we can either mark this as closed or fix the problem as appropriate. This appears to still be a problem. At least, using the code from git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes.git, I still see the error from Comment #2, and it does not seem to be a problem in the fsx test. I'm looking into what exactly is going on. I'm still not sure what's going on in the code, but the I've got a simple reproducer. If you just keep aternating between truncating a file down to 0 bytes, and writing to it, GFS2 eventually reports no space left on the device. However, the file's size is 0. Running df seems to clear things up. Created attachment 147705 [details]
simple recreater
This program just repeatedly truncates and then writes to a file. I should run
forever, and does on ext3. On gfs2, the write eventually fails with
errno=ENOSPC
Created attachment 150919 [details]
patch to flush log when there is no space in any resource groups
When you deallocate blocks in gf2, you are not able to reuse the space until
the associated resource groups are flushed to the ondisk log. This occasionally
caused gfs2 to act like there is no available space, when there actually is.
This patch causes gfs2 to flush the incore log if it is unable to find any
resource groups with available space.
Patch applied Did this get posted to rhkernel-list? This bug was posted to rhkernel-list was part of bz #239777 *** This bug has been marked as a duplicate of 239777 *** |