Bug 268521

Summary: RFE: GFS report block device errors as block device errors, not filesystem errors
Product: Red Hat Enterprise Linux 5 Reporter: Corey Marthaler <cmarthal>
Component: gfs-kmodAssignee: Robert Peterson <rpeterso>
Status: CLOSED WONTFIX QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: medium    
Version: 5.0CC: djuran, rwheeler, swhiteho
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-11 20:44:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Corey Marthaler 2007-08-30 19:16:18 UTC
Description of problem:
This appears to be the same as the gfs2 bug 253948. I didn't have the debug or
oops_ok mount option set so I didn't expect the system to panic due to a I/O
error. Is this a regression? This was fixed a long time ago?



GFS: fsid=TAFT_CLUSTER:2.3: fatal: assertion "x <= length" failed
GFS: fsid=TAFT_CLUSTER:2.3:   function = blkalloc_internal
GFS: fsid=TAFT_CLUSTER:2.3:   file =
/builddir/build/BUILD/gfs-kmod-0.1.19/_kmod_build_/src/gfs8
GFS: fsid=TAFT_CLUSTER:2.3:   time = 1188489777
GFS: fsid=TAFT_CLUSTER:2.3: about to withdraw from the cluster
GFS: fsid=TAFT_CLUSTER:2.3: telling LM to withdraw
Aug 30 11:02:57 taft-03 kernel: GFS: fsid=TAFT_CLUSTER:2.3: fatal: assertion "x
<= length" faild
Aug 30 11:02:57 taft-03 kernel: GFS: fsid=TAFT_CLUSTER:2.3:   function =
blkalloc_internal
Aug 30 11:02:57 taft-03 kernel: GFS: fsid=TAFT_CLUSTER:2.3:   file =
/builddir/build/BUILD/gfs-8
Aug 30 11:02:57 taft-03 kernel: GFS: fsid=TAFT_CLUSTER:2.3:   time = 1188489777
Aug 30 11:02:57 taft-03 kernel: GFS: fsid=TAFT_CLUSTER:2.3: about to withdraw
from the cluster
Aug 30 11:02:57 taft-03 kernel: GFS: fsid=TAFT_CLUSTER:2.3: telling LM to withdraw
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/locks.c:1991
invalid opcode: 0000 [1] SMP
last sysfs file: /fs/gfs/TAFT_CLUSTER:4/lock_module/recover_done
CPU 0
Modules linked in: gfs(U) autofs4 hidp rfcomm l2cap bluetooth lock_dlm gfs2 dlm
configfs sunrpcd
Pid: 10460, comm: doio Not tainted 2.6.18-40.el5 #1
RIP: 0010:[<ffffffff8002703d>]  [<ffffffff8002703d>] locks_remove_flock+0xe4/0x122
RSP: 0018:ffff8101ceccddb8  EFLAGS: 00010246
RAX: ffff81021487f6b8 RBX: ffff81019b2e1380 RCX: ffff8101ceccddb8
RDX: 0000000000000000 RSI: ffff8101ceccddb8 RDI: ffffffff802fdea0
RBP: ffff81021aa89280 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8101ceccddb8 R11: 00000000000000b0 R12: ffff81019b2e1280
R13: ffff81019b2e1280 R14: ffff81021043e780 R15: ffff8101994849c0
FS:  0000000000000000(0000) GS:ffffffff80396000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000304c295770 CR3: 0000000000201000 CR4: 00000000000006e0
Process doio (pid: 10460, threadinfo ffff8101ceccc000, task ffff8101b2e17040)
Stack:  0000000000000000 0000000000000000 0000000000000000 0000000000000000
 0000000000000000 0000000000000000 00000000000028dc 0000000000000000
 0000000000000000 0000000000000000 ffff81021aa89280 0000000000000202
Call Trace:
 [<ffffffff800122c2>] __fput+0x94/0x198
 [<ffffffff800237b1>] filp_close+0x5c/0x64
 [<ffffffff800384cf>] put_files_struct+0x6c/0xc3
 [<ffffffff80014f6e>] do_exit+0x2d2/0x89d
 [<ffffffff80046e11>] cpuset_exit+0x0/0x6c
 [<ffffffff8005b28d>] tracesys+0xd5/0xe0


Code: 0f 0b 68 62 df 28 80 c2 c7 07 48 89 c3 48 8b 03 48 85 c0 75
RIP  [<ffffffff8002703d>] locks_remove_flock+0xe4/0x122
 RSP <ffff8101ceccddb8>
 <0>Kernel panic - not syncing: Fatal exception


Version-Release number of selected component (if applicable):
2.6.18-40.el5

Comment 2 Robert Peterson 2009-05-11 18:25:02 UTC
Hey Corey, I'm just revisiting this bug record.  I should have spotted
this months ago, but:  In the problem description, the kernel panic
shown was due to a BUG() statement in vfs function locks_remove_flock.
In other words it was vfs, not gfs, that decided to panic the kernel.
The question is: Do you still perceive a problem in how GFS handles
block device errors in RHEL5?  If so, can I get a more recent example?
I'm wondering if this should really be against gfs.

Comment 3 Corey Marthaler 2009-05-11 20:30:40 UTC
This bug is almost 2 years old, you can probably just close it.

Comment 4 Robert Peterson 2009-05-11 20:44:25 UTC
Thanks, Corey.  I'll close it as WONTFIX.  We can always change
our minds later if it makes sense.