Bug 137333 - Many Oops when attempting to run I/O with latest build
Many Oops when attempting to run I/O with latest build
Status: CLOSED CURRENTRELEASE
Product: Red Hat Cluster Suite
Classification: Red Hat
Component: gfs (Show other bugs)
3
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Ken Preslan
GFS Bugs
:
Depends On:
Blocks: 137339
  Show dependency treegraph
 
Reported: 2004-10-27 12:17 EDT by Corey Marthaler
Modified: 2010-01-11 22:00 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-10-28 12:46:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Corey Marthaler 2004-10-27 12:17:14 EDT
Description of problem:
I get my cluster up, attempt some I/O load and imediately see
different Oops on the different machines. The I/O being run (as can be
seen in the Oops output) is genesis, accordion, and doio/iogen.

Version-Release number of selected component (if applicable):

GFS <CVS> (built Oct 26 2004 16:12:28) installed
CMAN <CVS> (built Oct 26 2004 16:11:37) installed
DLM <CVS> (built Oct 26 2004 16:11:53) installed
Lock_DLM (built Oct 26 2004 16:12:03) installed
Lock_Nolock <CVS> (built Oct 26 2004 16:12:00) installed

How reproducible:
Always
Comment 1 Corey Marthaler 2004-10-27 12:18:49 EDT
morph-01:

Unable to handle kernel NULL pointer dereference at virtual address
00000004
 printing eip:
f89b0f56
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 microcode
dm_mod uhci_hcd ehci_hcd button battery ac ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
CPU:    1
EIP:    0060:[<f89b0f56>]    Not tainted VLI
EFLAGS: 00010283   (2.6.9)
EIP is at dlm_async+0x146/0x370 [lock_dlm]
eax: 00000000   ebx: f7f7ae80   ecx: f2d2094c   edx: 00000000
esi: f7f7aedc   edi: 00000001   ebp: f7f7aeb0   esp: f4f23f60
ds: 007b   es: 007b   ss: 0068
Process lock_dlm2 (pid: 4473, threadinfo=f4f22000 task=f4fbd250)
Stack: f7f7aeb4 f4f22000 00000000 000000bd 00000000 f6d3fd40 f2d20900
f4f23fcc
       00000000 f4fbd250 c011f2d0 00000000 00000000 00000000 f4f09d10
f4f55d00
       00000000 f4fbd250 c011f2d0 00100100 00200200 39777800 000f4244
f4fbd3b0
Call Trace:
 [<c011f2d0>] default_wake_function+0x0/0x10
 [<c011f2d0>] default_wake_function+0x0/0x10
 [<f89b0e10>] dlm_async+0x0/0x370 [lock_dlm]
 [<c0135bd4>] kthread+0xa4/0xb0
 [<c0135b30>] kthread+0x0/0xb0
 [<c01042b5>] kernel_thread_helper+0x5/0x10
Code: 44 24 13 00 c6 44 24 0d 00 c6 44 24 0e 00 e8 a2 5b 94 c7 8b 4b
34 3b 0c 24 0f 84 36 01 00 00 8d 41 b4 89 44 24 18 8b 51 04 8b 01 <89>
50 04 89 02 c7 41 04 00 02 20 00 c7 01 00 01 10 00 f0 0f ba


morph-03:
Unable to handle kernel NULL pointer dereference at virtual address
00000004
 printing eip:
f8a340d2
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs
lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 microcode
dm_mod uhci_hcd ehci_hcd button battery ac ext3 jbd qla2300 qla2xxx
scsi_transport_fc sd_mod scsi_mod
CPU:    1
EIP:    0060:[<f8a340d2>]    Not tainted VLI
EFLAGS: 00010246   (2.6.9)
EIP is at incore_commit+0x52/0x230 [gfs]
eax: 00000000   ebx: f4ac8ddc   ecx: f2805800   edx: f266ec80
esi: f8a35010   edi: f2a34680   ebp: 00000000   esp: f45e7e00
ds: 007b   es: 007b   ss: 0068
Process growfiles (pid: 4117, threadinfo=f45e6000 task=f48693d0)
Stack: 00001000 00000246 f2a346a4 f2a346a4 f8acc000 f8ae8838 f2a34680
f75a75c0
       f8acc000 f8a34452 f45e7e40 f45e7e44 f45e7e48 f8acc000 f2a346b0
00000002
       00000001 ffffffff ffffffff f2a346a4 f2a34680 f2a346a4 f8acc000
f8a4c3dc
Call Trace:
 [<f8a34452>] gfs_log_commit+0x1a2/0x220 [gfs]
 [<f8a4c3dc>] gfs_trans_end+0x6c/0x100 [gfs]
 [<f8a3a753>] gfs_dinode_out+0x833/0x840 [gfs]
 [<f8a3e61e>] do_do_write_buf+0x16e/0x460 [gfs]
 [<f8a2b442>] gfs_glock_nq_m+0x162/0x190 [gfs]
 [<f8a3ea31>] do_write_buf+0x121/0x190 [gfs]
 [<f8a3da02>] walk_vm+0xc2/0x110 [gfs]
 [<f8a3eb36>] gfs_write+0x96/0xe0 [gfs]
 [<f8a3e910>] do_write_buf+0x0/0x190 [gfs]
 [<c015cad1>] vfs_write+0xd1/0x120
 [<c015cbe7>] sys_write+0x47/0x80
 [<c0105f5d>] sysenter_past_esp+0x52/0x71
Code: 26 00 8d bc 27 00 00 00 00 8b 43 f8 31 c9 8d 53 f8 8b 70 0c 85
f6 0f 85 dd 01 00 00 85 c9 74 2e 39 e9 74 2a 8b 51 04 85 ed 8b 01 <89>
50 04 89 02 c7 41 04 00 02 20 00 c7 01 00 01 10 00 0f 84 aa


morph-04:
Unable to handle kernel paging request at virtual address 04b5b0a7
 printing eip:
f8efa0d5
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs
lock_harness lpfc ipv6 parport_pc lp parport autofs4 sunrpc e1000
floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery ac ext3
jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod
CPU:    0
EIP:    0060:[<f8efa0d5>]    Not tainted VLI
EFLAGS: 00010246   (2.6.9)
EIP is at incore_commit+0x55/0x230 [gfs]
eax: f7fe9680   ebx: f473bb3c   ecx: f3808b80   edx: 04b5b0a7
esi: f8efb010   edi: f384e680   ebp: 00000000   esp: f3de5e08
ds: 007b   es: 007b   ss: 0068
Process growfiles (pid: 4238, threadinfo=f3de4000 task=f3de3630)
Stack: 00001000 00000246 f384e6a4 f384e6a4 f8a83000 f8a9f838 f384e680
f750c780
       f8a83000 f8efa452 f3de5e48 f3de5e4c f3de5e50 00000000 f384e6b0
00000003
       00000002 ffffffff ffffffff f384e6a4 f384e680 f384e6a4 f8a83000
f8f123dc
Call Trace:
 [<f8efa452>] gfs_log_commit+0x1a2/0x220 [gfs]
 [<f8f123dc>] gfs_trans_end+0x6c/0x100 [gfs]
 [<f8f06b9a>] gfs_create+0x13a/0x1c0 [gfs]
 [<c016b109>] vfs_create+0xa9/0x130
 [<c016b9e0>] open_namei+0x5f0/0x650
 [<c015bb9d>] filp_open+0x2d/0x60
 [<c015be08>] get_unused_fd+0x78/0xd0
 [<c015bf4c>] sys_open+0x3c/0xa0
 [<c0105f5d>] sysenter_past_esp+0x52/0x71
Code: bc 27 00 00 00 00 8b 43 f8 31 c9 8d 53 f8 8b 70 0c 85 f6 0f 85
dd 01 00 00 85 c9 74 2e 39 e9 74 2a 8b 51 04 85 ed 8b 01 89 50 04 <89>
02 c7 41 04 00 02 20 00 c7 01 00 01 10 00 0f 84 aa 01 00 00


morph-05:

Unable to handle kernel paging request at virtual address 064660ac
 printing eip:
f89f21b3
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs
lock_harness lpfc ipv6 parport_pc lp parport autofs4 sunrpc e1000
microcode dm_mod uhci_hcd ehci_hcd button battery ac ext3 jbd qla2300
qla2xxx scsi_transport_fc<1>Unable to handle kernel paging request at
virtual address 00050005
 printing eip:
f8ed278d
*pde = 00000000
 sd_mod scsi_mod
CPU:    0
EIP:    0060:[<f89f21b3>]    Not tainted VLI
EFLAGS: 00010202   (2.6.9)
EIP is at search_resource+0x53/0x70 [lock_dlm]
eax: 064660a8   ebx: 064660a8   ecx: 06475ea7   edx: 064660ac
esi: 06475ea0   edi: 00000000   ebp: f7277230   esp: f166de54
ds: 007b   es: 007b   ss: 0068
Process doio (pid: 4140, threadinfo=f166c000 task=f28acc70)
Stack: f166def4 00000000 f7277238 00000001 f7277180 f89f2215 fffffff4
f166def4
       00000000 00000007 f166def4 f2730910 f89f3d98 f166dec4 00000007
06475ea0
       00000000 f71b3800 f166df2c 02ec02cf 00000000 01eb54eb 00000000
00000001
Call Trace:
 [<f89f2215>] get_resource+0x45/0x190 [lock_dlm]
 [<f89f3d98>] lm_dlm_plock+0x98/0x2e0 [lock_dlm]
 [<f8ec54ff>] do_plock+0xcf/0x110 [gfs]
 [<c010c4d0>] timer_interrupt+0xb0/0x120
 [<f8ec5540>] gfs_lock+0x0/0x70 [gfs]
 [<f8ec55a0>] gfs_lock+0x60/0x70 [gfs]
 [<c017293b>] fcntl_setlk+0x25b/0x2b0
 [<f8943fa0>] e1000_clean+0xa0/0xc0 [e1000]
 [<c011d467>] recalc_task_prio+0x97/0x190
 [<c011ddbd>] finish_task_switch+0x3d/0x90
 [<c02f5f1f>] schedule+0x2ef/0x620
 [<c016e2cc>] do_fcntl+0xdc/0x170
 [<c016e470>] sys_fcntl64+0x90/0xa0
 [<c0105f5d>] sysenter_past_esp+0x52/0x71
Code: 30 8b 78 04 8d 74 26 00 8b 53 10 8b 43 0c 89 d1 31 f9 31 f0 09
c1 75 0b 8b 14 24 8b 42 08 39 43 14 74 1b 8b 53 04 8d 42 fc 89 c3 <8b>
40 04 0f 18 00 90 39 ea 75 d2 31 c0 5a 5b 5e 5f 5d c3 89 d8


Comment 2 David Teigland 2004-10-28 03:17:41 EDT
This should be fixed now.

Changes by:     teigland@sourceware.org 2004-10-28 07:14:46

Modified files:
        gfs-kernel/src/dlm: plock.c

Log message:
        Cached null locks that had been used with plocks were being
        freed too early, before the the unlock completion ast.
Comment 3 Corey Marthaler 2004-10-28 12:46:44 EDT
fix verified.

Note You need to log in before you can comment on or make changes to this bug.