Description of problem: I get my cluster up, attempt some I/O load and imediately see different Oops on the different machines. The I/O being run (as can be seen in the Oops output) is genesis, accordion, and doio/iogen. Version-Release number of selected component (if applicable): GFS <CVS> (built Oct 26 2004 16:12:28) installed CMAN <CVS> (built Oct 26 2004 16:11:37) installed DLM <CVS> (built Oct 26 2004 16:11:53) installed Lock_DLM (built Oct 26 2004 16:12:03) installed Lock_Nolock <CVS> (built Oct 26 2004 16:12:00) installed How reproducible: Always
morph-01: Unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: f89b0f56 *pde = 00000000 Oops: 0002 [#1] SMP Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 microcode dm_mod uhci_hcd ehci_hcd button battery ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 1 EIP: 0060:[<f89b0f56>] Not tainted VLI EFLAGS: 00010283 (2.6.9) EIP is at dlm_async+0x146/0x370 [lock_dlm] eax: 00000000 ebx: f7f7ae80 ecx: f2d2094c edx: 00000000 esi: f7f7aedc edi: 00000001 ebp: f7f7aeb0 esp: f4f23f60 ds: 007b es: 007b ss: 0068 Process lock_dlm2 (pid: 4473, threadinfo=f4f22000 task=f4fbd250) Stack: f7f7aeb4 f4f22000 00000000 000000bd 00000000 f6d3fd40 f2d20900 f4f23fcc 00000000 f4fbd250 c011f2d0 00000000 00000000 00000000 f4f09d10 f4f55d00 00000000 f4fbd250 c011f2d0 00100100 00200200 39777800 000f4244 f4fbd3b0 Call Trace: [<c011f2d0>] default_wake_function+0x0/0x10 [<c011f2d0>] default_wake_function+0x0/0x10 [<f89b0e10>] dlm_async+0x0/0x370 [lock_dlm] [<c0135bd4>] kthread+0xa4/0xb0 [<c0135b30>] kthread+0x0/0xb0 [<c01042b5>] kernel_thread_helper+0x5/0x10 Code: 44 24 13 00 c6 44 24 0d 00 c6 44 24 0e 00 e8 a2 5b 94 c7 8b 4b 34 3b 0c 24 0f 84 36 01 00 00 8d 41 b4 89 44 24 18 8b 51 04 8b 01 <89> 50 04 89 02 c7 41 04 00 02 20 00 c7 01 00 01 10 00 f0 0f ba morph-03: Unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: f8a340d2 *pde = 00000000 Oops: 0002 [#1] SMP Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 microcode dm_mod uhci_hcd ehci_hcd button battery ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 1 EIP: 0060:[<f8a340d2>] Not tainted VLI EFLAGS: 00010246 (2.6.9) EIP is at incore_commit+0x52/0x230 [gfs] eax: 00000000 ebx: f4ac8ddc ecx: f2805800 edx: f266ec80 esi: f8a35010 edi: f2a34680 ebp: 00000000 esp: f45e7e00 ds: 007b es: 007b ss: 0068 Process growfiles (pid: 4117, threadinfo=f45e6000 task=f48693d0) Stack: 00001000 00000246 f2a346a4 f2a346a4 f8acc000 f8ae8838 f2a34680 f75a75c0 f8acc000 f8a34452 f45e7e40 f45e7e44 f45e7e48 f8acc000 f2a346b0 00000002 00000001 ffffffff ffffffff f2a346a4 f2a34680 f2a346a4 f8acc000 f8a4c3dc Call Trace: [<f8a34452>] gfs_log_commit+0x1a2/0x220 [gfs] [<f8a4c3dc>] gfs_trans_end+0x6c/0x100 [gfs] [<f8a3a753>] gfs_dinode_out+0x833/0x840 [gfs] [<f8a3e61e>] do_do_write_buf+0x16e/0x460 [gfs] [<f8a2b442>] gfs_glock_nq_m+0x162/0x190 [gfs] [<f8a3ea31>] do_write_buf+0x121/0x190 [gfs] [<f8a3da02>] walk_vm+0xc2/0x110 [gfs] [<f8a3eb36>] gfs_write+0x96/0xe0 [gfs] [<f8a3e910>] do_write_buf+0x0/0x190 [gfs] [<c015cad1>] vfs_write+0xd1/0x120 [<c015cbe7>] sys_write+0x47/0x80 [<c0105f5d>] sysenter_past_esp+0x52/0x71 Code: 26 00 8d bc 27 00 00 00 00 8b 43 f8 31 c9 8d 53 f8 8b 70 0c 85 f6 0f 85 dd 01 00 00 85 c9 74 2e 39 e9 74 2a 8b 51 04 85 ed 8b 01 <89> 50 04 89 02 c7 41 04 00 02 20 00 c7 01 00 01 10 00 0f 84 aa morph-04: Unable to handle kernel paging request at virtual address 04b5b0a7 printing eip: f8efa0d5 *pde = 00000000 Oops: 0002 [#1] SMP Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harness lpfc ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<f8efa0d5>] Not tainted VLI EFLAGS: 00010246 (2.6.9) EIP is at incore_commit+0x55/0x230 [gfs] eax: f7fe9680 ebx: f473bb3c ecx: f3808b80 edx: 04b5b0a7 esi: f8efb010 edi: f384e680 ebp: 00000000 esp: f3de5e08 ds: 007b es: 007b ss: 0068 Process growfiles (pid: 4238, threadinfo=f3de4000 task=f3de3630) Stack: 00001000 00000246 f384e6a4 f384e6a4 f8a83000 f8a9f838 f384e680 f750c780 f8a83000 f8efa452 f3de5e48 f3de5e4c f3de5e50 00000000 f384e6b0 00000003 00000002 ffffffff ffffffff f384e6a4 f384e680 f384e6a4 f8a83000 f8f123dc Call Trace: [<f8efa452>] gfs_log_commit+0x1a2/0x220 [gfs] [<f8f123dc>] gfs_trans_end+0x6c/0x100 [gfs] [<f8f06b9a>] gfs_create+0x13a/0x1c0 [gfs] [<c016b109>] vfs_create+0xa9/0x130 [<c016b9e0>] open_namei+0x5f0/0x650 [<c015bb9d>] filp_open+0x2d/0x60 [<c015be08>] get_unused_fd+0x78/0xd0 [<c015bf4c>] sys_open+0x3c/0xa0 [<c0105f5d>] sysenter_past_esp+0x52/0x71 Code: bc 27 00 00 00 00 8b 43 f8 31 c9 8d 53 f8 8b 70 0c 85 f6 0f 85 dd 01 00 00 85 c9 74 2e 39 e9 74 2a 8b 51 04 85 ed 8b 01 89 50 04 <89> 02 c7 41 04 00 02 20 00 c7 01 00 01 10 00 0f 84 aa 01 00 00 morph-05: Unable to handle kernel paging request at virtual address 064660ac printing eip: f89f21b3 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: gnbd lock_gulm lock_nolock lock_dlm dlm cman gfs lock_harness lpfc ipv6 parport_pc lp parport autofs4 sunrpc e1000 microcode dm_mod uhci_hcd ehci_hcd button battery ac ext3 jbd qla2300 qla2xxx scsi_transport_fc<1>Unable to handle kernel paging request at virtual address 00050005 printing eip: f8ed278d *pde = 00000000 sd_mod scsi_mod CPU: 0 EIP: 0060:[<f89f21b3>] Not tainted VLI EFLAGS: 00010202 (2.6.9) EIP is at search_resource+0x53/0x70 [lock_dlm] eax: 064660a8 ebx: 064660a8 ecx: 06475ea7 edx: 064660ac esi: 06475ea0 edi: 00000000 ebp: f7277230 esp: f166de54 ds: 007b es: 007b ss: 0068 Process doio (pid: 4140, threadinfo=f166c000 task=f28acc70) Stack: f166def4 00000000 f7277238 00000001 f7277180 f89f2215 fffffff4 f166def4 00000000 00000007 f166def4 f2730910 f89f3d98 f166dec4 00000007 06475ea0 00000000 f71b3800 f166df2c 02ec02cf 00000000 01eb54eb 00000000 00000001 Call Trace: [<f89f2215>] get_resource+0x45/0x190 [lock_dlm] [<f89f3d98>] lm_dlm_plock+0x98/0x2e0 [lock_dlm] [<f8ec54ff>] do_plock+0xcf/0x110 [gfs] [<c010c4d0>] timer_interrupt+0xb0/0x120 [<f8ec5540>] gfs_lock+0x0/0x70 [gfs] [<f8ec55a0>] gfs_lock+0x60/0x70 [gfs] [<c017293b>] fcntl_setlk+0x25b/0x2b0 [<f8943fa0>] e1000_clean+0xa0/0xc0 [e1000] [<c011d467>] recalc_task_prio+0x97/0x190 [<c011ddbd>] finish_task_switch+0x3d/0x90 [<c02f5f1f>] schedule+0x2ef/0x620 [<c016e2cc>] do_fcntl+0xdc/0x170 [<c016e470>] sys_fcntl64+0x90/0xa0 [<c0105f5d>] sysenter_past_esp+0x52/0x71 Code: 30 8b 78 04 8d 74 26 00 8b 53 10 8b 43 0c 89 d1 31 f9 31 f0 09 c1 75 0b 8b 14 24 8b 42 08 39 43 14 74 1b 8b 53 04 8d 42 fc 89 c3 <8b> 40 04 0f 18 00 90 39 ea 75 d2 31 c0 5a 5b 5e 5f 5d c3 89 d8
This should be fixed now. Changes by: teigland 2004-10-28 07:14:46 Modified files: gfs-kernel/src/dlm: plock.c Log message: Cached null locks that had been used with plocks were being freed too early, before the the unlock completion ast.
fix verified.