Description of problem: Had a quorate 3-node cluster, made a filesystem with -p lock_dlm on a device (no lvm involved) and mounted on the first node. It Oopsed. Will reproduce and try to get more information: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: e041419d *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: gfs loop lock_dlm dlm cman lock_harness ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<e041419d>] Not tainted EFLAGS: 00010292 (2.6.8.1) EIP is at queue_complete+0xd/0x100 [lock_dlm] eax: 00000000 ebx: 00000000 ecx: e0415050 edx: 00000001 esi: daf3d3d8 edi: df48ae00 ebp: 00000005 esp: dad35f1c ds: 007b es: 007b ss: 0068 Process dlm_astd (pid: 5314, threadinfo=dad34000 task=db14a850) Stack: 00000000 e02e008f 00000018 00000246 398bcf80 000f4456 db14aa04 00000296 daa3350c daa3350c daf3d3d8 e02cd448 00000246 398bcf80 000f4456 dad44330 c13f7ca0 df48ae98 00000000 e0415060 e0415050 dad35fa4 e02ead8c dad34000 Call Trace: [<e02e008f>] _release_rsb+0x13f/0x2b0 [dlm] [<e02cd448>] process_asts+0x108/0x1e0 [dlm] [<e0415060>] lock_bast+0x0/0x5 [lock_dlm] [<e0415050>] lock_ast+0x0/0x10 [lock_dlm] [<e02cdca0>] dlm_astd+0x0/0x220 [dlm] [<e02cde95>] dlm_astd+0x1f5/0x220 [dlm] [<c011efb0>] default_wake_function+0x0/0x10 [<c011efb0>] default_wake_function+0x0/0x10 [<c0134fa4>] kthread+0xa4/0xb0 [<c0134f00>] kthread+0x0/0xb0 [<c01042b5>] kernel_thread_helper+0x5/0x10 Code: 8b 30 8b 40 34 a9 80 00 00 00 75 47 c7 04 24 34 8d 41 e0 8b Filesystem make command: [root@link-10 root]# gfs_mkfs -t MILTON:data1 -j 2 -p lock_dlm /dev/sdg15 Version-Release number of selected component (if applicable): [root@link-11 root]# gfs_mkfs -V gfs_mkfs DEVEL.1096560308 (built Sep 30 2004 11:06:29) Copyright (C) Red Hat, Inc. 2004 All rights reserved. How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
A little more context on this. It looks like this is happening during "Trying to acquire journal lock..." on the last journal. This is very reproducible today. I can't get a filesystem to not do this. Here is another example of a GFS with 3 journals: dlm: data3: total nodes 1 dlm: data3: rebuild resource directory dlm: data3: rebuilt 0 resources dlm: data3: recover event 2 done dlm: data3: recover event 2 finished GFS: fsid=MILTON:data3.0: Joined cluster. Now mounting FS... GFS: fsid=MILTON:data3.0: jid=0: Trying to acquire journal lock... GFS: fsid=MILTON:data3.0: jid=0: Looking at journal... GFS: fsid=MILTON:data3.0: jid=0: Done GFS: fsid=MILTON:data3.0: jid=1: Trying to acquire journal lock... GFS: fsid=MILTON:data3.0: jid=1: Looking at journal... GFS: fsid=MILTON:data3.0: jid=1: Done GFS: fsid=MILTON:data3.0: jid=2: Trying to acquire journal lock... Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: e043419d *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: lock_dlm gfs lock_harness dlm cman ipv6 parport_pc lp parport autofs4 sunrpc e1000 floppy sg microcode dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod CPU: 0 EIP: 0060:[<e043419d>] Not tainted EFLAGS: 00010292 (2.6.8.1) EIP is at queue_complete+0xd/0x100 [lock_dlm] eax: 00000000 ebx: 00000000 ecx: e0435050 edx: 00000001 esi: d93233d8 edi: df476800 ebp: 00000005 esp: d9635f1c ds: 007b es: 007b ss: 0068 Process dlm_astd (pid: 4991, threadinfo=d9634000 task=da38d6b0) Stack: 00000000 e02e008f 00000018 00000246 6f00a500 000f422b da38d864 00000296 d9e7d50c d9e7d50c d93233d8 e02cd448 00000086 6f00a500 000f422b d8d21730 c13f7ca0 df476898 00000000 e0435060 e0435050 d9635fa4 e02ead8c d9634000 Call Trace: [<e02e008f>] _release_rsb+0x13f/0x2b0 [dlm] [<e02cd448>] process_asts+0x108/0x1e0 [dlm] [<e0435060>] lock_bast+0x0/0x5 [lock_dlm] [<e0435050>] lock_ast+0x0/0x10 [lock_dlm] [<e02cdca0>] dlm_astd+0x0/0x220 [dlm] [<e02cde95>] dlm_astd+0x1f5/0x220 [dlm] [<c011efb0>] default_wake_function+0x0/0x10 [<c011efb0>] default_wake_function+0x0/0x10 [<c0134fa4>] kthread+0xa4/0xb0 [<c0134f00>] kthread+0x0/0xb0 [<c01042b5>] kernel_thread_helper+0x5/0x10 Code: 8b 30 8b 40 34 a9 80 00 00 00 75 47 c7 04 24 34 8d 43 e0 8b
*** Bug 134530 has been marked as a duplicate of this bug. ***
A couple days ago I checked in a change to the dlm (making NULL a valid ast arg) but missed checking in the corresponding update to lock_dlm -- I caught that today. I think this is what you're getting which means you need to update from cvs. Changes by: teigland 2004-10-04 05:24:51 Modified files: gfs-kernel/src/dlm: lock.c Log message: we must provide the correct astarg to dlm_unlock now that NULL is valid
Verified against: cman_tool DEVEL.1096898839 (built Oct 4 2004 09:08:29) Copyright (C) Red Hat, Inc. 2004 All rights reserved.
Updating version to the right level in the defects. Sorry for the storm.