Description of problem: While running flock tests on a GFS file system with jdata turned on I hit the following panic: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: f8dd0bcc *pde = 6cbd4067 Oops: 0000 [#1] SMP last sysfs file: /devices/pci0000:00/0000:00:02.0/0000:01:1f.0/0000:03:02.1/irq Modules linked in: gfs(U) lock_dlm gfs2 dlm sctp configfs autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video sbs backlight i2c_ec button battery asus_acpi ac lp floppy i2c_i801 ide_cd i2c_core cdrom sg pcspkr e7xxx_edac parport_pc e1000 edac_mc parport intel_rng dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd CPU: 0 EIP: 0060:[<f8dd0bcc>] Not tainted VLI EFLAGS: 00010202 (2.6.18-53.el5 #1) EIP is at gfs_glock_dq+0x93/0x12e [gfs] eax: d6019c6c ebx: d6019c48 ecx: 00000000 edx: 00000000 esi: ddca8158 edi: 00064fab ebp: f77f4a58 esp: cbe54efc ds: 007b es: 007b ss: 0068 Process xdoio (pid: 11485, ti=cbe54000 task=f7fbd000 task.ti=cbe54000) Stack: f8e00140 ddca8158 ddca8140 00000001 f51cb9c0 f8dd0e1e ddca8158 f8ddf2cc c5a9965c 00000007 f51cb9c0 00000001 c200c0f0 c0465f9d 00000001 00000001 c200c0e0 b7f3f000 f5287da0 f52872cc c0461bc3 f52ac80c 00000246 f52ac80c Call Trace: [<f8dd0e1e>] gfs_glock_dq_uninit+0x8/0x10 [gfs] [<f8ddf2cc>] gfs_flock+0xa1/0x1ec [gfs] [<c0465f9d>] free_pages_and_swap_cache+0x6b/0x7f [<c0461bc3>] unmap_region+0xe1/0xf0 [<c0461c08>] remove_vma+0x36/0x3b [<c044abd9>] audit_syscall_entry+0x11c/0x14e [<f8ddf22b>] gfs_flock+0x0/0x1ec [gfs] [<c0482d35>] sys_flock+0x114/0x147 [<c0404eff>] syscall_call+0x7/0xb ======================= Code: de f8 89 f8 e8 1e c7 01 00 58 5a f6 46 15 08 74 06 f0 0f ba 6b 08 04 f6 46 15 04 74 2b 8b 4b 24 31 ed 31 ff eb 05 89 cd 47 89 d1 <8b> 11 0f 18 02 90 8d 43 24 39 c1 75 ee 39 f5 75 0c 4f 75 09 31 EIP: [<f8dd0bcc>] gfs_glock_dq+0x93/0x12e [gfs] SS:ESP 0068:cbe54efc <0>Kernel panic - not syncing: Fatal exception Version-Release number of selected component (if applicable): kernel-2.6.18-53.el5 kmod-gfs-0.1.19-7.el5 How reproducible: Unknown Additional info: I was running d_io at the time and the tags "rwsbuflarge" and "laio_lock1" were running at the time of panic.
I was able to reproduce this on the first try. I'll see how much more I can narrow this down.
I was able to hit a soft lockup while running this on one node out of four xiogen -S 5957 -i 10s -m sequential -o -a -t 1b -T 100b -F 400000b:/mnt/brawl/tank-01/laio_lock1 | xdoio -v -K -n 5 BUG: soft lockup detected on CPU#1! [<c044d21c>] softlockup_tick+0x96/0xa4 [<c042ddb0>] update_process_times+0x39/0x5c [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c [<c04059bf>] apic_timer_interrupt+0x1f/0x24 [<c048007b>] do_select+0x330/0x3cb [<f8dcfbce>] gfs_glock_dq+0x95/0x12e [gfs] [<f8dcfe1e>] gfs_glock_dq_uninit+0x8/0x10 [gfs] [<f8dde2cc>] gfs_flock+0xa1/0x1ec [gfs] [<c0465f9d>] free_pages_and_swap_cache+0x6b/0x7f [<c0461bc3>] unmap_region+0xe1/0xf0 [<c0461c08>] remove_vma+0x36/0x3b [<c044abd9>] audit_syscall_entry+0x11c/0x14e [<f8dde22b>] gfs_flock+0x0/0x1ec [gfs] [<c0482d35>] sys_flock+0x114/0x147 [<c0404eff>] syscall_call+0x7/0xb
Bumping up the severity as this is a panic and causes a soft lockup on a second node.
Adding Dave to the cc-list, might be more in his space due to flock usage.
This looks similar to the flock problems Abhi recently fixed in gfs2, related to a process taking more than one flock on a file, or using multiple file descriptors for flocks on a file.
My best guess at what's going on here is that an item is getting added twice to gl->gl_holders. The soft-lockup will is happening while trying to loop through the gl_holders list. If an entry is added to a list twice, it can create an endless loop of next pointers. Once that entry is removed, it will break the list. This should make the list still point to the removed element, which will have poisoned pointers, which is not what you see with the panic. Instead, you see NULL pointers. However, once the element is removed from the list, it's not hard to believe that they got zeroed out.
Corey hit this on his grant cluster today during 5.1.Z testing. Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff885573e0>] :gfs:gfs_glock_dq+0xaf/0x16b PGD 11776a067 PUD 10ba5c067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:09.0/0000:02:00.0/irq CPU 2 Modules linked in: lock_nolock gfs(U) lock_dlm gfs2 dlm gnbd(U) configfs autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport k8_edac edac_mc ide_cd tg3 k8temp hwmon shpchp sg cdrom serio_raw pcspkr dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc sata_svw libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 6707, comm: xdoio Not tainted 2.6.18-53.1.6.el5 #1 RIP: 0010:[<ffffffff885573e0>] [<ffffffff885573e0>] :gfs:gfs_glock_dq+0xaf/0x16b RSP: 0018:ffff810106381e08 EFLAGS: 00010202 RAX: ffff8101083cd960 RBX: ffff8101083cd928 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000000005c052 RDI: ffff810201daf0a8 RBP: ffff81010d2e7ea8 R08: 00000000000da4a5 R09: 7fffffffffffffff R10: 000000004794e456 R11: 0000000000000246 R12: ffffc20010440000 R13: ffffffff8858b8a0 R14: 0000000000000000 R15: ffff81010f01c980 FS: 00002aaaaacbbcc0(0000) GS:ffff810103f993c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000109e4c000 CR4: 00000000000006e0 Process xdoio (pid: 6707, threadinfo ffff810106380000, task ffff81011d1f8860) Stack: 0000000000000001 ffff81010d2e7ea8 ffff81010d2e7e88 ffff81010d2e7ea8 ffff81010f01c980 ffffffff88557684 ffff81010d2e7e80 ffffffff88566c66 ffff81011f002b88 ffff81010f01c980 ffff810106381e88 ffffffff800d5fb6 Call Trace: [<ffffffff88557684>] :gfs:gfs_glock_dq_uninit+0x9/0x12 [<ffffffff88566c66>] :gfs:gfs_flock+0xb8/0x23e [<ffffffff800d5fb6>] do_readv_writev+0x272/0x295 [<ffffffff88567254>] :gfs:gfs_read+0x0/0x5 [<ffffffff80021d0d>] remote_llseek+0x73/0x7b [<ffffffff88567d20>] :gfs:gfs_llseek+0x75/0x84 [<ffffffff800b27ba>] audit_syscall_entry+0x14d/0x180 [<ffffffff800df31a>] sys_flock+0x11a/0x153 [<ffffffff8005b28d>] tracesys+0xd5/0xe0 Code: 48 8b 11 0f 18 0a 48 8d 43 38 48 39 c1 75 e9 48 39 ef 75 10 RIP [<ffffffff885573e0>] :gfs:gfs_glock_dq+0xaf/0x16b RSP <ffff810106381e08> CR2: 0000000000000000 <0>Kernel panic - not syncing: Fatal exception
(In reply to comment #7) > Corey hit this on his grant cluster today during 5.1.Z testing. I should note that when Corey hit this, he was not running jdata.
I don't know how this is happening yet, but I think I know what's happening, at least for the soft lockup, and it seems reasonable that this is happening for the panic too. In gfs_glock_dq() you start traversing the the gl_holders list. Unfortunately, while you're traversing the list, the holder structure you're currently on just happens to get moved to the gl_waiters3 list. Since you never hit the gl_holders list head, you loop endlessly. When I looked at my crash dumps from a soft locked node, I was in the code that checks the gl_holders list, but I was checking through the gl_waiters3 list of the same glock. I'm going to up the post_fail_delay on my cluster and hopefully, if a machine panics overnight, it'll be able to write out the whole crash dump before it gets fenced, and I'll be able to look at a panic dump tomorrow. I'm hoping that this will show me what process is moving the holder from one list to the other.
Any updates on this one?
Oops. Yeah, the fix for this got committed last week. Looping through the gl_holders list in gfs_glock_dq required a spin lock to be held. However, since all you were doing is checking to see if the list had one holder, and if that holder was the one you were dequeuing, you don't need to loop through the list at all. So I just removed the loop entirely, and replaced it with a single list check. I ran the new code overnight Thursday, and most of Friday without any problems, so I committed it to RHEL5 and head.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0348.html