Red Hat Bugzilla – Bug 426291
NULL pointer dereference in gfs_glock_dq during flock tests
Last modified: 2010-01-11 22:28:07 EST
Description of problem:
While running flock tests on a GFS file system with jdata turned on I hit the
BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
*pde = 6cbd4067
Oops: 0000 [#1]
last sysfs file: /devices/pci0000:00/0000:00:02.0/0000:01:1f.0/0000:03:02.1/irq
Modules linked in: gfs(U) lock_dlm gfs2 dlm sctp configfs autofs4 hidp rfcomm
l2cap bluetooth sunrpc ipv6 dm_multipath video sbs backlight i2c_ec button
battery asus_acpi ac lp floppy i2c_i801 ide_cd i2c_core cdrom sg pcspkr
e7xxx_edac parport_pc e1000 edac_mc parport intel_rng dm_snapshot dm_zero
dm_mirror dm_mod qla2xxx scsi_transport_fc ata_piix libata sd_mod scsi_mod ext3
jbd ehci_hcd ohci_hcd uhci_hcd
EIP: 0060:[<f8dd0bcc>] Not tainted VLI
EFLAGS: 00010202 (2.6.18-53.el5 #1)
EIP is at gfs_glock_dq+0x93/0x12e [gfs]
eax: d6019c6c ebx: d6019c48 ecx: 00000000 edx: 00000000
esi: ddca8158 edi: 00064fab ebp: f77f4a58 esp: cbe54efc
ds: 007b es: 007b ss: 0068
Process xdoio (pid: 11485, ti=cbe54000 task=f7fbd000 task.ti=cbe54000)
Stack: f8e00140 ddca8158 ddca8140 00000001 f51cb9c0 f8dd0e1e ddca8158 f8ddf2cc
c5a9965c 00000007 f51cb9c0 00000001 c200c0f0 c0465f9d 00000001 00000001
c200c0e0 b7f3f000 f5287da0 f52872cc c0461bc3 f52ac80c 00000246 f52ac80c
[<f8dd0e1e>] gfs_glock_dq_uninit+0x8/0x10 [gfs]
[<f8ddf2cc>] gfs_flock+0xa1/0x1ec [gfs]
[<f8ddf22b>] gfs_flock+0x0/0x1ec [gfs]
Code: de f8 89 f8 e8 1e c7 01 00 58 5a f6 46 15 08 74 06 f0 0f ba 6b 08 04 f6 46
15 04 74 2b 8b 4b 24 31 ed 31 ff eb 05 89 cd 47 89 d1 <8b> 11 0f 18 02 90 8d 43
24 39 c1 75 ee 39 f5 75 0c 4f 75 09 31
EIP: [<f8dd0bcc>] gfs_glock_dq+0x93/0x12e [gfs] SS:ESP 0068:cbe54efc
<0>Kernel panic - not syncing: Fatal exception
Version-Release number of selected component (if applicable):
I was running d_io at the time and the tags "rwsbuflarge" and "laio_lock1" were
running at the time of panic.
I was able to reproduce this on the first try. I'll see how much more I can
narrow this down.
I was able to hit a soft lockup while running this on one node out of four
xiogen -S 5957 -i 10s -m sequential -o -a -t 1b -T 100b -F
400000b:/mnt/brawl/tank-01/laio_lock1 | xdoio -v -K -n 5
BUG: soft lockup detected on CPU#1!
[<f8dcfbce>] gfs_glock_dq+0x95/0x12e [gfs]
[<f8dcfe1e>] gfs_glock_dq_uninit+0x8/0x10 [gfs]
[<f8dde2cc>] gfs_flock+0xa1/0x1ec [gfs]
[<f8dde22b>] gfs_flock+0x0/0x1ec [gfs]
Bumping up the severity as this is a panic and causes a soft lockup on a second
Adding Dave to the cc-list, might be more in his space due to flock usage.
This looks similar to the flock problems Abhi recently fixed in gfs2,
related to a process taking more than one flock on a file, or using
multiple file descriptors for flocks on a file.
My best guess at what's going on here is that an item is getting added twice to
gl->gl_holders. The soft-lockup will is happening while trying to loop through
the gl_holders list. If an entry is added to a list twice, it can create an
endless loop of next pointers. Once that entry is removed, it will break the
list. This should make the list still point to the removed element, which will
have poisoned pointers, which is not what you see with the panic. Instead, you
see NULL pointers. However, once the element is removed from the list, it's not
hard to believe that they got zeroed out.
Corey hit this on his grant cluster today during 5.1.Z testing.
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
PGD 11776a067 PUD 10ba5c067 PMD 0
Oops: 0000  SMP
last sysfs file: /devices/pci0000:00/0000:00:09.0/0000:02:00.0/irq
Modules linked in: lock_nolock gfs(U) lock_dlm gfs2 dlm gnbd(U) configfs autofs4
hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video sbs backlight i2c_ec
i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport
k8_edac edac_mc ide_cd tg3 k8temp hwmon shpchp sg cdrom serio_raw pcspkr
dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc sata_svw libata
sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 6707, comm: xdoio Not tainted 2.6.18-53.1.6.el5 #1
RIP: 0010:[<ffffffff885573e0>] [<ffffffff885573e0>] :gfs:gfs_glock_dq+0xaf/0x16b
RSP: 0018:ffff810106381e08 EFLAGS: 00010202
RAX: ffff8101083cd960 RBX: ffff8101083cd928 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000005c052 RDI: ffff810201daf0a8
RBP: ffff81010d2e7ea8 R08: 00000000000da4a5 R09: 7fffffffffffffff
R10: 000000004794e456 R11: 0000000000000246 R12: ffffc20010440000
R13: ffffffff8858b8a0 R14: 0000000000000000 R15: ffff81010f01c980
FS: 00002aaaaacbbcc0(0000) GS:ffff810103f993c0(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000109e4c000 CR4: 00000000000006e0
Process xdoio (pid: 6707, threadinfo ffff810106380000, task ffff81011d1f8860)
Stack: 0000000000000001 ffff81010d2e7ea8 ffff81010d2e7e88 ffff81010d2e7ea8
ffff81010f01c980 ffffffff88557684 ffff81010d2e7e80 ffffffff88566c66
ffff81011f002b88 ffff81010f01c980 ffff810106381e88 ffffffff800d5fb6
Code: 48 8b 11 0f 18 0a 48 8d 43 38 48 39 c1 75 e9 48 39 ef 75 10
RIP [<ffffffff885573e0>] :gfs:gfs_glock_dq+0xaf/0x16b
<0>Kernel panic - not syncing: Fatal exception
(In reply to comment #7)
> Corey hit this on his grant cluster today during 5.1.Z testing.
I should note that when Corey hit this, he was not running jdata.
I don't know how this is happening yet, but I think I know what's happening, at
least for the soft lockup, and it seems reasonable that this is happening for
the panic too. In gfs_glock_dq() you start traversing the the gl_holders list.
Unfortunately, while you're traversing the list, the holder structure you're
currently on just happens to get moved to the gl_waiters3 list. Since you never
hit the gl_holders list head, you loop endlessly.
When I looked at my crash dumps from a soft locked node, I was in the code that
checks the gl_holders list, but I was checking through the gl_waiters3 list of
the same glock.
I'm going to up the post_fail_delay on my cluster and hopefully, if a machine
panics overnight, it'll be able to write out the whole crash dump before it gets
fenced, and I'll be able to look at a panic dump tomorrow. I'm hoping that this
will show me what process is moving the holder from one list to the other.
Any updates on this one?
Oops. Yeah, the fix for this got committed last week. Looping through the
gl_holders list in gfs_glock_dq required a spin lock to be held. However, since
all you were doing is checking to see if the list had one holder, and if that
holder was the one you were dequeuing, you don't need to loop through the list
at all. So I just removed the loop entirely, and replaced it with a single list
check. I ran the new code overnight Thursday, and most of Friday without any
problems, so I committed it to RHEL5 and head.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.