Bug 426291 - NULL pointer dereference in gfs_glock_dq during flock tests
Summary: NULL pointer dereference in gfs_glock_dq during flock tests
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: gfs-kmod
Version: 5.1
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Ben Marzinski
QA Contact: GFS Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-12-19 20:13 UTC by Nate Straz
Modified: 2010-01-12 03:28 UTC (History)
2 users (show)

Fixed In Version: RHBA-2008-0348
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 15:37:20 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0348 0 normal SHIPPED_LIVE gfs-kmod bug fix update 2008-05-20 12:47:43 UTC

Description Nate Straz 2007-12-19 20:13:35 UTC
Description of problem:

While running flock tests on a GFS file system with jdata turned on I hit the
following panic:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
 printing eip:
f8dd0bcc
*pde = 6cbd4067
Oops: 0000 [#1]
SMP 
last sysfs file: /devices/pci0000:00/0000:00:02.0/0000:01:1f.0/0000:03:02.1/irq
Modules linked in: gfs(U) lock_dlm gfs2 dlm sctp configfs autofs4 hidp rfcomm
l2cap bluetooth sunrpc ipv6 dm_multipath video sbs backlight i2c_ec button
battery asus_acpi ac lp floppy i2c_i801 ide_cd i2c_core cdrom sg pcspkr
e7xxx_edac parport_pc e1000 edac_mc parport intel_rng dm_snapshot dm_zero
dm_mirror dm_mod qla2xxx scsi_transport_fc ata_piix libata sd_mod scsi_mod ext3
jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<f8dd0bcc>]    Not tainted VLI
EFLAGS: 00010202   (2.6.18-53.el5 #1) 
EIP is at gfs_glock_dq+0x93/0x12e [gfs]
eax: d6019c6c   ebx: d6019c48   ecx: 00000000   edx: 00000000
esi: ddca8158   edi: 00064fab   ebp: f77f4a58   esp: cbe54efc
ds: 007b   es: 007b   ss: 0068
Process xdoio (pid: 11485, ti=cbe54000 task=f7fbd000 task.ti=cbe54000)
Stack: f8e00140 ddca8158 ddca8140 00000001 f51cb9c0 f8dd0e1e ddca8158 f8ddf2cc 
       c5a9965c 00000007 f51cb9c0 00000001 c200c0f0 c0465f9d 00000001 00000001 
       c200c0e0 b7f3f000 f5287da0 f52872cc c0461bc3 f52ac80c 00000246 f52ac80c 
Call Trace:
 [<f8dd0e1e>] gfs_glock_dq_uninit+0x8/0x10 [gfs]
 [<f8ddf2cc>] gfs_flock+0xa1/0x1ec [gfs]
 [<c0465f9d>] free_pages_and_swap_cache+0x6b/0x7f
 [<c0461bc3>] unmap_region+0xe1/0xf0
 [<c0461c08>] remove_vma+0x36/0x3b
 [<c044abd9>] audit_syscall_entry+0x11c/0x14e
 [<f8ddf22b>] gfs_flock+0x0/0x1ec [gfs]
 [<c0482d35>] sys_flock+0x114/0x147
 [<c0404eff>] syscall_call+0x7/0xb
 =======================
Code: de f8 89 f8 e8 1e c7 01 00 58 5a f6 46 15 08 74 06 f0 0f ba 6b 08 04 f6 46
15 04 74 2b 8b 4b 24 31 ed 31 ff eb 05 89 cd 47 89 d1 <8b> 11 0f 18 02 90 8d 43
24 39 c1 75 ee 39 f5 75 0c 4f 75 09 31 
EIP: [<f8dd0bcc>] gfs_glock_dq+0x93/0x12e [gfs] SS:ESP 0068:cbe54efc
 <0>Kernel panic - not syncing: Fatal exception

Version-Release number of selected component (if applicable):
kernel-2.6.18-53.el5
kmod-gfs-0.1.19-7.el5

How reproducible:
Unknown
  
Additional info:

I was running d_io at the time and the tags "rwsbuflarge" and "laio_lock1" were
running at the time of panic.

Comment 1 Nate Straz 2007-12-19 20:42:43 UTC
I was able to reproduce this on the first try.  I'll see how much more I can
narrow this down.

Comment 2 Nate Straz 2007-12-19 20:50:36 UTC
I was able to hit a soft lockup while running this on one node out of four

xiogen -S 5957 -i 10s -m sequential -o -a -t 1b -T 100b -F
400000b:/mnt/brawl/tank-01/laio_lock1 | xdoio -v -K -n 5

BUG: soft lockup detected on CPU#1!
 [<c044d21c>] softlockup_tick+0x96/0xa4
 [<c042ddb0>] update_process_times+0x39/0x5c
 [<c04196fb>] smp_apic_timer_interrupt+0x5b/0x6c
 [<c04059bf>] apic_timer_interrupt+0x1f/0x24
 [<c048007b>] do_select+0x330/0x3cb
 [<f8dcfbce>] gfs_glock_dq+0x95/0x12e [gfs]
 [<f8dcfe1e>] gfs_glock_dq_uninit+0x8/0x10 [gfs]
 [<f8dde2cc>] gfs_flock+0xa1/0x1ec [gfs]
 [<c0465f9d>] free_pages_and_swap_cache+0x6b/0x7f
 [<c0461bc3>] unmap_region+0xe1/0xf0
 [<c0461c08>] remove_vma+0x36/0x3b
 [<c044abd9>] audit_syscall_entry+0x11c/0x14e
 [<f8dde22b>] gfs_flock+0x0/0x1ec [gfs]
 [<c0482d35>] sys_flock+0x114/0x147
 [<c0404eff>] syscall_call+0x7/0xb


Comment 3 Nate Straz 2008-01-02 19:30:47 UTC
Bumping up the severity as this is a panic and causes a soft lockup on a second
node.

Comment 4 Kiersten (Kerri) Anderson 2008-01-02 20:03:39 UTC
Adding Dave to the cc-list, might be more in his space due to flock usage.

Comment 5 David Teigland 2008-01-03 15:34:59 UTC
This looks similar to the flock problems Abhi recently fixed in gfs2,
related to a process taking more than one flock on a file, or using
multiple file descriptors for flocks on a file.


Comment 6 Ben Marzinski 2008-01-18 23:43:43 UTC
My best guess at what's going on here is that an item is getting added twice to
gl->gl_holders.  The soft-lockup will is happening while trying to loop through
the gl_holders list.  If an entry is added to a list twice, it can create an
endless loop of next pointers.  Once that entry is removed, it will break the
list.  This should make the list still point to the removed element, which will
have poisoned pointers, which is not what you see with the panic.  Instead, you
see NULL pointers.  However, once the element is removed from the list, it's not
hard to believe that they got zeroed out.

Comment 7 Nate Straz 2008-01-21 19:25:21 UTC
Corey hit this on his grant cluster today during 5.1.Z testing.

Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: 
 [<ffffffff885573e0>] :gfs:gfs_glock_dq+0xaf/0x16b
PGD 11776a067 PUD 10ba5c067 PMD 0 
Oops: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:09.0/0000:02:00.0/irq
CPU 2 
Modules linked in: lock_nolock gfs(U) lock_dlm gfs2 dlm gnbd(U) configfs autofs4
hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video sbs backlight i2c_ec
i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport
k8_edac edac_mc ide_cd tg3 k8temp hwmon shpchp sg cdrom serio_raw pcspkr
dm_snapshot dm_zero dm_mirror dm_mod qla2xxx scsi_transport_fc sata_svw libata
sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 6707, comm: xdoio Not tainted 2.6.18-53.1.6.el5 #1
RIP: 0010:[<ffffffff885573e0>]  [<ffffffff885573e0>] :gfs:gfs_glock_dq+0xaf/0x16b
RSP: 0018:ffff810106381e08  EFLAGS: 00010202
RAX: ffff8101083cd960 RBX: ffff8101083cd928 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000005c052 RDI: ffff810201daf0a8
RBP: ffff81010d2e7ea8 R08: 00000000000da4a5 R09: 7fffffffffffffff
R10: 000000004794e456 R11: 0000000000000246 R12: ffffc20010440000
R13: ffffffff8858b8a0 R14: 0000000000000000 R15: ffff81010f01c980
FS:  00002aaaaacbbcc0(0000) GS:ffff810103f993c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000109e4c000 CR4: 00000000000006e0
Process xdoio (pid: 6707, threadinfo ffff810106380000, task ffff81011d1f8860)
Stack:  0000000000000001 ffff81010d2e7ea8 ffff81010d2e7e88 ffff81010d2e7ea8
 ffff81010f01c980 ffffffff88557684 ffff81010d2e7e80 ffffffff88566c66
 ffff81011f002b88 ffff81010f01c980 ffff810106381e88 ffffffff800d5fb6
Call Trace:
 [<ffffffff88557684>] :gfs:gfs_glock_dq_uninit+0x9/0x12
 [<ffffffff88566c66>] :gfs:gfs_flock+0xb8/0x23e
 [<ffffffff800d5fb6>] do_readv_writev+0x272/0x295
 [<ffffffff88567254>] :gfs:gfs_read+0x0/0x5
 [<ffffffff80021d0d>] remote_llseek+0x73/0x7b
 [<ffffffff88567d20>] :gfs:gfs_llseek+0x75/0x84
 [<ffffffff800b27ba>] audit_syscall_entry+0x14d/0x180
 [<ffffffff800df31a>] sys_flock+0x11a/0x153
 [<ffffffff8005b28d>] tracesys+0xd5/0xe0


Code: 48 8b 11 0f 18 0a 48 8d 43 38 48 39 c1 75 e9 48 39 ef 75 10 
RIP  [<ffffffff885573e0>] :gfs:gfs_glock_dq+0xaf/0x16b
 RSP <ffff810106381e08>
CR2: 0000000000000000
 <0>Kernel panic - not syncing: Fatal exception


Comment 8 Nate Straz 2008-01-21 19:28:26 UTC
(In reply to comment #7)
> Corey hit this on his grant cluster today during 5.1.Z testing.

I should note that when Corey hit this, he was not running jdata.

Comment 9 Ben Marzinski 2008-01-23 00:43:34 UTC
I don't know how this is happening yet, but I think I know what's happening, at
least for the soft lockup, and it seems reasonable that this is happening for
the panic too. In gfs_glock_dq() you start traversing the the gl_holders list. 
Unfortunately, while you're traversing the list, the holder structure you're
currently on just happens to get moved to the gl_waiters3 list.  Since you never
hit the gl_holders list head, you loop endlessly.

When I looked at my crash dumps from a soft locked node, I was in the code that
checks the gl_holders list, but I was checking through the gl_waiters3 list of
the same glock.

I'm going to up the post_fail_delay on my cluster and hopefully, if a machine
panics overnight, it'll be able to write out the whole crash dump before it gets
fenced, and I'll be able to look at a panic dump tomorrow. I'm hoping that this
will show me what process is moving the holder from one list to the other.

Comment 10 Kiersten (Kerri) Anderson 2008-01-28 17:22:06 UTC
Any updates on this one?

Comment 11 Ben Marzinski 2008-01-28 17:52:56 UTC
Oops. Yeah, the fix for this got committed last week. Looping through the
gl_holders list in gfs_glock_dq required a spin lock to be held.  However, since
all you were doing is checking to see if the list had one holder, and if that
holder was the one you were dequeuing, you don't need to loop through the list
at all. So I just removed the loop entirely, and replaced it with a single list
check. I ran the new code overnight Thursday, and most of Friday without any
problems, so I committed it to RHEL5 and head.

Comment 15 errata-xmlrpc 2008-05-21 15:37:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0348.html



Note You need to log in before you can comment on or make changes to this bug.