485098 – NULL pointer deference in gfs2_getbuf

Bug 485098 - NULL pointer deference in gfs2_getbuf

Summary: NULL pointer deference in gfs2_getbuf

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	5.4
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Steve Whitehouse
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	460218
TreeView+	depends on / blocked

Reported:	2009-02-11 16:09 UTC by Nate Straz
Modified:	2009-09-02 09:02 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-09-02 09:02:32 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
Sysrq-t output and glock dumps from dash cluster (303.44 KB, application/x-gzip) 2009-02-11 16:09 UTC, Nate Straz	no flags	Details
Proposed fix (568 bytes, patch) 2009-04-29 09:36 UTC, Steve Whitehouse	no flags	Details \| Diff
Updated patch (806 bytes, patch) 2009-04-30 14:57 UTC, Steve Whitehouse	no flags	Details \| Diff
Show Obsolete (1) View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2009:1243	0	normal	SHIPPED_LIVE	Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update	2009-09-01 08:53:34 UTC

Description Nate Straz 2009-02-11 16:09:32 UTC

Created attachment 331578 [details]
Sysrq-t output and glock dumps from dash cluster

Description of problem:

While trying to verify a fix for bug 460218, I started hitting this panic right away.
Unable to handle kernel NULL pointer dereference at 0000000000000108 RIP:
 [<ffffffff885b5687>] :gfs2:gfs2_getbuf+0x2f/0x115
PGD 2baea067 PUD 2a862067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /fs/gfs2/dash:dash0/lock_module/recover_done
CPU 0
Modules linked in: sctp hfsplus dm_round_robin lock_dlm gfs2 dlm configfs autofs4 hidp rfcomm l2cap bluetooth dm_log_clustered(U) sunrpc ipv6 xfrm_nalgo crypto_api dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg i2c_i801 i2c_core tg3 serio_raw qla2xxx i3000_edac ide_cd pcspkr libphy edac_mc scsi_transport_fc shpchp cdrom dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 16775, comm: d_doio Tainted: G      2.6.18-129.gfs2abhi.001 #1
RIP: 0010:[<ffffffff885b5687>]  [<ffffffff885b5687>] :gfs2:gfs2_getbuf+0x2f/0x115
RSP: 0018:ffff81001df99ab8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff81001e24d810 RCX: ffff81001df99b28
RDX: 0000000000000001 RSI: 0000000026c94d2e RDI: ffff81001e24d810
RBP: ffff81001df99b28 R08: ffff81001df99b68 R09: 0000000026c94d2e
R10: ffff81003fd68020 R11: ffffffff8002c196 R12: 0000000026c94d2e
R13: ffff81002c37e000 R14: 0000000026c94d2e R15: 0000000000000000
FS:  00002b347c6d3020(0000) GS:ffffffff803ac000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000108 CR3: 000000002bae4000 CR4: 00000000000006e0
Process d_doio (pid: 16775, threadinfo ffff81001df98000, task ffff81003d690080)
Stack:  0000000300000000 ffff81001e24d810 ffff81001df99b28 0000000000000010
 ffff81001df99b68 ffff8100255a5c78 ffffffff885a8b9b ffffffff885b5784
 00000000ffffffff ffff810036b3f688 ffff81002c37e000 ffffffff885b5b82
Call Trace:
 [<ffffffff885a8b9b>] :gfs2:gfs2_dirent_find+0x0/0x4e
 [<ffffffff885b5784>] :gfs2:gfs2_meta_read+0x17/0x65
 [<ffffffff885b5b82>] :gfs2:gfs2_meta_indirect_buffer+0xba/0x160
 [<ffffffff885a88f9>] :gfs2:gfs2_dirent_search+0xf9/0x167
 [<ffffffff885a9e17>] :gfs2:gfs2_dir_search+0x21/0x73
 [<ffffffff8000d62d>] permission+0x81/0xc8
 [<ffffffff885b0797>] :gfs2:gfs2_lookupi+0x12e/0x16b
 [<ffffffff885b0755>] :gfs2:gfs2_lookupi+0xec/0x16b
 [<ffffffff885bc57f>] :gfs2:gfs2_lookup+0x26/0xa7
 [<ffffffff80022223>] d_alloc+0x174/0x1a9
 [<ffffffff8000cc36>] do_lookup+0xe5/0x1e6
 [<ffffffff80009fc7>] __link_path_walk+0xa01/0xf42
 [<ffffffff8000e80a>] link_path_walk+0x5c/0xe5
 [<ffffffff800b4635>] audit_syscall_entry+0x16e/0x1a1
 [<ffffffff8000c9d5>] do_path_lookup+0x270/0x2e8
 [<ffffffff800123f7>] getname+0x15b/0x1c1
 [<ffffffff800232a0>] __user_walk_fd+0x37/0x4c
 [<ffffffff80032057>] sys_faccessat+0xe4/0x18d
 [<ffffffff800b4635>] audit_syscall_entry+0x16e/0x1a1
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 48 8b 98 08 01 00 00 44 88 f9 49 d3 ec 85 d2 74 23 8b 53 60
RIP  [<ffffffff885b5687>] :gfs2:gfs2_getbuf+0x2f/0x115
 RSP <ffff81001df99ab8>
CR2: 0000000000000108
 <0>Kernel panic - not syncing: Fatal exception



Version-Release number of selected component (if applicable):
kernel-2.6.18-129.gfs2abhi.001

How reproducible:
Easily

Steps to Reproduce:
1. Run d_metaverify with memory pressure
  

Actual results:
See above

Expected results:
No panic

Additional info:

Comment 1 Steve Whitehouse 2009-02-12 13:16:42 UTC

The offset of 108 suggests that the problem relates to a buffer_head rather than a page since the struct page is fairly small and almost certainly less than 108 bytes.

The function create_empty_buffers() ought not to return until it has actually managed to create a set of buffers. It should be impossible for the created buffers to disappear since we hold the page lock during the subsequent operations.

The other thing which makes me think that this refers to a buffer_head is that the offset within the function is fairly high, suggesting that the problem is towards the end of that function.

So this is a bit of a mystery at the moment.

Comment 2 Steve Whitehouse 2009-02-17 14:40:10 UTC

Even more of a mystery - buffer_heads are so small that an offset of 108 would be beyond the end of it. There are no other data structures used in that function that are even remotely close to that size.

Also if the buffers are causing the issue, then that should have been caught in create_empty_buffers() rather than in gfs2_getbuf.

So I'm still not quite sure what is going on here. I have a hunch it might be related to the patch for bz #273001 but I can't say for certain yet.

Comment 3 Nate Straz 2009-04-23 21:35:06 UTC

I was able to recreate this with 2.6.18-140.gfs2abhi.004.  I have a test case which creates a ton of files on each node, continuously lists all files on the file system, and applies memory pressure every five minutes.

GFS2: fsid=dash:dash0.1: warning: assertion "gl" failed
GFS2: fsid=dash:dash0.1:   function = gfs2_glock_cb, file = fs/gfs2/glock.c, line = 1370

Call Trace:
 [<ffffffff88635256>] :gfs2:gfs2_assert_warn_i+0xa1/0xc3
 [<ffffffff8009de6e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8861ea3e>] :gfs2:gfs2_glock_cb+0x83/0x122
 [<ffffffff8869785c>] :lock_dlm:gdlm_ast+0x306/0x311
 [<ffffffff886972c1>] :lock_dlm:gdlm_bast+0x0/0x8d
 [<ffffffff885ed20e>] :dlm:dlm_astd+0xd7/0x14f
 [<ffffffff885ed137>] :dlm:dlm_astd+0x0/0x14f
 [<ffffffff80032bf2>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff8009de6e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032af4>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11

Unable to handle kernel NULL pointer dereference at 0000000000000108 RIP: 
 [<ffffffff8862569f>] :gfs2:gfs2_getbuf+0x2f/0x115
PGD 12c7d0067 PUD 12b92f067 PMD 0 
Oops: 0000 [1] SMP 
last sysfs file: /fs/gfs2/dash:dash0/lock_module/recover_done
CPU 0 
Modules linked in: deflate zlib_deflate ccm serpent blowfish twofish ecb xcbc crypto_hash cbc md5 sha256 sha512 des aes_generic testmgr_cipher testmgr crypto_blkcipher aes_x86_64 ipcomp6 ipcomp ah6 ah4 esp6 xfrm6
_esp esp4 xfrm4_esp aead crypto_algapi xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_tunnel xfrm6_tunnel tunnel6 af_key autofs4 hidp rfcomm l2cap bluetooth dm_log_clu
stered(U) lock_dlm gfs2 dlm configfs sunrpc ipv6 xfrm_nalgo crypto_api ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh video hwmon backlight s
bs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg i3000_edac shpchp tg3 edac_mc libphy i2c_i801 i2c_core pcspkr serio_raw qla2xxx ide_cd scsi_transport_fc cdrom dm_raid45 dm_message d
m_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 9981, comm: ls Tainted: G      2.6.18-140.gfs2abhi.004 #1
RIP: 0010:[<ffffffff8862569f>]  [<ffffffff8862569f>] :gfs2:gfs2_getbuf+0x2f/0x115
RSP: 0018:ffff810132985ae8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff81012d81baa0 RCX: ffff810132985b58
RDX: 0000000000000001 RSI: 0000000013648d5c RDI: ffff81012d81baa0
RBP: ffff810132985b58 R08: ffff810132985bb8 R09: 0000000013648d5c
R10: ffff81014b1f3018 R11: ffffffff8002c7df R12: 0000000013648d5c
R13: ffff8101372f2000 R14: 0000000013648d5c R15: 0000000000000000
FS:  00002af796d92c10(0000) GS:ffffffff803b0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000108 CR3: 000000012c7e2000 CR4: 00000000000006e0
Process ls (pid: 9981, threadinfo ffff810132984000, task ffff81012c4c70c0)
Stack:  0000000000000080 ffff81012d81baa0 ffff810132985b58 0000000000000010
 ffff810132985bb8 0000000000000008 ffff8101372f2000 ffffffff8862579c
 00000000ffffffff ffff810132985c68 ffff8101372f2000 ffffffff88625b9a
Call Trace:
 [<ffffffff8862579c>] :gfs2:gfs2_meta_read+0x17/0x65
 [<ffffffff88625b9a>] :gfs2:gfs2_meta_indirect_buffer+0xba/0x160
 [<ffffffff8861853e>] :gfs2:gfs2_dir_read_data+0x7f/0x2c8
 [<ffffffff88625776>] :gfs2:gfs2_getbuf+0x106/0x115
 [<ffffffff88618b9b>] :gfs2:gfs2_dirent_find+0x0/0x4e
 [<ffffffff886187ae>] :gfs2:get_leaf_nr+0x27/0x48
 [<ffffffff88618b9b>] :gfs2:gfs2_dirent_find+0x0/0x4e
 [<ffffffff886187e5>] :gfs2:get_first_leaf+0x16/0x31
 [<ffffffff8002d440>] wake_up_bit+0x11/0x22
 [<ffffffff88618883>] :gfs2:gfs2_dirent_search+0x83/0x167
 [<ffffffff88619e8d>] :gfs2:gfs2_dir_check+0x24/0xb9
 [<ffffffff88627c20>] :gfs2:gfs2_drevalidate+0x128/0x214
 [<ffffffff88627bf5>] :gfs2:gfs2_drevalidate+0xfd/0x214
 [<ffffffff80009cc4>] __d_lookup+0xb0/0xff
 [<ffffffff8000d3de>] do_lookup+0x18f/0x1e6
 [<ffffffff8000a714>] __link_path_walk+0xa01/0xf42
 [<ffffffff8000edf3>] link_path_walk+0x42/0xb2
 [<ffffffff8000d0d4>] do_path_lookup+0x270/0x2e7
 [<ffffffff80012aa6>] getname+0x15b/0x1c2
 [<ffffffff8002392f>] __user_walk_fd+0x37/0x4c
 [<ffffffff800ed9f6>] sys_getxattr+0x2b/0x62
 [<ffffffff8005d229>] tracesys+0x71/0xe0
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 48 8b 98 08 01 00 00 44 88 f9 49 d3 ec 85 d2 74 23 8b 53 60 
RIP  [<ffffffff8862569f>] :gfs2:gfs2_getbuf+0x2f/0x115

Comment 4 Steve Whitehouse 2009-04-24 10:32:16 UTC

That makes a lot more sense than the last trace... I can start to see what is going on now.

That assertion (only a warning in fact) is telling us that a reply has been received from the dlm via lock_dlm for which the glock appears to have gone away. That should be impossible since we hold ref counts for glocks when there are outstanding requests.

I suspect that gfs2_getbuf has failed because the ref count is no longer correct on the glock and it has already gone when gfs2_getbuf is called. So it looks like if we find the ref count issue, then we have solved this one.

Comment 5 Steve Whitehouse 2009-04-29 09:21:17 UTC

Looking at the differences between upstream and RHEL gfs2_glock_put(), I can see this:

upstream:

        /* 1 for being hashed, 1 for having state != LM_ST_UNLOCKED */
        if (atomic_read(&gl->gl_ref) == 2)
                gfs2_glock_schedule_for_reclaim(gl);
        write_unlock(gl_lock_addr(gl->gl_hash));
out:
        return rv;


rhel:

        write_unlock(gl_lock_addr(gl->gl_hash));
        /* 1 for being hashed, 1 for having state != LM_ST_UNLOCKED */
        if (atomic_read(&gl->gl_ref) == 2)
                gfs2_glock_schedule_for_reclaim(gl);
out:
        return rv;

So it might have been an error while converting the scand&glockd patch to the RHEL kernel perhaps. The rhel kernel might land up accessing the ref count after the glock has been freed if there is a race here. Its possible that it isn't the issue that we are chasing, but it needs fixing anyway.

Comment 6 Steve Whitehouse 2009-04-29 09:36:23 UTC

Created attachment 341724 [details]
Proposed fix

Lets see if this makes any difference...

Comment 7 Nate Straz 2009-04-29 21:35:51 UTC

That looks to have moved the panic to somewhere else.

Running kernel-2.6.18-142.gfs2abhi.001

G:  s:SH n:5/18951 f:lDp t:UN d:UN/166000 l:0 a:0 r:0
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/gfs2/glock.c:247
invalid opcode: 0000 [1] SMP 
last sysfs file: /kernel/dlm/dash0/control
CPU 0 
Modules linked in: nls_utf8 hfsplus deflate zlib_deflate ccm serpent blowfish twofish ecb xcbc crypto_hash cbc md5 sha256 sha512 des aes_generic testmgr_cipher testmgr crypto_blkcipher aes_x86_64 ipcomp6 ipcomp a
h6 ah4 esp6 xfrm6_esp esp4 xfrm4_esp aead crypto_algapi xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_tunnel xfrm6_tunnel tunnel6 af_key autofs4 hidp rfcomm l2cap blu
etooth dm_log_clustered(U) lock_dlm gfs2 dlm configfs sunrpc ipv6 xfrm_nalgo crypto_api ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi dm_multipath scsi_dh video 
hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport sg i3000_edac qla2xxx i2c_i801 pcspkr scsi_transport_fc i2c_core tg3 edac_mc ide_cd cdrom shpchp serio_raw dm_raid45 dm
_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 8466, comm: glock_workqueue Tainted: G      2.6.18-142.gfs2abhi.001 #1
RIP: 0010:[<ffffffff885ed550>]  [<ffffffff885ed550>] :gfs2:gfs2_glock_put+0xba/0x176
RSP: 0000:ffff810074561dd0  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8100082381a8 RCX: 0000000000000282
RDX: 00000000ffffffff RSI: 0000000000000046 RDI: ffffffff802fdadc
RBP: ffff810008238240 R08: 000000000000000d R09: 000000000000003f
R10: ffffffff803e0520 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: ffffffff886048e0 R15: ffffffff885ee793
FS:  0000000000000000(0000) GS:ffffffff803b3000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002ab8d935e000 CR3: 0000000066236000 CR4: 00000000000006e0
Process glock_workqueue (pid: 8466, threadinfo ffff810074560000, task ffff81007da077e0)
Stack:  0000000000000000 ffff8100082381a8 ffff810008238298 ffffffff885edd27
 ffff8100082381a8 ffff810008238298 ffff81007631c9c0 0000000000000246
 ffff8100082381a8 ffffffff885ee7b0 ffff810068b691a8 ffff810008238290
Call Trace:
 [<ffffffff885edd27>] :gfs2:finish_xmote+0x48/0x2b7
 [<ffffffff885ee7b0>] :gfs2:glock_work_func+0x1d/0xb8
 [<ffffffff8004db3b>] run_workqueue+0x94/0xe4
 [<ffffffff8004a3a6>] worker_thread+0x0/0x122
 [<ffffffff8009ee57>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8004a496>] worker_thread+0xf0/0x122
 [<ffffffff8008b81c>] default_wake_function+0x0/0xe
 [<ffffffff8009ee57>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8009ee57>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032d4e>] kthread+0xfe/0x132
 [<ffffffff8005efb1>] child_rip+0xa/0x11
 [<ffffffff8009ee57>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032c50>] kthread+0x0/0x132
 [<ffffffff8005efa7>] child_rip+0x0/0x11


Code: 0f 0b 68 d8 62 60 88 c2 f7 00 48 39 ab 98 00 00 00 74 14 48 
RIP  [<ffffffff885ed550>] :gfs2:gfs2_glock_put+0xba/0x176
 RSP <ffff810074561dd0>
 <0>Kernel panic - not syncing: Fatal exception

Comment 8 Steve Whitehouse 2009-04-30 10:52:43 UTC

Well there is no doubt that this is a bug relating to glock ref counts as this very clearly shows that.

There are three calls to gfs2_glock_put() from finish_xmote if you include the indirect one via state_change(). One of them is easy to rule out - its after we clear the lock bit on the glock. One of them is more tricky to rule out, but I think we can do so due to a lack of holders on this glock.

That leaves only the gfs2_glock_put() that is in state_change(). At that point there should always be a count of at least 1 since we hold a ref count from about half way through do_xmote() until the end of finish_xmote(). That was specifically to deal with this case of demoting a lock, since otherwise we have a ref count from the holder when we are doing promotion of the glock.

There is a strong possibility that the error is elsewhere though, and it just happens to show up at this point because we are one short on the ref counting.

Comment 9 Steve Whitehouse 2009-04-30 14:57:36 UTC

Created attachment 341936 [details]
Updated patch

This is now two lines, rather than one. I think I've found the real bug this time. This would explain why its tricky to hit in "normal" operation, why its more likely to happen with memory pressure, and why we see it in the places its shown up far. The bug is also in upstream, so I'll post a patch for it to cluster-devel shortly.

Fingers crossed...

Comment 10 Nate Straz 2009-05-01 18:59:15 UTC

I hit this while running on abhi's 2.6.18-142.gfs2abhi.002 which includes the patch from comment #9.

 G:  s:EX n:2/29b1fbc f:lDpr t:UN d:UN/18003000 l:0 a:00
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/gfs2/glock.c:247
invalid opcode: 0000 [1] SMP
last sysfs file: /fs/gfs2/dash:dash0/lock_module/recover_done
CPU 0
Modules linked in: deflate zlib_deflate ccm serpent blowfish twofish ecb xcbc cd
Pid: 173, comm: kswapd0 Tainted: G      2.6.18-142.gfs2abhi.002 #1
RIP: 0010:[<ffffffff885e4550>]  [<ffffffff885e4550>] :gfs2:gfs2_glock_put+0xba/6
RSP: 0000:ffff81007fb93d40  EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff810066769810 RCX: 0000000000000282
RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff8100667698a8 R08: 000000000000000d R09: 0000000000000000
R10: ffffffff803e0520 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000077 R14: 0000000000009e04 R15: 0000000000000100
FS:  0000000000000000(0000) GS:ffffffff803b3000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002abc0c0eb000 CR3: 000000001ebe1000 CR4: 00000000000006e0
Process kswapd0 (pid: 173, threadinfo ffff81007fb92000, task ffff810037ecf100)
Stack:  ffff8100667698a8 ffff8100667698a8 ffff810066769810 ffffffff885e5a94
 ffff810062721388 ffff810066015dc8 ffff810079853f00 0000000000013f24
 0000000000000092 00000000000000d0 0000000000000000 ffffffff8003f7c1
Call Trace:
 [<ffffffff885e5a94>] :gfs2:gfs2_shrink_glock_memory+0x158/0x1fa
 [<ffffffff8003f7c1>] shrink_slab+0xdc/0x153
 [<ffffffff80058089>] kswapd+0x35a/0x45a
 [<ffffffff8009f06f>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8009ee57>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80057d2f>] kswapd+0x0/0x45a
 [<ffffffff8009ee57>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032d4e>] kthread+0xfe/0x132
 [<ffffffff8009bcde>] request_module+0x0/0x14d
 [<ffffffff8005efb1>] child_rip+0xa/0x11
 [<ffffffff8009ee57>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032c50>] kthread+0x0/0x132
 [<ffffffff8005efa7>] child_rip+0x0/0x11


Code: 0f 0b 68 d8 d2 5f 88 c2 f7 00 48 39 ab 98 00 00 00 74 14 48
RIP  [<ffffffff885e4550>] :gfs2:gfs2_glock_put+0xba/0x176
 RSP <ffff81007fb93d40>
 <0>Kernel panic - not syncing: Fatal exception

Comment 11 Steve Whitehouse 2009-05-06 12:55:43 UTC

Results from comment #10 were without the final patch for this bug, due to a mix up. The final patch has been tested and does work.

This is a regression introduced by the recent scand & glockd patch and needs to be fixed before 5.4.

Comment 12 Steve Whitehouse 2009-05-06 13:02:09 UTC

Please can those of you with flag setting powers, set the various required flags. Thanks,

Steve.

Comment 13 RHEL Program Management 2009-05-06 13:29:54 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 14 Don Zickus 2009-05-12 17:41:11 UTC

in kernel-2.6.18-146.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 16 Nate Straz 2009-06-26 12:13:29 UTC

I ran the test case for about 18 hours and I was not able to reproduce this panic.  Marking VERIFIED based on kernel-2.6.18-154.el5.

Comment 18 errata-xmlrpc 2009-09-02 09:02:32 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Note You need to log in before you can comment on or make changes to this bug.