Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 538484

Summary:

gfs2 rename rgrp lock issue

Product:

Red Hat Enterprise Linux 5

Reporter:

Allen Belletti <allen>

Component:

kernel

Assignee:

Steve Whitehouse <swhiteho>

Status:

CLOSED ERRATA

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

medium

Docs Contact:

Priority:

low

Version:

5.4

CC:

adas, bmarzins, cward, dzickus, jtluka, lwang, rpeterso, swhiteho

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Clones:

547640 (view as bug list)

Environment:

Last Closed:

2010-03-30 07:46:32 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

526947, 547640

Attachments:

Description	Flags
Proposed patch	none
RHEL5 version of patch	none
fsck log of first filesystem	none
fsck log of second filesystem	none
Test kernel	none

Description Allen Belletti 2009-11-18 17:12:09 UTC

Description of problem:
During normal operation of a two node RHEL 5.4 cluster using GFS2 for shared Maildir mailboxes, one node panicked.


Version-Release number of selected component (if applicable):
RHEL 5.4 64 bit, kernel 2.6.18-164.6.1.el5
DLM (built Oct 27 2009 11:29:06) installed
GFS2 (built Oct 27 2009 11:29:46) installed
Lock_DLM (built Oct 27 2009 11:29:52) installed

How reproducible:
Not predictably reproducible.


Steps to Reproduce:
1.
2.
3.
  
Actual results:
Node panics with associated crash dump information.  Remainder of cluster (only one additional node in our case) continues to run.

Expected results:
Node should not panic.


Additional info:
original: gfs2_rename+0x19d/0x63b [gfs2]
pid : 12810
lock type: 3 req lock state : 1
new: gfs2_rlist_alloc+0x5c/0x6a [gfs2]
pid: 12810
lock type: 3 req lock state : 1
 G:  s:EX n:3/33d0327 f:y t:EX d:EX/0 l:0 a:5 r:4
  H: s:EX f:H e:0 p:12810 [imap] gfs2_rename+0x19d/0x63b [gfs2]
  R: n:54330151 f:05 b:274/274 i:1121
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/gfs2/glock.c:1074
invalid opcode: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:0a.0/0000:02:02.0/irq
CPU 1
Modules linked in: nfs fscache nfs_acl lock_dlm gfs2 dlm configfs lockd sunrpc ipv6 xfrm_nalgo crypto_api ipt_LOG xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables 8021q dm_multipath scsi_dh video backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac parport_pc lp parport i2c_amd756 k8temp ide_cd i2c_core hwmon sg amd_rng cdrom k8_edac pcspkr tg3 floppy edac_mc e1000 dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod qla2xxx scsi_transport_fc shpchp mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod raid1 ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 12810, comm: imap Not tainted 2.6.18-164.6.1.el5 #1
RIP: 0010:[<ffffffff8862a6df>]  [<ffffffff8862a6df>] :gfs2:gfs2_glock_nq+0x231/0x273
RSP: 0018:ffff8101ba8d9868  EFLAGS: 00010292
RAX: 0000000000000000 RBX: ffff8101ba8d9cb0 RCX: 0000000000000461
RDX: ffff8101ffe27a98 RSI: ffffffff80309c28 RDI: ffffffff80309c20
RBP: ffff8101860b1340 R08: ffffffff80309c28 R09: 000000000000003f
R10: ffff8101ba8d9368 R11: 0000000000000000 R12: ffff8100e87ea590
R13: ffff8100e87ea590 R14: ffff8100ed24e000 R15: 0000000000000000
FS:  00002b18a78ac530(0000) GS:ffff810103901940(0000) knlGS:00000000acbfbb90
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00002b70cf5cf000 CR3: 00000001b4d4a000 CR4: 00000000000006e0
Process imap (pid: 12810, threadinfo ffff8101ba8d8000, task ffff8101ffe277e0)
Stack:  ffff8101860b1340 0000000000000001 ffff8100b3e1b000 ffff8100b3e1a0e8
 0000000000000000 ffffffff8862a74e 0000000000000038 ffff810184e88368
 0000000000000001 ffffffff800caa0b 0000000000000005 ffff810184e88368
Call Trace:
 [<ffffffff8862a74e>] :gfs2:gfs2_glock_nq_m+0x2d/0xf4
 [<ffffffff800caa0b>] __kzalloc+0x9/0x21
 [<ffffffff88622831>] :gfs2:do_strip+0x175/0x349
 [<ffffffff886217e2>] :gfs2:recursive_scan+0xf2/0x175
 [<ffffffff886218fe>] :gfs2:trunc_dealloc+0x99/0xe7
 [<ffffffff886226bc>] :gfs2:do_strip+0x0/0x349
 [<ffffffff80090000>] sched_exit+0xb4/0xb5
 [<ffffffff88638dda>] :gfs2:gfs2_delete_inode+0xdd/0x191
 [<ffffffff88638d43>] :gfs2:gfs2_delete_inode+0x46/0x191
 [<ffffffff88628e77>] :gfs2:gfs2_glock_schedule_for_reclaim+0x5d/0x9a
 [<ffffffff88638cfd>] :gfs2:gfs2_delete_inode+0x0/0x191
 [<ffffffff8002f48f>] generic_delete_inode+0xc6/0x143
 [<ffffffff8863d9a4>] :gfs2:gfs2_inplace_reserve_i+0x63b/0x691
 [<ffffffff886248c4>] :gfs2:gfs2_dirent_find_space+0x0/0x41
 [<ffffffff88623983>] :gfs2:gfs2_dirent_search+0x147/0x16e
 [<ffffffff886377c5>] :gfs2:gfs2_rename+0x3be/0x63b
 [<ffffffff88637506>] :gfs2:gfs2_rename+0xff/0x63b
 [<ffffffff8863754c>] :gfs2:gfs2_rename+0x145/0x63b
 [<ffffffff88637571>] :gfs2:gfs2_rename+0x16a/0x63b
 [<ffffffff886375a4>] :gfs2:gfs2_rename+0x19d/0x63b
 [<ffffffff88629e29>] :gfs2:gfs2_holder_uninit+0xd/0x1f
 [<ffffffff886385bf>] :gfs2:gfs2_permission+0xaf/0xd4
 [<ffffffff88633124>] :gfs2:gfs2_drevalidate+0x158/0x214
 [<ffffffff8000d902>] permission+0x81/0xc8
 [<ffffffff8002a7d9>] vfs_rename+0x2f4/0x471
 [<ffffffff80036c20>] sys_renameat+0x180/0x1eb
 [<ffffffff800b66f5>] audit_syscall_entry+0x180/0x1b3
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0


Code: 0f 0b 68 f8 27 64 88 c2 32 04 be 01 00 00 00 4c 89 ef e8 df
RIP  [<ffffffff8862a6df>] :gfs2:gfs2_glock_nq+0x231/0x273
 RSP <ffff8101ba8d9868>
 <0>Kernel panic - not syncing: Fatal exception
 Killed by signal 15.

Comment 1 Robert Peterson 2009-11-18 21:18:56 UTC

Reassigning to Steve Whitehouse, since he talked to you about it.

Comment 2 RHEL Program Management 2009-11-19 09:44:08 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Steve Whitehouse 2009-11-19 11:34:53 UTC

Conditions required to hit this bug:

1. The rename must result in the unlinking of an inode
2. The rename must require the allocation of a block in order to satisfy the space requirement for adding the new directory entry.
3. The resource group in which both the old inode and the new blocks are being allocated must be the same.

At that point we land up trying to get two resource group locks at the same time.
The second event is relatively unlikely since not only will the initial unlink have created some space in the target directory, but also we only need to allocate new blocks occasionally as a directory grows.

In fact we only need to search the same resource group for the block allocation rather than actually select it, which does make it a bit more likely that this bug will trigger. On the other hand, in any large filesystem there will be a lot of resource groups, so the chances of this happening will reduce with filesystem size.

We can drop the lock on the rgrp early with a very simple patch and that will prevent us from hitting this bug again. On the other hand, thats not quite the whole story as there is still an issue wrt the locking of the two resource groups and their relative ordering. That will need to be addresses in order to avoid distributed deadlock.

Bearing in mind the complexity of that, and the likelihood of two nodes hitting this at the same time (considering that its tricky to hit even on a single node) it might be better to do the simple fix first. That will no doubt cover the majority of cases.

Comment 8 Steve Whitehouse 2009-11-20 16:25:05 UTC

The slightly odd thing about this bug is that the only time we need to add a new block to a directory is when there isn't enough space in it already. Given that the only time we unlink an inode is when there is a target inode directory entry with the same name as the source inode's directory entry, there should always be enough space (since the target inode's entry will have been removed).

So there might be more to this issue than immediately apparent.

Comment 9 Steve Whitehouse 2009-11-23 13:06:20 UTC

Changing the name of this bug so that I don't confuse myself again. Also, I think I might have a fix for it now. Just testing the upstream version and a RHEL5 version will be on its way once I've done some testing upstream.

Comment 10 Steve Whitehouse 2009-11-23 14:40:01 UTC

Created attachment 373130 [details]
Proposed patch

This is an upstream patch aimed at fixing the reported issue.

Comment 11 Steve Whitehouse 2009-11-23 15:03:52 UTC

Created attachment 373134 [details]
RHEL5 version of patch

This is the RHEL5 version of the original patch.

Comment 12 Steve Whitehouse 2009-11-23 15:05:18 UTC

Allen, if we supply you with a test kernel with the patch from comment #11, are you in a position to see if it fixes the bug?

Comment 13 Allen Belletti 2009-11-23 18:54:58 UTC

Steve, I would be happy to.  Of course, the issue is so relatively rare that it will be a bit difficult to know for sure.  Thanks for all of the quick work on this!

Comment 14 Allen Belletti 2009-11-23 22:00:25 UTC

In case this is useful, here are copies of the fsck logs that I generated over the weekend.  You'll note that numerous errors were corrected, despite fsck having been run pretty recently.  Perhaps these contributed to triggering this bug.

Comment 15 Allen Belletti 2009-11-23 22:02:09 UTC

Created attachment 373253 [details]
fsck log of first filesystem

Comment 16 Allen Belletti 2009-11-23 22:02:53 UTC

Created attachment 373254 [details]
fsck log of second filesystem

Comment 18 Steve Whitehouse 2009-12-02 11:02:50 UTC

Created attachment 375402 [details]
Test kernel

Allen, please find attached a test kernel rpm. If you need the other bits and bobs (kernel headers, debug stuff, etc) then let me know and I'll attach that too. Let us know how you get on.

Comment 19 Allen Belletti 2009-12-02 21:19:26 UTC

post2:/root # uname -a
Linux post2.isye.gatech.edu 2.6.18-175.gfs2abhi.001 #1 SMP Tue Dec 1 09:59:50 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

Both nodes are up and running on the test kernel.  Nothing unusual so far.  Since this is such a rare issue, it may be a while before I can confidently state that "the problem is gone", but nothing is grossly broken.

Thanks!
Allen

Comment 20 Steve Whitehouse 2009-12-08 14:08:09 UTC

Allen, any more news? If you've not hit any further issues then I'm seriously considering pushing this patch into our next version...

Comment 21 Allen Belletti 2009-12-08 17:09:20 UTC

Hi Steve, I've seen no further occurrences of the problem described in this bug, and the patched kernel hasn't added any new problems that I can see.  Should be safe to go for it, thanks.

Comment 22 Don Zickus 2009-12-14 19:30:44 UTC

in kernel-2.6.18-180.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.

Comment 25 errata-xmlrpc 2010-03-30 07:46:32 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html