Bug 643165 - GFS2: BUG_ON kernel panic in gfs2_glock_hold on 2.6.18-226
Summary: GFS2: BUG_ON kernel panic in gfs2_glock_hold on 2.6.18-226
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.6
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Steve Whitehouse
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 651584 651811
TreeView+ depends on / blocked
 
Reported: 2010-10-14 20:12 UTC by Adam Drew
Modified: 2018-11-26 19:38 UTC (History)
14 users (show)

Fixed In Version: kernel-2.6.18-232.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 651584 (view as bug list)
Environment:
Last Closed: 2011-01-13 21:56:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
First attempt at a crosswrite (13.97 KB, patch)
2010-10-26 13:36 UTC, Robert Peterson
no flags Details | Diff
modified crosswrite patch. (13.97 KB, patch)
2010-10-28 22:45 UTC, Ben Marzinski
no flags Details | Diff
Yet another version of the crosswrite patch (13.92 KB, patch)
2010-10-29 19:33 UTC, Ben Marzinski
no flags Details | Diff
Yet a better version of the crosswrite patch (13.90 KB, patch)
2010-11-10 13:29 UTC, Robert Peterson
no flags Details | Diff
Comment (65.50 KB, text/plain)
2010-10-14 20:12 UTC, Adam Drew
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Adam Drew 2010-10-14 20:12:11 UTC
Created attachment 915160 [details]
Comment

(This comment was longer than 65,535 characters and has been moved to an attachment by Red Hat Bugzilla).

Comment 1 Steve Whitehouse 2010-10-15 09:30:43 UTC
From the bug message:

G:  s:UN n:2/7cd19 f: t:UN d:EX/0 l:0 a:0 r:0


we can see that this is an inode glock. We can also see which inode it refers to (not that useful in this case I suspect) and that it looks like its newly created and thus, unused at the point where the problem occurs. The ref count is certainly zero, so it is probably an imbalance in get/put for this glock.

I've looked at this patch:
http://git.engineering.redhat.com/?p=users/jwilson/rhel5/kernel;a=commitdiff;h=fb5f8cb678bd05943c62be1abea2e2917895cc85

which is the one which adds the code in question (a relatively recent addition) but it is not obvious to me where the error is at the moment.

Comment 2 Adam Drew 2010-10-15 16:39:13 UTC
Could this be how we're ending up in gfs2_glock_hold from gfs2_process_unlinked_inode ?

gfs2_process_unlinked_inode() at inode.c:262>:
    gfs2_glock_get() at glock.c:802>:
        glock_work_func() at glock.c:758>:
            finish_xmote() at glock.c:522> (R):
                state_change() at glock.c:487>:
                    gfs2_glock_hold() at glock.c:207>:
                        GLOCK_BUG_ON()

I don't know if it's helpful at all but I was trying to figure out how we were getting from gfs2_glock_hold.

Comment 4 Steve Whitehouse 2010-10-20 13:42:12 UTC
RE: Comment #2. I don't think that is very likely.

A more likely explanation is that somewhere we have an unbalanced use of ref counts and that has caused a problem elsewhere in the code. The chances are that the call to glock_hold is from the holder_init code, judging by what is in the functions further up the stack.

Comment 6 Steve Whitehouse 2010-10-20 16:51:48 UTC
I think we might have tracked down the cause of the problem encountered and it does look like my original idea in comment #2 is correct.

When GFS2 does block allocation, at the same time it also performs a check for inodes which are (a) unlinked and (b) still listed as allocated. These occasionally occur in the filesystem if there has been a recovery situation.

GFS2 checks for them during block allocation time because at that point in time the check is virtually "free" in the sense that the data structures required are cached in memory during the search for free blocks. This is basically the equivalent to ext3's orphan list, and it is perfectly normal for an inode to be found and processed in this way.

The issue which was introduced in the earlier patch is one where the lookup of the inode which is scheduled to be deallocated should have occurred under the resource group lock to prevent multiple processes trying to do the same thing at once. Instead the inode number was passed out to the deallocation code so that the required lock was not being held at the right time.

There is a possible workaround. Using a recent fsck.gfs2 (after umount the fs on all nodes) will remove any unlinked, but still allocated inodes meaning that the cluster should continue running without hitting this bug. This might be useful until we can get the fix ready.

If you need any more info, please let us know.

Comment 9 Steve Whitehouse 2010-10-25 10:01:46 UTC
Did the customer have any nodes reboot between the fsck run and the following failure? If so, then my original thoughts on what were causing this may be right, but if not then I'll have to think again since it would have been impossible for my original theory to be correct.

Comment 11 Steve Whitehouse 2010-10-25 15:21:20 UTC
Is the customer in a position to try a test kernel?

I don't have one ready yet, but it would be useful to know in case we can debug it that way.

Comment 12 Ben Turner 2010-10-25 15:53:32 UTC
Here is the customer's response to your question in comment 9. Note I had them run the FSCK on Bob's people page:

"None of the nodes were rebooted between the time I ran Bob's fsck.gfs2
and the time I ran my test.

Here is a detailed step-by-step of what I did:

Reboot all nodes in the cluster (it was still at the hung state from
last test).
Manually stop application cluster services (using clusvcadm) on all nodes.
Manually unmount all GFS2 filesystems (umount -t gfs2 -a) on all nodes.
Run Bob's fsck.gfs2 -y /device_name on all GFS2 FSs from node 01.
The check found lots of inode inconsistencies on all filesystems and
fixed them.
Ran it again and it came back clean.
Re-mounted all GFS2 mounts.
Manually restarted application cluster services and started my test.
Some time later (3.5 hrs or so), node 01 panicked."

To address:

"Is the customer in a position to try a test kernel?

I don't have one ready yet, but it would be useful to know in case we can debug
it that way."

>>  The customer has a reproducer that hits this issue within hours 100% of the time.  They are up for testing any package and have even volunteered access to any RH personnel that want access if it will help resolve this issue.

Comment 13 Robert Peterson 2010-10-26 13:36:57 UTC
Created attachment 455771 [details]
First attempt at a crosswrite

This is a RHEL5.6 crosswrite of the previous patch.
I threw it together fast, so it should be double-checked.

Comment 16 Ben Turner 2010-10-27 22:39:30 UTC
Here is the trace from the log:

sd 0:0:0:0: SCSI error: return code = 0x08000002
sda: Current: sense key: Hardware Error
    Add. Sense: No defect spare location available

Info fld=0x74c8abb
end_request: I/O error, dev sda, sector 122456763
sd 0:0:0:0: SCSI error: return code = 0x08000002
sda: Current: sense key: Hardware Error
    Add. Sense: No defect spare location available

Info fld=0x74c8abb
end_request: I/O error, dev sda, sector 122456763
sd 0:0:0:0: SCSI error: return code = 0x08000002
sda: Current: sense key: Hardware Error
    Add. Sense: No defect spare location available

Info fld=0x74c8abb
end_request: I/O error, dev sda, sector 122456763
 G:  s:SH n:5/7ffa5 f:Dq t:SH d:UN/0 l:0 a:0 r:4
  H: s:SH f:EH e:0 p:19436 [(ended)] gfs2_inode_lookup+0x112/0x21b [gfs2]
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/gfs2/glock.c:743
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:07.0/0000:05:00.0/0000:06:00.0/irq
CPU 1 
Modules linked in: ipt_MASQUERADE iptable_nat ip_nat xt_state ip_conntrack nfnetlink ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge nfs fscache nfsd exportfs nfs_acl auth_rpcgss autofs4 hidp l2cap bluetooth dm_log_clustered(U) lock_dlm gfs2 dlm configfs lockd sunrpc be2iscsi ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_addr iscsi_tcp bnx2i cnic ipv6 xfrm_nalgo crypto_api uio cxgb3i iw_cxgb3 ib_core cxgb3 8021q libiscsi_tcp libiscsi2 scsi_transport_iscsi2 scsi_transport_iscsi dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec i2c_core dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport ksm(U) kvm_intel(U) kvm(U) i5000_edac tpm_tis tpm edac_mc pcspkr tpm_bios bnx2 dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod mppVhba(U) usb_storage qla2xxx scsi_transport_fc ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas mppUpper(U) sg sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 4383, comm: delete_workqueu Tainted: G      2.6.18-228.00333124.1.el5 #1
RIP: 0010:[<ffffffff8884ada4>]  [<ffffffff8884ada4>] :gfs2:delete_work_func+0x22/0x82
RSP: 0018:ffff81082038be20  EFLAGS: 00010282
RAX: 0000000000000000 RBX: 000000000007ffa5 RCX: ffffffff80318f28
RDX: ffffffff80318f28 RSI: 0000000000000082 RDI: ffffffff80318f20
RBP: ffff810802df2c30 R08: ffffffff80318f28 R09: 0000000000000001
R10: ffffffff88862e93 R11: ffff81082038bcb0 R12: ffff81081d72f000
R13: 0000000000000282 R14: ffff810802df2c30 R15: ffffffff8884ad82
FS:  0000000000000000(0000) GS:ffff81011cb767c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b837c8172ec CR3: 0000000824209000 CR4: 00000000000026e0
Process delete_workqueu (pid: 4383, threadinfo ffff81082038a000, task ffff8108287c70c0)
Stack:  ffff810802df2d78 ffff810802df2d80 ffff81081f7c7ec0 ffffffff8004d7aa
 ffff81082038be80 ffff81081f7c7ec0 ffffffff80049ff2 ffff810820091d98
 0000000000000282 ffff810820091d88 ffffffff800a267e ffffffff8004a0e2
Call Trace:
 [<ffffffff8004d7aa>] run_workqueue+0x99/0xf6
 [<ffffffff80049ff2>] worker_thread+0x0/0x122
 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8004a0e2>] worker_thread+0xf0/0x122
 [<ffffffff8008e414>] default_wake_function+0x0/0xe
 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff80032968>] kthread+0xfe/0x132
 [<ffffffff8005dfb1>] child_rip+0xa/0x11
 [<ffffffff800a267e>] keventd_create_kthread+0x0/0xc4
 [<ffffffff8003286a>] kthread+0x0/0x132
 [<ffffffff8005dfa7>] child_rip+0x0/0x11


Code: 0f 0b 68 38 2e 86 88 c2 e7 02 48 83 bd 90 00 00 00 00 74 0e 
RIP  [<ffffffff8884ada4>] :gfs2:delete_work_func+0x22/0x82
 RSP <ffff81082038be20>

Comment 18 Steve Whitehouse 2010-10-28 10:36:22 UTC
messages at the top of comment #16 appear to suggest that there is a hardware problem on the array. In that case all bets are off. Can you confirm whether that is correct?

Comment 19 Ben Turner 2010-10-28 12:10:56 UTC
Yeah I saw those messages too, thats why I was sure to include them.  This is the first time I have seen those come up in a core.  I will follow up on the SCSI errors and the health of the array.

Comment 20 Ben Turner 2010-10-28 16:56:22 UTC
I followed up with the customer, it looks like those errors are on their local disk.  There are 4 nodes in this cluster we are going to attempt to retest without this particular node.  Not from the customer here:

"We considered that option and I'm looking into it.
We have four nodes and I plan to check if the HW error is only showing on one. 
If so, I'll run the test with other three nodes.

I'll check on this and get back to you."

That said, do you think that a problem on the local disk would cause a panic in fs/gfs2/glock.c?

Comment 23 Ben Marzinski 2010-10-28 22:11:27 UTC
It looks to me like the kernel panic in comment #16 is the fault of the patch.  It's panicing in delete_work_func() on

GLOCK_BUG_ON(gl, gl->gl_name.ln_type != LM_TYPE_INODE);

But this work is only queued up by iopen_go_callback(), which seems like the glock should be of type LM_TYPE_IOPEN. Looking at

G:  s:SH n:5/7ffa5 f:Dq t:SH d:UN/0 l:0 a:0 r:4

That's exactly what you see. I need to understand the patch better in order to know what the right thing to do is, but if this next panic is at fs/gfs2/glock.c:743 then it looks like it's a patch induced error.

Comment 24 Ben Marzinski 2010-10-28 22:45:32 UTC
Created attachment 456362 [details]
modified crosswrite patch.

The only difference between this and Bob's original patch is that delete_work_func() now checks to make sure that it has a LM_TYPE_IOPEN type lock, which is what it will always have.  I haven't finished completely reviewing the patch, but it seems pretty obvious that this is what was intended.

Comment 25 Ben Marzinski 2010-10-28 23:00:26 UTC
Ignore that last attachment.  It won't work any better than the original. That's what I get for messing with a patch before I've completely looked through it.

Comment 28 Ben Marzinski 2010-10-29 19:33:42 UTC
Created attachment 456544 [details]
Yet another version of the crosswrite patch

This version removes the GLOCK_BUG_ON() check altogether.

Comment 29 Steve Whitehouse 2010-11-02 14:58:54 UTC
Any news? Did the updated patch work?

Comment 30 Ben Turner 2010-11-02 15:08:49 UTC
So far so good, we'll keep you updated.

Comment 31 RHEL Program Management 2010-11-08 17:39:29 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 33 RHEL Program Management 2010-11-08 17:59:36 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 34 Adam Drew 2010-11-09 17:09:19 UTC
Based on the testing performed the issue appears resolved with this patch.

Comment 35 Steve Whitehouse 2010-11-09 19:38:53 UTC
Posted to rhkernel-list

Comment 37 Robert Peterson 2010-11-10 13:29:47 UTC
Created attachment 459442 [details]
Yet a better version of the crosswrite patch

Revised slightly due to formatting issue

Comment 39 Jarod Wilson 2010-11-16 16:58:19 UTC
in kernel-2.6.18-232.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 41 Nate Straz 2010-12-03 19:44:05 UTC
Marking verified based on customer testing.

Comment 43 errata-xmlrpc 2011-01-13 21:56:53 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.