Bug 659798
Summary: | GFS2 lock requests not being passed back to DLM | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Adam Drew <adrew> | ||||||||
Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> | ||||||||
Status: | CLOSED DUPLICATE | QA Contact: | Cluster QE <mspqa-list> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | urgent | ||||||||||
Version: | 5.7 | CC: | adas, ahecox, bmarzins, cluster-maint, fhirtz, liko, rpeterso, sbradley, swhiteho, teigland | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2011-01-18 17:14:34 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Adam Drew
2010-12-03 17:32:25 UTC
Created attachment 464621 [details]
DLM, glock, messages, and cluster data.
The messages logs were cut off without glock_workqueue traces. I am getting newer copies of the logs that should have those in them. Got new logs from the customer with the full sysrq -t. In common with BZ656032 there are no glock_workqueue traces from the time of the hang. A sysrq -t gathered earlier that day does contain glock_workqueue traces but the sysrq -t gathered during the issue does not. This is the same situation as BZ656032. try_rgrp_fit() returns 1 if the alloc fits, and 0 if it does not. I can't see anything wrong with that. The real question is why (opening comment) there appears to be no reference to the dlm lock on node1, when there are obviously queued glocks. node2 is caching that lock, but with no sign of any demote request having been received. Bearing in mind that in RHEL5 it is possible for DLM locks to be reclaimed if they are not being held by the glock layer, the reason that there is no dlm lock showing is that it might have been reclaimed in the mean time. The flags on the glock say l (locked), r (reply pending) and q (queued). Of these the most interesting is r (reply pending) since that should mean that the dlm has responded to our request for an EX lock (see target state) and that all we are waiting for is the glock_workqueue to process the reply. This is at odds with the dlm though, since if we've requested an EX lock and a reply has been sent, then there ought to be some local reference to the dlm lock, even if the reply is an error notification for some reason. Also we need to figure out whether the real issue is that the glock_workqueue is stuck somehow, and that this is the problem which is preventing the glock from being granted. All glock_workqueue's seem to be in 'S' indicating that they are not stuck, but are sitting idle. Looking at __generic_block_fiemap() I'm suspicious that in the big do loop, there seems to be nothing updating the "len" variable. Surely the len ought to get shorter as progress is made.... Created attachment 468910 [details]
12.02 issue data
Created attachment 468913 [details]
12.01 issue data
Hi Steve, There really haven't been new datasets as the client fell back to the old, pre-hotfix kernel, which doesn't exhibit the issue. They sent over a dataset from that setup on Monday, but it was just the original problem, not this one. I've attached the previous 2 datasets to this ticket so you can have a look. Also, the client is quite interested in providing a webex session so we can see/interact with a cluster that's reproducing the issue. Is there any utility in that to you? We've been working on an internal reproduction case (which has been going slowly), but aside from the dataset, I don't have anything that I would need to look for on a live host. As mentioned, the reproducer work here has been going slowly and the workload is fairly complex as you can see from the previous notes, so if there's anything that we could do or get in parallel which might help, please let me know and I'll see about getting it. Thanks again, Frank. I recommend, as a test, that we bring them up to a newer, pre-release kernel 2.6.18-232 or newer, to pick up the recent dlm performance patch for bug #604139. Then let me know which kernel they're running, and I'll build a test version of my latest and greatest patch for bug #650494. At best case, their performance should improve because these are both performance-related patches. However, if it's not a performance issue and they still get stuck, then we can get some lock dumps and call traces to debug the problem. Comment #36 shows that two glocks are being held by processes creating inodes and that the creation process appears to be stuck. The answer will probably be found in the stack traces of the processes concerned. I'll take a look shortly, however this is worth comparing with #656032 since they look similar in some respects. Looking at the comment #36 info, I'm assuming that is it from node 2 since pid 8966 is not a pmdtm process on node 1. From the node 2 messages file: Jan 11 18:27:47 poq6etlsapp03 kernel: INFO: task pmdtm:8966 blocked for more than 120 seconds. Jan 11 18:27:47 poq6etlsapp03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 11 18:27:47 poq6etlsapp03 kernel: pmdtm D ffffffff80153676 0 8966 12285 8967 8965 (NOTLB) Jan 11 18:27:47 poq6etlsapp03 kernel: ffff810629f0da58 0000000000000082 ffff81063ffe0340 0000000000010243 Jan 11 18:27:47 poq6etlsapp03 kernel: 0000000000000050 0000000000000007 ffff81062bee3860 ffff810680060820 Jan 11 18:27:47 poq6etlsapp03 kernel: 00000ff79a6f2c25 00000000000068c3 ffff81062bee3a48 0000000100000000 Jan 11 18:27:47 poq6etlsapp03 kernel: Call Trace: Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff88575da7>] :gfs2:gfs2_getbuf+0x106/0x115 Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff885714d3>] :gfs2:iget_test+0x0/0x1c Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff800512de>] __wait_on_freeing_inode+0x82/0xa0 Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff800a28c4>] wake_bit_function+0x0/0x23 Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff80025c55>] find_or_create_page+0x22/0x72 Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff8004fb37>] find_inode+0x3d/0x6c Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff885714d3>] :gfs2:iget_test+0x0/0x1c Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff885714d3>] :gfs2:iget_test+0x0/0x1c Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff80059d3c>] ifind+0x34/0x96 Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff800eeda2>] iget5_locked+0x6c/0x183 Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff885714ef>] :gfs2:iget_set+0x0/0x1d Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff88571257>] :gfs2:gfs2_inode_lookup+0x40/0x21b Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff88572205>] :gfs2:gfs2_createi+0x9d6/0xd2c Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff801395e9>] security_compute_sid+0x307/0x329 Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff8857d08c>] :gfs2:gfs2_create+0x65/0x143 Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff88571892>] :gfs2:gfs2_createi+0x63/0xd2c Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff8856f756>] :gfs2:gfs2_glock_nq_num+0x3b/0x68 Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff8003a6d6>] vfs_create+0xeJan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff8001b229>] open_namei+0x19d/0x712 Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff80027527>] do_filp_open+0x1c/0x38 Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff80019fb5>] do_sys_open+0x44/0xbe Jan 11 18:27:47 poq6etlsapp03 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 6/0x158 So it looks like it is stuck waiting for an inode to be freed, which is another indicator that this could be a dup of bug #656032. Looks increasingly like a dup of #656032 *** This bug has been marked as a duplicate of bug 656032 *** |