Bug 733418
Summary: | gfs2 hung task - kernel 2.6.18-194.32.1 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Charlie Brady <charlieb-fedora-bugzilla> | ||||||
Component: | kernel | Assignee: | Steve Whitehouse <swhiteho> | ||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | 5.5 | CC: | adas, anprice, bmarzins, rpeterso, rwheeler, swhiteho | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-08-30 13:30:20 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Charlie Brady
2011-08-25 16:34:29 UTC
Created attachment 519927 [details]
gfs stack backtraces from node1
Created attachment 519928 [details]
gfs stack backtraces from node2
This issue may have already been addressed in 5.6 update kernels: * Thu Jun 16 2011 Phillip Lougher <plougher> [2.6.18-238.15.1.el5] ... - [fs] gfs2: fix resource group bitmap corruption (Robert S Peterson) [711519 690555] ... * Wed Jun 01 2011 Phillip Lougher <plougher> [2.6.18-238.13.1.el5] - [fs] gfs2: fix processes waiting on already-available inode glock (Phillip Lougher) [709767 694669] * Sat May 07 2011 Phillip Lougher <plougher> [2.6.18-238.12.1.el5] ... - [fs] gfs2: fix filesystem hang caused by incorrect lock order (Robert S Peterson) [688855 656032] - [fs] gfs2: restructure reclaim of unlinked dinodes (Phillip Lougher) [688855 656032] - [fs] gfs2: unlock on gfs2_trans_begin error (Robert S Peterson) [688855 656032] ... * Fri Apr 15 2011 Phillip Lougher <plougher> [2.6.18-238.10.1.el5] - [fs] gfs2: creating large files suddenly slow to a crawl (Robert S Peterson) [690239 683155] ... * Fri Mar 04 2011 Jiri Pirko <jpirko> [2.6.18-238.7.1.el5] ... - [fs] gfs2: remove iopen glocks from cache on delete fail (Benjamin Marzinski) [675909 666080] ... * Tue Jan 04 2011 Jiri Pirko <jpirko> [2.6.18-238.1.1.el5] ... - [fs] gfs2: fix statfs error after gfs2_grow (Robert S Peterson) [666792 660661] ... The stack trace looks like waiting on an inode glock during a stat system call. The key to tracking down the issue is to look at the glock dumps on each node and see which node/process is holding the glock in question, and thus preventing the ls process from getting the glock required for that stat. Note that it can be useful to use ls -f in order to disable stat()ing of files by ls which dramatically improved ls performance for larger directories. It is quite likely that one of the fixes in comment #3 will resolve this, with this one being the most likely: * Wed Jun 01 2011 Phillip Lougher <plougher> [2.6.18-238.13.1.el5] - [fs] gfs2: fix processes waiting on already-available inode glock (Phillip Lougher) [709767 694669] If there is no possibility of getting the glock dump, then I think we are going to be pretty much stuck so far as providing a root cause goes. The report is useful anyway as it will be kept on file in case of any similar future events, however if there is not further information available we will have to close this bug as insufficient data. > If there is no possibility of getting the glock dump, then I think we > are going to be pretty much stuck so far as providing a root cause goes. That's what I thought. I will apply the cluebat when the relevant person returns from vacation. I'm pretty sure that there will be no record of that information, and I don't think he looked at the glock information. > The report is useful anyway as it will be kept on file in case of any > similar future events, ... That's exactly why I made the report. > ... however if there is not further information available we will > have to close this bug as insufficient data. Understood. Ok, thanks for the info! |