Bug 742537

Summary: System hang: cpu stuck for 10s: nfs_access_cache_shrinker
Product: Red Hat Enterprise Linux 5 Reporter: Alastair Munro <alastair>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.5CC: jlayton, rwheeler, steved
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-05 10:06:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Stack trace from /var/log/messages none

Description Alastair Munro 2011-09-30 13:32:28 UTC
Description of problem: System hang: nfs_access_cache_shrinker


Version-Release number of selected component (if applicable): 5.5


How reproducible: Not easily


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info: This is a known issue and has been fixed in a number of RHEL4/5 releases. There is current a call open for this (585935), and I have checked a number of RH kernel sources, including the latest RH5 (2.6.18-238, 2.6.18-238.19.1, 2.6.18-274, 2.6.18-274.3.1), and this patch has not been applied yet in RHEL5. Can someone advise when it will be available?

Please see attachment for stack track from /var/log/messages.

We have had another crash today on the same server; not exactly the same stack trace, but looks related as its lock related (down_read on a do_page_fault).

Comment 1 Alastair Munro 2011-09-30 13:33:55 UTC
Created attachment 525771 [details]
Stack trace from /var/log/messages

Comment 2 Alastair Munro 2011-09-30 13:35:09 UTC
We have rhn subscriptions. I am submitting this for bg-group and their rhn login is petrotech. Thx.

Comment 5 Alastair Munro 2011-10-04 15:17:53 UTC
Any update on this yet? Thx.

Comment 6 Jeff Layton 2011-10-04 16:00:25 UTC
Looks like the code here is spinning while trying to take the nfs_access_lru_lock. That probably means that another thread is doing something while holding that lock (or didn't release it properly).

> 
> Additional info: This is a known issue and has been fixed in a number of
> RHEL4/5 releases. There is current a call open for this (585935), and I have
> checked a number of RH kernel sources, including the latest RH5 (2.6.18-238,
> 2.6.18-238.19.1, 2.6.18-274, 2.6.18-274.3.1), and this patch has not been
> applied yet in RHEL5. Can someone advise when it will be available?
> 

What patch are you referring to here?

Comment 7 Alastair Munro 2011-10-05 08:31:43 UTC
The proposed patch in 585935. There was a patch put together in 585935, but that never hit the RH5 sources. In fact no further information about the proposed patch. Basically the issue was identified in RH4 and fixed. The issue was then re-identified in early RH5 and a patch applied that had typos in it. Then in 585935, a patch to fix the typos was proposed. Did you not see all this?

Comment 8 Jeff Layton 2011-10-05 10:06:00 UTC
Thanks -- closing this bug as a duplicate of that one.

*** This bug has been marked as a duplicate of bug 585935 ***