Bug 489889
Summary: | Deadlock seen when writing to share over loopback mount | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Sachin Prabhu <sprabhu> | ||||
Component: | kernel | Assignee: | core-kernel-bot <core-kernel-mgr> | ||||
kernel sub component: | Kernel-Core | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Status: | CLOSED INSUFFICIENT_DATA | Docs Contact: | |||||
Severity: | high | ||||||
Priority: | high | CC: | alfredo.moralejo, casmith, jlayton, mkranz, ndevos, prgarcial, shwu, sjohnsto, skylar2, steved, tao | ||||
Version: | 5.3 | Keywords: | Reopened | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-09-12 10:11:25 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 533192 | ||||||
Attachments: |
|
Description
Sachin Prabhu
2009-03-12 13:49:04 UTC
The patch above instruduces a new operation 'launder_pages to the address_space_operations structure. This breaks KABI --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -426,6 +426,7 @@ struct address_space_operations { /* migrate the contents of a page to the specified target */ int (*migratepage) (struct address_space *, struct page *, struct page *); + int (*launder_page) (struct page *); }; *** Bug 491177 has been marked as a duplicate of this bug. *** Created attachment 347621 [details]
Stack and CPU traces of a hung RHEL5 NFS server
Comment on attachment 347621 [details]
Stack and CPU traces of a hung RHEL5 NFS server
I've replicated this problem with RHEL5 kernel 2.6.18-128 over both loopback NFS and Ethernet. I've attached output from kernel and CPU stack traces.
I'm finding the same problem in a loopback mount. Our cluster design for some applications depends heavily on nfs loopback so it's a hard problem for us. While doing some testing we found that, if I mount using the sync option it does not hang, however I'm not sure if that is because it's not using the release page method or simply the performance is so slow that It does not deadlocks. Anyway the performance with sync option is so slow that makes it almost unusable. Is there any mount or export option that may help to mitigate the problem? Any plan to add the patch to the RHEL5 kernel? The use of the "sync" mount option on the NFS client reduces the number of dirty pages on the client. This reduces the chances of the NFS server from finding one of them when it needs memory. The problem with the patch is possible kABI concerns. Some way would need to be found to work around them. The upstream discussion of the patch mentioned in the summary is available at http://lkml.org/lkml/2006/12/14/448 The stack traces of processes which are hung due to this specific deadlock is available at http://lkml.org/lkml/2006/12/15/167 The cause of the problem described upstream is different from what is being described here. https://bugzilla.redhat.com/show_bug.cgi?id=491177#c5 In the cases reported here, the problem appears to be caused by the fact that a cyclic dependency is caused because of the loopback mount which can be summarised as 1) nfsd wants memory. It tries to obtain this by flushing pagecache. 2) The pagecache tries to free memory by flushing page cache allocated to files over nfs. 3) nfs tries to flush paged by writing back to the nfs server. 4) nfsd needs memory. go back to 1. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. |