Bug 489889

Summary: Deadlock seen when writing to share over loopback mount
Product: Red Hat Enterprise Linux 5 Reporter: Sachin Prabhu <sprabhu>
Component: kernelAssignee: core-kernel-bot <core-kernel-mgr>
kernel sub component: Kernel-Core QA Contact: Red Hat Kernel QE team <kernel-qe>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: high    
Priority: high CC: alfredo.moralejo, casmith, jlayton, mkranz, ndevos, prgarcial, shwu, sjohnsto, skylar2, steved, tao
Version: 5.3Keywords: Reopened
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-12 10:11:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 533192    
Attachments:
Description Flags
Stack and CPU traces of a hung RHEL5 NFS server none

Description Sachin Prabhu 2009-03-12 13:49:04 UTC
The design of a clustering software product necessitates mounting NFS shares over localhost. On such a system, the user is experiencing deadlocks when writing to the share.

Problem was originally reported on 
kernel-2.6.18-92.1.1.el5
nfs-utils-1.0.9-33.el5

We have since also recreated the problem on 2.6.18-131.el5.

Steps to Reproduce:
-------------------
/etc/init.d/nfs start
exportfs -o rw,no_root_squash,insecure 127.0.0.1:/safedir
mount -o tcp 127.0.0.1:/safedir /test
dd if=/dev/zero of=/test/xxx bs=1M count=12K

Actual results:
---------------
Access to the nfs share completly hangs. Ex: 'ls -l /test' does not return.

This problem is not seen with the latest upstream kernel 2.6.29. The command 'ls -l /test' does return the listing of the directory. There may be delays seen here because of the io load on the system. However the information is returned in a few seconds.
Additional info:
----------------

A git bisection of the vanilla kernel shows that  the following patch solves the problem :

> author Trond Myklebust <Trond.Myklebust>
>
> Thu, 11 Jan 2007 07:15:39 +0000 (23:15 -0800)
> committer Linus Torvalds <torvalds.org>
>
> Fri, 12 Jan 2007 02:18:21 +0000 (18:18 -0800)
> commit e3db7691e9f3dff3289f64e3d98583e28afe03db
> tree e05542d8d8bb545545c5b535381a8c1fcb369a03 tree | snapshot
> parent 07031e14c1127fc7e1a5b98dfcc59f434e025104 commit | diff
> [PATCH] NFS: Fix race in nfs_release_page()
>
>     NFS: Fix race in nfs_release_page()
>
>     invalidate_inode_pages2() may find the dirty bit has been set on a page
>     owing to the fact that the page may still be mapped after it was locked.
>     Only after the call to unmap_mapping_range() are we sure that the page
>     can no longer be dirtied.
>     In order to fix this, NFS has hooked the releasepage() method and tries
>     to write the page out between the call to unmap_mapping_range() and the
>     call to remove_mapping(). This, however leads to deadlocks in the page
>     reclaim code, where the page may be locked without holding a reference
>     to the inode or dentry.
>
>     Fix is to add a new address_space_operation, launder_page(), which will
>     attempt to write out a dirty page without releasing the page lock.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust>
>
>     Also, the bare SetPageDirty() can skew all sort of accounting leading to
>     other nasties.
>
This patch is present in 2.6.20 Linux kernel

Comment 1 Sachin Prabhu 2009-03-12 13:51:12 UTC
The patch above instruduces a new operation 'launder_pages to the address_space_operations structure. This breaks KABI

--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -426,6 +426,7 @@ struct address_space_operations {
        /* migrate the contents of a page to the specified target */
        int (*migratepage) (struct address_space *,
                        struct page *, struct page *);
+       int (*launder_page) (struct page *);
 };

Comment 3 Peter Staubach 2009-06-11 20:34:38 UTC
*** Bug 491177 has been marked as a duplicate of this bug. ***

Comment 4 Skylar Thompson 2009-06-12 17:16:28 UTC
Created attachment 347621 [details]
Stack and CPU traces of a hung RHEL5 NFS server

Comment 5 Skylar Thompson 2009-06-12 17:17:42 UTC
Comment on attachment 347621 [details]
Stack and CPU traces of a hung RHEL5 NFS server

I've replicated this problem with RHEL5 kernel 2.6.18-128 over both loopback NFS and Ethernet. I've attached output from kernel and CPU stack traces.

Comment 7 Alfredo Moralejo 2009-10-23 16:00:00 UTC
I'm finding the same problem in a loopback mount. Our cluster design for some applications depends heavily on nfs loopback so it's a hard problem for us.

While doing some testing we found that, if I mount using the sync option it does not hang, however I'm not sure if that is because it's not using the release page method or simply the performance is so slow that It does not deadlocks. Anyway the performance with sync option is so slow that makes it almost unusable.

Is there any mount or export option that may help to mitigate the problem?
Any plan to add the patch to the RHEL5 kernel?

Comment 8 Peter Staubach 2009-10-23 16:12:12 UTC
The use of the "sync" mount option on the NFS client reduces the number
of dirty pages on the client.  This reduces the chances of the NFS server
from finding one of them when it needs memory.

The problem with the patch is possible kABI concerns.  Some way would
need to be found to work around them.

Comment 11 Sachin Prabhu 2009-11-09 12:25:41 UTC
The upstream discussion of the patch mentioned in the summary is available at
http://lkml.org/lkml/2006/12/14/448

The stack traces of processes which are hung due to this specific deadlock is available at
http://lkml.org/lkml/2006/12/15/167

The cause of the problem described upstream is different from what is being described here. 

https://bugzilla.redhat.com/show_bug.cgi?id=491177#c5

In the cases reported here, the problem appears to be caused by the fact that a cyclic dependency is caused because of the loopback mount which can be summarised as

1) nfsd wants memory. It tries to obtain this by flushing pagecache. 
2) The pagecache tries to free memory by flushing page cache allocated to files over nfs. 
3) nfs tries to flush paged by writing back to the nfs server. 
4) nfsd needs memory. go back to 1.

Comment 12 RHEL Program Management 2009-11-25 19:26:40 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.