Bug 489889 - Deadlock seen when writing to share over loopback mount
Summary: Deadlock seen when writing to share over loopback mount
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: core-kernel-bot
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
: 491177 (view as bug list)
Depends On:
Blocks: 533192
TreeView+ depends on / blocked
 
Reported: 2009-03-12 13:49 UTC by Sachin Prabhu
Modified: 2023-08-08 03:11 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-12 10:11:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Stack and CPU traces of a hung RHEL5 NFS server (73.62 KB, text/plain)
2009-06-12 17:16 UTC, Skylar Thompson
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 22231 0 None None None Never

Description Sachin Prabhu 2009-03-12 13:49:04 UTC
The design of a clustering software product necessitates mounting NFS shares over localhost. On such a system, the user is experiencing deadlocks when writing to the share.

Problem was originally reported on 
kernel-2.6.18-92.1.1.el5
nfs-utils-1.0.9-33.el5

We have since also recreated the problem on 2.6.18-131.el5.

Steps to Reproduce:
-------------------
/etc/init.d/nfs start
exportfs -o rw,no_root_squash,insecure 127.0.0.1:/safedir
mount -o tcp 127.0.0.1:/safedir /test
dd if=/dev/zero of=/test/xxx bs=1M count=12K

Actual results:
---------------
Access to the nfs share completly hangs. Ex: 'ls -l /test' does not return.

This problem is not seen with the latest upstream kernel 2.6.29. The command 'ls -l /test' does return the listing of the directory. There may be delays seen here because of the io load on the system. However the information is returned in a few seconds.
Additional info:
----------------

A git bisection of the vanilla kernel shows that  the following patch solves the problem :

> author Trond Myklebust <Trond.Myklebust>
>
> Thu, 11 Jan 2007 07:15:39 +0000 (23:15 -0800)
> committer Linus Torvalds <torvalds.org>
>
> Fri, 12 Jan 2007 02:18:21 +0000 (18:18 -0800)
> commit e3db7691e9f3dff3289f64e3d98583e28afe03db
> tree e05542d8d8bb545545c5b535381a8c1fcb369a03 tree | snapshot
> parent 07031e14c1127fc7e1a5b98dfcc59f434e025104 commit | diff
> [PATCH] NFS: Fix race in nfs_release_page()
>
>     NFS: Fix race in nfs_release_page()
>
>     invalidate_inode_pages2() may find the dirty bit has been set on a page
>     owing to the fact that the page may still be mapped after it was locked.
>     Only after the call to unmap_mapping_range() are we sure that the page
>     can no longer be dirtied.
>     In order to fix this, NFS has hooked the releasepage() method and tries
>     to write the page out between the call to unmap_mapping_range() and the
>     call to remove_mapping(). This, however leads to deadlocks in the page
>     reclaim code, where the page may be locked without holding a reference
>     to the inode or dentry.
>
>     Fix is to add a new address_space_operation, launder_page(), which will
>     attempt to write out a dirty page without releasing the page lock.
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust>
>
>     Also, the bare SetPageDirty() can skew all sort of accounting leading to
>     other nasties.
>
This patch is present in 2.6.20 Linux kernel

Comment 1 Sachin Prabhu 2009-03-12 13:51:12 UTC
The patch above instruduces a new operation 'launder_pages to the address_space_operations structure. This breaks KABI

--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -426,6 +426,7 @@ struct address_space_operations {
        /* migrate the contents of a page to the specified target */
        int (*migratepage) (struct address_space *,
                        struct page *, struct page *);
+       int (*launder_page) (struct page *);
 };

Comment 3 Peter Staubach 2009-06-11 20:34:38 UTC
*** Bug 491177 has been marked as a duplicate of this bug. ***

Comment 4 Skylar Thompson 2009-06-12 17:16:28 UTC
Created attachment 347621 [details]
Stack and CPU traces of a hung RHEL5 NFS server

Comment 5 Skylar Thompson 2009-06-12 17:17:42 UTC
Comment on attachment 347621 [details]
Stack and CPU traces of a hung RHEL5 NFS server

I've replicated this problem with RHEL5 kernel 2.6.18-128 over both loopback NFS and Ethernet. I've attached output from kernel and CPU stack traces.

Comment 7 Alfredo Moralejo 2009-10-23 16:00:00 UTC
I'm finding the same problem in a loopback mount. Our cluster design for some applications depends heavily on nfs loopback so it's a hard problem for us.

While doing some testing we found that, if I mount using the sync option it does not hang, however I'm not sure if that is because it's not using the release page method or simply the performance is so slow that It does not deadlocks. Anyway the performance with sync option is so slow that makes it almost unusable.

Is there any mount or export option that may help to mitigate the problem?
Any plan to add the patch to the RHEL5 kernel?

Comment 8 Peter Staubach 2009-10-23 16:12:12 UTC
The use of the "sync" mount option on the NFS client reduces the number
of dirty pages on the client.  This reduces the chances of the NFS server
from finding one of them when it needs memory.

The problem with the patch is possible kABI concerns.  Some way would
need to be found to work around them.

Comment 11 Sachin Prabhu 2009-11-09 12:25:41 UTC
The upstream discussion of the patch mentioned in the summary is available at
http://lkml.org/lkml/2006/12/14/448

The stack traces of processes which are hung due to this specific deadlock is available at
http://lkml.org/lkml/2006/12/15/167

The cause of the problem described upstream is different from what is being described here. 

https://bugzilla.redhat.com/show_bug.cgi?id=491177#c5

In the cases reported here, the problem appears to be caused by the fact that a cyclic dependency is caused because of the loopback mount which can be summarised as

1) nfsd wants memory. It tries to obtain this by flushing pagecache. 
2) The pagecache tries to free memory by flushing page cache allocated to files over nfs. 
3) nfs tries to flush paged by writing back to the nfs server. 
4) nfsd needs memory. go back to 1.

Comment 12 RHEL Program Management 2009-11-25 19:26:40 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request.


Note You need to log in before you can comment on or make changes to this bug.