Bug 246268

Summary: NFS failover causes VM deadlock
Product: Red Hat Enterprise Linux 5 Reporter: Kris Corwin <kris.corwin>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED WONTFIX QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: low    
Version: 5.0CC: lwoodman, staubach, steved
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-07-06 20:01:18 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
typescript of loop doing cat meminfo every 10 seconds none

Description Kris Corwin 2007-06-29 18:09:10 UTC
Description of problem:
I have 2 x86 boxes, f5 and f6, running RHEL5 with shared storage.  The 
shared storage contains an ext3 filesystem.  I mount the shared storage locally
on f5 and NFS export it.  On f6, it's mounted as NFS.  I have an IP alias
for the nfs server that fails over too.  I start an I/O job on f6 writing to
a file on the NFS filesystem.  While the application is writing, I stop NFS
on f5 and unmount the filesystem.  I attempt to mount the filesystem and
start the NFS server on f6.  Either during the process of bringing up the server
or shortly afterwards, f6 will "hang".  If I had windows previously open on f6,
I can extract some information.  If I cause any write to the disk, the login
session will block uninterruptible.

I first saw this behavior with EL4 and exchanged some e-mail with Larry
Woodman.  I have now reproduced it with EL5.  The stack indicates both ext3
and nfsd threads are blocked waiting for pages to be freed.  To capture
information, I ran typescript and executed 'cat meminfo' every 10 seconds 
while I reproduced it.  I'll attach the files.  I left it overnight and it's
still hung with the following output from meminfo:
Dirty:               0 kB
Writeback:      614792 kB
AnonPages:       14084 kB
Mapped:           6240 kB
Slab:            48900 kB
PageTables:       1396 kB
NFS_Unstable:   279040 kB
Bounce:              0 kB
CommitLimit:   3078040 kB
Committed_AS:    57688 kB
To correlate the meminfo output, the nfs failover happened around 16:54, 
Jun 28th.

Version-Release number of selected component (if applicable):
EL5, 2.6.18-8.el5-i686

How reproducible: very


Steps to Reproduce:
easiest reproduction was using 2 machines and shared storage.  I was able
to reproduce it with one machine by stopping NFS, waiting a few minutes and
trying to start it again.
1. nfs exporting a shared storage filesystem from 1 machine, NFS mounting on
another.
2. Start heavy writing app (iozone -i 0 -n 0 -s 4G -r 1M -w -f
/mnt/NFS/iozone.dat) on the nfs client writing over the NFS mount.
3. stop the NFS server and try to start it on the client machine.
  
Actual results:
client machine locks up (processes hang)

Expected results:
The I/O would complete.

Additional info:

Comment 1 Kris Corwin 2007-06-29 18:09:10 UTC
Created attachment 158232 [details]
typescript of loop doing cat meminfo every 10 seconds

Comment 2 Kris Corwin 2007-06-29 18:11:24 UTC
I forced a stack trace dump and executed a dmesg at the end of the meminfo
attachment.

Comment 5 Jeff Layton 2007-07-06 19:58:18 UTC
The reproducer here involves a known problematic configuration -- namely,
attempting to relocate an NFS service onto a client of that service. There are
quite a few problems associated with this configuration -- both in kernel and
userspace and so it's not a situation that we support.

Peter Zijlstra has some changes upstream which may help some with this. However,
they are pervasive and not a good candidate for backporting to any of our
existing releases.


Comment 6 RHEL Program Management 2007-07-06 20:01:18 UTC
Development Management has reviewed and declined this request.  You may appeal
this decision by reopening this request. 

Comment 7 Kris Corwin 2007-07-06 20:23:49 UTC
What specific part of the configuration is unsupported?  I can reproduce the
hang using only 1 box.  filesystem locally mounted and NFS exported.  The
same machine NFS mounts it.  
    /dev/sda on /mnt/local type ext3 (rw)
    nfsd on /proc/fs/nfsd type nfsd (rw)
    nfsserver:/mnt/local on /mnt/NFS type nfs (rw,intr,addr=192.168.2.100)
    [root@f6 2.6.18-8.el5-i686]# cat /etc/exports
    /mnt/local *(rw,no_root_squash,async,insecure,fsid=3)

Start writing a 4G file to the NFS mount.  Stop NFS and try to unmount the 
local filesystem.  It will block waiting for pages.

 =======================
umount        D 50D9BE9E  1928 10010   9972                     (NOTLB)
       f3635cec 00000082 00000001 50d9be9e 000227e5 000227d3 00000007 c21fe000
       c2138aa0 50d9d052 000227e5 000011b4 00000001 c21fe10c c20144e0 c042cdf2
       c213e000 f3635cf4 00000286 c042cf03 00000000 00000286 243cdf63 243cdf63
Call Trace:
 [<c042cdf2>] lock_timer_base+0x15/0x2f
 [<c042cf03>] __mod_timer+0x99/0xa3
 [<c05fa67c>] schedule_timeout+0x71/0x8c
 [<c042c517>] process_timeout+0x0/0x5
 [<c05fa141>] io_schedule_timeout+0x3b/0x61
 [<c04d1beb>] blk_congestion_wait+0x53/0x67
 [<c04352a1>] autoremove_wake_function+0x0/0x2d
 [<c0452876>] balance_dirty_pages_ratelimited_nr+0x147/0x1ad
 [<c044eed9>] generic_file_buffered_write+0x4be/0x5f1
 [<c046b323>] __getblk+0x30/0x270
 [<c0427f65>] current_fs_time+0x4a/0x55
 [<c044f4b2>] __generic_file_aio_write_nolock+0x4a6/0x52a
 [<c0451c99>] get_page_from_freelist+0x2a6/0x310
 [<c044f58d>] generic_file_aio_write+0x57/0xaa
 [<f888ee89>] ext3_file_write+0x19/0x83 [ext3]
 [<c04691e6>] do_sync_write+0xb6/0xf1
 [<c04352a1>] autoremove_wake_function+0x0/0x2d
 [<c05fc7cc>] do_page_fault+0x2c7/0x5d5
 [<c05fc841>] do_page_fault+0x33c/0x5d5
 [<c0469daf>] generic_file_llseek+0x8f/0x9a
 [<c0469130>] do_sync_write+0x0/0xf1
 [<c0469a9f>] vfs_write+0xa1/0x143
 [<c046a091>] sys_write+0x3c/0x63
 [<c0403eff>] syscall_call+0x7/0xb
 =======================

Comment 8 Peter Staubach 2007-07-09 11:13:01 UTC
The part that is unsupported and does not work reliably is the part
where the client and server are running on the same system, in the
same operating system.  This is a well known problem and is very
difficult to fix and will not be fixed in an existing RHEL release.

Comment 9 Kris Corwin 2007-07-09 15:01:00 UTC
Thank you.  Could you please confirm running the NFS client and server on the 
same box IS supported if they are not the same filesystem.

Comment 10 Jeff Layton 2007-07-09 15:04:58 UTC
Yes, there's no problem running a host that is a NFS client and server, as long
as it's not serving to itself.