This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours

Bug 764052 (GLUSTER-2320)

Summary: Deadlock with NFS client and glusterfs server on the same host
Product: [Community] GlusterFS Reporter: Artur Zaprzała <artur.zaprzala+gluster>
Component: nfsAssignee: Shehjar Tikoo <shehjart>
Status: CLOSED NOTABUG QA Contact:
Severity: low Docs Contact:
Priority: low    
Version: 3.1.2CC: amarts, gluster-bugs, jaw171, prasanth
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: nfs
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Artur Zaprzała 2011-01-26 06:07:58 EST
Glusterfs deadlocks when a volume is mounted over NFS on the same host where glusterfsd is running. I would like to know if this configuration is supported, because I know that some network filesystems can deadlock in this configuration.

Deadlock happens when writing a file big enough to fill the filesystem cache and kernel is trying to flush it to free some memory for glusterfsd which needs memory to commit some filesystem blocks to free some memory for glusterfsd...

I tested glusterfs-3.1.1 and 3.1.2 on Fedora 14 with kernels 2.6.35.10-74.fc14.x86_64 and 2.6.37-2.fc15.x86_64.

The bug is easy to replicate when the system has little memory:
# free
             total       used       free     shared    buffers     cached
Mem:        505408     490248      15160          0      21100      98772
-/+ buffers/cache:     370376     135032
Swap:      1047548     139492     908056

This bug is probably the same as #2251, but when file is big relative to available memory, just one dd is enough:
# dd if=/dev/zero of=/home/gluster/bigfile bs=1M count=100

# ps xaww -Owchan:22
  PID WCHAN                  S TTY          TIME COMMAND
 1603 nfs_wait_bit_killable  D ?        00:00:07 /usr/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log
 1655 nfs_wait_bit_killable  D pts/1    00:00:00 dd if=/dev/zero of=/home/gluster/bigfile bs=1M count=100

# cat /proc/1603/stack
[<ffffffffa0100a6c>] nfs_wait_bit_killable+0x34/0x38 [nfs]
[<ffffffffa010cea7>] nfs_commit_inode+0x71/0x1d6 [nfs]
[<ffffffffa00ff128>] nfs_release_page+0x66/0x83 [nfs]
[<ffffffff810d2d47>] try_to_release_page+0x32/0x3b
[<ffffffff810de9ec>] shrink_page_list+0x2cf/0x446
[<ffffffff810deeb0>] shrink_inactive_list.clone.35+0x34d/0x5c6
[<ffffffff810df732>] shrink_zone+0x355/0x3e2
[<ffffffff810dfbaa>] do_try_to_free_pages+0x160/0x363
[<ffffffff810dff46>] try_to_free_pages+0x67/0x69
[<ffffffff810da1d3>] __alloc_pages_nodemask+0x525/0x776
[<ffffffff81100015>] alloc_pages_current+0xa9/0xc3
[<ffffffff813fc9cc>] tcp_sendmsg+0x3c5/0x809
[<ffffffff813aefe3>] __sock_sendmsg+0x6b/0x77
[<ffffffff813afd99>] sock_aio_write+0xc2/0xd6
[<ffffffff811176aa>] do_sync_readv_writev+0xc1/0x100
[<ffffffff81117900>] do_readv_writev+0xa7/0x127
[<ffffffff811179c5>] vfs_writev+0x45/0x47
[<ffffffff81117ae8>] sys_writev+0x4a/0x93
[<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

# cat /proc/1655/stack
[<ffffffffa0100a6c>] nfs_wait_bit_killable+0x34/0x38 [nfs]
[<ffffffffa010cfd6>] nfs_commit_inode+0x1a0/0x1d6 [nfs]
[<ffffffffa00ff128>] nfs_release_page+0x66/0x83 [nfs]
[<ffffffff810d2d47>] try_to_release_page+0x32/0x3b
[<ffffffff810de9ec>] shrink_page_list+0x2cf/0x446
[<ffffffff810deeb0>] shrink_inactive_list.clone.35+0x34d/0x5c6
[<ffffffff810df732>] shrink_zone+0x355/0x3e2
[<ffffffff810dfbaa>] do_try_to_free_pages+0x160/0x363
[<ffffffff810dff46>] try_to_free_pages+0x67/0x69
[<ffffffff810da1d3>] __alloc_pages_nodemask+0x525/0x776
[<ffffffff81100015>] alloc_pages_current+0xa9/0xc3
[<ffffffff810d3aa3>] __page_cache_alloc+0x77/0x7e
[<ffffffff810d3c6c>] grab_cache_page_write_begin+0x5c/0xa3
[<ffffffffa00ff41e>] nfs_write_begin+0xd4/0x187 [nfs]
[<ffffffff810d2f27>] generic_file_buffered_write+0xfa/0x23d
[<ffffffff810d498c>] __generic_file_aio_write+0x24f/0x27f
[<ffffffff810d4a17>] generic_file_aio_write+0x5b/0xab
[<ffffffffa010001f>] nfs_file_write+0xe0/0x172 [nfs]
[<ffffffff81116bf6>] do_sync_write+0xcb/0x108
[<ffffffff811172d0>] vfs_write+0xac/0x100
[<ffffffff811174d9>] sys_write+0x4a/0x6e
[<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
Comment 1 Shehjar Tikoo 2011-01-27 00:31:42 EST
*** Bug 2251 has been marked as a duplicate of this bug. ***
Comment 2 Shehjar Tikoo 2011-01-27 00:32:34 EST
Yes, this is same as bug 763983 but we have havent been able to reproduce it in-house more than once or twice. Your report confirms that this is related to memory pressure.

I wont say this is an unsupported config but because because of different system resources, it may work for some and not for others. The flushing of nfs caches is largely drivn by the nfs client code in the kernel so I dont think glusterfs can do much about this problem.
Comment 3 Artur Zaprzała 2011-01-27 00:59:04 EST
Mounting nfs with -osync prevents deadlock but kills performance.
Fuse client code is also in the kernel and somehow can handle memory pressure moments. I'm sure this problem is basic for kernel hackers.
Comment 4 Shehjar Tikoo 2011-05-25 01:55:09 EDT
Resolving. Reason given in the previous comment.
Comment 5 Krishna Srinivas 2012-08-14 08:11:57 EDT
*** Bug 765335 has been marked as a duplicate of this bug. ***