Glusterfs deadlocks when a volume is mounted over NFS on the same host where glusterfsd is running. I would like to know if this configuration is supported, because I know that some network filesystems can deadlock in this configuration. Deadlock happens when writing a file big enough to fill the filesystem cache and kernel is trying to flush it to free some memory for glusterfsd which needs memory to commit some filesystem blocks to free some memory for glusterfsd... I tested glusterfs-3.1.1 and 3.1.2 on Fedora 14 with kernels 2.6.35.10-74.fc14.x86_64 and 2.6.37-2.fc15.x86_64. The bug is easy to replicate when the system has little memory: # free total used free shared buffers cached Mem: 505408 490248 15160 0 21100 98772 -/+ buffers/cache: 370376 135032 Swap: 1047548 139492 908056 This bug is probably the same as #2251, but when file is big relative to available memory, just one dd is enough: # dd if=/dev/zero of=/home/gluster/bigfile bs=1M count=100 # ps xaww -Owchan:22 PID WCHAN S TTY TIME COMMAND 1603 nfs_wait_bit_killable D ? 00:00:07 /usr/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log 1655 nfs_wait_bit_killable D pts/1 00:00:00 dd if=/dev/zero of=/home/gluster/bigfile bs=1M count=100 # cat /proc/1603/stack [<ffffffffa0100a6c>] nfs_wait_bit_killable+0x34/0x38 [nfs] [<ffffffffa010cea7>] nfs_commit_inode+0x71/0x1d6 [nfs] [<ffffffffa00ff128>] nfs_release_page+0x66/0x83 [nfs] [<ffffffff810d2d47>] try_to_release_page+0x32/0x3b [<ffffffff810de9ec>] shrink_page_list+0x2cf/0x446 [<ffffffff810deeb0>] shrink_inactive_list.clone.35+0x34d/0x5c6 [<ffffffff810df732>] shrink_zone+0x355/0x3e2 [<ffffffff810dfbaa>] do_try_to_free_pages+0x160/0x363 [<ffffffff810dff46>] try_to_free_pages+0x67/0x69 [<ffffffff810da1d3>] __alloc_pages_nodemask+0x525/0x776 [<ffffffff81100015>] alloc_pages_current+0xa9/0xc3 [<ffffffff813fc9cc>] tcp_sendmsg+0x3c5/0x809 [<ffffffff813aefe3>] __sock_sendmsg+0x6b/0x77 [<ffffffff813afd99>] sock_aio_write+0xc2/0xd6 [<ffffffff811176aa>] do_sync_readv_writev+0xc1/0x100 [<ffffffff81117900>] do_readv_writev+0xa7/0x127 [<ffffffff811179c5>] vfs_writev+0x45/0x47 [<ffffffff81117ae8>] sys_writev+0x4a/0x93 [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff # cat /proc/1655/stack [<ffffffffa0100a6c>] nfs_wait_bit_killable+0x34/0x38 [nfs] [<ffffffffa010cfd6>] nfs_commit_inode+0x1a0/0x1d6 [nfs] [<ffffffffa00ff128>] nfs_release_page+0x66/0x83 [nfs] [<ffffffff810d2d47>] try_to_release_page+0x32/0x3b [<ffffffff810de9ec>] shrink_page_list+0x2cf/0x446 [<ffffffff810deeb0>] shrink_inactive_list.clone.35+0x34d/0x5c6 [<ffffffff810df732>] shrink_zone+0x355/0x3e2 [<ffffffff810dfbaa>] do_try_to_free_pages+0x160/0x363 [<ffffffff810dff46>] try_to_free_pages+0x67/0x69 [<ffffffff810da1d3>] __alloc_pages_nodemask+0x525/0x776 [<ffffffff81100015>] alloc_pages_current+0xa9/0xc3 [<ffffffff810d3aa3>] __page_cache_alloc+0x77/0x7e [<ffffffff810d3c6c>] grab_cache_page_write_begin+0x5c/0xa3 [<ffffffffa00ff41e>] nfs_write_begin+0xd4/0x187 [nfs] [<ffffffff810d2f27>] generic_file_buffered_write+0xfa/0x23d [<ffffffff810d498c>] __generic_file_aio_write+0x24f/0x27f [<ffffffff810d4a17>] generic_file_aio_write+0x5b/0xab [<ffffffffa010001f>] nfs_file_write+0xe0/0x172 [nfs] [<ffffffff81116bf6>] do_sync_write+0xcb/0x108 [<ffffffff811172d0>] vfs_write+0xac/0x100 [<ffffffff811174d9>] sys_write+0x4a/0x6e [<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff
*** Bug 2251 has been marked as a duplicate of this bug. ***
Yes, this is same as bug 763983 but we have havent been able to reproduce it in-house more than once or twice. Your report confirms that this is related to memory pressure. I wont say this is an unsupported config but because because of different system resources, it may work for some and not for others. The flushing of nfs caches is largely drivn by the nfs client code in the kernel so I dont think glusterfs can do much about this problem.
Mounting nfs with -osync prevents deadlock but kills performance. Fuse client code is also in the kernel and somehow can handle memory pressure moments. I'm sure this problem is basic for kernel hackers.
Resolving. Reason given in the previous comment.
*** Bug 765335 has been marked as a duplicate of this bug. ***