764052 – (GLUSTER-2320) Deadlock with NFS client and glusterfs server on the same host

Bug 764052 (GLUSTER-2320) - Deadlock with NFS client and glusterfs server on the same host

Summary: Deadlock with NFS client and glusterfs server on the same host

Keywords:
Status:	CLOSED NOTABUG
Alias:	GLUSTER-2320
Product:	GlusterFS
Classification:	Community
Component:	nfs
Sub Component:
Version:	3.1.2
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Shehjar Tikoo
QA Contact:
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	GLUSTER-2251 GLUSTER-3603 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-01-26 11:07 UTC by Artur Zaprzała
Modified:	2012-08-14 12:11 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:
Regression:	---
Mount Type:	nfs
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Artur Zaprzała 2011-01-26 11:07:58 UTC

Glusterfs deadlocks when a volume is mounted over NFS on the same host where glusterfsd is running. I would like to know if this configuration is supported, because I know that some network filesystems can deadlock in this configuration.

Deadlock happens when writing a file big enough to fill the filesystem cache and kernel is trying to flush it to free some memory for glusterfsd which needs memory to commit some filesystem blocks to free some memory for glusterfsd...

I tested glusterfs-3.1.1 and 3.1.2 on Fedora 14 with kernels 2.6.35.10-74.fc14.x86_64 and 2.6.37-2.fc15.x86_64.

The bug is easy to replicate when the system has little memory:
# free
             total       used       free     shared    buffers     cached
Mem:        505408     490248      15160          0      21100      98772
-/+ buffers/cache:     370376     135032
Swap:      1047548     139492     908056

This bug is probably the same as #2251, but when file is big relative to available memory, just one dd is enough:
# dd if=/dev/zero of=/home/gluster/bigfile bs=1M count=100

# ps xaww -Owchan:22
  PID WCHAN                  S TTY          TIME COMMAND
 1603 nfs_wait_bit_killable  D ?        00:00:07 /usr/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log
 1655 nfs_wait_bit_killable  D pts/1    00:00:00 dd if=/dev/zero of=/home/gluster/bigfile bs=1M count=100

# cat /proc/1603/stack
[<ffffffffa0100a6c>] nfs_wait_bit_killable+0x34/0x38 [nfs]
[<ffffffffa010cea7>] nfs_commit_inode+0x71/0x1d6 [nfs]
[<ffffffffa00ff128>] nfs_release_page+0x66/0x83 [nfs]
[<ffffffff810d2d47>] try_to_release_page+0x32/0x3b
[<ffffffff810de9ec>] shrink_page_list+0x2cf/0x446
[<ffffffff810deeb0>] shrink_inactive_list.clone.35+0x34d/0x5c6
[<ffffffff810df732>] shrink_zone+0x355/0x3e2
[<ffffffff810dfbaa>] do_try_to_free_pages+0x160/0x363
[<ffffffff810dff46>] try_to_free_pages+0x67/0x69
[<ffffffff810da1d3>] __alloc_pages_nodemask+0x525/0x776
[<ffffffff81100015>] alloc_pages_current+0xa9/0xc3
[<ffffffff813fc9cc>] tcp_sendmsg+0x3c5/0x809
[<ffffffff813aefe3>] __sock_sendmsg+0x6b/0x77
[<ffffffff813afd99>] sock_aio_write+0xc2/0xd6
[<ffffffff811176aa>] do_sync_readv_writev+0xc1/0x100
[<ffffffff81117900>] do_readv_writev+0xa7/0x127
[<ffffffff811179c5>] vfs_writev+0x45/0x47
[<ffffffff81117ae8>] sys_writev+0x4a/0x93
[<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

# cat /proc/1655/stack
[<ffffffffa0100a6c>] nfs_wait_bit_killable+0x34/0x38 [nfs]
[<ffffffffa010cfd6>] nfs_commit_inode+0x1a0/0x1d6 [nfs]
[<ffffffffa00ff128>] nfs_release_page+0x66/0x83 [nfs]
[<ffffffff810d2d47>] try_to_release_page+0x32/0x3b
[<ffffffff810de9ec>] shrink_page_list+0x2cf/0x446
[<ffffffff810deeb0>] shrink_inactive_list.clone.35+0x34d/0x5c6
[<ffffffff810df732>] shrink_zone+0x355/0x3e2
[<ffffffff810dfbaa>] do_try_to_free_pages+0x160/0x363
[<ffffffff810dff46>] try_to_free_pages+0x67/0x69
[<ffffffff810da1d3>] __alloc_pages_nodemask+0x525/0x776
[<ffffffff81100015>] alloc_pages_current+0xa9/0xc3
[<ffffffff810d3aa3>] __page_cache_alloc+0x77/0x7e
[<ffffffff810d3c6c>] grab_cache_page_write_begin+0x5c/0xa3
[<ffffffffa00ff41e>] nfs_write_begin+0xd4/0x187 [nfs]
[<ffffffff810d2f27>] generic_file_buffered_write+0xfa/0x23d
[<ffffffff810d498c>] __generic_file_aio_write+0x24f/0x27f
[<ffffffff810d4a17>] generic_file_aio_write+0x5b/0xab
[<ffffffffa010001f>] nfs_file_write+0xe0/0x172 [nfs]
[<ffffffff81116bf6>] do_sync_write+0xcb/0x108
[<ffffffff811172d0>] vfs_write+0xac/0x100
[<ffffffff811174d9>] sys_write+0x4a/0x6e
[<ffffffff81009cf2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Comment 1 Shehjar Tikoo 2011-01-27 05:31:42 UTC

*** Bug 2251 has been marked as a duplicate of this bug. ***

Comment 2 Shehjar Tikoo 2011-01-27 05:32:34 UTC

Yes, this is same as bug 763983 but we have havent been able to reproduce it in-house more than once or twice. Your report confirms that this is related to memory pressure.

I wont say this is an unsupported config but because because of different system resources, it may work for some and not for others. The flushing of nfs caches is largely drivn by the nfs client code in the kernel so I dont think glusterfs can do much about this problem.

Comment 3 Artur Zaprzała 2011-01-27 05:59:04 UTC

Mounting nfs with -osync prevents deadlock but kills performance.
Fuse client code is also in the kernel and somehow can handle memory pressure moments. I'm sure this problem is basic for kernel hackers.

Comment 4 Shehjar Tikoo 2011-05-25 05:55:09 UTC

Resolving. Reason given in the previous comment.

Comment 5 Krishna Srinivas 2012-08-14 12:11:57 UTC

*** Bug 765335 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.