Description of problem: ====================== dd blocked on nfs mount when unmounted the bricks while dd was in progress. Same is successful from Fuse mount. nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying INFO: task dd:2181 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. dd D 0000000000000000 0 2181 1836 0x00000080 ffff880119b4dc78 0000000000000082 0000000000000000 0007761c43cb47ca ffff880119b4dbe8 ffff88011817bd70 000000000012709c ffffffffae04e010 ffff880118234638 ffff880119b4dfd8 000000000000fb88 ffff880118234638 Call Trace: [<ffffffff81119d40>] ? sync_page+0x0/0x50 [<ffffffff8150e513>] io_schedule+0x73/0xc0 [<ffffffff81119d7d>] sync_page+0x3d/0x50 [<ffffffff8150eecf>] __wait_on_bit+0x5f/0x90 [<ffffffff81119fb3>] wait_on_page_bit+0x73/0x80 [<ffffffff81096d00>] ? wake_bit_function+0x0/0x50 [<ffffffff8112efb5>] ? pagevec_lookup_tag+0x25/0x40 [<ffffffff8111a3db>] wait_on_page_writeback_range+0xfb/0x190 [<ffffffff8111a5a8>] filemap_write_and_wait_range+0x78/0x90 [<ffffffff811b1b2e>] vfs_fsync_range+0x7e/0xe0 [<ffffffff811b1bfd>] vfs_fsync+0x1d/0x20 [<ffffffffa0250670>] nfs_file_flush+0x70/0xa0 [nfs] [<ffffffff8117de3c>] filp_close+0x3c/0x90 [<ffffffff8117df35>] sys_close+0xa5/0x100 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying nfs: server rhs-client11 not responding, still trying INFO: task dd:2181 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. dd D 0000000000000000 0 2181 1836 0x00000080 ffff880119b4dc78 0000000000000082 0000000000000000 0007761c43cb47ca ffff880119b4dbe8 ffff88011817bd70 000000000012709c ffffffffae04e010 ffff880118234638 ffff880119b4dfd8 000000000000fb88 ffff880118234638 Call Trace: [<ffffffff81119d40>] ? sync_page+0x0/0x50 [<ffffffff8150e513>] io_schedule+0x73/0xc0 [<ffffffff81119d7d>] sync_page+0x3d/0x50 [<ffffffff8150eecf>] __wait_on_bit+0x5f/0x90 [<ffffffff81119fb3>] wait_on_page_bit+0x73/0x80 [<ffffffff81096d00>] ? wake_bit_function+0x0/0x50 [<ffffffff8112efb5>] ? pagevec_lookup_tag+0x25/0x40 [<ffffffff8111a3db>] wait_on_page_writeback_range+0xfb/0x190 [<ffffffff8111a5a8>] filemap_write_and_wait_range+0x78/0x90 [<ffffffff811b1b2e>] vfs_fsync_range+0x7e/0xe0 [<ffffffff811b1bfd>] vfs_fsync+0x1d/0x20 [<ffffffffa0250670>] nfs_file_flush+0x70/0xa0 [nfs] [<ffffffff8117de3c>] filp_close+0x3c/0x90 [<ffffffff8117df35>] sys_close+0xa5/0x100 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b [root@tia n]# Version-Release number of selected component (if applicable): ============================================================= glusterfs-rdma-3.4.0.14rhs-1.el6_4.x86_64 glusterfs-devel-3.4.0.14rhs-1.el6_4.x86_64 glusterfs-debuginfo-3.4.0.14rhs-1.el6_4.x86_64 glusterfs-3.4.0.14rhs-1.el6_4.x86_64 glusterfs-fuse-3.4.0.14rhs-1.el6_4.x86_64 Steps to Reproduce: =================== 1. Create and start 6*2 volume from 4 servers (rhs-client11-14) 2. Mount on client (Fuse and NFS) 3. Create directories f and n from Fuse mount 4. cd to f from fuse mount and cd to n from NFS mount. 5. start dd from both mounted directories (f and n ) using: dd if=/dev/zero of=test_file bs=1M count=10240 6. while dd is in progress unmount the brick directories. I unmounted /rhs/brick1 (umount -l) on rhs-cleint11 and rhs-client13 which was having bricks from rhs-client11(b1,b3,b5) and rhs-client13(b7,b9,b11) 7. dd was successful from fuse mount but blocked on nfs mount. Actual results: =============== dd was blocked on NFS mount Additional info: ================ Status of volume: vol-dr Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/brick1/r1 N/A N N/A Brick rhs-client12:/rhs/brick1/r2 49152 Y 4052 Brick rhs-client11:/rhs/brick1/r3 N/A N N/A Brick rhs-client12:/rhs/brick1/r4 49153 Y 4056 Brick rhs-client11:/rhs/brick1/r5 N/A N N/A Brick rhs-client12:/rhs/brick1/r6 49154 Y 4060 Brick rhs-client13:/rhs/brick1/r7 N/A N N/A Brick rhs-client14:/rhs/brick1/r8 49155 Y 5448 Brick rhs-client13:/rhs/brick1/r9 N/A N N/A Brick rhs-client14:/rhs/brick1/r10 49156 Y 5454 Brick rhs-client13:/rhs/brick1/r11 N/A N N/A Brick rhs-client14:/rhs/brick1/r12 49157 Y 5459 NFS Server on localhost 2049 Y 7788 Self-heal Daemon on localhost N/A Y 7795 NFS Server on rhs-client14 2049 Y 1498 Self-heal Daemon on rhs-client14 N/A Y 1505 NFS Server on rhs-client13 2049 Y 1018 Self-heal Daemon on rhs-client13 N/A Y 1025 NFS Server on rhs-client12 2049 Y 11083 Self-heal Daemon on rhs-client12 N/A Y 11092 There are no active volume tasks
glusterfs-3.5 and newer have a health-checker for the bricks (http://review.gluster.org/5176). If a brick is unmounted, the brick process is killed. This should provide a more stable behaviour. This problem is most likely fixed in current releases.