Created attachment 908223 [details] tgz of entire /var/log/glusterfs folder. sorry if it's overkill Description of problem: httpd and some other processes hang and cannot be killed with kill -9. These processes access a FUSE mount point to access image files. Version-Release number of selected component (if applicable): 1) Centos 6.5 - 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux 2) Gluster 3.5.0-2.el6 ( from yum repos) How reproducible: Unknown how to reproduce. Has happened only once in the last month. Although recently I switched to the a datacenter that has much higher usage of the gluster system, so maybe the increased load of reads and writes contributed to it. Steps to Reproduce: Unknown Actual results: Expected results: Additional info: ### FROM /var/log/messages ### Jun 11 20:54:19 server-relay kernel: INFO: task glusterfs:7769 blocked for more than 120 seconds. Jun 11 20:54:19 server-relay kernel: Not tainted 2.6.32-431.17.1.el6.x86_64 #1 Jun 11 20:54:19 server-relay kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 11 20:54:19 server-relay kernel: glusterfs D 0000000000000000 0 7769 1 0x00000000 Jun 11 20:54:19 server-relay kernel: ffff880436ca1a78 0000000000000082 ffffffff81aaa5c0 0000000000000086 Jun 11 20:54:19 server-relay kernel: ffff880436ca1a08 0000000000000086 ffff880436ca1a58 ffffea0005319a80 Jun 11 20:54:19 server-relay kernel: ffff880436ccb058 ffff880436ca1fd8 000000000000fbc8 ffff880436ccb058 Jun 11 20:54:19 server-relay kernel: Call Trace: Jun 11 20:54:19 server-relay kernel: [<ffffffff810a6d01>] ? ktime_get_ts+0xb1/0xf0 Jun 11 20:54:19 server-relay kernel: [<ffffffff8111f590>] ? sync_page+0x0/0x50 Jun 11 20:54:19 server-relay kernel: [<ffffffff81528383>] io_schedule+0x73/0xc0 Jun 11 20:54:19 server-relay kernel: [<ffffffff8111f5cd>] sync_page+0x3d/0x50 Jun 11 20:54:19 server-relay kernel: [<ffffffff81528c1a>] __wait_on_bit_lock+0x5a/0xc0 Jun 11 20:54:19 server-relay kernel: [<ffffffff8111f567>] __lock_page+0x67/0x70 Jun 11 20:54:19 server-relay kernel: [<ffffffff8109af80>] ? wake_bit_function+0x0/0x50 Jun 11 20:54:19 server-relay kernel: [<ffffffff8111f6a7>] ? unlock_page+0x27/0x30 Jun 11 20:54:19 server-relay kernel: [<ffffffff81137cad>] invalidate_inode_pages2_range+0x36d/0x3b0 Jun 11 20:54:19 server-relay kernel: [<ffffffffa0278145>] fuse_reverse_inval_inode+0x75/0x90 [fuse] Jun 11 20:54:19 server-relay kernel: [<ffffffffa026d9ef>] fuse_notify+0x12f/0x2a0 [fuse] Jun 11 20:54:19 server-relay kernel: [<ffffffffa026d593>] ? fuse_copy_one+0x53/0x70 [fuse] Jun 11 20:54:19 server-relay kernel: [<ffffffffa026e6ea>] fuse_dev_write+0x16a/0x3e0 [fuse] Jun 11 20:54:19 server-relay kernel: [<ffffffff8118893a>] do_sync_write+0xfa/0x140 Jun 11 20:54:19 server-relay kernel: [<ffffffff8109af00>] ? autoremove_wake_function+0x0/0x40 Jun 11 20:54:19 server-relay kernel: [<ffffffff81226056>] ? security_file_permission+0x16/0x20 Jun 11 20:54:19 server-relay kernel: [<ffffffff81188c38>] vfs_write+0xb8/0x1a0 Jun 11 20:54:19 server-relay kernel: [<ffffffff81189531>] sys_write+0x51/0x90 Jun 11 20:54:19 server-relay kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Jun 11 20:54:19 server-relay kernel: INFO: task mimedefang.pl:16185 blocked for more than 120 seconds. Jun 11 20:54:19 server-relay kernel: Not tainted 2.6.32-431.17.1.el6.x86_64 #1 Jun 11 20:54:19 server-relay kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 11 20:54:19 server-relay kernel: mimedefang.pl D 0000000000000002 0 16185 24786 0x00000000 Jun 11 20:54:19 server-relay kernel: ffff880287b6de38 0000000000000082 ffff880287b6ddd8 ffffffff810546b9 Jun 11 20:54:19 server-relay kernel: ffff88036641abe0 0000000300000000 00000000000000d0 ffff880434ed5840 Jun 11 20:54:19 server-relay kernel: ffff8804346d1098 ffff880287b6dfd8 000000000000fbc8 ffff8804346d1098 Jun 11 20:54:19 server-relay kernel: Call Trace: Jun 11 20:54:19 server-relay kernel: [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90 Jun 11 20:54:19 server-relay kernel: [<ffffffff8109b22e>] ? prepare_to_wait+0x4e/0x80 Jun 11 20:54:19 server-relay kernel: [<ffffffffa026e085>] fuse_request_send+0xe5/0x290 [fuse] Jun 11 20:54:19 server-relay kernel: [<ffffffff8109af00>] ? autoremove_wake_function+0x0/0x40 Jun 11 20:54:19 server-relay kernel: [<ffffffffa0273ff6>] fuse_flush+0x106/0x140 [fuse] Jun 11 20:54:19 server-relay kernel: [<ffffffff8118582c>] filp_close+0x3c/0x90 Jun 11 20:54:19 server-relay kernel: [<ffffffff81185925>] sys_close+0xa5/0x100 Jun 11 20:54:19 server-relay kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Jun 11 20:54:19 server-relay kernel: INFO: task rsync:6831 blocked for more than 120 seconds. Jun 11 20:54:19 server-relay kernel: Not tainted 2.6.32-431.17.1.el6.x86_64 #1 Jun 11 20:54:19 server-relay kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 11 20:54:19 server-relay kernel: rsync D 0000000000000002 0 6831 7094 0x00000000 Jun 11 20:54:19 server-relay kernel: ffff8801f28a3bb8 0000000000000086 ffff8801f28a3be8 ffffffffa027278b Jun 11 20:54:19 server-relay kernel: 0000000000000001 0000000000000000 0000000000069c66 0000000000000000 Jun 11 20:54:19 server-relay kernel: ffff88038065b058 ffff8801f28a3fd8 000000000000fbc8 ffff88038065b058 Jun 11 20:54:19 server-relay kernel: Call Trace: Jun 11 20:54:19 server-relay kernel: [<ffffffffa027278b>] ? fuse_dentry_revalidate+0x2fb/0x340 [fuse] Jun 11 20:54:19 server-relay kernel: [<ffffffff810a6d01>] ? ktime_get_ts+0xb1/0xf0 Jun 11 20:54:19 server-relay kernel: [<ffffffff8111f590>] ? sync_page+0x0/0x50 Jun 11 20:54:19 server-relay kernel: [<ffffffff81528383>] io_schedule+0x73/0xc0 Jun 11 20:54:19 server-relay kernel: [<ffffffff8111f5cd>] sync_page+0x3d/0x50 Jun 11 20:54:19 server-relay kernel: [<ffffffff81528c1a>] __wait_on_bit_lock+0x5a/0xc0 Jun 11 20:54:19 server-relay kernel: [<ffffffff8111f567>] __lock_page+0x67/0x70 Jun 11 20:54:19 server-relay kernel: [<ffffffff8109af80>] ? wake_bit_function+0x0/0x50 Jun 11 20:54:19 server-relay kernel: [<ffffffff811358b2>] ? pagevec_lookup+0x22/0x30 Jun 11 20:54:19 server-relay kernel: [<ffffffff81137783>] truncate_inode_pages_range+0x4e3/0x500 Jun 11 20:54:19 server-relay kernel: [<ffffffff81137835>] truncate_inode_pages+0x15/0x20 Jun 11 20:54:19 server-relay kernel: [<ffffffff8113788f>] truncate_pagecache+0x4f/0x70 Jun 11 20:54:19 server-relay kernel: [<ffffffffa0278342>] fuse_change_attributes+0xd2/0x100 [fuse] Jun 11 20:54:19 server-relay kernel: [<ffffffffa02700dd>] fuse_readdir+0x76d/0x7b0 [fuse] Jun 11 20:54:19 server-relay kernel: [<ffffffff8119ea30>] ? filldir+0x0/0xe0 Jun 11 20:54:19 server-relay kernel: [<ffffffff8119ea30>] ? filldir+0x0/0xe0 Jun 11 20:54:19 server-relay kernel: [<ffffffff8119ecb0>] vfs_readdir+0xc0/0xe0 Jun 11 20:54:19 server-relay kernel: [<ffffffff8118e7a6>] ? sys_newlstat+0x36/0x50 Jun 11 20:54:19 server-relay kernel: [<ffffffff8119ee39>] sys_getdents+0x89/0xf0 Jun 11 20:54:19 server-relay kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Jun 11 20:54:27 server-relay xinetd[29487]: EXIT: mysqlchk signal=13 pid=8414 duration=0(sec) Jun 11 20:54:39 server-relay xinetd[29487]: EXIT: mysqlchk signal=13 pid=8432 duration=0(sec) Jun 11 20:54:51 server-relay xinetd[29487]: EXIT: mysqlchk signal=13 pid=8477 duration=0(sec) Jun 11 20:55:03 server-relay xinetd[29487]: EXIT: mysqlchk signal=13 pid=8696 duration=0(sec) Jun 11 20:55:15 server-relay xinetd[29487]: EXIT: mysqlchk signal=13 pid=8831 duration=0(sec) Jun 11 20:55:27 server-relay xinetd[29487]: EXIT: mysqlchk signal=13 pid=8956 duration=0(sec) Jun 11 20:55:39 server-relay xinetd[29487]: EXIT: mysqlchk signal=13 pid=9079 duration=0(sec) Jun 11 20:55:51 server-relay xinetd[29487]: EXIT: mysqlchk signal=13 pid=9221 duration=0(sec) Jun 11 20:56:03 server-relay xinetd[29487]: EXIT: mysqlchk signal=13 pid=9374 duration=0(sec) Jun 11 20:56:15 server-relay xinetd[29487]: EXIT: mysqlchk status=0 pid=9505 duration=0(sec) Jun 11 20:56:19 server-relay kernel: INFO: task glusterfs:7769 blocked for more than 120 seconds. Jun 11 20:56:19 server-relay kernel: Not tainted 2.6.32-431.17.1.el6.x86_64 #1 Jun 11 20:56:19 server-relay kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 11 20:56:19 server-relay kernel: glusterfs D 0000000000000000 0 7769 1 0x00000000 Jun 11 20:56:19 server-relay kernel: ffff880436ca1a78 0000000000000082 ffffffff81aaa5c0 0000000000000086 Jun 11 20:56:19 server-relay kernel: ffff880436ca1a08 0000000000000086 ffff880436ca1a58 ffffea0005319a80 Jun 11 20:56:19 server-relay kernel: ffff880436ccb058 ffff880436ca1fd8 000000000000fbc8 ffff880436ccb058 Jun 11 20:56:19 server-relay kernel: Call Trace: Jun 11 20:56:19 server-relay kernel: [<ffffffff810a6d01>] ? ktime_get_ts+0xb1/0xf0 Jun 11 20:56:19 server-relay kernel: [<ffffffff8111f590>] ? sync_page+0x0/0x50 Jun 11 20:56:19 server-relay kernel: [<ffffffff81528c1a>] __wait_on_bit_lock+0x5a/0xc0 Jun 11 20:56:19 server-relay kernel: [<ffffffff8111f567>] __lock_page+0x67/0x70 Jun 11 20:56:19 server-relay kernel: [<ffffffff8109af80>] ? wake_bit_function+0x0/0x50 Jun 11 20:56:19 server-relay kernel: [<ffffffff8111f6a7>] ? unlock_page+0x27/0x30 Jun 11 20:56:19 server-relay kernel: [<ffffffff81137cad>] invalidate_inode_pages2_range+0x36d/0x3b0 Jun 11 20:56:19 server-relay kernel: [<ffffffffa0278145>] fuse_reverse_inval_inode+0x75/0x90 [fuse] Jun 11 20:56:19 server-relay kernel: [<ffffffffa026d9ef>] fuse_notify+0x12f/0x2a0 [fuse] Jun 11 20:56:19 server-relay kernel: [<ffffffffa026d593>] ? fuse_copy_one+0x53/0x70 [fuse] Jun 11 20:56:19 server-relay kernel: [<ffffffffa026e6ea>] fuse_dev_write+0x16a/0x3e0 [fuse] Jun 11 20:56:19 server-relay kernel: [<ffffffff8118893a>] do_sync_write+0xfa/0x140 Jun 11 20:56:19 server-relay kernel: [<ffffffff8109af00>] ? autoremove_wake_function+0x0/0x40 Jun 11 20:56:19 server-relay kernel: [<ffffffff81226056>] ? security_file_permission+0x16/0x20 Jun 11 20:56:19 server-relay kernel: [<ffffffff81188c38>] vfs_write+0xb8/0x1a0 Jun 11 20:56:19 server-relay kernel: [<ffffffff81189531>] sys_write+0x51/0x90 Jun 11 20:56:19 server-relay kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Jun 11 20:56:19 server-relay kernel: INFO: task mimedefang.pl:25148 blocked for more than 120 seconds. Jun 11 20:56:19 server-relay kernel: Not tainted 2.6.32-431.17.1.el6.x86_64 #1 Jun 11 20:56:19 server-relay kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 11 20:56:19 server-relay kernel: mimedefang.pl D 0000000000000001 0 25148 24786 0x00000000 Jun 11 20:56:19 server-relay kernel: ffff88026de95d18 0000000000000082 0000000000000000 ffff880383648034 Jun 11 20:56:19 server-relay kernel: ffff88026de95c88 ffffffff812271bf ffff88026de95d68 ffffffff811989c5 Jun 11 20:56:19 server-relay kernel: ffff88026b901098 ffff88026de95fd8 000000000000fbc8 ffff88026b901098 Jun 11 20:56:19 server-relay kernel: Call Trace: Jun 11 20:56:19 server-relay kernel: [<ffffffff812271bf>] ? security_inode_permission+0x1f/0x30 Jun 11 20:56:19 server-relay kernel: [<ffffffff811989c5>] ? __link_path_walk+0x145/0xff0 Jun 11 20:56:19 server-relay kernel: [<ffffffff8152935e>] __mutex_lock_slowpath+0x13e/0x180 Jun 11 20:56:19 server-relay kernel: [<ffffffff815291fb>] mutex_lock+0x2b/0x50 Jun 11 20:56:19 server-relay kernel: [<ffffffff8119b3e6>] do_filp_open+0x2d6/0xd20 Jun 11 20:56:19 server-relay kernel: [<ffffffff812824f5>] ? _atomic_dec_and_lock+0x55/0x80 Jun 11 20:56:19 server-relay kernel: [<ffffffff811a7ea2>] ? alloc_fd+0x92/0x160 Jun 11 20:56:19 server-relay kernel: [<ffffffff811859e9>] do_sys_open+0x69/0x140 Jun 11 20:56:19 server-relay kernel: [<ffffffff81185b00>] sys_open+0x20/0x30 Jun 11 20:56:19 server-relay kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Jun 11 20:56:19 server-relay kernel: INFO: task mimedefang.pl:16185 blocked for more than 120 seconds. Jun 11 20:56:19 server-relay kernel: Not tainted 2.6.32-431.17.1.el6.x86_64 #1 Jun 11 20:56:19 server-relay kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 11 20:56:19 server-relay kernel: mimedefang.pl D 0000000000000002 0 16185 24786 0x00000000 Jun 11 20:56:19 server-relay kernel: ffff880287b6de38 0000000000000082 ffff880287b6ddd8 ffffffff810546b9 Jun 11 20:56:19 server-relay kernel: ffff88036641abe0 0000000300000000 00000000000000d0 ffff880434ed5840 Jun 11 20:56:19 server-relay kernel: ffff8804346d1098 ffff880287b6dfd8 000000000000fbc8 ffff8804346d1098 Jun 11 20:56:19 server-relay kernel: Call Trace: Jun 11 20:56:19 server-relay kernel: [<ffffffff810546b9>] ? __wake_up_common+0x59/0x90 Jun 11 20:56:19 server-relay kernel: [<ffffffff8109b22e>] ? prepare_to_wait+0x4e/0x80 Jun 11 20:56:19 server-relay kernel: [<ffffffffa026e085>] fuse_request_send+0xe5/0x290 [fuse] Jun 11 20:56:19 server-relay kernel: [<ffffffff8109af00>] ? autoremove_wake_function+0x0/0x40 Jun 11 20:56:19 server-relay kernel: [<ffffffffa0273ff6>] fuse_flush+0x106/0x140 [fuse] Jun 11 20:56:19 server-relay kernel: [<ffffffff8118582c>] filp_close+0x3c/0x90 Jun 11 20:56:19 server-relay kernel: [<ffffffff81185925>] sys_close+0xa5/0x100 Jun 11 20:56:19 server-relay kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Jun 11 20:56:19 server-relay kernel: INFO: task mimedefang.pl:14409 blocked for more than 120 seconds. Jun 11 20:56:19 server-relay kernel: Not tainted 2.6.32-431.17.1.el6.x86_64 #1 Jun 11 20:56:19 server-relay kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 11 20:56:19 server-relay kernel: mimedefang.pl D 0000000000000000 0 14409 24786 0x00000000 Jun 11 20:56:19 server-relay kernel: ffff88026dc51d18 0000000000000086 0000000000000000 ffff8803667f0034 Jun 11 20:56:19 server-relay kernel: ffff88026dc51c88 ffffffff812271bf ffff88026dc51d68 ffffffff811989c5 Jun 11 20:56:19 server-relay kernel: ffff88038065bab8 ffff88026dc51fd8 000000000000fbc8 ffff88038065bab8 Jun 11 20:56:19 server-relay kernel: Call Trace: Jun 11 20:56:19 server-relay kernel: [<ffffffff812271bf>] ? security_inode_permission+0x1f/0x30 Jun 11 20:56:19 server-relay kernel: [<ffffffff811989c5>] ? __link_path_walk+0x145/0xff0 Jun 11 20:56:19 server-relay kernel: [<ffffffff8152935e>] __mutex_lock_slowpath+0x13e/0x180 Jun 11 20:56:19 server-relay kernel: [<ffffffff815291fb>] mutex_lock+0x2b/0x50 Jun 11 20:56:19 server-relay kernel: [<ffffffff8119b3e6>] do_filp_open+0x2d6/0xd20 Jun 11 20:56:19 server-relay kernel: [<ffffffff812824f5>] ? _atomic_dec_and_lock+0x55/0x80 Jun 11 20:56:19 server-relay kernel: [<ffffffff811a7ea2>] ? alloc_fd+0x92/0x160 Jun 11 20:56:19 server-relay kernel: [<ffffffff811859e9>] do_sys_open+0x69/0x140 Jun 11 20:56:19 server-relay kernel: [<ffffffff81185b00>] sys_open+0x20/0x30 Jun 11 20:56:19 server-relay kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Jun 11 20:56:19 server-relay kernel: INFO: task mimedefang.pl:14415 blocked for more than 120 seconds. Jun 11 20:56:19 server-relay kernel: Not tainted 2.6.32-431.17.1.el6.x86_64 #1 Jun 11 20:56:19 server-relay kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 11 20:56:19 server-relay kernel: mimedefang.pl D 0000000000000000 0 14415 24786 0x00000000 Jun 11 20:56:19 server-relay kernel: ffff880187305d18 0000000000000082 0000000000000000 ffff88040fab9034 Jun 11 20:56:19 server-relay kernel: ffff880187305c88 ffffffff812271bf ffff880187305d68 ffffffff811989c5 Jun 11 20:56:19 server-relay kernel: ffff880259d4c638 ffff880187305fd8 000000000000fbc8 ffff880259d4c638 Jun 11 20:56:19 server-relay kernel: Call Trace: Jun 11 20:56:19 server-relay kernel: [<ffffffff812271bf>] ? security_inode_permission+0x1f/0x30 Jun 11 20:56:19 server-relay kernel: [<ffffffff811989c5>] ? __link_path_walk+0x145/0xff0 Jun 11 20:56:19 server-relay kernel: [<ffffffff8152935e>] __mutex_lock_slowpath+0x13e/0x180 Jun 11 20:56:19 server-relay kernel: [<ffffffff815291fb>] mutex_lock+0x2b/0x50 Jun 11 20:56:19 server-relay kernel: [<ffffffff8119b3e6>] do_filp_open+0x2d6/0xd20 Jun 11 20:56:19 server-relay kernel: [<ffffffff812824f5>] ? _atomic_dec_and_lock+0x55/0x80 Jun 11 20:56:19 server-relay kernel: [<ffffffff811a7ea2>] ? alloc_fd+0x92/0x160 Jun 11 20:56:19 server-relay kernel: [<ffffffff811859e9>] do_sys_open+0x69/0x140 Jun 11 20:56:19 server-relay kernel: [<ffffffff81185b00>] sys_open+0x20/0x30 Jun 11 20:56:19 server-relay kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Jun 11 20:56:19 server-relay kernel: INFO: task rsync:6831 blocked for more than 120 seconds. Jun 11 20:56:19 server-relay kernel: Not tainted 2.6.32-431.17.1.el6.x86_64 #1 Jun 11 20:56:19 server-relay kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 11 20:56:19 server-relay kernel: rsync D 0000000000000002 0 6831 7094 0x00000000 Jun 11 20:56:19 server-relay kernel: ffff8801f28a3bb8 0000000000000086 ffff8801f28a3be8 ffffffffa027278b Jun 11 20:56:19 server-relay kernel: 0000000000000001 0000000000000000 0000000000069c66 0000000000000000 Jun 11 20:56:19 server-relay kernel: ffff88038065b058 ffff8801f28a3fd8 000000000000fbc8 ffff88038065b058 Jun 11 20:56:19 server-relay kernel: Call Trace: Jun 11 20:56:19 server-relay kernel: [<ffffffffa027278b>] ? fuse_dentry_revalidate+0x2fb/0x340 [fuse] Jun 11 20:56:19 server-relay kernel: [<ffffffff810a6d01>] ? ktime_get_ts+0xb1/0xf0 Jun 11 20:56:19 server-relay kernel: [<ffffffff8111f590>] ? sync_page+0x0/0x50 Jun 11 20:56:19 server-relay kernel: [<ffffffff81528383>] io_schedule+0x73/0xc0 Jun 11 20:56:19 server-relay kernel: [<ffffffff8111f5cd>] sync_page+0x3d/0x50 Jun 11 20:56:19 server-relay kernel: [<ffffffff81528c1a>] __wait_on_bit_lock+0x5a/0xc0 Jun 11 20:56:19 server-relay kernel: [<ffffffff8111f567>] __lock_page+0x67/0x70 Jun 11 20:56:19 server-relay kernel: [<ffffffff8109af80>] ? wake_bit_function+0x0/0x50 Jun 11 20:56:19 server-relay kernel: [<ffffffff811358b2>] ? pagevec_lookup+0x22/0x30 Jun 11 20:56:19 server-relay kernel: [<ffffffff81137783>] truncate_inode_pages_range+0x4e3/0x500 Jun 11 20:56:19 server-relay kernel: [<ffffffff81137835>] truncate_inode_pages+0x15/0x20 Jun 11 20:56:19 server-relay kernel: [<ffffffff8113788f>] truncate_pagecache+0x4f/0x70 Jun 11 20:56:19 server-relay kernel: [<ffffffffa0278342>] fuse_change_attributes+0xd2/0x100 [fuse] Jun 11 20:56:19 server-relay kernel: [<ffffffffa02700dd>] fuse_readdir+0x76d/0x7b0 [fuse] Jun 11 20:56:19 server-relay kernel: [<ffffffff8119ea30>] ? filldir+0x0/0xe0 Jun 11 20:56:19 server-relay kernel: [<ffffffff8119ea30>] ? filldir+0x0/0xe0 Jun 11 20:56:19 server-relay kernel: [<ffffffff8119ecb0>] vfs_readdir+0xc0/0xe0 Jun 11 20:56:19 server-relay kernel: [<ffffffff8118e7a6>] ? sys_newlstat+0x36/0x50 Jun 11 20:56:19 server-relay kernel: [<ffffffff8119ee39>] sys_getdents+0x89/0xf0 Jun 11 20:56:19 server-relay kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Amit, Do you happen to have logs of the setup? I fixed one issue which could lead to dead locks in 3.5.0. Wondering if it is the same issue. I need logs to confirm it. Pranith
(In reply to Pranith Kumar K from comment #1) > Amit, > Do you happen to have logs of the setup? I fixed one issue which could > lead to dead locks in 3.5.0. Wondering if it is the same issue. I need logs > to confirm it. > > Pranith Logs are attached, mentioned first line of description.
(In reply to santosh pradhan from comment #2) > (In reply to Pranith Kumar K from comment #1) > > Amit, > > Do you happen to have logs of the setup? I fixed one issue which could > > lead to dead locks in 3.5.0. Wondering if it is the same issue. I need logs > > to confirm it. > > > > Pranith > > Logs are attached, mentioned first line of description. Thanks for pointing out Santosh :-). Amit, I went through the logs. I see the following logs on the mount logs: pk1@pranithk-laptop - ~/Downloads/1108850/glusterfs 10:11:04 :( ⚡ grep ping mnt-storage.log [2014-05-19 17:39:55.228936] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-storage-client-3: server 10.35.63.197:49152 has not responded in the last 42 seconds, disconnecting. [2014-05-19 17:39:55.229337] W [client-handshake.c:276:client_ping_cbk] 0-storage-client-3: timer must have expired [2014-06-12 00:49:47.379302] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-storage-client-0: server 10.35.63.196:49152 has not responded in the last 42 seconds, disconnecting. [2014-06-12 00:49:47.398995] W [client-handshake.c:276:client_ping_cbk] 0-storage-client-0: timer must have expired [2014-06-12 00:50:37.387927] C [client-handshake.c:127:rpc_client_ping_timer_expired] 0-storage-client-1: server 10.35.63.195:49152 has not responded in the last 42 seconds, disconnecting. [2014-06-12 00:50:37.407045] W [client-handshake.c:276:client_ping_cbk] 0-storage-client-1: timer must have expired Those times are UTC timestamps. Did you notice the timeouts during that time frame? This looks a LOT like the bug I fixed. Basically the bug appears if the number of overlapping writes is > 64 at an instant. Pasting the commit information here for reference. commit bc434b3ca961757ade8c6093f4ff0dbe4b3a4672 Author: Pranith Kumar K <pkarampu> Date: Wed Apr 23 14:05:10 2014 +0530 rpcsvc: Ignore INODELK/ENTRYLK/LK for throttling Problem: When iozone is in progress, number of blocking inodelks sometimes becomes greater than the threshold number of rpc requests allowed for that client (RPCSVC_DEFAULT_OUTSTANDING_RPC_LIMIT). Subsequent requests from that client will not be read until all the outstanding requests are processed and replied to. But because no more requests are read from that client, unlocks on the already granted locks will never come thus the number of outstanding requests would never come down. This leads to a ping-timeout on the client. Fix: Do not account INODELK/ENTRYLK/LK for throttling Change-Id: I59c6b54e7ec24ed7375ff977e817a9cb73469806 BUG: 1089470 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/7531 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Krutika Dhananjay <kdhananj> Reviewed-by: Anand Avati <avati> Pranith
Amit, the fix for this will be in GlusterFS 3.5.1. Pranith's patch was added about a week after 3.5.0 was released. If you hit this issue again before 3.5.1 is released (should be soon), then you may want to consider upgrading to 3.5.1 beta 2 in the meantime. It has this patch in it too. Hope that helps. :)
$ git log -1 45a0322066513259e61c7a4b1b1ed1a0bd3a0827 commit 45a0322066513259e61c7a4b1b1ed1a0bd3a0827 Author: Pranith Kumar K <pkarampu> Date: Wed Apr 23 14:05:10 2014 +0530 rpcsvc: Ignore INODELK/ENTRYLK/LK for throttling Problem: When iozone is in progress, number of blocking inodelks sometimes becomes greater than the threshold number of rpc requests allowed for that client (RPCSVC_DEFAULT_OUTSTANDING_RPC_LIMIT). Subsequent requests from that client will not be read until all the outstanding requests are processed and replied to. But because no more requests are read from that client, unlocks on the already granted locks will never come thus the number of outstanding requests would never come down. This leads to a ping-timeout on the client. Fix: Do not account INODELK/ENTRYLK/LK for throttling BUG: 1089470 Change-Id: I9cc2c259d2462159cea913d95f98a565acb8e0c0 Signed-off-by: Pranith Kumar K <pkarampu> Reviewed-on: http://review.gluster.org/7570 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Krutika Dhananjay <kdhananj> Reviewed-by: Niels de Vos <ndevos> $ git tag --contains 45a0322066513259e61c7a4b1b1ed1a0bd3a0827 v3.5.1 ... *** This bug has been marked as a duplicate of bug 1089470 ***