Description of problem: I am executing some longevity tests for gluster-object and as per dmesg there are swift-container process gets blocked quite frequently dmesg information, NFO: task swift-container:6372 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swift-contain D 0000000000000004 0 6372 6328 0x00000000 ffff8809cd92dbb8 0000000000000082 0000000000000000 ffff8804998450b0 ffff880499845000 0000000000000000 ffff8809cd92dc38 ffffffff81472611 ffff8809cd92baf8 ffff8809cd92dfd8 000000000000f4e8 ffff8809cd92baf8 Call Trace: [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0 [<ffffffff81185794>] __link_path_walk+0x734/0x1030 [<ffffffff8118631a>] path_walk+0x6a/0xe0 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0 [<ffffffff81187157>] user_path_at+0x57/0xa0 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80 [<ffffffff8117beab>] vfs_stat+0x1b/0x20 [<ffffffff8117bed4>] sys_newstat+0x24/0x50 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b INFO: task swift-container:6369 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swift-contain D 0000000000000012 0 6369 6328 0x00000000 ffff8809cd8cdbb8 0000000000000082 0000000000000000 ffff88059f2ca270 ffff88059f2ca1c0 0000000000000000 ffff8809cd8cdc38 ffffffff81472611 ffff880c18372678 ffff8809cd8cdfd8 000000000000f4e8 ffff880c18372678 Call Trace: [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8118d6df>] ? d_free+0x3f/0x60 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0 [<ffffffff81185794>] __link_path_walk+0x734/0x1030 [<ffffffff8118631a>] path_walk+0x6a/0xe0 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0 [<ffffffff81187157>] user_path_at+0x57/0xa0 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80 [<ffffffff8117beab>] vfs_stat+0x1b/0x20 [<ffffffff8117bed4>] sys_newstat+0x24/0x50 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b INFO: task swift-container:6372 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swift-contain D 0000000000000004 0 6372 6328 0x00000000 ffff8809cd92dbb8 0000000000000082 0000000000000000 ffff8804998450b0 ffff880499845000 0000000000000000 ffff8809cd92dc38 ffffffff81472611 ffff8809cd92baf8 ffff8809cd92dfd8 000000000000f4e8 ffff8809cd92baf8 Call Trace: [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0 [<ffffffff81185794>] __link_path_walk+0x734/0x1030 [<ffffffff8118631a>] path_walk+0x6a/0xe0 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0 [<ffffffff81187157>] user_path_at+0x57/0xa0 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80 [<ffffffff8117beab>] vfs_stat+0x1b/0x20 [<ffffffff8117bed4>] sys_newstat+0x24/0x50 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b INFO: task swift-container:6375 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swift-contain D 0000000000000003 0 6375 6328 0x00000000 ffff8809cd98dbb8 0000000000000082 ffff8809cd98db28 ffff8804db969670 ffff8804db9695c0 0000000000000000 ffff8809cd98dc38 ffffffff81472611 ffff8809cd92b0b8 ffff8809cd98dfd8 000000000000f4e8 ffff8809cd92b0b8 Call Trace: [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0 [<ffffffff81185794>] __link_path_walk+0x734/0x1030 [<ffffffff8118631a>] path_walk+0x6a/0xe0 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0 [<ffffffff81187157>] user_path_at+0x57/0xa0 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80 [<ffffffff8117beab>] vfs_stat+0x1b/0x20 [<ffffffff8117bed4>] sys_newstat+0x24/0x50 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b INFO: task swift-container:6377 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swift-contain D 0000000000000009 0 6377 6328 0x00000000 ffff8809cd9ebbb8 0000000000000086 0000000000000000 ffff8804582756f0 ffff880458275640 0000000000000000 ffff8809cd9ebc38 ffffffff81472611 ffff8809cd92a678 ffff8809cd9ebfd8 000000000000f4e8 ffff8809cd92a678 Call Trace: [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0 [<ffffffff81185794>] __link_path_walk+0x734/0x1030 [<ffffffff8118631a>] path_walk+0x6a/0xe0 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0 [<ffffffff81187157>] user_path_at+0x57/0xa0 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80 [<ffffffff8117beab>] vfs_stat+0x1b/0x20 [<ffffffff8117bed4>] sys_newstat+0x24/0x50 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b INFO: task swift-container:6369 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swift-contain D 0000000000000012 0 6369 6328 0x00000000 ffff8809cd8cdbb8 0000000000000082 0000000000000000 ffff88059f2ca270 ffff88059f2ca1c0 0000000000000000 ffff8809cd8cdc38 ffffffff81472611 ffff880c18372678 ffff8809cd8cdfd8 000000000000f4e8 ffff880c18372678 Call Trace: [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8118d6df>] ? d_free+0x3f/0x60 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0 [<ffffffff81185794>] __link_path_walk+0x734/0x1030 [<ffffffff8118631a>] path_walk+0x6a/0xe0 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0 [<ffffffff81187157>] user_path_at+0x57/0xa0 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80 [<ffffffff8117beab>] vfs_stat+0x1b/0x20 [<ffffffff8117bed4>] sys_newstat+0x24/0x50 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b INFO: task swift-container:6372 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swift-contain D 0000000000000004 0 6372 6328 0x00000000 ffff8809cd92dbb8 0000000000000082 0000000000000000 ffff8804998450b0 ffff880499845000 0000000000000000 ffff8809cd92dc38 ffffffff81472611 ffff8809cd92baf8 ffff8809cd92dfd8 000000000000f4e8 ffff8809cd92baf8 Call Trace: [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0 [<ffffffff81185794>] __link_path_walk+0x734/0x1030 [<ffffffff8118631a>] path_walk+0x6a/0xe0 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0 [<ffffffff81187157>] user_path_at+0x57/0xa0 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80 [<ffffffff8117beab>] vfs_stat+0x1b/0x20 [<ffffffff8117bed4>] sys_newstat+0x24/0x50 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b INFO: task swift-container:6375 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swift-contain D 0000000000000003 0 6375 6328 0x00000000 ffff8809cd98dbb8 0000000000000082 ffff8809cd98db28 ffff8804db969670 ffff8804db9695c0 0000000000000000 ffff8809cd98dc38 ffffffff81472611 ffff8809cd92b0b8 ffff8809cd98dfd8 000000000000f4e8 ffff8809cd92b0b8 Call Trace: [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0 [<ffffffff81185794>] __link_path_walk+0x734/0x1030 [<ffffffff8118631a>] path_walk+0x6a/0xe0 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0 [<ffffffff81187157>] user_path_at+0x57/0xa0 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80 [<ffffffff8117beab>] vfs_stat+0x1b/0x20 [<ffffffff8117bed4>] sys_newstat+0x24/0x50 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b INFO: task swift-container:6377 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swift-contain D 0000000000000009 0 6377 6328 0x00000000 ffff8809cd9ebbb8 0000000000000086 0000000000000000 ffff8804582756f0 ffff880458275640 0000000000000000 ffff8809cd9ebc38 ffffffff81472611 ffff8809cd92a678 ffff8809cd9ebfd8 000000000000f4e8 ffff8809cd92a678 Call Trace: [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0 [<ffffffff81185794>] __link_path_walk+0x734/0x1030 [<ffffffff8118631a>] path_walk+0x6a/0xe0 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0 [<ffffffff81187157>] user_path_at+0x57/0xa0 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80 [<ffffffff8117beab>] vfs_stat+0x1b/0x20 [<ffffffff8117bed4>] sys_newstat+0x24/0x50 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b INFO: task swift-container:6369 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. swift-contain D 0000000000000012 0 6369 6328 0x00000000 ffff8809cd8cdbb8 0000000000000082 0000000000000000 ffff88059f2ca270 ffff88059f2ca1c0 0000000000000000 ffff8809cd8cdc38 ffffffff81472611 ffff880c18372678 ffff8809cd8cdfd8 000000000000f4e8 ffff880c18372678 Call Trace: [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8118d6df>] ? d_free+0x3f/0x60 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0 [<ffffffff81185794>] __link_path_walk+0x734/0x1030 [<ffffffff8118631a>] path_walk+0x6a/0xe0 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0 [<ffffffff81187157>] user_path_at+0x57/0xa0 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80 [<ffffffff8117beab>] vfs_stat+0x1b/0x20 [<ffffffff8117bed4>] sys_newstat+0x24/0x50 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b [root@gqac028 ~]# [root@gqac028 ~]# [root@gqac028 ~]# dmesg Version-Release number of selected component (if applicable): [root@gqac028 ~]# glusterfs -V glusterfs 3.3.0 built on Jul 19 2012 14:08:45 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@gqac028 ~]# How reproducible: happening on the present setup Steps to Reproduce: 1. create some thousands of containers 2. keep executing REST APIs in parallel,(PUT/GET/DELETE) 3. Actual results: as per the description Expected results: the container is not supposed to be blocked. Additional info: [root@gqac028 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_gqac028-lv_root 50G 24G 24G 51% / tmpfs 24G 0 24G 0% /dev/shm /dev/sda1 485M 31M 429M 7% /boot /dev/mapper/vg_gqac028-lv_home 366G 28G 338G 8% /home localhost:test1 2.2T 171G 2.0T 8% /mnt/gluster-object/AUTH_test1 df: `/mnt/gluster-object/AUTH_test': Transport endpoint is not connected [root@gqac028 ~]# [root@gqac028 ~]# [root@gqac028 ~]# free -m total used free shared buffers cached Mem: 48383 26827 21556 0 129 16148 -/+ buffers/cache: 10550 37833 Swap: 50431 0 50431 [root@gqac028 ~]# gluster volume info test1 Volume Name: test1 Type: Distributed-Replicate Volume ID: ae4f5ddc-dcf1-4298-b705-c71e81a5b12f Status: Started Number of Bricks: 6 x 2 = 12 Transport-type: tcp Bricks: Brick1: 10.16.157.81:/home/test1-dr Brick2: 10.16.157.75:/home/test1-dr2 Brick3: 10.16.157.78:/home/test1-d2r Brick4: 10.16.157.21:/home/test1-d2r2 Brick5: 10.16.157.81:/home/test1-d3r Brick6: 10.16.157.75:/home/test1-d3r2 Brick7: 10.16.157.78:/home/test1-d4r Brick8: 10.16.157.21:/home/test1-d4r2 Brick9: 10.16.157.81:/home/test1-d5r Brick10: 10.16.157.75:/home/test1-d5r2 Brick11: 10.16.157.78:/home/test1-d6r Brick12: 10.16.157.21:/home/test1-d6r2 Options Reconfigured: geo-replication.indexing: off features.quota: off diagnostics.brick-log-level: CRITICAL [root@gqac028 ~]# though at present state, [root@gqac028 ~]# ps -aux | grep container-server Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ root 6328 0.0 0.0 230468 15388 ? Ss Aug02 0:00 /usr/bin/python /usr/bin/swift-container-server /etc/swift/container-server/1.conf root 6369 3.2 0.0 247380 29948 ? S Aug02 42:35 /usr/bin/python /usr/bin/swift-container-server /etc/swift/container-server/1.conf root 6372 3.2 0.0 247640 30420 ? S Aug02 42:37 /usr/bin/python /usr/bin/swift-container-server /etc/swift/container-server/1.conf root 6375 3.2 0.0 246452 29280 ? S Aug02 42:45 /usr/bin/python /usr/bin/swift-container-server /etc/swift/container-server/1.conf root 6377 3.2 0.0 247468 30236 ? S Aug02 42:50 /usr/bin/python /usr/bin/swift-container-server /etc/swift/container-server/1.conf root 6454 0.0 0.0 103236 864 pts/0 S+ 06:56 0:00 grep container-server
Hi Saurabh, What was the load on the server on which this back trace was seen? Can you please provide some more info on the test case that you were running and the machine configuration that was used.
RHS 2.0 UFO Bugs are being set to low priority.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html