Bug 845456 - gluster-object:swift-container process gets blocked quite frequently
Summary: gluster-object:swift-container process gets blocked quite frequently
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: gluster-swift
Version: 2.0
Hardware: x86_64
OS: All
low
low
Target Milestone: ---
: ---
Assignee: Luis Pabón
QA Contact: pushpesh sharma
URL:
Whiteboard:
Depends On:
Blocks: 858437
TreeView+ depends on / blocked
 
Reported: 2012-08-03 06:59 UTC by Saurabh
Modified: 2016-11-08 22:25 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 858437 (view as bug list)
Environment:
Last Closed: 2013-09-23 22:32:17 UTC
Embargoed:


Attachments (Terms of Use)

Description Saurabh 2012-08-03 06:59:23 UTC
Description of problem:
I am executing some longevity tests for gluster-object

and as per dmesg there are swift-container process gets blocked quite frequently


dmesg information,

NFO: task swift-container:6372 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swift-contain D 0000000000000004     0  6372   6328 0x00000000
 ffff8809cd92dbb8 0000000000000082 0000000000000000 ffff8804998450b0
 ffff880499845000 0000000000000000 ffff8809cd92dc38 ffffffff81472611
 ffff8809cd92baf8 ffff8809cd92dfd8 000000000000f4e8 ffff8809cd92baf8
Call Trace:
 [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90
 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50
 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0
 [<ffffffff81185794>] __link_path_walk+0x734/0x1030
 [<ffffffff8118631a>] path_walk+0x6a/0xe0
 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0
 [<ffffffff81187157>] user_path_at+0x57/0xa0
 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0
 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80
 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100
 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80
 [<ffffffff8117beab>] vfs_stat+0x1b/0x20
 [<ffffffff8117bed4>] sys_newstat+0x24/0x50
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task swift-container:6369 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swift-contain D 0000000000000012     0  6369   6328 0x00000000
 ffff8809cd8cdbb8 0000000000000082 0000000000000000 ffff88059f2ca270
 ffff88059f2ca1c0 0000000000000000 ffff8809cd8cdc38 ffffffff81472611
 ffff880c18372678 ffff8809cd8cdfd8 000000000000f4e8 ffff880c18372678
Call Trace:
 [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90
 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff8118d6df>] ? d_free+0x3f/0x60
 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50
 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0
 [<ffffffff81185794>] __link_path_walk+0x734/0x1030
 [<ffffffff8118631a>] path_walk+0x6a/0xe0
 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0
 [<ffffffff81187157>] user_path_at+0x57/0xa0
 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0
 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80
 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100
 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80
 [<ffffffff8117beab>] vfs_stat+0x1b/0x20
 [<ffffffff8117bed4>] sys_newstat+0x24/0x50
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task swift-container:6372 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swift-contain D 0000000000000004     0  6372   6328 0x00000000
 ffff8809cd92dbb8 0000000000000082 0000000000000000 ffff8804998450b0
 ffff880499845000 0000000000000000 ffff8809cd92dc38 ffffffff81472611
 ffff8809cd92baf8 ffff8809cd92dfd8 000000000000f4e8 ffff8809cd92baf8
Call Trace:
 [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90
 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50
 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0
 [<ffffffff81185794>] __link_path_walk+0x734/0x1030
 [<ffffffff8118631a>] path_walk+0x6a/0xe0
 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0
 [<ffffffff81187157>] user_path_at+0x57/0xa0
 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0
 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80
 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100
 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80
 [<ffffffff8117beab>] vfs_stat+0x1b/0x20
 [<ffffffff8117bed4>] sys_newstat+0x24/0x50
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task swift-container:6375 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swift-contain D 0000000000000003     0  6375   6328 0x00000000
 ffff8809cd98dbb8 0000000000000082 ffff8809cd98db28 ffff8804db969670
 ffff8804db9695c0 0000000000000000 ffff8809cd98dc38 ffffffff81472611
 ffff8809cd92b0b8 ffff8809cd98dfd8 000000000000f4e8 ffff8809cd92b0b8
Call Trace:
 [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90
 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50
 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0
 [<ffffffff81185794>] __link_path_walk+0x734/0x1030
 [<ffffffff8118631a>] path_walk+0x6a/0xe0
 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0
 [<ffffffff81187157>] user_path_at+0x57/0xa0
 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0
 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80
 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100
 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80
 [<ffffffff8117beab>] vfs_stat+0x1b/0x20
 [<ffffffff8117bed4>] sys_newstat+0x24/0x50
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task swift-container:6377 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swift-contain D 0000000000000009     0  6377   6328 0x00000000
 ffff8809cd9ebbb8 0000000000000086 0000000000000000 ffff8804582756f0
 ffff880458275640 0000000000000000 ffff8809cd9ebc38 ffffffff81472611
 ffff8809cd92a678 ffff8809cd9ebfd8 000000000000f4e8 ffff8809cd92a678
Call Trace:
 [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90
 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50
 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0
 [<ffffffff81185794>] __link_path_walk+0x734/0x1030
 [<ffffffff8118631a>] path_walk+0x6a/0xe0
 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0
 [<ffffffff81187157>] user_path_at+0x57/0xa0
 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0
 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80
 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100
 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80
 [<ffffffff8117beab>] vfs_stat+0x1b/0x20
 [<ffffffff8117bed4>] sys_newstat+0x24/0x50
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task swift-container:6369 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swift-contain D 0000000000000012     0  6369   6328 0x00000000
 ffff8809cd8cdbb8 0000000000000082 0000000000000000 ffff88059f2ca270
 ffff88059f2ca1c0 0000000000000000 ffff8809cd8cdc38 ffffffff81472611
 ffff880c18372678 ffff8809cd8cdfd8 000000000000f4e8 ffff880c18372678
Call Trace:
 [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90
 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff8118d6df>] ? d_free+0x3f/0x60
 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50
 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0
 [<ffffffff81185794>] __link_path_walk+0x734/0x1030
 [<ffffffff8118631a>] path_walk+0x6a/0xe0
 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0
 [<ffffffff81187157>] user_path_at+0x57/0xa0
 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0
 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80
 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100
 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80
 [<ffffffff8117beab>] vfs_stat+0x1b/0x20
 [<ffffffff8117bed4>] sys_newstat+0x24/0x50
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task swift-container:6372 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swift-contain D 0000000000000004     0  6372   6328 0x00000000
 ffff8809cd92dbb8 0000000000000082 0000000000000000 ffff8804998450b0
 ffff880499845000 0000000000000000 ffff8809cd92dc38 ffffffff81472611
 ffff8809cd92baf8 ffff8809cd92dfd8 000000000000f4e8 ffff8809cd92baf8
Call Trace:
 [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90
 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50
 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0
 [<ffffffff81185794>] __link_path_walk+0x734/0x1030
 [<ffffffff8118631a>] path_walk+0x6a/0xe0
 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0
 [<ffffffff81187157>] user_path_at+0x57/0xa0
 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0
 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80
 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100
 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80
 [<ffffffff8117beab>] vfs_stat+0x1b/0x20
 [<ffffffff8117bed4>] sys_newstat+0x24/0x50
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task swift-container:6375 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swift-contain D 0000000000000003     0  6375   6328 0x00000000
 ffff8809cd98dbb8 0000000000000082 ffff8809cd98db28 ffff8804db969670
 ffff8804db9695c0 0000000000000000 ffff8809cd98dc38 ffffffff81472611
 ffff8809cd92b0b8 ffff8809cd98dfd8 000000000000f4e8 ffff8809cd92b0b8
Call Trace:
 [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90
 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50
 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0
 [<ffffffff81185794>] __link_path_walk+0x734/0x1030
 [<ffffffff8118631a>] path_walk+0x6a/0xe0
 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0
 [<ffffffff81187157>] user_path_at+0x57/0xa0
 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0
 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80
 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100
 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80
 [<ffffffff8117beab>] vfs_stat+0x1b/0x20
 [<ffffffff8117bed4>] sys_newstat+0x24/0x50
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task swift-container:6377 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swift-contain D 0000000000000009     0  6377   6328 0x00000000
 ffff8809cd9ebbb8 0000000000000086 0000000000000000 ffff8804582756f0
 ffff880458275640 0000000000000000 ffff8809cd9ebc38 ffffffff81472611
 ffff8809cd92a678 ffff8809cd9ebfd8 000000000000f4e8 ffff8809cd92a678
Call Trace:
 [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90
 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50
 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0
 [<ffffffff81185794>] __link_path_walk+0x734/0x1030
 [<ffffffff8118631a>] path_walk+0x6a/0xe0
 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0
 [<ffffffff81187157>] user_path_at+0x57/0xa0
 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0
 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80
 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100
 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80
 [<ffffffff8117beab>] vfs_stat+0x1b/0x20
 [<ffffffff8117bed4>] sys_newstat+0x24/0x50
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
INFO: task swift-container:6369 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
swift-contain D 0000000000000012     0  6369   6328 0x00000000
 ffff8809cd8cdbb8 0000000000000082 0000000000000000 ffff88059f2ca270
 ffff88059f2ca1c0 0000000000000000 ffff8809cd8cdc38 ffffffff81472611
 ffff880c18372678 ffff8809cd8cdfd8 000000000000f4e8 ffff880c18372678
Call Trace:
 [<ffffffff81472611>] ? tcp_recvmsg+0x831/0xe90
 [<ffffffff814ee8ae>] __mutex_lock_slowpath+0x13e/0x180
 [<ffffffff8118d6df>] ? d_free+0x3f/0x60
 [<ffffffff814ee74b>] mutex_lock+0x2b/0x50
 [<ffffffff81184e6f>] do_lookup+0xef/0x1e0
 [<ffffffff81185794>] __link_path_walk+0x734/0x1030
 [<ffffffff8118631a>] path_walk+0x6a/0xe0
 [<ffffffff811864eb>] do_path_lookup+0x5b/0xa0
 [<ffffffff81187157>] user_path_at+0x57/0xa0
 [<ffffffff81144f51>] ? unlink_anon_vmas+0x71/0xd0
 [<ffffffff8126ae15>] ? _atomic_dec_and_lock+0x55/0x80
 [<ffffffff8117bb14>] ? cp_new_stat+0xe4/0x100
 [<ffffffff8117bd46>] vfs_fstatat+0x46/0x80
 [<ffffffff8117beab>] vfs_stat+0x1b/0x20
 [<ffffffff8117bed4>] sys_newstat+0x24/0x50
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
[root@gqac028 ~]# 
[root@gqac028 ~]# 
[root@gqac028 ~]# dmesg 
 

Version-Release number of selected component (if applicable):
[root@gqac028 ~]# glusterfs -V
glusterfs 3.3.0 built on Jul 19 2012 14:08:45
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
You may redistribute copies of GlusterFS under the terms of the GNU General Public License.
[root@gqac028 ~]# 


How reproducible:

happening on the present setup

Steps to Reproduce:
1. create some thousands of containers
2. keep executing REST APIs in parallel,(PUT/GET/DELETE)
3.
  
Actual results:
as per the description

Expected results:
the container is  not supposed to be blocked.

Additional info:


[root@gqac028 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg_gqac028-lv_root
                       50G   24G   24G  51% /
tmpfs                  24G     0   24G   0% /dev/shm
/dev/sda1             485M   31M  429M   7% /boot
/dev/mapper/vg_gqac028-lv_home
                      366G   28G  338G   8% /home
localhost:test1       2.2T  171G  2.0T   8% /mnt/gluster-object/AUTH_test1
df: `/mnt/gluster-object/AUTH_test': Transport endpoint is not connected
[root@gqac028 ~]# 
[root@gqac028 ~]# 
[root@gqac028 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         48383      26827      21556          0        129      16148
-/+ buffers/cache:      10550      37833
Swap:        50431          0      50431
[root@gqac028 ~]# gluster volume info test1
 
Volume Name: test1
Type: Distributed-Replicate
Volume ID: ae4f5ddc-dcf1-4298-b705-c71e81a5b12f
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.16.157.81:/home/test1-dr
Brick2: 10.16.157.75:/home/test1-dr2
Brick3: 10.16.157.78:/home/test1-d2r
Brick4: 10.16.157.21:/home/test1-d2r2
Brick5: 10.16.157.81:/home/test1-d3r
Brick6: 10.16.157.75:/home/test1-d3r2
Brick7: 10.16.157.78:/home/test1-d4r
Brick8: 10.16.157.21:/home/test1-d4r2
Brick9: 10.16.157.81:/home/test1-d5r
Brick10: 10.16.157.75:/home/test1-d5r2
Brick11: 10.16.157.78:/home/test1-d6r
Brick12: 10.16.157.21:/home/test1-d6r2
Options Reconfigured:
geo-replication.indexing: off
features.quota: off
diagnostics.brick-log-level: CRITICAL
[root@gqac028 ~]# 


though at present state,


[root@gqac028 ~]# ps -aux | grep container-server
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
root      6328  0.0  0.0 230468 15388 ?        Ss   Aug02   0:00 /usr/bin/python /usr/bin/swift-container-server /etc/swift/container-server/1.conf
root      6369  3.2  0.0 247380 29948 ?        S    Aug02  42:35 /usr/bin/python /usr/bin/swift-container-server /etc/swift/container-server/1.conf
root      6372  3.2  0.0 247640 30420 ?        S    Aug02  42:37 /usr/bin/python /usr/bin/swift-container-server /etc/swift/container-server/1.conf
root      6375  3.2  0.0 246452 29280 ?        S    Aug02  42:45 /usr/bin/python /usr/bin/swift-container-server /etc/swift/container-server/1.conf
root      6377  3.2  0.0 247468 30236 ?        S    Aug02  42:50 /usr/bin/python /usr/bin/swift-container-server /etc/swift/container-server/1.conf
root      6454  0.0  0.0 103236   864 pts/0    S+   06:56   0:00 grep container-server

Comment 2 Junaid 2012-08-07 06:40:07 UTC
Hi Saurabh,

What was the load on the server on which this back trace was seen? Can you please provide some more info on the test case that you were running and the machine configuration that was used.

Comment 4 Luis Pabón 2013-07-17 01:00:26 UTC
RHS 2.0 UFO Bugs are being set to low priority.

Comment 8 Scott Haines 2013-09-23 22:32:17 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.