+++ This bug was initially created as a clone of Bug #1226005 +++ When we did a graph switch on a rebalance daemon, a second call to gf_degrag_start() was done. This lead to multiple threads doing migration. When multiple threads try to move the same file there can be deadlocks. --- Additional comment from Anand Avati on 2015-05-28 14:01:39 EDT --- REVIEW: http://review.gluster.org/10977 (When we did a graph switch on a rebalance daemon, a second call to gf_degrag_start() was done. This lead to multiple threads doing migration. When multiple threads try to move the same file there can be deadlocks.) posted (#1) for review on master by Dan Lambright (dlambrig) --- Additional comment from Anand Avati on 2015-05-28 17:29:10 EDT --- REVIEW: http://review.gluster.org/10977 (When we did a graph switch on a rebalance daemon, a second call to gf_degrag_start() was done. This lead to multiple threads doing migration. When multiple threads try to move the same file there can be deadlocks.) posted (#3) for review on master by Dan Lambright (dlambrig) --- Additional comment from Anand Avati on 2015-05-29 15:56:51 EDT --- REVIEW: http://review.gluster.org/10977 (When we did a graph switch on a rebalance daemon, a second call to gf_degrag_start() was done. This lead to multiple threads doing migration. When multiple threads try to move the same file there can be deadlocks.) posted (#4) for review on master by Dan Lambright (dlambrig) --- Additional comment from Anand Avati on 2015-05-30 06:34:53 EDT --- REVIEW: http://review.gluster.org/10977 (When we did a graph switch on a rebalance daemon, a second call to gf_degrag_start() was done. This lead to multiple threads doing migration. When multiple threads try to move the same file there can be deadlocks.) posted (#5) for review on master by Niels de Vos (ndevos) --- Additional comment from Anand Avati on 2015-05-30 11:59:50 EDT --- REVIEW: http://review.gluster.org/10977 (cluster/dht: maintain start state of rebalance daemon across graph switch.) posted (#6) for review on master by Dan Lambright (dlambrig) --- Additional comment from Anand Avati on 2015-06-01 14:12:41 EDT --- COMMIT: http://review.gluster.org/10977 committed in master by Vijay Bellur (vbellur) ------ commit 3f11b8e8ec6d78ebe33636b64130d5d133729f2c Author: Dan Lambright <dlambrig> Date: Thu May 28 14:00:37 2015 -0400 cluster/dht: maintain start state of rebalance daemon across graph switch. When we did a graph switch on a rebalance daemon, a second call to gf_degrag_start() was done. This lead to multiple threads doing migration. When multiple threads try to move the same file there can be deadlocks. Change-Id: I931ca7fe600022f245e3dccaabb1ad004f732c56 BUG: 1226005 Signed-off-by: Dan Lambright <dlambrig> Reviewed-on: http://review.gluster.org/10977 Tested-by: NetBSD Build System <jenkins.org> Reviewed-by: Shyamsundar Ranganathan <srangana>
hi Dan, Can you kindly tell what QE must do to verify this bug? thanks, nagpavan
### QE validation Testcase#### 1)created a tier volume 2)checked the tier rebalance process was started and noted its PID [root@nchilaka-tier-01 ~]# ps -ef|grep tier root 15440 15015 0 17:33 pts/0 00:00:00 grep tier root 30815 1 0 00:06 ? 00:00:08 /usr/sbin/glusterfs -s localhost --volfile-id rebalance/vol1 --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.readdir-optimize=on --xlator-option *tier-dht.xattr-name=trusted.tier-gfid --xlator-option *dht.rebalance-cmd=6 --xlator-option *dht.node-uuid=d1a42b37-f19d-4b76-a676-a2443b0ccec8 --xlator-option *dht.commit-hash=2905257114 --socket-file /var/run/gluster/gluster-tier-bc3c0dd1-dfd0-4a58-9652-aa231333202c.sock --pid-file /var/lib/glusterd/vols/vol1/tier/d1a42b37-f19d-4b76-a676-a2443b0ccec8.pid -l /var/log/glusterfs/vol1-tier.log 3)Now did a pstack of the PID to see the tier_start thread information [root@nchilaka-tier-01 ~]# pstack 30815|grep tier #2 0x00007f6420121895 in tier_start () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/tier.so 4)Now turned on and off some performance options of volume and played with them The, Previous Behavior, was that there was another tier_start thread triggered along with the existing. This meant there were 2 threads for rebalance of tier and hence could cause deadlock. Current Fix: Now there is only one thread still and hence no deadlock Also the PID was not killed. test version : ============= [root@nchilaka-tier-01 ~]# gluster --version glusterfs 3.7.1 built on Jul 12 2015 22:27:42 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@nchilaka-tier-01 ~]# rpm -qa|grep gluster gluster-nagios-common-0.2.0-1.el6rhs.noarch glusterfs-3.7.1-9.el6rhs.x86_64 glusterfs-cli-3.7.1-9.el6rhs.x86_64 gluster-nagios-addons-0.2.4-4.el6rhs.x86_64 glusterfs-libs-3.7.1-9.el6rhs.x86_64 glusterfs-client-xlators-3.7.1-9.el6rhs.x86_64 glusterfs-api-3.7.1-9.el6rhs.x86_64 glusterfs-server-3.7.1-9.el6rhs.x86_64 glusterfs-rdma-3.7.1-9.el6rhs.x86_64 python-gluster-3.7.1-9.el6rhs.x86_64 vdsm-gluster-4.16.20-1.2.el6rhs.noarch glusterfs-fuse-3.7.1-9.el6rhs.x86_64 glusterfs-geo-replication-3.7.1-9.el6rhs.x86_64 [root@nchilaka-tier-01 ~]# CLI logs: ======== [root@nchilaka-tier-01 ~]# gluster --version glusterfs 3.7.1 built on Jul 12 2015 22:27:42 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@nchilaka-tier-01 ~]# rpm -qa|grep gluster gluster-nagios-common-0.2.0-1.el6rhs.noarch glusterfs-3.7.1-9.el6rhs.x86_64 glusterfs-cli-3.7.1-9.el6rhs.x86_64 gluster-nagios-addons-0.2.4-4.el6rhs.x86_64 glusterfs-libs-3.7.1-9.el6rhs.x86_64 glusterfs-client-xlators-3.7.1-9.el6rhs.x86_64 glusterfs-api-3.7.1-9.el6rhs.x86_64 glusterfs-server-3.7.1-9.el6rhs.x86_64 glusterfs-rdma-3.7.1-9.el6rhs.x86_64 python-gluster-3.7.1-9.el6rhs.x86_64 vdsm-gluster-4.16.20-1.2.el6rhs.noarch glusterfs-fuse-3.7.1-9.el6rhs.x86_64 glusterfs-geo-replication-3.7.1-9.el6rhs.x86_64 [root@nchilaka-tier-01 ~]# ###################### bash-4.3$ ssh root.42.129 root.42.129's password: Last login: Tue Jul 14 21:30:15 2015 from dhcp35-163.lab.eng.blr.redhat.com [root@nchilaka-tier-01 ~]# ps -ef|grep tier root 15119 15015 0 17:30 pts/0 00:00:00 grep tier root 30815 1 0 00:06 ? 00:00:08 /usr/sbin/glusterfs -s localhost --volfile-id rebalance/vol1 --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.readdir-optimize=on --xlator-option *tier-dht.xattr-name=trusted.tier-gfid --xlator-option *dht.rebalance-cmd=6 --xlator-option *dht.node-uuid=d1a42b37-f19d-4b76-a676-a2443b0ccec8 --xlator-option *dht.commit-hash=2905257114 --socket-file /var/run/gluster/gluster-tier-bc3c0dd1-dfd0-4a58-9652-aa231333202c.sock --pid-file /var/lib/glusterd/vols/vol1/tier/d1a42b37-f19d-4b76-a676-a2443b0ccec8.pid -l /var/log/glusterfs/vol1-tier.log [root@nchilaka-tier-01 ~]# pstack 30815 Thread 19 (Thread 0x7f642547f700 (LWP 30816)): #0 0x00007f642d031fbd in nanosleep () from /lib64/libpthread.so.0 #1 0x00007f642df6168a in gf_timer_proc () from /usr/lib64/libglusterfs.so.0 #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 18 (Thread 0x7f6424a7e700 (LWP 30817)): #0 0x00007f642d032535 in sigwait () from /lib64/libpthread.so.0 #1 0x00007f642e40902b in glusterfs_sigwaiter () #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 17 (Thread 0x7f642407d700 (LWP 30818)): #0 0x00007f642c958aad in nanosleep () from /lib64/libc.so.6 #1 0x00007f642c958920 in sleep () from /lib64/libc.so.6 #2 0x00007f6420121895 in tier_start () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/tier.so #3 0x00007f642034fd27 in gf_defrag_start_crawl () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #4 0x00007f642df872b2 in synctask_wrap () from /usr/lib64/libglusterfs.so.0 #5 0x00007f642c8ef8f0 in ?? () from /lib64/libc.so.6 #6 0x0000000000000000 in ?? () Thread 16 (Thread 0x7f642367c700 (LWP 30819)): #0 0x00007f642d02ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642df86d6b in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007f642df8bb80 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 15 (Thread 0x7f6421449700 (LWP 30820)): #0 0x00007f642c994f63 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f642dfa38c1 in ?? () from /usr/lib64/libglusterfs.so.0 #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x7f6419fce700 (LWP 30824)): #0 0x00007f642c994f63 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f642dfa38c1 in ?? () from /usr/lib64/libglusterfs.so.0 #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x7f640fce8700 (LWP 30830)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034d995 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7f640f2e7700 (LWP 30831)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034d995 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7f640e8e6700 (LWP 30832)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034d995 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7f640dee5700 (LWP 30833)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034d995 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7f640d4e4700 (LWP 30834)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034d995 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7f640cae3700 (LWP 30835)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034d995 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f63f3fff700 (LWP 30836)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034db13 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f63f35fe700 (LWP 30837)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034db13 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f63f2bfd700 (LWP 30838)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034db13 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f63f21fc700 (LWP 30839)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034db13 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f63f17fb700 (LWP 30840)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034db13 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f63f0dfa700 (LWP 30841)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034db13 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f642e3ee740 (LWP 30815)): #0 0x00007f642d02b2ad in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f642dfa353d in ?? () from /usr/lib64/libglusterfs.so.0 #2 0x00007f642e40aef1 in main () [root@nchilaka-tier-01 ~]# pstack 30815|grep tier #2 0x00007f6420121895 in tier_start () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/tier.so [root@nchilaka-tier-01 ~]# pstack 30815 Thread 19 (Thread 0x7f642547f700 (LWP 30816)): #0 0x00007f642d031fbd in nanosleep () from /lib64/libpthread.so.0 #1 0x00007f642df6168a in gf_timer_proc () from /usr/lib64/libglusterfs.so.0 #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 18 (Thread 0x7f6424a7e700 (LWP 30817)): #0 0x00007f642d032535 in sigwait () from /lib64/libpthread.so.0 #1 0x00007f642e40902b in glusterfs_sigwaiter () #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 17 (Thread 0x7f642407d700 (LWP 30818)): #0 0x00007f642c958aad in nanosleep () from /lib64/libc.so.6 #1 0x00007f642c958920 in sleep () from /lib64/libc.so.6 #2 0x00007f6420121895 in tier_start () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/tier.so #3 0x00007f642034fd27 in gf_defrag_start_crawl () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #4 0x00007f642df872b2 in synctask_wrap () from /usr/lib64/libglusterfs.so.0 #5 0x00007f642c8ef8f0 in ?? () from /lib64/libc.so.6 #6 0x0000000000000000 in ?? () Thread 16 (Thread 0x7f642367c700 (LWP 30819)): #0 0x00007f642d02ea0e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642df86d6b in syncenv_task () from /usr/lib64/libglusterfs.so.0 #2 0x00007f642df8bb80 in syncenv_processor () from /usr/lib64/libglusterfs.so.0 #3 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #4 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 15 (Thread 0x7f6421449700 (LWP 30820)): #0 0x00007f642c994f63 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f642dfa38c1 in ?? () from /usr/lib64/libglusterfs.so.0 #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x7f6419fce700 (LWP 30824)): #0 0x00007f642c994f63 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f642dfa38c1 in ?? () from /usr/lib64/libglusterfs.so.0 #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x7f640fce8700 (LWP 30830)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034d995 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7f640f2e7700 (LWP 30831)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034d995 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7f640e8e6700 (LWP 30832)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034d995 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7f640dee5700 (LWP 30833)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034d995 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7f640d4e4700 (LWP 30834)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034d995 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7f640cae3700 (LWP 30835)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034d995 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7f63f3fff700 (LWP 30836)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034db13 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7f63f35fe700 (LWP 30837)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034db13 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7f63f2bfd700 (LWP 30838)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034db13 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7f63f21fc700 (LWP 30839)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034db13 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7f63f17fb700 (LWP 30840)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034db13 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7f63f0dfa700 (LWP 30841)): #0 0x00007f642d02e63c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f642034db13 in gf_defrag_task () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/distribute.so #2 0x00007f642d02aa51 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f642c99496d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7f642e3ee740 (LWP 30815)): #0 0x00007f642d02b2ad in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f642dfa353d in ?? () from /usr/lib64/libglusterfs.so.0 #2 0x00007f642e40aef1 in main () [root@nchilaka-tier-01 ~]# pstack 30815tier Process 30815tier not found. [root@nchilaka-tier-01 ~]# pstack 30815|grep tier #2 0x00007f6420121895 in tier_start () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/tier.so [root@nchilaka-tier-01 ~]# pstack 30815|grep tier_start #2 0x00007f6420121895 in tier_start () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/tier.so [root@nchilaka-tier-01 ~]# pstack 30815|grep tier_start #2 0x00007f6420121895 in tier_start () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/tier.so [root@nchilaka-tier-01 ~]# pstack 30815|grep tier_start #2 0x00007f6420121895 in tier_start () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/tier.so [root@nchilaka-tier-01 ~]# ps -ef|grep tier root 15354 15015 0 17:32 pts/0 00:00:00 grep tier root 30815 1 0 00:06 ? 00:00:08 /usr/sbin/glusterfs -s localhost --volfile-id rebalance/vol1 --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.readdir-optimize=on --xlator-option *tier-dht.xattr-name=trusted.tier-gfid --xlator-option *dht.rebalance-cmd=6 --xlator-option *dht.node-uuid=d1a42b37-f19d-4b76-a676-a2443b0ccec8 --xlator-option *dht.commit-hash=2905257114 --socket-file /var/run/gluster/gluster-tier-bc3c0dd1-dfd0-4a58-9652-aa231333202c.sock --pid-file /var/lib/glusterd/vols/vol1/tier/d1a42b37-f19d-4b76-a676-a2443b0ccec8.pid -l /var/log/glusterfs/vol1-tier.log [root@nchilaka-tier-01 ~]# [root@nchilaka-tier-01 ~]# pstack 30815|grep tier #2 0x00007f6420121895 in tier_start () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/tier.so [root@nchilaka-tier-01 ~]# pstack 30815|grep tier #2 0x00007f6420121895 in tier_start () from /usr/lib64/glusterfs/3.7.1/xlator/cluster/tier.so [root@nchilaka-tier-01 ~]# ps -ef|grep tier root 15440 15015 0 17:33 pts/0 00:00:00 grep tier root 30815 1 0 00:06 ? 00:00:08 /usr/sbin/glusterfs -s localhost --volfile-id rebalance/vol1 --xlator-option *dht.use-readdirp=yes --xlator-option *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off --xlator-option *replicate*.readdir-failover=off --xlator-option *dht.readdir-optimize=on --xlator-option *tier-dht.xattr-name=trusted.tier-gfid --xlator-option *dht.rebalance-cmd=6 --xlator-option *dht.node-uuid=d1a42b37-f19d-4b76-a676-a2443b0ccec8 --xlator-option *dht.commit-hash=2905257114 --socket-file /var/run/gluster/gluster-tier-bc3c0dd1-dfd0-4a58-9652-aa231333202c.sock --pid-file /var/lib/glusterd/vols/vol1/tier/d1a42b37-f19d-4b76-a676-a2443b0ccec8.pid -l /var/log/glusterfs/vol1-tier.log [root@nchilaka-tier-01 ~]# Moving bug to verified [root@nchilaka-tier-01 ~]# [root@nchilaka-tier-01 ~]# gluster --version glusterfs 3.7.1 built on Jul 12 2015 22:27:42 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@nchilaka-tier-01 ~]# rpm -qa|grep gluster gluster-nagios-common-0.2.0-1.el6rhs.noarch glusterfs-3.7.1-9.el6rhs.x86_64 glusterfs-cli-3.7.1-9.el6rhs.x86_64 gluster-nagios-addons-0.2.4-4.el6rhs.x86_64 glusterfs-libs-3.7.1-9.el6rhs.x86_64 glusterfs-client-xlators-3.7.1-9.el6rhs.x86_64 glusterfs-api-3.7.1-9.el6rhs.x86_64 glusterfs-server-3.7.1-9.el6rhs.x86_64 glusterfs-rdma-3.7.1-9.el6rhs.x86_64 python-gluster-3.7.1-9.el6rhs.x86_64 vdsm-gluster-4.16.20-1.2.el6rhs.noarch glusterfs-fuse-3.7.1-9.el6rhs.x86_64 glusterfs-geo-replication-3.7.1-9.el6rhs.x86_64 [root@nchilaka-tier-01 ~]#
Got the information on how to verify from Rafi
sosreports for qe verified @ [qe-admin@rhsqe-repo bug.1227469]$ pwd /home/repo/sosreports/bug.1227469
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-1495.html