Bug 1392899

Summary:	stat of file is hung with possible deadlock
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Nag Pavan Chilakam <nchilaka>
Component:	core	Assignee:	Pranith Kumar K <pkarampu>
Status:	CLOSED ERRATA	QA Contact:	Nag Pavan Chilakam <nchilaka>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.2	CC:	amukherj, pkarampu, rcyriac, rhinduja, rhs-bugs, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.8.4-5	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1393259 (view as bug list)		Environment:
Last Closed:	2017-03-23 06:17:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1351528, 1393259, 1393677, 1393682

Description Nag Pavan Chilakam 2016-11-08 12:48:13 UTC

Description of problem:
=========================
I have setup a systemic testbed, where I have a 4x2 volume spanning 4 nodes.
I hav enabled below features, look at vol info:
Volume Name: drvol
Type: Distributed-Replicate
Volume ID: 2f0e5510-fe47-4ce8-906e-6ddc7f9334ca
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: 10.70.35.191:/rhs/brick1/drvol
Brick2: 10.70.37.108:/rhs/brick1/drvol
Brick3: 10.70.35.3:/rhs/brick1/drvol
Brick4: 10.70.37.66:/rhs/brick1/drvol
Brick5: 10.70.35.191:/rhs/brick2/drvol
Brick6: 10.70.37.108:/rhs/brick2/drvol
Brick7: 10.70.35.3:/rhs/brick2/drvol
Brick8: 10.70.37.66:/rhs/brick2/drvol
Options Reconfigured:
cluster.use-compound-fops: on
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.uss: enable
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
features.barrier: disable
cluster.shd-max-threads: 16
performance.md-cache-timeout: 600
performance.cache-invalidation: true
features.cache-invalidation-timeout: 300
features.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on


I then mounted the volume on 10 different clients and did following IOs:
From all clients:===> started taking statedump of the fuse mount process every 5 minutes and moving them to a dedicated directory for each host on the mount point(so into gluster vol)
From all clients:====>collecting top and cpu usage every 2 mins and appending the contents into a file for each host on the mount point(so into gluster vol)

From 2(say A&B) clients:===> creating a big deapth and breadth dir structure ==>both the client fuse crashed due to oom kill , as it had low memory of 4gb only

from 1(say C) client ===> did file creates of about 1million using touch

from 1(say D)  client==> did ls -l of each file by name which were getting created by above client

from 1(say E)  client ==>did stat of each file by name which were getting created by client C
I saw that client E , the stat was hung indefinately.

CLient is rhel6.8 

memory consumption is as below:
[root@rhs-client14 ~]# top -n 1 -b|grep 5377
 5377 root      20   0 7029m 5.9g 4436 T  0.0 38.0 408:49.23 glusterfs      




PStack info of the pid:

[root@rhs-client14 ~]# pstack 5377
Thread 10 (Thread 0x7fce12ab6700 (LWP 5378)):
#0  0x00007fce1abbb00d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007fce1bade265 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0
#2  0x00007fce1abb3aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fce1a51caad in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7fce120b5700 (LWP 5379)):
#0  0x00007fce1abb8465 in ?? () from /lib64/libpthread.so.0
#1  0x00007fce1baf4c8c in fd_ctx_dump () from /usr/lib64/libglusterfs.so.0
#2  0x00007fce1bade793 in inode_dump () from /usr/lib64/libglusterfs.so.0
#3  0x00007fce1bae0957 in inode_table_dump () from /usr/lib64/libglusterfs.so.0
#4  0x00007fce12aca289 in fuse_itable_dump () from /usr/lib64/glusterfs/3.8.4/xlator/mount/fuse.so
#5  0x00007fce1bafdd92 in gf_proc_dump_xlator_info () from /usr/lib64/libglusterfs.so.0
#6  0x00007fce1bafe6c5 in gf_proc_dump_info () from /usr/lib64/libglusterfs.so.0
#7  0x00007fce1bfc53ed in glusterfs_sigwaiter ()
#8  0x00007fce1abb3aa1 in start_thread () from /lib64/libpthread.so.0
#9  0x00007fce1a51caad in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7fce116b4700 (LWP 5380)):
#0  0x00007fce1abb7a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fce1bb09cdc in syncenv_task () from /usr/lib64/libglusterfs.so.0
#2  0x00007fce1bb17d20 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
#3  0x00007fce1abb3aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fce1a51caad in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7fce0ea7e700 (LWP 5383)):
#0  0x00007fce1abba334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fce1abb55d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007fce1abb54a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fce1bae1741 in inode_unref () from /usr/lib64/libglusterfs.so.0
#4  0x00007fce1bacdbf2 in loc_wipe () from /usr/lib64/libglusterfs.so.0
#5  0x00007fce0de3d94e in client_local_wipe () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so
#6  0x00007fce0de5686c in client3_3_open_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so
#7  0x00007fce1b89a925 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0
#8  0x00007fce1b89ba8c in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
#9  0x00007fce1b896bc8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#10 0x00007fce100a456d in socket_event_poll_in () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#11 0x00007fce100a585e in socket_event_handler () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#12 0x00007fce1bb2cc96 in event_dispatch_epoll_worker () from /usr/lib64/libglusterfs.so.0
#13 0x00007fce1abb3aa1 in start_thread () from /lib64/libpthread.so.0
#14 0x00007fce1a51caad in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7fce076ff700 (LWP 5384)):
#0  0x00007fce1abba334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fce1abb55d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007fce1abb54a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fce1bae1741 in inode_unref () from /usr/lib64/libglusterfs.so.0
#4  0x00007fce1bacdbf2 in loc_wipe () from /usr/lib64/libglusterfs.so.0
#5  0x00007fce0cce1b7c in ob_fd_free () from /usr/lib64/glusterfs/3.8.4/xlator/performance/open-behind.so
#6  0x00007fce0cce1dd0 in ob_wake_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/performance/open-behind.so
#7  0x00007fce1bb5e186 in default_open_cbk () from /usr/lib64/libglusterfs.so.0
#8  0x00007fce0d0f2c39 in ioc_open_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/performance/io-cache.so
#9  0x00007fce0d50f754 in ra_open_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/performance/read-ahead.so
#10 0x00007fce0d99139e in dht_open_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/distribute.so
#11 0x00007fce0dbdb26d in afr_open_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so
#12 0x00007fce0de5685a in client3_3_open_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/client.so
#13 0x00007fce1b89a925 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0
#14 0x00007fce1b89ba8c in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
#15 0x00007fce1b896bc8 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#16 0x00007fce100a456d in socket_event_poll_in () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#17 0x00007fce100a585e in socket_event_handler () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so
#18 0x00007fce1bb2cc96 in event_dispatch_epoll_worker () from /usr/lib64/libglusterfs.so.0
#19 0x00007fce1abb3aa1 in start_thread () from /lib64/libpthread.so.0
#20 0x00007fce1a51caad in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7fce0c26e700 (LWP 5385)):
#0  0x00007fce1abb7a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fce0c8c1a6c in iot_worker () from /usr/lib64/glusterfs/3.8.4/xlator/performance/io-threads.so
#2  0x00007fce1abb3aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fce1a51caad in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7fcdf7efe700 (LWP 5389)):
#0  0x00007fce1abba334 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007fce1abb55d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007fce1abb54a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007fce1bae1ac1 in inode_ref () from /usr/lib64/libglusterfs.so.0
#4  0x00007fce12abc01f in fuse_ino_to_inode () from /usr/lib64/glusterfs/3.8.4/xlator/mount/fuse.so
#5  0x00007fce12abd01d in fuse_resolve_inode_init () from /usr/lib64/glusterfs/3.8.4/xlator/mount/fuse.so
#6  0x00007fce12ac832a in fuse_getattr () from /usr/lib64/glusterfs/3.8.4/xlator/mount/fuse.so
#7  0x00007fce12ad6270 in fuse_thread_proc () from /usr/lib64/glusterfs/3.8.4/xlator/mount/fuse.so
#8  0x00007fce1abb3aa1 in start_thread () from /lib64/libpthread.so.0
#9  0x00007fce1a51caad in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7fcdf74fd700 (LWP 5390)):
#0  0x00007fce1abb768c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fce12ac7d9b in notify_kernel_loop () from /usr/lib64/glusterfs/3.8.4/xlator/mount/fuse.so
#2  0x00007fce1abb3aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x00007fce1a51caad in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fcca2ffa700 (LWP 6850)):
#0  0x00007fce1abb7a5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007fce1bb09cdc in syncenv_task () from /usr/lib64/libglusterfs.so.0
#2  0x00007fce1bb17d20 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
#3  0x00007fce1abb3aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007fce1a51caad in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fce1bfa8740 (LWP 5377)):
#0  0x00007fce1abb42fd in pthread_join () from /lib64/libpthread.so.0
#1  0x00007fce1bb2c75d in event_dispatch_epoll () from /usr/lib64/libglusterfs.so.0
#2  0x00007fce1bfc749b in main ()
[root@rhs-client14 ~]# 




[root@rhs-client14 ~]# rpm -qa|grep gluster
glusterfs-client-xlators-3.8.4-3.el6.x86_64
glusterfs-3.8.4-3.el6.x86_64
glusterfs-fuse-3.8.4-3.el6.x86_64
glusterfs-libs-3.8.4-3.el6.x86_64
glusterfs-debuginfo-3.8.4-3.el6.x86_64
[root@rhs-client14 ~]#

Comment 2 Nag Pavan Chilakam 2016-11-08 13:03:42 UTC

also note that two bricks were brought down. 
[root@dhcp37-108 ~]# gluster v status
Status of volume: drvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.191:/rhs/brick1/drvol        N/A       N/A        N       N/A  
Brick 10.70.37.108:/rhs/brick1/drvol        49154     0          Y       14087
Brick 10.70.35.3:/rhs/brick1/drvol          49154     0          Y       4790 
Brick 10.70.37.66:/rhs/brick1/drvol         49154     0          Y       14602
Brick 10.70.35.191:/rhs/brick2/drvol        49156     0          Y       15070
Brick 10.70.37.108:/rhs/brick2/drvol        N/A       N/A        N       N/A  
Brick 10.70.35.3:/rhs/brick2/drvol          49155     0          Y       4810 
Brick 10.70.37.66:/rhs/brick2/drvol         49155     0          Y       14622
Snapshot Daemon on localhost                49152     0          Y       14278
Self-heal Daemon on localhost               N/A       N/A        Y       14128
Quota Daemon on localhost                   N/A       N/A        Y       14226
Snapshot Daemon on 10.70.35.3               49152     0          Y       4974 
Self-heal Daemon on 10.70.35.3              N/A       N/A        Y       4835 
Quota Daemon on 10.70.35.3                  N/A       N/A        Y       4928 
Snapshot Daemon on 10.70.37.66              49152     0          Y       14795
Self-heal Daemon on 10.70.37.66             N/A       N/A        Y       14643
Quota Daemon on 10.70.37.66                 N/A       N/A        Y       14742
Snapshot Daemon on dhcp35-191.lab.eng.blr.r
edhat.com                                   49153     0          Y       15307
Self-heal Daemon on dhcp35-191.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       15091
Quota Daemon on dhcp35-191.lab.eng.blr.redh
at.com                                      N/A       N/A        Y       15239
 
Task Status of Volume drvol
------------------------------------------------------------------------------
There are no active volume tasks

Comment 3 Pranith Kumar K 2016-11-08 17:33:54 UTC

open-behind is taking fd->lock then inode->lock where as statedump is taking inode->lock then fd->lock, so it is leading to deadlock:

void
ob_fd_free (ob_fd_t *ob_fd)
{
        loc_wipe (&ob_fd->loc); <<--- this takes (inode->lock)

        if (ob_fd->xdata)
                dict_unref (ob_fd->xdata);

        if (ob_fd->open_frame)
                STACK_DESTROY (ob_fd->open_frame->root);

        GF_FREE (ob_fd);
}


int
ob_wake_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
             int op_ret, int op_errno, fd_t *fd_ret, dict_t *xdata)
{
        fd_t              *fd = NULL;
        struct list_head   list;
        ob_fd_t           *ob_fd = NULL;
        call_stub_t       *stub = NULL, *tmp = NULL;

        fd = frame->local;
        frame->local = NULL;

        INIT_LIST_HEAD (&list);

        LOCK (&fd->lock); <<---- fd->lock
        {
                ob_fd = __ob_fd_ctx_get (this, fd);

                list_splice_init (&ob_fd->list, &list);

                if (op_ret < 0) {
                        /* mark fd BAD for ever */
                        ob_fd->op_errno = op_errno;
                } else {
                        __fd_ctx_del (fd, this, NULL);
                        ob_fd_free (ob_fd);
                }
        }
        UNLOCK (&fd->lock);


==============================================================

inode_dump (inode_t *inode, char *prefix)
{
        int                ret       = -1;
        xlator_t          *xl        = NULL;
        int                i         = 0;
        fd_t              *fd        = NULL;
        struct _inode_ctx *inode_ctx = NULL;
        struct list_head fd_list;

        if (!inode)
                return;

        INIT_LIST_HEAD (&fd_list);

        ret = TRY_LOCK(&inode->lock); <<---- takes inode->lock
        if (ret != 0) {
                return;
        }

        {
                gf_proc_dump_write("gfid", "%s", uuid_utoa (inode->gfid));
                gf_proc_dump_write("nlookup", "%ld", inode->nlookup);
                gf_proc_dump_write("fd-count", "%u", inode->fd_count);
                gf_proc_dump_write("ref", "%u", inode->ref);
                gf_proc_dump_write("ia_type", "%d", inode->ia_type);
                if (inode->_ctx) {
                        inode_ctx = GF_CALLOC (inode->table->ctxcount,
                                               sizeof (*inode_ctx),
                                               gf_common_mt_inode_ctx);
                        if (inode_ctx == NULL) {
                                goto unlock;
                        }

                        for (i = 0; i < inode->table->ctxcount;
                             i++) {
                                inode_ctx[i] = inode->_ctx[i];
                        }
                }

                if (dump_options.xl_options.dump_fdctx != _gf_true)
                        goto unlock;


                list_for_each_entry (fd, &inode->fd_list, inode_list) {
                        fd_ctx_dump (fd, prefix); <<<-----------------
                }
        }

fd_ctx_dump (fd_t *fd, char *prefix)
{
        struct _fd_ctx *fd_ctx = NULL;
        xlator_t       *xl     = NULL;
        int    i               = 0;


        if ((fd == NULL) || (fd->_ctx == NULL)) {
                goto out;
        }

        LOCK (&fd->lock); <<<-------------------
        {
                if (fd->_ctx != NULL) {
                        fd_ctx = GF_CALLOC (fd->xl_count, sizeof (*fd_ctx),
                                            gf_common_mt_fd_ctx);
                        if (fd_ctx == NULL) {
                                goto unlock;
                        }

                        for (i = 0; i < fd->xl_count; i++) {
                                fd_ctx[i] = fd->_ctx[i];
                        }
                }
        }

Comment 4 Pranith Kumar K 2016-11-09 09:29:38 UTC

http://review.gluster.org/15808

Comment 7 Pranith Kumar K 2016-11-11 10:08:31 UTC

https://code.engineering.redhat.com/gerrit/#/c/89794/

Comment 9 Nag Pavan Chilakam 2016-11-15 07:34:08 UTC

I have run this case for two days on my systemic setup and didn't hit any hang/deadlock Hence moving to verified
test version:3.8.4-5

Comment 11 errata-xmlrpc 2017-03-23 06:17:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html