Bug 1729052

Summary: glusterfs-fuse client mount point process stuck du to inode->lock and table->lock deadlock
Product: [Community] GlusterFS Reporter: Rongjunfei <rongjunfei>
Component: coreAssignee: Xavi Hernandez <jahernan>
Status: CLOSED NEXTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 6CC: atumball, bugs
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-8.x Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-15 07:27:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rongjunfei 2019-07-11 09:53:33 UTC
Description of problem:
glusterfs-fuse client mount point process stuck
lead to my "postmark" stop write 

Version-Release number of selected component (if applicable):
release-6.3

How reproducible:
It was always reproducible

Steps to Reproduce:
1.3 node,create ec volume test 12 x (2+1)
2.set test shard on
3.mount -t glusterfs 127.0.0.1:/test /mnt (on 3 node)
4."postmark" write on /mnt (on 3 node)
5.on node-3, execute cmd "ps aux | grep mnt",get pid of fuse client mount point
  than, "kill -SIGUSR1 pid" to get the "statdump" file of client mount point
6.execute this cmd "kill -SIGUSR1 pid" to get the "statdump" serveral times
  until I find "postmark" stuck and "statedump" file is incomplete

Actual results:

I get the incomplete "statdump file" name "dump8QsKgE"
I copy the gdb info below:
(gdb) info threads
  Id   Target Id         Frame 
  18   Thread 0x7f375f891700 (LWP 28461) "glfs_timer" 0x00007f37672adbdd in nanosleep () from /lib64/libpthread.so.0
  17   Thread 0x7f375f090700 (LWP 28462) "glfs_sigwait" 0x00007f37672ad1bd in __lll_lock_wait () from /lib64/libpthread.so.0
  16   Thread 0x7f375e88f700 (LWP 28463) "glfs_memsweep" 0x00007f3766bb266d in nanosleep () from /lib64/libc.so.6
  15   Thread 0x7f375e08e700 (LWP 28464) "glfs_sproc0" 0x00007f37672aaa82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  14   Thread 0x7f375d88d700 (LWP 28465) "glfs_sproc1" 0x00007f37672aaa82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  13   Thread 0x7f375d08c700 (LWP 28466) "digioceanfs" 0x00007f3766be2ba3 in select () from /lib64/libc.so.6
  12   Thread 0x7f375adcb700 (LWP 28471) "glfs_epoll000" 0x00007f37672ad1bd in __lll_lock_wait () from /lib64/libpthread.so.0
  11   Thread 0x7f375a5ca700 (LWP 28472) "glfs_epoll001" 0x00007f37672ad1bd in __lll_lock_wait () from /lib64/libpthread.so.0
  10   Thread 0x7f37512f3700 (LWP 28497) "glfs_fuseproc" 0x00007f37672ad1bd in __lll_lock_wait () from /lib64/libpthread.so.0
  9    Thread 0x7f3750af2700 (LWP 28498) "glfs_fusedlyd" 0x00007f37672aa6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8    Thread 0x7f373ffff700 (LWP 28499) "glfs_fusenoti" 0x00007f37672aa6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7    Thread 0x7f373eefe700 (LWP 28714) "glfs_epoll000" 0x00007f37672aa6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6    Thread 0x7f373e6fd700 (LWP 28715) "glfs_epoll000" 0x00007f3766bb266d in nanosleep () from /lib64/libc.so.6
  5    Thread 0x7f375008c700 (LWP 28716) "glfs_iotwr000" 0x00007f37672ad1bd in __lll_lock_wait () from /lib64/libpthread.so.0
  4    Thread 0x7f373c825700 (LWP 14311) "glfs_sproc003" 0x00007f37672aaa82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3    Thread 0x7f3712dfd700 (LWP 16002) "glfs_sproc006" 0x00007f37672aaa82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2    Thread 0x7f37125fc700 (LWP 31651) "glfs_sproc007" 0x00007f37672aaa82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 1    Thread 0x7f3768956780 (LWP 28459) "digioceanfs" 0x00007f37672a7ef7 in pthread_join () from /lib64/libpthread.so.0


(gdb) thread 18
[Switching to thread 18 (Thread 0x7f375f090700 (LWP 28462))]
#0  0x00007f37672ad1bd in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f37672ad1bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f37672a8d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007f37672a8c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f3768477bd6 in inode_ctx_get0 (inode=inode@entry=0x7f373848cca8, xlator=xlator@entry=0x7f37542ecba0, value1=value1@entry=0x7f375f087418) at inode.c:2279
#4  0x00007f3758ba595f in ioc_inode_dump (this=0x7f37542ecba0, inode=0x7f373848cca8) at io-cache.c:2037
#5  0x00007f37684748c0 in inode_dump (inode=0x7f373848cca8, prefix=<optimized out>) at inode.c:2499
#6  0x00007f376847821e in inode_table_dump (itable=0x7f375423f8f0, prefix=prefix@entry=0x7f375f8b8df3 "xlator.mount.fuse.itable") at inode.c:2540
#7  0x00007f375f89a3fb in fuse_itable_dump (this=<optimized out>) at fuse-bridge.c:6039
#8  0x00007f3768496e33 in gf_proc_dump_single_xlator_info (trav=0x1a35140) at statedump.c:507
#9  0x00007f37684978e4 in gf_proc_dump_info (signum=signum@entry=10, ctx=0x19f6010) at statedump.c:887
#10 0x0000000000409851 in digioceanfs_sigwaiter (arg=<optimized out>) at digioceanfsd.c:2393
#11 0x00007f37672a6dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f3766beb73d in clone () from /lib64/libc.so.6
(gdb) up
#1  0x00007f37672a8d02 in _L_lock_791 () from /lib64/libpthread.so.0
(gdb) 
#2  0x00007f37672a8c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
(gdb) 
#3  0x00007f3768477bd6 in inode_ctx_get0 (inode=inode@entry=0x7f373848cca8, xlator=xlator@entry=0x7f37542ecba0, value1=value1@entry=0x7f375f087418) at inode.c:2279
2279	    LOCK(&inode->lock);
(gdb) l
2274	    int ret = 0;
2275	
2276	    if (!inode || !xlator)
2277	        return -1;
2278	
2279	    LOCK(&inode->lock);
2280	    {
2281	        ret = __inode_ctx_get0(inode, xlator, value1);
2282	    }
2283	    UNLOCK(&inode->lock);
(gdb) 


(gdb) thread 13
[Switching to thread 13 (Thread 0x7f375adcb700 (LWP 28471))]
#0  0x00007f37672ad1bd in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f37672ad1bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f37672a8d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007f37672a8c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f3768475849 in inode_ref (inode=inode@entry=0x7f373848cca8) at inode.c:669
#4  0x00007f376848d13b in __fd_create (inode=inode@entry=0x7f373848cca8, pid=pid@entry=0) at fd.c:590
#5  0x00007f376848e1ea in __fd_anonymous (inode=inode@entry=0x7f373848cca8, flags=flags@entry=2) at fd.c:734
#6  0x00007f376848e53d in fd_anonymous (inode=0x7f373848cca8) at fd.c:757
#7  0x00007f37598def58 in ec_writev_start (fop=0x7f36e43d8008) at ec-inode-write.c:1994
#8  0x00007f37598df8a5 in ec_manager_writev (fop=0x7f36e43d8008, state=<optimized out>) at ec-inode-write.c:2150
#9  0x00007f37598bdc0b in __ec_manager (fop=0x7f36e43d8008, error=0) at ec-common.c:2951
#10 0x00007f37598bdd68 in ec_resume (fop=0x7f36e43d8008, error=0) at ec-common.c:507
#11 0x00007f37598c05a0 in ec_lock_resume_shared (list=0x7f375adca790) at ec-common.c:1842
#12 0x00007f37598c1ee2 in ec_lock_next_owner (release=false, cbk=0x7f3756dffdd8, link=0x7f374f2053e0) at ec-common.c:2136
#13 ec_lock_reuse (fop=0x7f374f205338) at ec-common.c:2911
#14 0x00007f37598c59d5 in ec_manager_xattrop (fop=0x7f374f205338, state=<optimized out>) at ec-generic.c:1254
#15 0x00007f37598bdc0b in __ec_manager (fop=0x7f374f205338, error=0) at ec-common.c:2951
#16 0x00007f37598bdd68 in ec_resume (fop=0x7f374f205338, error=0) at ec-common.c:507
#17 0x00007f37598be39f in ec_complete (fop=0x7f374f205338) at ec-common.c:584
#18 0x00007f37598c5769 in ec_xattrop_cbk (frame=<optimized out>, cookie=0x1, this=<optimized out>, op_ret=<optimized out>, op_errno=0, xattr=0x7f37575e54d8, 
    xdata=0x7f37575e0db8) at ec-generic.c:1161
#19 0x00007f3759b90b67 in client4_0_fxattrop_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f374f5f6438) at client-rpc-fops_v2.c:1626
#20 0x00007f3768210721 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f37543cd580, pollin=pollin@entry=0x7f3754dacfe0) at rpc-clnt.c:755
#21 0x00007f3768210a87 in rpc_clnt_notify (trans=0x7f37543cd890, mydata=0x7f37543cd5b0, event=<optimized out>, data=0x7f3754dacfe0) at rpc-clnt.c:922
#22 0x00007f376820d4e3 in rpc_transport_notify (this=this@entry=0x7f37543cd890, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f3754dacfe0)
    at rpc-transport.c:542
#23 0x00007f375c676f39 in socket_event_poll_in (notify_handled=true, this=0x7f37543cd890) at socket.c:2522
#24 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f37543cd890, poll_in=<optimized out>, poll_out=<optimized out>, 
    poll_err=0, event_thread_died=0 '\000') at socket.c:2924
#25 0x00007f37684c68c6 in event_dispatch_epoll_handler (event=0x7f375adcae80, event_pool=0x1a2ce90) at event-epoll.c:648
#26 event_dispatch_epoll_worker (data=0x1a8e610) at event-epoll.c:761
#27 0x00007f37672a6dc5 in start_thread () from /lib64/libpthread.so.0
#28 0x00007f3766beb73d in clone () from /lib64/libc.so.6
(gdb) 
(gdb) up
#1  0x00007f37672a8d02 in _L_lock_791 () from /lib64/libpthread.so.0
(gdb) 
#2  0x00007f37672a8c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
(gdb) 
#3  0x00007f3768475849 in inode_ref (inode=inode@entry=0x7f373848cca8) at inode.c:669
669	    pthread_mutex_lock(&table->lock);
(gdb) l
664	    if (!inode)
665	        return NULL;
666	
667	    table = inode->table;
668	
669	    pthread_mutex_lock(&table->lock);
670	    {
671	        inode = __inode_ref(inode, false);
672	    }
673	    pthread_mutex_unlock(&table->lock);
(gdb) 


Expected results:


Additional info:
I try to look up the reason
1. ps aux | grep mnt
  get the pid of glusterfs-fuse client mount point
2. gdb -p xxx
3. info threads
4. I find "statdump threads" stuck waiting "inode->lock"
   and another thread stuck waiting "table->lock"
5. I find the reason from gluster source code below:
   one thread hand "write" walk below routing:
   "fd_anonymous" function gain "inode->lock", than gain "table->lock" in "inode_ref" function
   fd_anonymous(inode_t *inode)
{
    ......

    LOCK(&inode->lock);
    {
        fd = __fd_anonymous(inode, GF_ANON_FD_FLAGS);
    }
    UNLOCK(&inode->lock);

    ......
}
inode_ref(inode_t *inode)
{
    ......

    pthread_mutex_lock(&table->lock);
    {
        inode = __inode_ref(inode, false);
    }
    pthread_mutex_unlock(&table->lock);

    ......
}
6. another thread handle "statdump" walk below routing:
    "inode_table_dump" gain "table->lock", than gain "inode->lock" in "inode_dump" function
inode_table_dump (inode_table_t *itable, char *prefix)
{
     .....
     ret = pthread_mutex_trylock(&itable->lock);

     ---->....->inode_dump (inode_t *inode, char *prefix)
     pthread_mutex_unlock(&itable->lock);
     .....
}

7.the thread for handle "wirte" and thread for handle "statdump" will stuck for each other

Comment 1 Amar Tumballi 2019-07-15 07:27:26 UTC
https://review.gluster.org/22995 fixes the issue.

Comment 2 Amar Tumballi 2019-07-15 07:28:09 UTC
https://review.gluster.org/22995 fixes the issue.