Description of problem: Following test case was given by Nithya. Create a pure replicate volume and enable the following options: Volume Name: xvol Type: Replicate Volume ID: 095d6083-ea82-4ec9-a3a9-498fbd5f8dbe Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 192.168.122.7:/bricks/brick1/xvol-1 Brick2: 192.168.122.7:/bricks/brick1/xvol-2 Brick3: 192.168.122.7:/bricks/brick1/xvol-3 Options Reconfigured: server.event-threads: 4 client.event-threads: 4 performance.parallel-readdir: on performance.readdir-ahead: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off Fuse mount using: mount -t glusterfs -o lru-limit=500 -s 192.168.122.7:/xvol /mnt/g1 mkdir /mnt/g1/dirdd From terminal 1: cd /mnt/g1/dirdd while (true); do ls -lR dirdd; done From terminal 2: while true; do dd if=/dev/urandom of=/mnt/g1/dirdd/1G.file bs=1M count=1; rm -f /mnt/g1/dirdd/1G.file; done On running this test, both dd and ls hang after some time. Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
(gdb) thr 8 [Switching to thread 8 (Thread 0x7f28072d1700 (LWP 26397))] #0 0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f2813a3bdcb in _L_lock_812 () from /lib64/libpthread.so.0 #2 0x00007f2813a3bc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f2805e3122f in rda_inode_ctx_get_iatt (inode=0x7f27ec0010b8, this=0x7f2800012560, attr=0x7f28072d0700) at readdir-ahead.c:286 #4 0x00007f2805e3134d in __rda_fill_readdirp (ctx=0x7f27f800f290, request_size=<optimized out>, entries=0x7f28072d0890, this=0x7f2800012560) at readdir-ahead.c:326 #5 __rda_serve_readdirp (this=this@entry=0x7f2800012560, ctx=ctx@entry=0x7f27f800f290, size=size@entry=4096, entries=entries@entry=0x7f28072d0890, op_errno=op_errno@entry=0x7f28072d085c) at readdir-ahead.c:353 #6 0x00007f2805e32732 in rda_fill_fd_cbk (frame=0x7f27f801c1e8, cookie=<optimized out>, this=0x7f2800012560, op_ret=3, op_errno=2, entries=<optimized out>, xdata=0x0) at readdir-ahead.c:581 #7 0x00007f2806097447 in client4_0_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f27f800f498) at client-rpc-fops_v2.c:2339 #8 0x00007f28149a29d1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2800051120, pollin=pollin@entry=0x7f280006a180) at rpc-clnt.c:755 #9 0x00007f28149a2d37 in rpc_clnt_notify (trans=0x7f28000513e0, mydata=0x7f2800051150, event=<optimized out>, data=0x7f280006a180) at rpc-clnt.c:922 #10 0x00007f281499f5e3 in rpc_transport_notify (this=this@entry=0x7f28000513e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f280006a180) at rpc-transport.c:542 #11 0x00007f2808d88f77 in socket_event_poll_in (notify_handled=true, this=0x7f28000513e0) at socket.c:2522 #12 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f28000513e0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2924 #13 0x00007f2814c5a926 in event_dispatch_epoll_handler (event=0x7f28072d0e80, event_pool=0x90d560) at event-epoll.c:648 #14 event_dispatch_epoll_worker (data=0x96f1e0) at event-epoll.c:762 #15 0x00007f2813a39dd5 in start_thread () from /lib64/libpthread.so.0 #16 0x00007f2813302b3d in clone () from /lib64/libc.so.6 [Switching to thread 7 (Thread 0x7f2806ad0700 (LWP 26398))] #0 0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f2813a3bdcb in _L_lock_812 () from /lib64/libpthread.so.0 #2 0x00007f2813a3bc98 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f2805e2cd85 in rda_mark_inode_dirty (this=this@entry=0x7f2800012560, inode=0x7f27ec009da8) at readdir-ahead.c:234 #4 0x00007f2805e2f3cc in rda_writev_cbk (frame=0x7f27f800ef48, cookie=<optimized out>, this=0x7f2800012560, op_ret=131072, op_errno=0, prebuf=0x7f2806acf870, postbuf=0x7f2806acf910, xdata=0x0) at readdir-ahead.c:769 #5 0x00007f2806094064 in client4_0_writev_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f27f801a7f8) at client-rpc-fops_v2.c:685 #6 0x00007f28149a29d1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2800051120, pollin=pollin@entry=0x7f27f8008320) at rpc-clnt.c:755 #7 0x00007f28149a2d37 in rpc_clnt_notify (trans=0x7f28000513e0, mydata=0x7f2800051150, event=<optimized out>, data=0x7f27f8008320) at rpc-clnt.c:922 #8 0x00007f281499f5e3 in rpc_transport_notify (this=this@entry=0x7f28000513e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f27f8008320) at rpc-transport.c:542 #9 0x00007f2808d88f77 in socket_event_poll_in (notify_handled=true, this=0x7f28000513e0) at socket.c:2522 #10 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f28000513e0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000') at socket.c:2924 #11 0x00007f2814c5a926 in event_dispatch_epoll_handler (event=0x7f2806acfe80, event_pool=0x90d560) at event-epoll.c:648 #12 event_dispatch_epoll_worker (data=0x96f4b0) at event-epoll.c:762 #13 0x00007f2813a39dd5 in start_thread () from /lib64/libpthread.so.0 #14 0x00007f2813302b3d in clone () from /lib64/libc.so.6 In writev and readdirp codepath inode and fd-ctx locks are acquired in opposite order causing a deadlock.
REVIEW: https://review.gluster.org/22321 (performance/readdir-ahead: fix deadlock) posted (#1) for review on master by Raghavendra G
REVIEW: https://review.gluster.org/22321 (performance/readdir-ahead: fix deadlock) merged (#2) on master by Raghavendra G