1686399 – listing a file while writing to it causes deadlock

Bug 1686399 - listing a file while writing to it causes deadlock

Summary: listing a file while writing to it causes deadlock

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	6
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1674412
Blocks:
TreeView+	depends on / blocked

Reported:	2019-03-07 11:34 UTC by Raghavendra G
Modified:	2019-03-25 16:33 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-6.0
Clone Of:	1674412
Environment:
Last Closed:	2019-03-08 14:08:58 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	22322	0	None	Merged	performance/readdir-ahead: fix deadlock	2019-03-08 14:08:57 UTC

Description Raghavendra G 2019-03-07 11:34:23 UTC

+++ This bug was initially created as a clone of Bug #1674412 +++

Description of problem:

Following test case was given by Nithya.
Create a pure replicate volume and enable the following options:
Volume Name: xvol
Type: Replicate
Volume ID: 095d6083-ea82-4ec9-a3a9-498fbd5f8dbe
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.122.7:/bricks/brick1/xvol-1
Brick2: 192.168.122.7:/bricks/brick1/xvol-2
Brick3: 192.168.122.7:/bricks/brick1/xvol-3
Options Reconfigured:
server.event-threads: 4
client.event-threads: 4
performance.parallel-readdir: on
performance.readdir-ahead: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


Fuse mount using:
mount -t glusterfs -o lru-limit=500 -s 192.168.122.7:/xvol /mnt/g1
mkdir /mnt/g1/dirdd

From terminal 1:
cd /mnt/g1/dirdd
while (true); do ls -lR dirdd; done

From terminal 2:
while true; do dd if=/dev/urandom of=/mnt/g1/dirdd/1G.file bs=1M count=1; rm -f /mnt/g1/dirdd/1G.file; done

On running this test, both dd and ls hang after some time.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Raghavendra G on 2019-02-11 10:01:41 UTC ---

(gdb) thr 8
[Switching to thread 8 (Thread 0x7f28072d1700 (LWP 26397))]
#0  0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f2813a3bdcb in _L_lock_812 () from /lib64/libpthread.so.0
#2  0x00007f2813a3bc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f2805e3122f in rda_inode_ctx_get_iatt (inode=0x7f27ec0010b8, this=0x7f2800012560, attr=0x7f28072d0700) at readdir-ahead.c:286
#4  0x00007f2805e3134d in __rda_fill_readdirp (ctx=0x7f27f800f290, request_size=<optimized out>, entries=0x7f28072d0890, this=0x7f2800012560) at readdir-ahead.c:326
#5  __rda_serve_readdirp (this=this@entry=0x7f2800012560, ctx=ctx@entry=0x7f27f800f290, size=size@entry=4096, entries=entries@entry=0x7f28072d0890, op_errno=op_errno@entry=0x7f28072d085c) at readdir-ahead.c:353
#6  0x00007f2805e32732 in rda_fill_fd_cbk (frame=0x7f27f801c1e8, cookie=<optimized out>, this=0x7f2800012560, op_ret=3, op_errno=2, entries=<optimized out>, xdata=0x0) at readdir-ahead.c:581
#7  0x00007f2806097447 in client4_0_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f27f800f498) at client-rpc-fops_v2.c:2339
#8  0x00007f28149a29d1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2800051120, pollin=pollin@entry=0x7f280006a180) at rpc-clnt.c:755
#9  0x00007f28149a2d37 in rpc_clnt_notify (trans=0x7f28000513e0, mydata=0x7f2800051150, event=<optimized out>, data=0x7f280006a180) at rpc-clnt.c:922
#10 0x00007f281499f5e3 in rpc_transport_notify (this=this@entry=0x7f28000513e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f280006a180) at rpc-transport.c:542
#11 0x00007f2808d88f77 in socket_event_poll_in (notify_handled=true, this=0x7f28000513e0) at socket.c:2522
#12 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f28000513e0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000')
    at socket.c:2924
#13 0x00007f2814c5a926 in event_dispatch_epoll_handler (event=0x7f28072d0e80, event_pool=0x90d560) at event-epoll.c:648
#14 event_dispatch_epoll_worker (data=0x96f1e0) at event-epoll.c:762
#15 0x00007f2813a39dd5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f2813302b3d in clone () from /lib64/libc.so.6
[Switching to thread 7 (Thread 0x7f2806ad0700 (LWP 26398))]
#0  0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f2813a3bdcb in _L_lock_812 () from /lib64/libpthread.so.0
#2  0x00007f2813a3bc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f2805e2cd85 in rda_mark_inode_dirty (this=this@entry=0x7f2800012560, inode=0x7f27ec009da8) at readdir-ahead.c:234
#4  0x00007f2805e2f3cc in rda_writev_cbk (frame=0x7f27f800ef48, cookie=<optimized out>, this=0x7f2800012560, op_ret=131072, op_errno=0, prebuf=0x7f2806acf870, postbuf=0x7f2806acf910, xdata=0x0)
    at readdir-ahead.c:769
#5  0x00007f2806094064 in client4_0_writev_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f27f801a7f8) at client-rpc-fops_v2.c:685
#6  0x00007f28149a29d1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2800051120, pollin=pollin@entry=0x7f27f8008320) at rpc-clnt.c:755
#7  0x00007f28149a2d37 in rpc_clnt_notify (trans=0x7f28000513e0, mydata=0x7f2800051150, event=<optimized out>, data=0x7f27f8008320) at rpc-clnt.c:922
#8  0x00007f281499f5e3 in rpc_transport_notify (this=this@entry=0x7f28000513e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f27f8008320) at rpc-transport.c:542
#9  0x00007f2808d88f77 in socket_event_poll_in (notify_handled=true, this=0x7f28000513e0) at socket.c:2522
#10 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f28000513e0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000')
    at socket.c:2924
#11 0x00007f2814c5a926 in event_dispatch_epoll_handler (event=0x7f2806acfe80, event_pool=0x90d560) at event-epoll.c:648
#12 event_dispatch_epoll_worker (data=0x96f4b0) at event-epoll.c:762
#13 0x00007f2813a39dd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f2813302b3d in clone () from /lib64/libc.so.6


In writev and readdirp codepath inode and fd-ctx locks are acquired in opposite order causing a deadlock.

--- Additional comment from Worker Ant on 2019-03-07 11:24:16 UTC ---

REVIEW: https://review.gluster.org/22321 (performance/readdir-ahead: fix deadlock) posted (#1) for review on master by Raghavendra G

Comment 1 Worker Ant 2019-03-07 15:14:07 UTC

REVIEW: https://review.gluster.org/22322 (performance/readdir-ahead: fix deadlock) posted (#2) for review on release-6 by Raghavendra G

Comment 2 Worker Ant 2019-03-08 14:08:58 UTC

REVIEW: https://review.gluster.org/22322 (performance/readdir-ahead: fix deadlock) merged (#3) on release-6 by Shyamsundar Ranganathan

Comment 3 Shyamsundar 2019-03-25 16:33:31 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.