Bug 1361519 - [Disperse] dd + rm + ls lead to IO hang
Summary: [Disperse] dd + rm + ls lead to IO hang
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: RHGS 3.2.0
Assignee: Pranith Kumar K
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
: 1362420 (view as bug list)
Depends On: 1231224 1346719 1362420 1371397 1371404 1373392 1373396
Blocks: 1351522
TreeView+ depends on / blocked
 
Reported: 2016-07-29 09:22 UTC by Pranith Kumar K
Modified: 2017-03-23 05:43 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.8.4-3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1346719
Environment:
Last Closed: 2017-03-23 05:43:35 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Pranith Kumar K 2016-07-29 09:22:07 UTC
+++ This bug was initially created as a clone of Bug #1346719 +++

Description of problem:

Creation of files and ls gets hanged while trying to do rm -rf in infinite loop

Version-Release number of selected component (if applicable):
[root@apandey gluster]# glusterfs --version
glusterfs 3.9dev built on Jun 15 2016 11:39:11
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.


How reproducible:
1/1

Steps to Reproduce:
1.  Create a disperse volume.
2. Mount this volume on 3 mount points- m1, m2 , m3
3. Create 10000 file on m1 using for and dd. After some time start rm -rf on m2 in an infinite loop. Start ls -lRT on m3 

Actual results:
IO Hang has been seen. on m1, m3. 

Expected results:
There should not be any hang.

Additional info:

Volume Name: vol
Type: Disperse
Volume ID: c81743b4-ab0e-4d9b-931b-4d67f4d24a75
Status: Started
Number of Bricks: 1 x (4 + 2) = 6
Transport-type: tcp
Bricks:
Brick1: apandey:/brick/gluster/vol-1
Brick2: apandey:/brick/gluster/vol-2
Brick3: apandey:/brick/gluster/vol-3
Brick4: apandey:/brick/gluster/vol-4
Brick5: apandey:/brick/gluster/vol-5
Brick6: apandey:/brick/gluster/vol-6
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: off
Status of volume: vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick apandey:/brick/gluster/vol-1          49152     0          Y       13179
Brick apandey:/brick/gluster/vol-2          49153     0          Y       13198
Brick apandey:/brick/gluster/vol-3          49154     0          Y       13217
Brick apandey:/brick/gluster/vol-4          49155     0          Y       13236
Brick apandey:/brick/gluster/vol-5          49156     0          Y       13255
Brick apandey:/brick/gluster/vol-6          49157     0          Y       13274
NFS Server on localhost                     N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       13302
 
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@apandey gluster]#  mount

usectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
apandey:vol on /mnt/glu type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
apandey:vol on /mnt/gfs type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
apandey:vol on /mnt/vol type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
[root@apandey gluster]#

--- Additional comment from Ashish Pandey on 2016-06-15 05:01:07 EDT ---

statedump shows some blocked inodelk - 

[conn.1.bound_xl./brick/gluster/vol-1.active.1]
gfid=00000000-0000-0000-0000-000000000001
nlookup=3
fd-count=3
ref=1
ia_type=2

[xlator.features.locks.vol-locks.inode]
path=/
mandatory=0
inodelk-count=3
lock-dump.domain.domain=dht.layout.heal
lock-dump.domain.domain=vol-disperse-0:self-heal
lock-dump.domain.domain=vol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 3327, owner=dc710738fd7e0000, client=0x7f283c1a7b00, connection-id=apandey-15766-2016/06/15-07:59:38:894408-vol-client-0-0-0, blocked at 2016-06-15 08:02:13, granted at 2016-06-15 08:02:13
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22451, owner=cc338ae8f07f0000, client=0x7f2834006660, connection-id=apandey-13531-2016/06/15-07:58:50:360055-vol-client-0-0-0, blocked at 2016-06-15 08:02:13
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 22530, owner=6cd51d48da7f0000, client=0x7f28342db820, connection-id=apandey-19856-2016/06/15-08:01:05:258794-vol-client-0-0-0, blocked at 2016-06-15 08:02:22

--- Additional comment from Ashish Pandey on 2016-06-15 05:08:42 EDT ---


Just observed that option disperse.eager-lock has come to rescue- 
Setting disperse.eager-lock to off started IO's and ls -lR command.

gluster v set vol disperse.eager-lock off

Comment 2 Ravishankar N 2016-08-02 07:35:24 UTC
*** Bug 1362420 has been marked as a duplicate of this bug. ***

Comment 4 Atin Mukherjee 2016-08-30 05:44:11 UTC
http://review.gluster.org/15309 has made into upstream. One more patch http://review.gluster.org/#/c/11204/ is currently under review.

Comment 7 Nag Pavan Chilakam 2016-11-17 06:44:55 UTC
currently blocked due to 1395699 - getting Input/output error on doing deletes simultaneously from two clients

Comment 8 Nag Pavan Chilakam 2016-11-29 06:52:14 UTC
QATP:
====
As the bug 1395699 is being deferred to 3.2 beyond. I have limited my execution with  validation to check if there is a hang or not.
I did a rm -rf ls -lRt and file create using dd parallelly from 3 clients and i didn't hit any hang.

Hence moving to verified


build verified:3.8.4-5
on a 2x(4+2) ec volume

Comment 10 errata-xmlrpc 2017-03-23 05:43:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.