Bug 1395161 - [Disperse] : I/O errors on all FUSE mounts during rm -rf from multiple clients
Summary: [Disperse] : I/O errors on all FUSE mounts during rm -rf from multiple clients
Keywords:
Status: CLOSED DUPLICATE of bug 1812789
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: disperse
Version: rhgs-3.2
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Xavi Hernandez
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
: 1395699 1719321 (view as bug list)
Depends On:
Blocks: 1397283
TreeView+ depends on / blocked
 
Reported: 2016-11-15 10:09 UTC by Ambarish
Modified: 2020-09-29 11:12 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1397283 (view as bug list)
Environment:
Last Closed: 2020-09-29 11:12:06 UTC
Embargoed:


Attachments (Terms of Use)

Description Ambarish 2016-11-15 10:09:02 UTC
Description of problem:
-----------------------

Created a 12*(4+2) volume on 6 servers,mounted it via FUSE on 6 clients.
Created ~5 lakh small files(64 kb each) on the mount point.

Triggered " rm -rf <mount - point> -v " from all the clients ,one by one.

**At the application side** :

rm: cannot remove ‘/gluster-mount/file_srcdir/gqac012.sbu.lab.eng.bos.redhat.com/thrd_04/d_001/d_003’: Input/output error

rm: cannot remove ‘/gluster-mount/file_srcdir/gqac008.sbu.lab.eng.bos.redhat.com/thrd_00/d_001/d_001’: Input/output error

rm: cannot remove ‘/gluster-mount/file_srcdir/gqac008.sbu.lab.eng.bos.redhat.com/thrd_02/d_001/d_003’: Input/output error

**From client logs** :

[2016-11-15 06:40:37.471404] W [MSGID: 122040] [ec-common.c:940:ec_prepare_update_cbk] 0-butcher-disperse-1: Failed to get size and version [Input/output error]

[2016-11-15 07:00:28.978743] W [MSGID: 109065] [dht-common.c:7826:dht_rmdir_lock_cbk] 0-butcher-dht: acquiring inodelk failed rmdir for /file_srcdir/gqac012.sbu.lab.eng.bos.redhat.com/thrd_03/d_001/d_005) [Input/output error]
[2016-11-15 07:00:28.978806] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 858890: RMDIR() /file_srcdir/gqac012.sbu.lab.eng.bos.redhat.com/thrd_03/d_001/d_005 => -1 (Input/output error)



Version-Release number of selected component (if applicable):
-------------------------------------------------------------

glusterfs-3.8.4-5.el7rhgs.x86_64

How reproducible:
-----------------

2/2

Steps to Reproduce:
-------------------

1. Create an EC volume,mount it via FUSE.

2. Create lots and lots of files on the mount point in a deep dir structure.

3. Run rm -rf <mount point> * from various clients.

Actual results:
--------------

I/O Errors on client side.

Expected results:
-----------------

No errors on the application side.

Additional info:
----------------

*Server & Client OS* : RHEL 7.3

*Vol Info* :

[root@gqas003 bricks]# gluster v info
 
Volume Name: butcher
Type: Distributed-Disperse
Volume ID: 1742f033-0029-43fb-9469-cd31ffa258f6
Status: Started
Snapshot Count: 0
Number of Bricks: 12 x (4 + 2) = 72
Transport-type: tcp
Bricks:
Brick1: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick2: gqas004.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick3: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick4: gqas009.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick5: gqas010.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick6: gqas012.sbu.lab.eng.bos.redhat.com:/bricks1/brick
Brick7: gqas003.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick8: gqas004.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick9: gqas007.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick10: gqas009.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick11: gqas010.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick12: gqas012.sbu.lab.eng.bos.redhat.com:/bricks2/brick
Brick13: gqas003.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick14: gqas004.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick15: gqas007.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick16: gqas009.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick17: gqas010.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick18: gqas012.sbu.lab.eng.bos.redhat.com:/bricks3/brick
Brick19: gqas003.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick20: gqas004.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick21: gqas007.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick22: gqas009.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick23: gqas010.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick24: gqas012.sbu.lab.eng.bos.redhat.com:/bricks4/brick
Brick25: gqas003.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick26: gqas004.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick27: gqas007.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick28: gqas009.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick29: gqas010.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick30: gqas012.sbu.lab.eng.bos.redhat.com:/bricks5/brick
Brick31: gqas003.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick32: gqas004.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick33: gqas007.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick34: gqas009.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick35: gqas010.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick36: gqas012.sbu.lab.eng.bos.redhat.com:/bricks6/brick
Brick37: gqas003.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick38: gqas004.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick39: gqas007.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick40: gqas009.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick41: gqas010.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick42: gqas012.sbu.lab.eng.bos.redhat.com:/bricks7/brick
Brick43: gqas003.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick44: gqas004.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick45: gqas007.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick46: gqas009.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick47: gqas010.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick48: gqas012.sbu.lab.eng.bos.redhat.com:/bricks8/brick
Brick49: gqas003.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick50: gqas004.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick51: gqas007.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick52: gqas009.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick53: gqas010.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick54: gqas012.sbu.lab.eng.bos.redhat.com:/bricks9/brick
Brick55: gqas003.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick56: gqas004.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick57: gqas007.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick58: gqas009.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick59: gqas010.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick60: gqas012.sbu.lab.eng.bos.redhat.com:/bricks10/brick
Brick61: gqas003.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick62: gqas004.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick63: gqas007.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick64: gqas009.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick65: gqas010.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick66: gqas012.sbu.lab.eng.bos.redhat.com:/bricks11/brick
Brick67: gqas003.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick68: gqas004.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick69: gqas007.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick70: gqas009.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick71: gqas010.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Brick72: gqas012.sbu.lab.eng.bos.redhat.com:/bricks12/brick
Options Reconfigured:
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root@gqas003 bricks]#

Comment 3 Pranith Kumar K 2016-11-17 08:47:53 UTC
*** Bug 1395699 has been marked as a duplicate of this bug. ***

Comment 6 Ashish Pandey 2016-11-22 07:16:23 UTC
I investigated the issue and found that the combining of callbacks and then the assignment of fop->answers are having some issue while taking inodelk.
The point is that for any one subvolume in above case, we have only minimum number of bricks UP. Now, when deletion is going on directories,
for a inodelk fop, if we get ESTALE from any one of the callbacks, we will not be able to prepare an fop->answers.

Now, in ec_lock_check this part is assigning EIO to errno.

if (fop->answer && fop->answer->op_ret < 0)                                                  
                    error = fop->answer->op_errno;                                                       
else  
                  error = EIO;   


Now, I think this is the root cause of the issue. I also placed some gf_msg's on this place and different places and found that this is where EIO is being set.

I think to solve this we should roll through the cbk list over here and see if the errors from some of the callbacks are ESTALE (or any such error which can be ignored),
we should just assign errno accordingly.

Comment 16 Ashish Pandey 2019-06-13 05:08:30 UTC
*** Bug 1719321 has been marked as a duplicate of this bug. ***

Comment 18 Yaniv Kaul 2019-06-13 11:58:55 UTC
Any updates?

Comment 31 Ashish Pandey 2020-09-29 11:12:06 UTC

*** This bug has been marked as a duplicate of bug 1812789 ***


Note You need to log in before you can comment on or make changes to this bug.