Description of problem: ----------------------- Created a 12*(4+2) volume on 6 servers,mounted it via FUSE on 6 clients. Created ~5 lakh small files(64 kb each) on the mount point. Triggered " rm -rf <mount - point> -v " from all the clients ,one by one. **At the application side** : rm: cannot remove ‘/gluster-mount/file_srcdir/gqac012.sbu.lab.eng.bos.redhat.com/thrd_04/d_001/d_003’: Input/output error rm: cannot remove ‘/gluster-mount/file_srcdir/gqac008.sbu.lab.eng.bos.redhat.com/thrd_00/d_001/d_001’: Input/output error rm: cannot remove ‘/gluster-mount/file_srcdir/gqac008.sbu.lab.eng.bos.redhat.com/thrd_02/d_001/d_003’: Input/output error **From client logs** : [2016-11-15 06:40:37.471404] W [MSGID: 122040] [ec-common.c:940:ec_prepare_update_cbk] 0-butcher-disperse-1: Failed to get size and version [Input/output error] [2016-11-15 07:00:28.978743] W [MSGID: 109065] [dht-common.c:7826:dht_rmdir_lock_cbk] 0-butcher-dht: acquiring inodelk failed rmdir for /file_srcdir/gqac012.sbu.lab.eng.bos.redhat.com/thrd_03/d_001/d_005) [Input/output error] [2016-11-15 07:00:28.978806] W [fuse-bridge.c:1355:fuse_unlink_cbk] 0-glusterfs-fuse: 858890: RMDIR() /file_srcdir/gqac012.sbu.lab.eng.bos.redhat.com/thrd_03/d_001/d_005 => -1 (Input/output error) Version-Release number of selected component (if applicable): ------------------------------------------------------------- glusterfs-3.8.4-5.el7rhgs.x86_64 How reproducible: ----------------- 2/2 Steps to Reproduce: ------------------- 1. Create an EC volume,mount it via FUSE. 2. Create lots and lots of files on the mount point in a deep dir structure. 3. Run rm -rf <mount point> * from various clients. Actual results: -------------- I/O Errors on client side. Expected results: ----------------- No errors on the application side. Additional info: ---------------- *Server & Client OS* : RHEL 7.3 *Vol Info* : [root@gqas003 bricks]# gluster v info Volume Name: butcher Type: Distributed-Disperse Volume ID: 1742f033-0029-43fb-9469-cd31ffa258f6 Status: Started Snapshot Count: 0 Number of Bricks: 12 x (4 + 2) = 72 Transport-type: tcp Bricks: Brick1: gqas003.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick2: gqas004.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick3: gqas007.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick4: gqas009.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick5: gqas010.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick6: gqas012.sbu.lab.eng.bos.redhat.com:/bricks1/brick Brick7: gqas003.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick8: gqas004.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick9: gqas007.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick10: gqas009.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick11: gqas010.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick12: gqas012.sbu.lab.eng.bos.redhat.com:/bricks2/brick Brick13: gqas003.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick14: gqas004.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick15: gqas007.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick16: gqas009.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick17: gqas010.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick18: gqas012.sbu.lab.eng.bos.redhat.com:/bricks3/brick Brick19: gqas003.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick20: gqas004.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick21: gqas007.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick22: gqas009.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick23: gqas010.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick24: gqas012.sbu.lab.eng.bos.redhat.com:/bricks4/brick Brick25: gqas003.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick26: gqas004.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick27: gqas007.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick28: gqas009.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick29: gqas010.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick30: gqas012.sbu.lab.eng.bos.redhat.com:/bricks5/brick Brick31: gqas003.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick32: gqas004.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick33: gqas007.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick34: gqas009.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick35: gqas010.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick36: gqas012.sbu.lab.eng.bos.redhat.com:/bricks6/brick Brick37: gqas003.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick38: gqas004.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick39: gqas007.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick40: gqas009.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick41: gqas010.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick42: gqas012.sbu.lab.eng.bos.redhat.com:/bricks7/brick Brick43: gqas003.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick44: gqas004.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick45: gqas007.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick46: gqas009.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick47: gqas010.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick48: gqas012.sbu.lab.eng.bos.redhat.com:/bricks8/brick Brick49: gqas003.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick50: gqas004.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick51: gqas007.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick52: gqas009.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick53: gqas010.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick54: gqas012.sbu.lab.eng.bos.redhat.com:/bricks9/brick Brick55: gqas003.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick56: gqas004.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick57: gqas007.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick58: gqas009.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick59: gqas010.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick60: gqas012.sbu.lab.eng.bos.redhat.com:/bricks10/brick Brick61: gqas003.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick62: gqas004.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick63: gqas007.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick64: gqas009.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick65: gqas010.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick66: gqas012.sbu.lab.eng.bos.redhat.com:/bricks11/brick Brick67: gqas003.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick68: gqas004.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick69: gqas007.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick70: gqas009.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick71: gqas010.sbu.lab.eng.bos.redhat.com:/bricks12/brick Brick72: gqas012.sbu.lab.eng.bos.redhat.com:/bricks12/brick Options Reconfigured: client.event-threads: 4 server.event-threads: 4 cluster.lookup-optimize: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on [root@gqas003 bricks]#
*** Bug 1395699 has been marked as a duplicate of this bug. ***
I investigated the issue and found that the combining of callbacks and then the assignment of fop->answers are having some issue while taking inodelk. The point is that for any one subvolume in above case, we have only minimum number of bricks UP. Now, when deletion is going on directories, for a inodelk fop, if we get ESTALE from any one of the callbacks, we will not be able to prepare an fop->answers. Now, in ec_lock_check this part is assigning EIO to errno. if (fop->answer && fop->answer->op_ret < 0) error = fop->answer->op_errno; else error = EIO; Now, I think this is the root cause of the issue. I also placed some gf_msg's on this place and different places and found that this is where EIO is being set. I think to solve this we should roll through the cbk list over here and see if the errors from some of the callbacks are ESTALE (or any such error which can be ignored), we should just assign errno accordingly.
*** Bug 1719321 has been marked as a duplicate of this bug. ***
Any updates?
*** This bug has been marked as a duplicate of bug 1812789 ***