Bug 1571217

Summary: [Disperse] rm -rf failed with EIO error
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Prasad Desala <tdesala>
Component: disperseAssignee: Sunil Kumar Acharya <sheggodu>
Status: CLOSED UPSTREAM QA Contact: Nag Pavan Chilakam <nchilaka>
Severity: medium Docs Contact:
Priority: low    
Version: rhgs-3.4CC: aspandey, jahernan, rhs-bugs, sankarshan, storage-qa-internal, tdesala, ubansal, vdas
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-19 04:34:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Prasad Desala 2018-04-24 10:46:16 UTC
Description of problem:
=======================
When ran rm -rf from a single client, rm failed to clean the data and throwed EIO error.

[root@client-1 ec_new]# rm -rvf *
rm: cannot remove ‘b/b/b/b/b/b’: Input/output error

Client logs:

[2018-04-24 09:57:12.097558] W [MSGID: 122053] [ec-common.c:307:ec_check_status] 0-ec_new-disperse-1: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111110, remaining=000000, good=111110, bad=000001)
[2018-04-24 09:57:12.948095] W [MSGID: 122040] [ec-common.c:1144:ec_prepare_update_cbk] 0-ec_new-disperse-7: Failed to get size and version [Input/output error]
[2018-04-24 09:57:13.185568] W [fuse-bridge.c:1460:fuse_unlink_cbk] 0-glusterfs-fuse: 1124: RMDIR() /b/b/b/b/b/b => -1 (Input/output error)
[2018-04-24 09:57:12.274744] W [MSGID: 122053] [ec-common.c:307:ec_check_status] 0-ec_new-disperse-1: Operation failed on 1 of 6 subvolumes.(up=111111, mask=111110, remaining=000000, good=111110, bad=000001)
[2018-04-24 09:57:13.181768] W [MSGID: 122040] [ec-common.c:1144:ec_prepare_update_cbk] 0-ec_new-disperse-7: Failed to get size and version [Input/output error]

Version-Release number of selected component (if applicable):
3.12.2-8.el7rhgs.x86_64

How reproducible:
1/1

Steps to Reproduce:
===================
Not sure on the repro steps but below steps lead to the failure,
1) Create a Distributed-Disperse volume and start it.
2) Mount it on multiple clients.
3) From client-1; while true; do mkdir a;done
   From client-2; while true;do mv a b; mv b a;done
4) Step-3 was run for 10 mins and stopped both scripts.
5) Done offline update from 3.12.2-7.el7rhgs.x86_64 to 3.12.2-8.el7rhgs.x86_64 (both clients and servers are updated)
6) Create a link to the directory.
7)  Now, run rm -rf * from one client.

Actual results:
===============
rm -rf * failed with EIO error.

Expected results:
=================
rm -rf * should remove the data without any errors/issues.

Comment 5 Sunil Kumar Acharya 2018-05-08 13:40:47 UTC
I am not able to hit the issue on my test setup. Even after couple of tries.