Bug 1575927
Summary: | [Ganesha+EC] rm -rf failed with Input/output Error when ran from 2 clients | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Manisha Saini <msaini> |
Component: | nfs-ganesha | Assignee: | Kaleb KEITHLEY <kkeithle> |
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Manisha Saini <msaini> |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | rhgs-3.4 | CC: | dang, ffilz, grajoria, jahernan, jijoy, jthottan, pasik, rhs-bugs, sankarshan, storage-qa-internal |
Target Milestone: | --- | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-04-29 12:00:52 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Manisha Saini
2018-05-08 10:07:01 UTC
Needs RCA before we can decide to take into 3.4.0 So, I don't believe this is a bug. This is a consequence of the way POSIX APIs work. Here's what happens: Each client does a readdir. They get back dirents, and start deleting them. However, because multiple clients are doing this, there's a chance that some of the objects represented by the dirents are already deleted by the other client. There's no way to know this, other than attempting to unlink(), which will (of course) fail. It's arguable whether or not EIO is the correct error or ENOENT, but (the manpage is somewhat unclear on this, IMO), but EIO is a valid return from unlink(), so this is not something that should hold up a release. I was able to get errors on my local filesystem with this scenario. It's much much more difficult on my SSD than on a remote FS, but it does happen. It looks like the I/O error may have originated here: [2018-05-08 08:25:46.005811] W [MSGID: 122033] [ec-common.c:1793:ec_locked] 1-Ganeshavol1-disperse-0: Failed to complete preop lock [Input/output error] I also see stale file handle errors, which would be expected in this scenario. I definitely agree that two rm -Rf racing with each other are going to trip over each other in unpredictable ways. While this is an interesting test and is something we should not crash or do something else horrible on, it also seems unrealistic. Frank |