Bug 1460661
Summary: | "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Ravishankar N <ravishankar> |
Component: | replicate | Assignee: | Ravishankar N <ravishankar> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.8 | CC: | bugs |
Target Milestone: | --- | Keywords: | Triaged |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.8.13 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1456582 | Environment: | |
Last Closed: | 2017-06-29 09:55:42 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1449988, 1454689, 1456582 | ||
Bug Blocks: | 1457616, 1457732 |
Description
Ravishankar N
2017-06-12 11:04:03 UTC
REVIEW: https://review.gluster.org/17517 (posix: use the correct op_errno) posted (#1) for review on release-3.8 by Ravishankar N (ravishankar) REVIEW: https://review.gluster.org/17518 (afr: add errno to afr_inode_refresh_done()) posted (#1) for review on release-3.8 by Ravishankar N (ravishankar) COMMIT: https://review.gluster.org/17518 committed in release-3.8 by jiffin tony Thottan (jthottan) ------ commit 5d13784dc6aa406f69061f2608a19ef0c8a80581 Author: Ravishankar N <ravishankar> Date: Mon Jun 5 09:40:51 2017 +0530 afr: add errno to afr_inode_refresh_done() Backport of https://review.gluster.org/17413 and https://review.gluster.org/17436 Problem: When parellel `rm -rf`s were being done from cifs clients, opendir might fail on some replicas with ENOENT. DHT ignores partial opendir failures in dht_fd_cbk() and winds readdirs on those replicas. Afr inode refresh (as a part of readdirp read_txn) sees in its fd context that the state of the fds is *not* AFR_FD_OPENED and bails out to afr_inode_refresh_done() without doing a refresh. When this happens, the errno is set as EIO due to lack of readable subvols, logging split-brain messages in the logs. Fix: Introduce an errno argument to afr_inode_refresh_do() to bail out with the right error value when inode refresh is not performed. Change-Id: I8eed4d6e6c85332c1f5813c74cb54ae73693a369 BUG: 1460661 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: https://review.gluster.org/17518 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> REVIEW: https://review.gluster.org/17517 (posix: use the correct op_errno) posted (#2) for review on release-3.8 by Ravishankar N (ravishankar) COMMIT: https://review.gluster.org/17517 committed in release-3.8 by jiffin tony Thottan (jthottan) ------ commit 0903b76c88ed47d819372763fdccbe1486bf4943 Author: Ravishankar N <ravishankar> Date: Mon May 29 21:38:14 2017 +0530 posix: use the correct op_errno Problem: If readdir/fstat was performed on a directory that was removed, posix_fd_ctx_get() fails with ENOENT but we incorrectly use the ret value (-1 in this case) as op_errno, logging "Operation not permitted" messages in the brick logs. Also in case of fstat, the -1 op_errno was also propagated to the client via stack unwind, causing the message to appear in protocol/client logs as well. Fix: Use the right op_errno in readdir, fstat and writev. A̶l̶s̶o̶,̶ ̶i̶f̶ p̶o̶s̶i̶x̶_̶f̶d̶_̶c̶t̶x̶_̶g̶e̶t̶(̶)̶ ̶f̶a̶i̶l̶e̶d̶ ̶w̶i̶t̶h̶ ̶E̶N̶O̶E̶N̶T̶,̶ ̶c̶o̶n̶v̶e̶r̶t̶ ̶i̶t̶ ̶i̶n̶t̶o̶ ̶E̶B̶A̶D̶F̶ ̶b̶e̶c̶a̶u̶s̶e̶ ̶E̶N̶O̶E̶N̶T̶ i̶s̶ ̶n̶o̶t̶ ̶a̶ ̶v̶a̶l̶i̶d̶ ̶e̶r̶r̶o̶r̶ ̶f̶o̶r̶ ̶a̶n̶ ̶f̶d̶ ̶o̶p̶e̶r̶a̶t̶i̶o̶n̶.̶ Don't do this as it breaks DHT.See 17565 > Reviewed-on: https://review.gluster.org/17414 > Smoke: Gluster Build System <jenkins.org> > Reviewed-by: Pranith Kumar Karampuri <pkarampu> > Tested-by: Pranith Kumar Karampuri <pkarampu> > NetBSD-regression: NetBSD Build System <jenkins.org> > Reviewed-by: Amar Tumballi <amarts> > CentOS-regression: Gluster Build System <jenkins.org> (cherry picked from commit de92c363c95d16966dbcc9d8763fd4448dd84d13) Change-Id: Ie43c0789d5040ec73b7cf885d015a183b8c64d70 BUG: 1460661 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: https://review.gluster.org/17517 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.13, please open a new bug report. glusterfs-3.8.13 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] https://lists.gluster.org/pipermail/announce/2017-June/000075.html [2] https://www.gluster.org/pipermail/gluster-users/ |