Bug 1456582
Summary: | "split-brain observed [Input/output error]" error messages in samba logs during parallel rm -rf | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Ravishankar N <ravishankar> | |
Component: | replicate | Assignee: | Ravishankar N <ravishankar> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | ||
Severity: | medium | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | bugs, ksubrahm | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.12.0 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | 1454689 | |||
: | 1457616 1457732 1460661 (view as bug list) | Environment: | ||
Last Closed: | 2017-08-23 09:08:24 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1454689 | |||
Bug Blocks: | 1457616, 1457732, 1460661 |
Description
Ravishankar N
2017-05-29 16:44:08 UTC
REVIEW: https://review.gluster.org/17413 (afr: add errno to afr_inode_refresh_done()) posted (#1) for review on master by Ravishankar N (ravishankar) REVIEW: https://review.gluster.org/17414 (posix: use the correct op_errno) posted (#1) for review on master by Ravishankar N (ravishankar) COMMIT: https://review.gluster.org/17413 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit feaea7fa541b81a4988b8f394037bfedb5017f4c Author: Ravishankar N <ravishankar> Date: Mon May 29 21:56:12 2017 +0530 afr: add errno to afr_inode_refresh_done() Problem: When parellel `rm -rf`s were being done from cifs clients, opendir might fail on some replicas with ENOENT. DHT ignores partial opendir failures in dht_fd_cbk() and winds readdirs on those replicas. Afr inode refresh (as a part of readdirp read_txn) sees in its fd context that the state of the fds is *not* AFR_FD_OPENED and bails out to afr_inode_refresh_done() without doing a refresh. When this happens, the errno is set as EIO due to lack of readable subvols, logging split-brain messages in the logs. Fix: Introduce an errno argument to afr_inode_refresh_do() to bail out with the right error value when inode refresh is not performed. Change-Id: I075707fbb73fd93a923b77b923a96aac79e847f9 BUG: 1456582 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: https://review.gluster.org/17413 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> REVIEW: https://review.gluster.org/17414 (posix: use the correct op_errno) posted (#2) for review on master by Ravishankar N (ravishankar) COMMIT: https://review.gluster.org/17414 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit de92c363c95d16966dbcc9d8763fd4448dd84d13 Author: Ravishankar N <ravishankar> Date: Mon May 29 21:38:14 2017 +0530 posix: use the correct op_errno Problem: If readdir/fstat was performed on a directory that was removed, posix_fd_ctx_get() fails with ENOENT but we incorrectly use the ret value (-1 in this case) as op_errno, logging "Operation not permitted" messages in the brick logs. Also in case of fstat, the -1 op_errno was also propagated to the client via stack unwind, causing the message to appear in protocol/client logs as well. Fix: Use the right op_errno in readdir, fstat and writev. Also, if posix_fd_ctx_get() failed with ENOENT, convert it into EBADF because ENOENT is not a valid error for an fd operation. Change-Id: Ie43c0789d5040ec73b7cf885d015a183b8c64d70 BUG: 1456582 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: https://review.gluster.org/17414 Smoke: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> Tested-by: Pranith Kumar Karampuri <pkarampu> NetBSD-regression: NetBSD Build System <jenkins.org> Reviewed-by: Amar Tumballi <amarts> CentOS-regression: Gluster Build System <jenkins.org> One more patch being sent. REVIEW: https://review.gluster.org/17436 (afr: update errno check in afr_inode_refresh_do) posted (#1) for review on master by Ravishankar N (ravishankar) COMMIT: https://review.gluster.org/17436 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 9aa81c15a429dc2817f0149a33b6a9e88ead8110 Author: Ravishankar N <ravishankar> Date: Thu Jun 1 14:20:03 2017 +0530 afr: update errno check in afr_inode_refresh_do Addresses review comment in https://review.gluster.org/#/c/17413 Change-Id: Ic247729e5e92a5bb0148543764e0b30790444004 BUG: 1456582 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: https://review.gluster.org/17436 Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> Reviewed-by: Pranith Kumar Karampuri <pkarampu> REVIEW: https://review.gluster.org/17565 (posix: Revert modifying op_errno in __posix_fd_ctx_get) posted (#1) for review on master by Ravishankar N (ravishankar) COMMIT: https://review.gluster.org/17565 committed in master by Pranith Kumar Karampuri (pkarampu) ------ commit 61924b98a61108a7ec453fb7f1fc5487d1386e56 Author: Ravishankar N <ravishankar> Date: Mon Jun 19 13:45:55 2017 +0530 posix: Revert modifying op_errno in __posix_fd_ctx_get https://review.gluster.org/#/c/17414/ converted ENOENT to EBADFD because ENOENT is not a valid error for fd based operations, but this apparently breaks dht rebalance behaviour (see comments in the backport 17517. So reverting that part of the change. Change-Id: Idcf5c65a47b096a3766cf7f20ca938d988572052 BUG: 1456582 Signed-off-by: Ravishankar N <ravishankar> Reviewed-on: https://review.gluster.org/17565 Reviewed-by: Pranith Kumar Karampuri <pkarampu> Tested-by: Pranith Kumar Karampuri <pkarampu> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.12.0, please open a new bug report. glusterfs-3.12.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/announce/2017-September/000082.html [2] https://www.gluster.org/pipermail/gluster-users/ |