Red Hat Bugzilla – Bug 1291701
Renames/deletes failed with "No such file or directory" when few of the bricks from the hot tier went offline
Last modified: 2016-06-16 09:50:45 EDT
+++ This bug was initially created as a clone of Bug #1291560 +++
Description of problem:
On a tiered volume, with 2x2 cold tier and 2x3 dis-rep hot tier was performing
renames|deletes on files/dirs. When the bricks went offline, the renames/deletes failed with "No such file or directory". The bricks from cold tier were all online as quorum was set. Only one of the bricks from the each sub-volume of the hot-tier went offline.
Version-Release number of selected component (if applicable):
glusterfs 3.7.5 built on Dec 3 2015 11:30:45
Steps to Reproduce:
1. Create 2x2 dis-rep cold-tier and 2x3 dis-rep hot-tier volume. Start the volume. Mount the volume.
2. From mount, create files/dirs.
3. rename few of the files/dirs created to different name
4. While rename is in progress, crash the mounts of the bricks using "godown" utility.(available in xfsprogs).
On mount, the rename fails with "No such file or directory"
Renames/deletes shouldn't fail
Volume Name: testvol
Volume ID: 5a2f042d-ee04-4b3d-b5d5-d36e29cea325
Number of Bricks: 10
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 3 = 6
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Error messages seen in client log:
[2015-12-14 07:45:12.156546] E [MSGID: 114031] [client-rpc-fops.c:251:client3_3_mknod_cbk] 0-testvol-client-9: remote operation failed. Path: (null) [Input/output error]
[2015-12-14 07:45:12.159452] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 0-testvol-client-0: remote operation failed [Device or resource busy]
[2015-12-14 07:45:12.159480] W [MSGID: 114031] [client-rpc-fops.c:664:client3_3_unlink_cbk] 0-testvol-client-1: remote operation failed [Device or resource busy]
[2015-12-14 07:45:12.160293] I [MSGID: 109069] [dht-common.c:1159:dht_lookup_unlink_stale_linkto_cbk] 0-testvol-tier-dht: Returned with op_ret -1 and op_errno 16 for /E_file_44
REVIEW: http://review.gluster.org/12973 (cluster/afr: During name heal, propagate EIO only on gfid or type mismatch) posted (#1) for review on master by Krutika Dhananjay (firstname.lastname@example.org)
COMMIT: http://review.gluster.org/12973 committed in master by Pranith Kumar Karampuri (email@example.com)
Author: Krutika Dhananjay <firstname.lastname@example.org>
Date: Tue Dec 15 18:48:20 2015 +0530
cluster/afr: During name heal, propagate EIO only on gfid or type mismatch
When the disk associated with a brick returns EIO during lookup, chances are
that name heal would return an EIO because one of the syncop_XXX() operations
as part of it returned an EIO. This is inherently treated by afr_lookup_selfheal_wrap()
as a split-brain and the lookup is aborted prematurely with EIO even if it
succeeded on the other replica(s).
Signed-off-by: Krutika Dhananjay <email@example.com>
Tested-by: NetBSD Build System <firstname.lastname@example.org>
Tested-by: Gluster Build System <email@example.com>
Reviewed-by: Pranith Kumar Karampuri <firstname.lastname@example.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.
glusterfs-3.8.0 has been announced on the Gluster mailinglists , packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist  and the update infrastructure for your distribution.