Bug 1286042

Summary:	rebalance :- rm -rf failed with with error 'No such file or directory' for few files and directory while rebalance is in progress
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Susant Kumar Palai <spalai>
Component:	distribute	Assignee:	Raghavendra G <rgowdapp>
Status:	CLOSED CURRENTRELEASE	QA Contact:	storage-qa-internal <storage-qa-internal>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.1	CC:	kramdoss, mzywusko, nbalacha, racpatel, rgowdapp, rhs-bugs, smohan, spalai, tdesala
Target Milestone:	---	Keywords:	Triaged, ZStream
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:	dht-add-brick, dht-fops-while-rebal, dht-3.2.0-proposed, dht-fixed
Fixed In Version:	3.7.9-10	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:	1128737	Environment:
Last Closed:	2017-03-25 14:24:17 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1128737
Bug Blocks:

Comment 3 Raghavendra G 2016-07-13 07:21:39 UTC

From the logs posted in the initial bug-report, opendir failed with ENOENT errors on newly added bricks (client36 till client47). I didn't see any failure logs on bricks that were part of dht before add-brick (client0 till client35). So, I assume this bug is due to directory not healed after add-brick. There are some fixes [1][2] in rhgs-3.1.3 which adds healing of directory and layout even in nameless lookup codepath. Since there is atleast one nameless lookup done on gfid before opendir is sent in new graph (which is aware of new bricks), this issue should be fixed in 3.1.3.

Also, Sakshi reported saying that she didn't see any issues with parallel rm -rf and rebalance post add-brick in rhgs-3.1.3.

Waiting for Karthick's confirmation of our observations.

[1] https://code.engineering.redhat.com/gerrit/61036
[2] http://review.gluster.org/14365

Please note that [2] is not necessary fix for this bug (it is not present in rhgs-3.1.3). However it solves related issue of directory having holes post lookup.

Comment 4 Prasad Desala 2016-10-25 05:30:24 UTC

This issue is no longer seen with glusterfs version: 3.8.4-2.el7rhgs.x86_64.

Here are the steps that were followed,
1. Created and distributed replica volume and started it. 
2. Created files and Directories on it.
3. Added few bricks to that volume.
4. started rebalance with start force option.
5. From mount point started deleting data using rm -rf *.

The error reported in this BZ was not seen and the command executed successfully without any errors/issues.
Hence, moving this BZ to Verified.