1286042 – rebalance :- rm -rf failed with with error 'No such file or directory' for few files and directory while rebalance is in progress

Bug 1286042 - rebalance :- rm -rf failed with with error 'No such file or directory' for few files and directory while rebalance is in progress

Summary: rebalance :- rm -rf failed with with error 'No such file or directory' for fe...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	distribute
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Raghavendra G
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:	dht-add-brick, dht-fops-while-rebal, ...
Depends On:	1128737
Blocks:
TreeView+	depends on / blocked

Reported:	2015-11-27 10:11 UTC by Susant Kumar Palai
Modified:	2017-03-25 14:24 UTC (History)
CC List:	9 users (show)
Fixed In Version:	3.7.9-10
Doc Type:	Bug Fix
Doc Text:
Clone Of:	1128737
Environment:
Last Closed:	2017-03-25 14:24:17 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 3 Raghavendra G 2016-07-13 07:21:39 UTC

From the logs posted in the initial bug-report, opendir failed with ENOENT errors on newly added bricks (client36 till client47). I didn't see any failure logs on bricks that were part of dht before add-brick (client0 till client35). So, I assume this bug is due to directory not healed after add-brick. There are some fixes [1][2] in rhgs-3.1.3 which adds healing of directory and layout even in nameless lookup codepath. Since there is atleast one nameless lookup done on gfid before opendir is sent in new graph (which is aware of new bricks), this issue should be fixed in 3.1.3.

Also, Sakshi reported saying that she didn't see any issues with parallel rm -rf and rebalance post add-brick in rhgs-3.1.3.

Waiting for Karthick's confirmation of our observations.

[1] https://code.engineering.redhat.com/gerrit/61036
[2] http://review.gluster.org/14365

Please note that [2] is not necessary fix for this bug (it is not present in rhgs-3.1.3). However it solves related issue of directory having holes post lookup.

Comment 4 Prasad Desala 2016-10-25 05:30:24 UTC

This issue is no longer seen with glusterfs version: 3.8.4-2.el7rhgs.x86_64.

Here are the steps that were followed,
1. Created and distributed replica volume and started it. 
2. Created files and Directories on it.
3. Added few bricks to that volume.
4. started rebalance with start force option.
5. From mount point started deleting data using rm -rf *.

The error reported in this BZ was not seen and the command executed successfully without any errors/issues.
Hence, moving this BZ to Verified.

Note You need to log in before you can comment on or make changes to this bug.