Bug 1286042 - rebalance :- rm -rf failed with with error 'No such file or directory' for few files and directory while rebalance is in progress
rebalance :- rm -rf failed with with error 'No such file or directory' for fe...
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: distribute (Show other bugs)
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Raghavendra G
dht-add-brick, dht-fops-while-rebal, ...
: Triaged, ZStream
Depends On: 1128737
  Show dependency treegraph
Reported: 2015-11-27 05:11 EST by Susant Kumar Palai
Modified: 2017-03-25 10:24 EDT (History)
9 users (show)

See Also:
Fixed In Version: 3.7.9-10
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1128737
Last Closed: 2017-03-25 10:24:17 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Comment 3 Raghavendra G 2016-07-13 03:21:39 EDT
From the logs posted in the initial bug-report, opendir failed with ENOENT errors on newly added bricks (client36 till client47). I didn't see any failure logs on bricks that were part of dht before add-brick (client0 till client35). So, I assume this bug is due to directory not healed after add-brick. There are some fixes [1][2] in rhgs-3.1.3 which adds healing of directory and layout even in nameless lookup codepath. Since there is atleast one nameless lookup done on gfid before opendir is sent in new graph (which is aware of new bricks), this issue should be fixed in 3.1.3.

Also, Sakshi reported saying that she didn't see any issues with parallel rm -rf and rebalance post add-brick in rhgs-3.1.3.

Waiting for Karthick's confirmation of our observations.

[1] https://code.engineering.redhat.com/gerrit/61036
[2] http://review.gluster.org/14365

Please note that [2] is not necessary fix for this bug (it is not present in rhgs-3.1.3). However it solves related issue of directory having holes post lookup.
Comment 4 Prasad Desala 2016-10-25 01:30:24 EDT
This issue is no longer seen with glusterfs version: 3.8.4-2.el7rhgs.x86_64.

Here are the steps that were followed,
1. Created and distributed replica volume and started it. 
2. Created files and Directories on it.
3. Added few bricks to that volume.
4. started rebalance with start force option.
5. From mount point started deleting data using rm -rf *.

The error reported in this BZ was not seen and the command executed successfully without any errors/issues.
Hence, moving this BZ to Verified.

Note You need to log in before you can comment on or make changes to this bug.