Bug 1231195

Summary: rm -rf throws 'Is a directory' error for few directories while add-brick operation is done
Product: [Community] GlusterFS Reporter: Sakshi <sabansal>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: mainlineCC: amukherj, bugs, kramdoss, nbalacha, rgowdapp, smohan
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: dht-directory-consistency
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-30 16:18:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sakshi 2015-06-12 11:25:23 UTC
Description of problem:
rm -rf throws 'Is a directory' error for few directories while add-brick operation is done.

Version-Release number of selected component (if applicable):


How reproducible:
Frequent

Steps to Reproduce:
1.Create a distribute volume, fuse mount the volume. 
2.Start rm -rf on the mount point.
3.While the rm operation is going on, add-brick to the volume.

Actual results:
rm starts throwing 'Is a directory' error for a few directories and does not delete them.

Expected results:
rm -rf should proceede without errors.

Observation:
A lookup is issued(as a part of the rm operation) on say 'child_dir'. Since some subvols do not have the directory created, a selfheal is triggered. However due to the on-going rm operation it may so happen that the parent directory of 'child_dir' on a particular subvol may be deleted. Hence selfheal returns ESTALE error and an unlink on the directory is done. This makes the application throw 'Is a directory' error for that 'child_dir'.

Additional info:

Comment 3 Raghavendra G 2017-01-31 04:09:49 UTC
(In reply to Sakshi from comment #0)
> 
> Observation:
> A lookup is issued(as a part of the rm operation) on say 'child_dir'. Since
> some subvols do not have the directory created, a selfheal is triggered.
> However due to the on-going rm operation it may so happen that the parent
> directory of 'child_dir' on a particular subvol may be deleted. Hence
> selfheal returns ESTALE error and an unlink on the directory is done. This
> makes the application throw 'Is a directory' error for that 'child_dir'.

The flaw in this argument is that 
1.lookup heal is issued as part of "rmdir child_dir" 
2."rmdir parent_dir" won't be issued till "rmdir child_dir" is complete. 

So, 1 and 2 can't happen parallely. But the RCA requires that they happen parallely. Hence I think the RCA is something different.

Comment 4 Nithya Balachandran 2017-08-30 16:18:27 UTC
I am unable to reproduce this with the latest master. 
I used a single node cluster and a pure distribute volume which I expanded from 3 bricks to 7 bricks during the rm -rf operation.


As there are no logs available, I am closing this with the resolution WorksForMe.

Please file a new BZ if you hit this again.