Bug 1231195 - rm -rf throws 'Is a directory' error for few directories while add-brick operation is done
Summary: rm -rf throws 'Is a directory' error for few directories while add-brick oper...
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: mainline
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
Assignee: Nithya Balachandran
QA Contact:
URL:
Whiteboard: dht-directory-consistency
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-12 11:25 UTC by Sakshi
Modified: 2017-08-30 16:18 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-30 16:18:27 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Sakshi 2015-06-12 11:25:23 UTC
Description of problem:
rm -rf throws 'Is a directory' error for few directories while add-brick operation is done.

Version-Release number of selected component (if applicable):


How reproducible:
Frequent

Steps to Reproduce:
1.Create a distribute volume, fuse mount the volume. 
2.Start rm -rf on the mount point.
3.While the rm operation is going on, add-brick to the volume.

Actual results:
rm starts throwing 'Is a directory' error for a few directories and does not delete them.

Expected results:
rm -rf should proceede without errors.

Observation:
A lookup is issued(as a part of the rm operation) on say 'child_dir'. Since some subvols do not have the directory created, a selfheal is triggered. However due to the on-going rm operation it may so happen that the parent directory of 'child_dir' on a particular subvol may be deleted. Hence selfheal returns ESTALE error and an unlink on the directory is done. This makes the application throw 'Is a directory' error for that 'child_dir'.

Additional info:

Comment 3 Raghavendra G 2017-01-31 04:09:49 UTC
(In reply to Sakshi from comment #0)
> 
> Observation:
> A lookup is issued(as a part of the rm operation) on say 'child_dir'. Since
> some subvols do not have the directory created, a selfheal is triggered.
> However due to the on-going rm operation it may so happen that the parent
> directory of 'child_dir' on a particular subvol may be deleted. Hence
> selfheal returns ESTALE error and an unlink on the directory is done. This
> makes the application throw 'Is a directory' error for that 'child_dir'.

The flaw in this argument is that 
1.lookup heal is issued as part of "rmdir child_dir" 
2."rmdir parent_dir" won't be issued till "rmdir child_dir" is complete. 

So, 1 and 2 can't happen parallely. But the RCA requires that they happen parallely. Hence I think the RCA is something different.

Comment 4 Nithya Balachandran 2017-08-30 16:18:27 UTC
I am unable to reproduce this with the latest master. 
I used a single node cluster and a pure distribute volume which I expanded from 3 bricks to 7 bricks during the rm -rf operation.


As there are no logs available, I am closing this with the resolution WorksForMe.

Please file a new BZ if you hit this again.


Note You need to log in before you can comment on or make changes to this bug.