1695403 – rm -rf fails with "Directory not empty"

Bug 1695403 - rm -rf fails with "Directory not empty"

Summary: rm -rf fails with "Directory not empty"

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	distribute
Sub Component:
Version:	5
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Assignee:	Nithya Balachandran
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1676400
Blocks:	1458215 1661258 1677260 1686272
TreeView+	depends on / blocked

Reported:	2019-04-03 04:10 UTC by Nithya Balachandran
Modified:	2019-04-18 11:10 UTC (History)
CC List:	2 users (show)
Fixed In Version:	glusterfs-5.6
Clone Of:	1676400
Environment:
Last Closed:	2019-04-08 14:06:54 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	22486	0	None	Merged	cluster/dht: Fix lookup selfheal and rmdir race	2019-04-08 14:06:53 UTC

Description Nithya Balachandran 2019-04-03 04:10:06 UTC

+++ This bug was initially created as a clone of Bug #1676400 +++

Description of problem:

When 2 clients run rm -rf <dirname> concurrently, the operation sometimes fails with " Directory not empty"


ls on the directory from the gluster mount point does not show any entries however there are directories on some of the bricks.



Version-Release number of selected component (if applicable):


How reproducible:
Rare.This is a race condition.

Steps to Reproduce:
Steps:
1. Create 3x (2+1) arbiter volume and fuse mount it. Make sure lookup-optimize is enabled.
2. mkdir -p dir0/dir1/dir2.
3. Unmount and remount the volume to ensure a fresh lookup is sent. GDB into the fuse process and set a breakpoint at dht_lookup.
4. from the client mount: 
rm -rf mra_sources
5. When gdb breaks at dht_lookup for dir0/dir1/dir2, set a breakpoint at dht_lookup_cbk. Allow the process to continue until it hits dht_lookup_cbk. dht_lookup_cbk will return with op_ret = 0 .
6. Delete dir0/dir1/dir2 from every brick on the non-hashed subvols.
7. Set a breakpoint in dht_selfheal_dir_mkdir and allow gdb to continue. 
8. When the process breaks at dht_selfheal_dir_mkdir, delete the directory from the hashed subvolume bricks.
9. In dht_selfheal_dir_mkdir_lookup_cbk, set a breakpoint at line :
if (local->selfheal.hole_cnt == layout->cnt) {

When gdb breaks at this point, set local->selfheal.hole_cnt to a value different from that of layout->cnt. Allow gdb to proceed.

DHT will create the directories only on the non-hashed subvolumes as the layout has not been updated to indicate that the dir no longer exists on the hashed subvolume.  This directory will no longer be visible on the mount point causing the rm -rf to fail.


Actual results:
root@server fuse1]# rm -rf mra_sources
rm: cannot remove ‘dir0/dir1’: Directory not empty


Expected results:
rm -rf should succeed.

Additional info:
As lookup-optimize is enabled, subsequent lookups cannot heal the directory. The same steps with lookup-optimize disabled will work as a subsequent lookup will lookup everywhere even if the entry does not exist on the hashed subvol.

--- Additional comment from Nithya Balachandran on 2019-02-12 08:08:31 UTC ---

RCA for the invisible directory left behind with concurrent rm -rf :
--------------------------------------------------------------------


dht_selfheal_dir_mkdir_lookup_cbk (...) {
...

1381         this_call_cnt = dht_frame_return (frame);                               
1382                                                                                 
1383         LOCK (&frame->lock);                                                    
1384         {                                                                       
1385                 if ((op_ret < 0) &&                                             
1386                     (op_errno == ENOENT || op_errno == ESTALE)) {               
1387                         local->selfheal.hole_cnt = !local->selfheal.hole_cnt ? 1
1388                                                 : local->selfheal.hole_cnt + 1; 
1389                 }                                                               
1390                                                                                 
1391                 if (!op_ret) {                                                  
1392                         dht_iatt_merge (this, &local->stbuf, stbuf, prev);      
1393                 }                                                               
1394                 check_mds = dht_dict_get_array (xattr, conf->mds_xattr_key,     
1395                                                 mds_xattr_val, 1, &errst);      
1396                 if (dict_get (xattr, conf->mds_xattr_key) && check_mds && !errst) {
1397                         dict_unref (local->xattr);                              
1398                         local->xattr = dict_ref (xattr);                        
1399                 }                                                               
1400                                                                                 
1401         }                                                                       
1402         UNLOCK (&frame->lock);                                                  
1403                                                                                 
1404         if (is_last_call (this_call_cnt)) {                                     
1405                 if (local->selfheal.hole_cnt == layout->cnt) {                  
1406                         gf_msg_debug (this->name, op_errno,                     
1407                                       "Lookup failed, an rmdir could have "     
1408                                       "deleted this entry %s", loc->name);      
1409                         local->op_errno = op_errno;                             
1410                         goto err;                                               
1411                 } else {                                                        
1412                         for (i = 0; i < layout->cnt; i++) {                     
1413                                 if (layout->list[i].err == ENOENT ||            
1414                                     layout->list[i].err == ESTALE ||            
1415                                     local->selfheal.force_mkdir)                
1416                                         missing_dirs++;                         
1417                         }      




There are 2 problems here:

1. The layout is not updated with the new subvol status on error. 

In this case, the initial lookup found a directory on the hashed subvol so only 2 entries in the layout indicate missing directories. However, by the time the selfheal code is executed, the racing rmdir has deleted the directory from all the subvols.  At this point, the directory does not exist on any subvol and dht_selfheal_dir_mkdir_lookup_cbk gets an error from all 3 subvols, 
but this new status is not updated in the layout which still has only 2 missing dirs marked.


2. this_call_cnt = dht_frame_return (frame); is called before processing the frame. So with a call cnt of 3, it is possible that the second response has reached 1404 before the third one has started processing the return values. At this point, 

local->selfheal.hole_cnt != layout->cnt so control goes to line 1412.

At line 1412, since we are still using the old layout, only the directories on the non-hashed subvols are considered when incrementing missing_dirs and for the healing.


The combination of these two causes the selfheal to start healing the directories on the non-hashed subvols. It succeeds in creating the dirs on the non-hashed subvols. However, to set the layout, dht takes an inodelk on the hashed subvol which fails because the directory does on exist there. We therefore end up with directories on the non-hashed subvols with no layouts set.

--- Additional comment from Worker Ant on 2019-02-12 08:34:01 UTC ---

REVIEW: https://review.gluster.org/22195 (cluster/dht: Fix lookup selfheal and rmdir race) posted (#1) for review on master by N Balachandran

--- Additional comment from Worker Ant on 2019-02-13 18:20:26 UTC ---

REVIEW: https://review.gluster.org/22195 (cluster/dht: Fix lookup selfheal and rmdir race) merged (#3) on master by Raghavendra G

Comment 1 Worker Ant 2019-04-03 04:14:43 UTC

REVIEW: https://review.gluster.org/22486 (cluster/dht: Fix lookup selfheal and rmdir race) posted (#1) for review on release-5 by N Balachandran

Comment 4 Worker Ant 2019-04-08 14:06:54 UTC

REVIEW: https://review.gluster.org/22486 (cluster/dht: Fix lookup selfheal and rmdir race) merged (#2) on release-5 by Shyamsundar Ranganathan

Comment 5 Shyamsundar 2019-04-18 11:10:09 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.6, please open a new bug report.

glusterfs-5.6 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-April/000123.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.