Description of problem: while deleting data from Distributed volume(NFS mount), for few Directories it failed with 'Directory not empty' error. [u2@rhs-client22 nfsdht]$ rm -rf mv7 rm: cannot remove `mv7/8/etc1': Directory not empty rm: cannot remove `mv7/8/etc3': Directory not empty rm: cannot remove `mv7/8/etc5': Directory not empty rm: cannot remove `mv7/8/etc6/pki/tls/misc': Directory not empty rm: cannot remove `mv7/8/etc7': Directory not empty --> While checking on backend found that data(Directories and files) is present on one brick/sub-volume but data inside that Directory is not accessible from mount point [root@OVM1 brick1]# ls /rhs/brick1/*/mv7/8/etc3/ /rhs/brick1/212/mv7/8/etc3/: /rhs/brick1/d1/mv7/8/etc3/: testdata Version-Release number of selected component (if applicable): 3.4.0.59rhs-1.el6rhs.x86_64 How reproducible: 2/5 Steps to Reproduce: 1.had a Distributed volume. 2. created many files and Directories on it. 4 bricks out of 5 bricks were 100% full. 3. started deleting data from mount using rm -rf. rm -rf failed fore few directories [u2@rhs-client22 nfsdht]$ rm -rf mv7 rm: cannot remove `mv7/8/etc1': Directory not empty rm: cannot remove `mv7/8/etc3': Directory not empty rm: cannot remove `mv7/8/etc5': Directory not empty rm: cannot remove `mv7/8/etc6/pki/tls/misc': Directory not empty rm: cannot remove `mv7/8/etc7': Directory not empty --> mount point did not show any data for failed directory [root@rhs-client22 24]# ls /mnt/dht1/mv7/8/etc3 [root@rhs-client22 24]# ls -l /mnt/dht1/mv7/8/etc3 total 0 -->verified on bricks. One brick has data for that Directory bricks:- [root@OVM2 brick1]# ls /rhs/brick1/*/mv7/8/etc3/ /rhs/brick1/d1/mv7/8/etc3/: /rhs/brick1/d2/mv7/8/etc3/: [root@OVM1 brick1]# ls -l /rhs/brick1/*/mv7/8/etc3 /rhs/brick1/212/mv7/8/etc3: total 0 /rhs/brick1/d1/mv7/8/etc3: total 8 drwxrwxr-x 51 503 503 8192 Feb 6 14:49 testdata <------------data present [root@OVM3 brick1]# ls -l /rhs/brick1/*/mv7/8/etc3 total 0 layout info for that directory:- bricks:- getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/d1/mv7/8/etc3/ trusted.gfid=0xb65142294a2d46dc9f884072ade87cab trusted.glusterfs.dht=0x000000010000000099999999cccccccb # file: rhs/brick1/d2/mv7/8/etc3/ trusted.gfid=0xb65142294a2d46dc9f884072ade87cab trusted.glusterfs.dht=0x00000001000000006666666699999998 [root@OVM1 brick1]# getfattr -d -m . -e hex /rhs/brick1/*/mv7/8/etc3/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/212/mv7/8/etc3/ trusted.gfid=0xb65142294a2d46dc9f884072ade87cab trusted.glusterfs.dht=0x0000000100000000ccccccccffffffff # file: rhs/brick1/d1/mv7/8/etc3/ trusted.gfid=0xb65142294a2d46dc9f884072ade87cab trusted.glusterfs.dht=0x0000000100000000 [root@OVM1 brick1]# getfattr -d -m . -e hex /rhs/brick1/*/mv7/8/etc3/tes* getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/d1/mv7/8/etc3/testdata trusted.gfid=0x3d44cbc52bc64139a536303ab7f2a4cb trusted.glusterfs.dht=0x00000001000000007fffffffffffffff 3333333366666665 [root@OVM3 brick1]# getfattr -d -m . -e hex /rhs/brick1/*/mv7/8/etc3/ getfattr: Removing leading '/' from absolute path names # file: rhs/brick1/d1/mv7/8/etc3/ trusted.gfid=0xb65142294a2d46dc9f884072ade87cab trusted.glusterfs.dht=0x00000001000000000000000033333332 Actual results: rm -rf failed for few files with error 'Directory not empty' error on mount point and data inside it is not accessible from mount point. Expected results: rm -rf should remove all data and should not fail. If any Directory has data it should be displayed on mount point Additional info: Bug 960910 talks about rm -rf failing with same error but looking at comment#7 and onwards it's case of stale link file(backend has only stale link files) but in this case backed has Directory and files(not stale link files) and those are not accessible from mount point so opened new bug.
Sent one possible fix for the bug: http://review.gluster.org/#/c/7733/. The fix addresses the following issue. * POSIX_READDIRP function fills the stat information of all the entries present in the directory. If lstat of an entry fails, it used to fill the stat information of the current file with that of the the previous entry read. e.g let say the current entry was a file and the previous entry read was a directory. And if the lstat of current file failed, the stat info for current file will be filled with that of the previous directory. Hence, the file will be treated as a directory. Now one of the following two scenario may happen as dht_readdirp takes directory entry only from the first up subvolume. 1) If the file (now a directory for dht because of wrong stat) is not present on the first_up_subvolume, then it won't be processed for deletion. 2) Even if it is present on first_up_subvolume, a rmdir call will go for the file(corrupted stat) which will result in to "Not a directory" ERROR. And we will see a "Directory Not Empty" error while trying to unlink the parent directory. *** This bug has been marked as a duplicate of bug 960910 ***
Marked duplicate as the fix :http://review.gluster.org/#/c/7733/ is a possible fix for both "Directory Not empty" and "Is a directory" error. Please reopen this bug if reproduced in future.