Bug 960910 - "rm -rf" failed to remove directory complained "directory not empty" from fuse mount
Summary: "rm -rf" failed to remove directory complained "directory not empty" from fus...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: glusterfs
Version: 2.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Sakshi
QA Contact: shylesh
URL:
Whiteboard: dht-rm-rf , triaged
: 1065287 (view as bug list)
Depends On: 1115367 1245065 1257894
Blocks: 966848 1096578
TreeView+ depends on / blocked
 
Reported: 2013-05-08 09:49 UTC by Rahul Hinduja
Modified: 2019-11-14 06:21 UTC (History)
18 users (show)

Fixed In Version: glusterfs-3.6.0.14-1.el6rhs
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 966848 1096578 (view as bug list)
Environment:
Last Closed: 2016-01-01 11:48:15 UTC
Embargoed:
spalai: needinfo-


Attachments (Terms of Use)

Description Rahul Hinduja 2013-05-08 09:49:32 UTC
Description of problem:
=======================

Stopped the rebalance in between using rebalance stop and tried to delete the data from the fuse mount using rm -rf *. It failed with the reason "directory not empty"

Output:
=======

[root@darrel fuse]# rm -rf *
rm: cannot remove `etc.10/latrace.d': Directory not empty
rm: cannot remove `etc.10/selinux/targeted/modules/active/modules': Directory not empty


Version-Release number of selected component (if applicable):
=============================================================

[root@darrel ~]# rpm -qa | grep gluster
glusterfs-fuse-3.4.0.4rhs-1.el6.x86_64
glusterfs-3.4.0.4rhs-1.el6.x86_64
glusterfs-debuginfo-3.4.0.4rhs-1.el6.x86_64
glusterfs-devel-3.4.0.4rhs-1.el6.x86_64
glusterfs-rdma-3.4.0.4rhs-1.el6.x86_64
[root@darrel ~]# 


Steps Carried:
==============
1. Created 6*2 volume named vol-dis-rep
2. Fuse mount on client (darrel) at /mnt/fuse
3. Created lots of directories and files from the fuse mount as
"for i in {1..100}; do cp -rf /etc etc.$i ; done"
4. Converted volume from 6*2 to 7*2 by adding 2 bricks
5. Started rebalance using "gluster volume rebalance vol-dis-rep start force"
6. After about an 1.5 hour, confirmed that rebalance is in progress.
7. Stopped the rebalance. Rebalance is stopped confirmed from rebalance status cli as well as the process on all the server were not present.
8. Tried to delete the files and directories from the fuse mount using "rm -rf *"

 
Actual results:
===============

[root@darrel fuse]# rm -rf *
rm: cannot remove `etc.10/latrace.d': Directory not empty
rm: cannot remove `etc.10/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.10/rc.d/rc3.d': Directory not empty
rm: cannot remove `etc.10/libreport/events.d': Directory not empty
rm: cannot remove `etc.100/udev/rules.d': Directory not empty
rm: cannot remove `etc.100/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.11/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.11/rc.d/init.d': Directory not empty
rm: cannot remove `etc.11/init': Directory not empty
rm: cannot remove `etc.12/rc.d/init.d': Directory not empty
rm: cannot remove `etc.12/rc.d/rc4.d': Directory not empty
rm: cannot remove `etc.13/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.13/rc.d/init.d': Directory not empty
rm: cannot remove `etc.13/rc.d/rc4.d': Directory not empty
rm: cannot remove `etc.13/rc.d/rc5.d': Directory not empty
rm: cannot remove `etc.15/rc.d/rc6.d': Directory not empty
rm: cannot remove `etc.15/postfix': Directory not empty
rm: cannot remove `etc.16/rc.d/rc2.d': Directory not empty
rm: cannot remove `etc.17/fonts/conf.avail': Directory not empty
rm: cannot remove `etc.17/profile.d': Directory not empty
rm: cannot remove `etc.18/sysconfig/ha/web/secure': Directory not empty
rm: cannot remove `etc.19/security': Directory not empty
rm: cannot remove `etc.20/fonts/conf.d': Directory not empty
rm: cannot remove `etc.20/dbus-1/system.d': Directory not empty
rm: cannot remove `etc.22/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.23/libreport/events.d': Directory not empty
rm: cannot remove `etc.25/rc.d/rc0.d': Directory not empty
rm: cannot remove `etc.26/makedev.d': Directory not empty
rm: cannot remove `etc.26/latrace.d': Directory not empty
rm: cannot remove `etc.27/init': Directory not empty
rm: cannot remove `etc.27/alternatives': Directory not empty
rm: cannot remove `etc.29/rc.d/init.d': Directory not empty
rm: cannot remove `etc.30/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.30/sysconfig': Directory not empty
rm: cannot remove `etc.31/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.31/pam.d': Directory not empty
rm: cannot remove `etc.32/rc.d/init.d': Directory not empty
rm: cannot remove `etc.34/sysconfig': Directory not empty
rm: cannot remove `etc.34/rc.d/rc5.d': Directory not empty
rm: cannot remove `etc.35/sysconfig': Directory not empty
rm: cannot remove `etc.35/alternatives': Directory not empty
rm: cannot remove `etc.37/fonts/conf.avail': Directory not empty
rm: cannot remove `etc.38/makedev.d': Directory not empty
rm: cannot remove `etc.39/oddjobd.conf.d': Directory not empty
rm: cannot remove `etc.40/rc.d/rc5.d': Directory not empty
rm: cannot remove `etc.41/rc.d/rc3.d': Directory not empty
rm: cannot remove `etc.41/rc.d/rc0.d': Directory not empty
rm: cannot remove `etc.44/postfix': Directory not empty
rm: cannot remove `etc.45/sysconfig': Directory not empty
rm: cannot remove `etc.47/fonts/conf.avail': Directory not empty
rm: cannot remove `etc.47/rc.d/rc5.d': Directory not empty
rm: cannot remove `etc.48/rc.d/init.d': Directory not empty
rm: cannot remove `etc.5/rc.d/init.d': Directory not empty
rm: cannot remove `etc.50/rc.d/rc3.d': Directory not empty
rm: cannot remove `etc.51/rc.d/rc4.d': Directory not empty
rm: cannot remove `etc.51/rc.d/rc5.d': Directory not empty
rm: cannot remove `etc.52/X11/fontpath.d': Directory not empty
rm: cannot remove `etc.53/rc.d/rc4.d': Directory not empty
rm: cannot remove `etc.53/rc.d/rc2.d': Directory not empty
rm: cannot remove `etc.54/lsb-release.d': Directory not empty
rm: cannot remove `etc.54/iproute2': Directory not empty
rm: cannot remove `etc.55/sysconfig': Directory not empty
rm: cannot remove `etc.56/alternatives': Directory not empty
rm: cannot remove `etc.57/rc.d/rc1.d': Directory not empty
rm: cannot remove `etc.58/rc.d/init.d': Directory not empty
rm: cannot remove `etc.58/rc.d/rc4.d': Directory not empty
rm: cannot remove `etc.59/rc.d/rc1.d': Directory not empty
rm: cannot remove `etc.6/pam.d': Directory not empty
rm: cannot remove `etc.60/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.61/logrotate.d': Directory not empty
rm: cannot remove `etc.61/pam.d': Directory not empty
rm: cannot remove `etc.63/libreport/events.d': Directory not empty
rm: cannot remove `etc.64/fonts/conf.avail': Directory not empty
rm: cannot remove `etc.64/alternatives': Directory not empty
rm: cannot remove `etc.65/sysconfig/network-scripts': Directory not empty
rm: cannot remove `etc.65/rc.d/init.d': Directory not empty
rm: cannot remove `etc.65/rc.d/rc0.d': Directory not empty
rm: cannot remove `etc.66/sysconfig': Directory not empty
rm: cannot remove `etc.66/init': Directory not empty

rm: cannot remove `etc.7/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.7/rc.d/init.d': Directory not empty
rm: cannot remove `etc.7/dbus-1/system.d': Directory not empty
rm: cannot remove `etc.71': Directory not empty
rm: cannot remove `etc.72/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.74/sysconfig': Directory not empty
rm: cannot remove `etc.74/pam.d': Directory not empty
rm: cannot remove `etc.75/fonts/conf.d': Directory not empty
rm: cannot remove `etc.75/rc.d/rc1.d': Directory not empty
rm: cannot remove `etc.75/pam.d': Directory not empty
rm: cannot remove `etc.76/profile.d': Directory not empty
rm: cannot remove `etc.76/rc.d/rc4.d': Directory not empty
rm: cannot remove `etc.76/rc.d/rc2.d': Directory not empty
rm: cannot remove `etc.77/rc.d/rc4.d': Directory not empty
rm: cannot remove `etc.77/rc.d/rc0.d': Directory not empty
rm: cannot remove `etc.77/ssh': Directory not empty
rm: cannot remove `etc.79/rc.d/rc3.d': Directory not empty
rm: cannot remove `etc.8/ppp': Directory not empty
rm: cannot remove `etc.8/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.8/rc.d/init.d': Directory not empty
rm: cannot remove `etc.8/rc.d/rc5.d': Directory not empty
rm: cannot remove `etc.8/ssh': Directory not empty
rm: cannot remove `etc.82/rc.d/init.d': Directory not empty
rm: cannot remove `etc.82/init': Directory not empty
rm: cannot remove `etc.83/sysconfig/network-scripts': Directory not empty
rm: cannot remove `etc.83/rc.d/init.d': Directory not empty
rm: cannot remove `etc.83/rc.d/rc5.d': Directory not empty
rm: cannot remove `etc.84/rc.d/rc0.d': Directory not empty
rm: cannot remove `etc.86/fonts/conf.avail': Directory not empty
rm: cannot remove `etc.86/alternatives': Directory not empty
rm: cannot remove `etc.88/rc.d/init.d': Directory not empty
rm: cannot remove `etc.88/alternatives': Directory not empty
rm: cannot remove `etc.89/sysconfig/network-scripts': Directory not empty
rm: cannot remove `etc.89/sysconfig/ha/web/secure': Directory not empty
rm: cannot remove `etc.89/rc.d/init.d': Directory not empty
rm: cannot remove `etc.89/rc.d/rc1.d': Directory not empty
rm: cannot remove `etc.89/rc.d/rc0.d': Directory not empty
rm: cannot remove `etc.89/init': Directory not empty
rm: cannot remove `etc.90/fonts/conf.d': Directory not empty
rm: cannot remove `etc.90/makedev.d': Directory not empty
rm: cannot remove `etc.90/latrace.d': Directory not empty
rm: cannot remove `etc.90/sysconfig': Directory not empty
rm: cannot remove `etc.91/sysconfig/ha/web/secure': Directory not empty
rm: cannot remove `etc.92/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.92/alternatives': Directory not empty
rm: cannot remove `etc.93/profile.d': Directory not empty
rm: cannot remove `etc.93/rc.d/rc1.d': Directory not empty
rm: cannot remove `etc.94/sysconfig/network-scripts': Directory not empty
rm: cannot remove `etc.94/rc.d/rc4.d': Directory not empty
rm: cannot remove `etc.94/rc.d/rc2.d': Directory not empty
rm: cannot remove `etc.95/fonts/conf.avail': Directory not empty
rm: cannot remove `etc.95/makedev.d': Directory not empty
rm: cannot remove `etc.97/rc.d/rc1.d': Directory not empty
rm: cannot remove `etc.97/rc.d/rc3.d': Directory not empty
rm: cannot remove `etc.99/selinux/targeted/modules/active/modules': Directory not empty
rm: cannot remove `etc.99/sysconfig/network-scripts': Directory not empty
[root@darrel fuse]# 



Expected results:
=================

rm -rf should be successful.


Additional info:
================

Initial setup:
==============

[root@rhs-client11 ~]# gluster volume info 
 
Volume Name: vol-dis-rep
Type: Distributed-Replicate
Volume ID: 946c9a5a-db42-42eb-82e5-42a09ec1b18f
Status: Started
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.36.35:/rhs/brick1/b1
Brick2: 10.70.36.36:/rhs/brick1/b2
Brick3: 10.70.36.35:/rhs/brick1/b3
Brick4: 10.70.36.36:/rhs/brick1/b4
Brick5: 10.70.36.35:/rhs/brick1/b5
Brick6: 10.70.36.36:/rhs/brick1/b6
Brick7: 10.70.36.37:/rhs/brick1/b7
Brick8: 10.70.36.38:/rhs/brick1/b8
Brick9: 10.70.36.37:/rhs/brick1/b9
Brick10: 10.70.36.38:/rhs/brick1/b10
Brick11: 10.70.36.37:/rhs/brick1/b11
Brick12: 10.70.36.38:/rhs/brick1/b12
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# gluster volume status
Status of volume: vol-dis-rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.36.35:/rhs/brick1/b1			49152	Y	5622
Brick 10.70.36.36:/rhs/brick1/b2			49152	Y	5269
Brick 10.70.36.35:/rhs/brick1/b3			49153	Y	5458
Brick 10.70.36.36:/rhs/brick1/b4			49153	Y	5278
Brick 10.70.36.35:/rhs/brick1/b5			49154	Y	5467
Brick 10.70.36.36:/rhs/brick1/b6			49154	Y	5287
Brick 10.70.36.37:/rhs/brick1/b7			49155	Y	3388
Brick 10.70.36.38:/rhs/brick1/b8			49152	Y	5269
Brick 10.70.36.37:/rhs/brick1/b9			49156	Y	3393
Brick 10.70.36.38:/rhs/brick1/b10			49153	Y	5278
Brick 10.70.36.37:/rhs/brick1/b11			49157	Y	3394
Brick 10.70.36.38:/rhs/brick1/b12			49154	Y	5287
NFS Server on localhost					2049	Y	5632
Self-heal Daemon on localhost				N/A	Y	5639
NFS Server on c6b5d4e9-3782-457c-8542-f32b0941ed05	2049	Y	5514
Self-heal Daemon on c6b5d4e9-3782-457c-8542-f32b0941ed0
5							N/A	Y	5521
NFS Server on f9cc4b9c-97e1-4f65-9657-3b050d45296e	2049	Y	5505
Self-heal Daemon on f9cc4b9c-97e1-4f65-9657-3b050d45296
e							N/A	Y	5512
NFS Server on 6962d204-37c8-436b-8ea6-a9698be40ec6	2049	Y	5224
Self-heal Daemon on 6962d204-37c8-436b-8ea6-a9698be40ec
6							N/A	Y	5231
 
There are no active volume tasks
[root@rhs-client11 ~]# 


[root@rhs-client11 ~]# gluster volume add-brick vol-dis-rep 10.70.36.35:/rhs/brick1/nb1 10.70.36.36:/rhs/brick1/nb2
volume add-brick: success
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# gluster volume info
 
Volume Name: vol-dis-rep
Type: Distributed-Replicate
Volume ID: 946c9a5a-db42-42eb-82e5-42a09ec1b18f
Status: Started
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: 10.70.36.35:/rhs/brick1/b1
Brick2: 10.70.36.36:/rhs/brick1/b2
Brick3: 10.70.36.35:/rhs/brick1/b3
Brick4: 10.70.36.36:/rhs/brick1/b4
Brick5: 10.70.36.35:/rhs/brick1/b5
Brick6: 10.70.36.36:/rhs/brick1/b6
Brick7: 10.70.36.37:/rhs/brick1/b7
Brick8: 10.70.36.38:/rhs/brick1/b8
Brick9: 10.70.36.37:/rhs/brick1/b9
Brick10: 10.70.36.38:/rhs/brick1/b10
Brick11: 10.70.36.37:/rhs/brick1/b11
Brick12: 10.70.36.38:/rhs/brick1/b12
Brick13: 10.70.36.35:/rhs/brick1/nb1
Brick14: 10.70.36.36:/rhs/brick1/nb2
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# gluster volume rebalance vol-dis-rep start force
volume rebalance: vol-dis-rep: success: Starting rebalance on volume vol-dis-rep has been successful.
ID: 50afc5f2-aa18-415b-b869-22cb3ac96c06
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# gluster volume rebalance vol-dis-rep status
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost               12        18.2KB            68             0    in progress            24.00
                             10.70.36.36                0        0Bytes          4207             0    in progress            24.00
                             10.70.36.37               13        20.5KB           101             0    in progress            25.00
                             10.70.36.38                0        0Bytes          4207             0    in progress            24.00
volume rebalance: vol-dis-rep: success: 
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# gluster volume rebalance vol-dis-rep status                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost             3334        39.1MB         10233             0    in progress          4471.00
                             10.70.36.36                0        0Bytes        174800             0      completed          1061.00
                             10.70.36.37             3303        50.8MB          9108             0    in progress          4471.00
                             10.70.36.38                0        0Bytes        174800             0      completed          1061.00
volume rebalance: vol-dis-rep: success: 
[root@rhs-client11 ~]# gluster volume rebalance vol-dis-rep stop
                                    Node Rebalanced-files          size       scanned      failures         status run time in secs
                               ---------      -----------   -----------   -----------   -----------   ------------   --------------
                               localhost             3336        39.1MB         10235             0        stopped          4474.00
                             10.70.36.36                0        0Bytes        174800             0      completed          1061.00
                             10.70.36.37             3305        50.8MB          9110             0        stopped          4475.00
                             10.70.36.38                0        0Bytes        174800             0      completed          1061.00
volume rebalance: vol-dis-rep: success: rebalance process may be in the middle of a file migration.
The process will be fully stopped once the migration of the file is complete.
Please check rebalance process for completion before doing any further brick related tasks on the volume.
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# 
[root@rhs-client11 ~]# ps -eaf | grep rebalance
root      7419  5189  0 14:46 pts/3    00:00:00 grep rebalance
[root@rhs-client11 ~]# 

[root@rhs-client12 ~]# ps -eaf | grep rebalance
root      6483  5185  0 14:47 pts/3    00:00:00 grep rebalance
[root@rhs-client12 ~]# 

[root@rhs-client13 ~]# ps -eaf | grep rebalance
root      6232  5211  0 14:47 pts/0    00:00:00 grep rebalance
[root@rhs-client13 ~]# 

[root@rhs-client14 ~]# ps -eaf | grep rebalance
root      6462  5185  0 14:47 pts/3    00:00:00 grep rebalance
[root@rhs-client14 ~]#

Comment 3 shishir gowda 2013-05-08 12:54:09 UTC
xattrs and ls -l of /etc.10/latrace.d from the bricks:

[root@rhs-client11 ~]# getfattr -m . -d -e hex /rhs/brick1/*/etc.10/latrace.d
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b1/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x0000000100000000492492486db6db6b

# file: rhs/brick1/b3/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x00000001000000006db6db6c9249248f

# file: rhs/brick1/b5/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x000000010000000092492490b6db6db3

# file: rhs/brick1/nb1/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x00000001000000002492492449249247

[root@rhs-client11 ~]# ls -l /rhs/brick1/*/etc.10/latrace.d
/rhs/brick1/b1/etc.10/latrace.d:
total 0

/rhs/brick1/b3/etc.10/latrace.d:
total 0

/rhs/brick1/b5/etc.10/latrace.d:
total 0

/rhs/brick1/nb1/etc.10/latrace.d:
total 0

[root@rhs-client12 ~]# getfattr -m . -d -e hex /rhs/brick1/*/etc.10/latrace.d
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b2/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x0000000100000000492492486db6db6b

# file: rhs/brick1/b4/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x00000001000000006db6db6c9249248f

# file: rhs/brick1/b6/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x000000010000000092492490b6db6db3

# file: rhs/brick1/nb2/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x00000001000000002492492449249247

[root@rhs-client12 ~]# ls -l /rhs/brick1/*/etc.10/latrace.d
/rhs/brick1/b2/etc.10/latrace.d:
total 0

/rhs/brick1/b4/etc.10/latrace.d:
total 0

/rhs/brick1/b6/etc.10/latrace.d:
total 0

/rhs/brick1/nb2/etc.10/latrace.d:
total 0


[root@rhs-client13 ~]# getfattr -m . -d -e hex /rhs/brick1/*/etc.10/latrace.d
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b11/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x00000001000000000000000024924923

# file: rhs/brick1/b7/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x0000000100000000b6db6db4db6db6d7

# file: rhs/brick1/b9/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x0000000100000000db6db6d8ffffffff

[root@rhs-client13 ~]# ls -l /rhs/brick1/*/etc.10/latrace.d
/rhs/brick1/b11/etc.10/latrace.d:
total 4
-rw-r--r-- 2 root root 2289 May  8 13:13 resource.conf

/rhs/brick1/b7/etc.10/latrace.d:
total 32
-rw-r--r-- 2 root root  273 May  8 13:13 getopt.conf
-rw-r--r-- 2 root root  655 May  8 13:13 inet.conf
-rw-r--r-- 2 root root   68 May  8 13:13 ioctl.conf
-rw-r--r-- 2 root root  869 May  8 13:13 pwd.conf
-rw-r--r-- 2 root root 3365 May  8 13:13 socket.conf
-rw-r--r-- 2 root root 6327 May  8 13:13 stdlib.conf
-rw-r--r-- 2 root root 2834 May  8 13:13 string.conf

/rhs/brick1/b9/etc.10/latrace.d:
total 40
-rw-r--r-- 2 root root  651 May  8 13:13 libintl.conf
-rw-r--r-- 2 root root  286 May  8 13:13 locale.conf
-rw-r--r-- 2 root root 3937 May  8 13:13 netdb.conf
-rw-r--r-- 2 root root 7686 May  8 13:13 pthread.conf
-rw-r--r-- 2 root root 3943 May  8 13:13 stdio.conf
-rw-r--r-- 2 root root  198 May  8 13:13 syslog.conf
-rw-r--r-- 2 root root  934 May  8 13:13 term.conf
-rw-r--r-- 2 root root 4399 May  8 13:13 unistd.conf


[root@rhs-client14 ~]# getfattr -m . -d -e hex /rhs/brick1/*/etc.10/latrace.d
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b10/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x0000000100000000db6db6d8ffffffff

# file: rhs/brick1/b12/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x00000001000000000000000024924923

# file: rhs/brick1/b8/etc.10/latrace.d
trusted.gfid=0xf4b6877a26c1494b9d3f2333edaede4f
trusted.glusterfs.dht=0x0000000100000000b6db6db4db6db6d7

[root@rhs-client14 ~]# ls -l /rhs/brick1/*/etc.10/latrace.d
/rhs/brick1/b10/etc.10/latrace.d:
total 40
-rw-r--r-- 2 root root  651 May  8 13:13 libintl.conf
-rw-r--r-- 2 root root  286 May  8 13:13 locale.conf
-rw-r--r-- 2 root root 3937 May  8 13:13 netdb.conf
-rw-r--r-- 2 root root 7686 May  8 13:13 pthread.conf
-rw-r--r-- 2 root root 3943 May  8 13:13 stdio.conf
-rw-r--r-- 2 root root  198 May  8 13:13 syslog.conf
-rw-r--r-- 2 root root  934 May  8 13:13 term.conf
-rw-r--r-- 2 root root 4399 May  8 13:13 unistd.conf

/rhs/brick1/b12/etc.10/latrace.d:
total 4
-rw-r--r-- 2 root root 2289 May  8 13:13 resource.conf

/rhs/brick1/b8/etc.10/latrace.d:
total 32
-rw-r--r-- 2 root root  273 May  8 13:13 getopt.conf
-rw-r--r-- 2 root root  655 May  8 13:13 inet.conf
-rw-r--r-- 2 root root   68 May  8 13:13 ioctl.conf
-rw-r--r-- 2 root root  869 May  8 13:13 pwd.conf
-rw-r--r-- 2 root root 3365 May  8 13:13 socket.conf
-rw-r--r-- 2 root root 6327 May  8 13:13 stdlib.conf
-rw-r--r-- 2 root root 2834 May  8 13:13 string.conf

Comment 4 shishir gowda 2013-05-08 13:20:55 UTC
Errors on /etc.10 say files not present and unlink fails with ENOENT

[2013-05-08 13:17:14.466861] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392508: UNLINK() /etc.10/latrace.d/string.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.468042] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392509: UNLINK() /etc.10/latrace.d/stdlib.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.469448] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392510: UNLINK() /etc.10/latrace.d/getopt.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.470810] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392511: UNLINK() /etc.10/latrace.d/inet.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.472160] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392512: UNLINK() /etc.10/latrace.d/ioctl.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.473428] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392513: UNLINK() /etc.10/latrace.d/pwd.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.474896] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392514: UNLINK() /etc.10/latrace.d/socket.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.517258] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392538: UNLINK() /etc.10/latrace.d/unistd.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.518506] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392539: UNLINK() /etc.10/latrace.d/pthread.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.519553] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392540: UNLINK() /etc.10/latrace.d/locale.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.520635] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392541: UNLINK() /etc.10/latrace.d/libintl.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.521641] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392542: UNLINK() /etc.10/latrace.d/term.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.522741] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392543: UNLINK() /etc.10/latrace.d/stdio.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.523777] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392544: UNLINK() /etc.10/latrace.d/syslog.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.524899] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392545: UNLINK() /etc.10/latrace.d/netdb.conf => -1 (No such file or directory)
[2013-05-08 13:17:14.530367] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 1392552: UNLINK() /etc.10/latrace.d/resource.conf => -1 (No such file or directory)

But this is visible from the client:

[root@darrel latrace.d]# ls -l
total 42
-rw-r--r--. 1 root root  273 May  8  2013 getopt.conf
-rw-r--r--. 1 root root  655 May  8  2013 inet.conf
-rw-r--r--. 1 root root   68 May  8  2013 ioctl.conf
-rw-r--r--. 1 root root  651 May  8  2013 libintl.conf
-rw-r--r--. 1 root root  286 May  8  2013 locale.conf
-rw-r--r--. 1 root root 3937 May  8  2013 netdb.conf
-rw-r--r--. 1 root root 7686 May  8  2013 pthread.conf
-rw-r--r--. 1 root root  869 May  8  2013 pwd.conf
-rw-r--r--. 1 root root 2289 May  8  2013 resource.conf
-rw-r--r--. 1 root root 3365 May  8  2013 socket.conf
-rw-r--r--. 1 root root 3943 May  8  2013 stdio.conf
-rw-r--r--. 1 root root 6327 May  8  2013 stdlib.conf
-rw-r--r--. 1 root root 2834 May  8  2013 string.conf
-rw-r--r--. 1 root root  198 May  8  2013 syslog.conf
-rw-r--r--. 1 root root  934 May  8  2013 term.conf
-rw-r--r--. 1 root root 4399 May  8  2013 unistd.conf
[root@darrel latrace.d]# pwd
/mnt/fuse/etc.10/latrace.d

The issue seems to be this:

1. dht_unlink sends unlink on hashed_subvol (hashed!=cached)
2. dht_unlink_linkfile_cbk get op_ret = -1, and ENOENT error
3. we error out instead of continuing unlink on cached

Comment 5 shishir gowda 2013-05-09 07:17:21 UTC
Upstream fix sent @ http://review.gluster.org/#/c/4971/. Fix https://code.engineering.redhat.com/gerrit/#/c/7308/ is merged downstream

Comment 7 shishir gowda 2013-05-14 07:05:56 UTC
[root@rhs-client11 bricks]# ls -l /rhs/brick1/*/fuse/etc.21/selinux/*/*/*/*
/rhs/brick1/b1/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0

/rhs/brick1/b3/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0
---------T 2 root root 0 May 13 17:25 aisexec.pp
---------T 2 root root 0 May 13 17:25 antivirus.pp
---------T 2 root root 0 May 13 17:25 audioentropy.pp
---------T 2 root root 0 May 13 17:25 awstats.pp
---------T 2 root root 0 May 13 17:25 git.pp
---------T 2 root root 0 May 13 17:25 sanlock.pp

/rhs/brick1/b5/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0
---------T 2 root root 0 May 13 17:25 boinc.pp
---------T 2 root root 0 May 13 17:25 ctdbd.pp
---------T 2 root root 0 May 13 17:25 openshift-origin.pp
---------T 2 root root 0 May 13 17:25 sblim.pp
---------T 2 root root 0 May 13 17:25 ulogd.pp
---------T 2 root root 0 May 13 17:25 xfs.pp

/rhs/brick1/n1/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0
---------T 2 root root 0 May 13 17:25 portreserve.pp
---------T 2 root root 0 May 13 17:25 samba.pp

[root@rhs-client12 ~]# ls -l /rhs/brick1/*/fuse/etc.21/selinux/*/*/*/*
ls: cannot access ls: No such file or directory
/rhs/brick1/b2/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0

/rhs/brick1/b4/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0
---------T 2 root root 0 May 13 17:26 aisexec.pp
---------T 2 root root 0 May 13 17:26 antivirus.pp
---------T 2 root root 0 May 13 17:26 audioentropy.pp
---------T 2 root root 0 May 13 17:26 awstats.pp
---------T 2 root root 0 May 13 17:26 git.pp
---------T 2 root root 0 May 13 17:26 sanlock.pp

/rhs/brick1/b6/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0
---------T 2 root root 0 May 13 17:26 boinc.pp
---------T 2 root root 0 May 13 17:26 ctdbd.pp
---------T 2 root root 0 May 13 17:26 openshift-origin.pp
---------T 2 root root 0 May 13 17:26 sblim.pp
---------T 2 root root 0 May 13 17:26 ulogd.pp
---------T 2 root root 0 May 13 17:26 xfs.pp

/rhs/brick1/n2/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0
---------T 2 root root 0 May 13 17:26 portreserve.pp
---------T 2 root root 0 May 13 17:26 samba.pp

[root@rhs-client13 ~]# ls -l /rhs/brick1/*/fuse/etc.21/selinux/*/*/*/*
/rhs/brick1/b11/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0

/rhs/brick1/b7/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0

/rhs/brick1/b9/fuse/etc.21/selinux/targeted/modules/active/modules:

[root@rhs-client14 ~]# ls -l /rhs/brick1/*/fuse/etc.21/selinux/*/*/*/*
/rhs/brick1/b10/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0

/rhs/brick1/b12/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0

/rhs/brick1/b8/fuse/etc.21/selinux/targeted/modules/active/modules:
total 0

Looks like the stale link files were not getting deleted.
cd'ing into each dir (deepest leaf) and removing individual directories seems to fix the issue at hand.

[root@darrel fuse]# pwd
/mnt/test-fuse/fuse
[root@darrel fuse]# ls -l
total 0

Comment 8 shishir gowda 2013-05-14 07:56:00 UTC
Few observations:

1. dht_rmdir related calls to unlink linkfiles never seem to be triggered
   This observation is based on
   a. link files still present on backend
   b. No unlink errors reported by clients/servers
   c. No failures reported by DHT too

2. But, on a subsequent cd and rm -rf call, dht seems to be getting these linkfiles cleaned up in dht_lookup

[2013-05-13 17:35:38.792671] I [dht-common.c:1029:dht_lookup_everywhere_cbk] 1-vol-dis-rep-dht: deleting stale linkfile <gfid:af99922e-9a67-4f12-bf70-028b143f5a2c>/etc.99/alternatives/print-lpman on vol-dis-rep-replicate-5
[2013-05-13 17:35:38.797407] I [dht-common.c:1029:dht_lookup_everywhere_cbk] 1-vol-dis-rep-dht: deleting stale linkfile <gfid:af99922e-9a67-4f12-bf70-028b143f5a2c>/etc.99/alternatives/keytool on vol-dis-rep-replicate-6
[2013-05-13 17:35:38.801594] I [dht-common.c:1029:dht_lookup_everywhere_cbk] 1-vol-dis-rep-dht: deleting stale linkfile <gfid:af99922e-9a67-4f12-bf70-028b143f5a2c>/etc.99/alternatives/mta-sendmail on vol-dis-rep-replicate-5
[2013-05-13 17:35:38.827379] I [dht-common.c:1029:dht_lookup_everywhere_cbk] 1-vol-dis-rep-dht: deleting stale linkfile <gfid:af99922e-9a67-4f12-bf70-028b143f5a2c>/etc.99/alternatives/man-ip6tables-restore.x86_64 on vol-dis-rep-replicate-5

Comment 9 Ravishankar N 2013-05-21 09:43:53 UTC
Not able to reproduce the bug on RHS2.1 downstream repo (3.4.0.8)

Comment 11 Rachana Patel 2013-05-31 07:27:17 UTC
found similar issue in 3.3.0.10rhs-1.el6.x86_64

Steps to Reproduce:
1.had a cluster of 3 peers and dist-rep volume - 3X2, mounted as FUSE mount having some data


[root@rhsauto031 ~]# gluster v info dist-rep
 
Volume Name: dist-rep
Type: Distributed-Replicate
Volume ID: 9dee64a6-9f86-4463-bacc-47d97c750803
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick4/2
Brick2: rhsauto038.lab.eng.blr.redhat.com:/rhs/brick4/2
Brick3: rhsauto031.lab.eng.blr.redhat.com:/rhs/brick4/1
Brick4: rhsauto031.lab.eng.blr.redhat.com:/rhs/brick4/2
Brick5: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick4/3
Brick6: rhsauto031.lab.eng.blr.redhat.com:/rhs/brick4/3

mount point:-
[root@localhost system_light]# mount | grep rahul
glusterfs#rhsauto018.lab.eng.blr.redhat.com:/dist-rep on /mnt/rahulsanity type fuse (rw,default_permissions,allow_other,max_read=131072)

[root@localhost system_light]# ls /mnt/rahulsanity/
d27  d36  d44  d52  d60  d69  d77  d85  d93     file200  file29  file37  file45  file53  file61  file69  file77  file85  file93
d29  d37  d45  d53  d61  d7   d78  d86  d94     file21   file3   file38  file46  file54  file61  file7   file78  file86  file94
d3   d38  d46  d54  d62  d70  d79  d87  d95     file22   file30  file39  file47  file55  file62  file70  file79  file87  file95
d30  d39  d47  d55  d63  d71  d8   d88  d96     file23   file31  file4   file48  file56  file63  file71  file8   file88  file96
d31  d4   d48  d56  d64  d72  d80  d89  d97     file24   file32  file40  file49  file57  file64  file72  file80  file89  file97
d32  d40  d49  d57  d65  d73  d81  d9   d98     file25   file33  file41  file5   file58  file65  file73  file81  file9   file98
d33  d41  d5   d58  d66  d74  d82  d90  d99     file26   file34  file42  file50  file59  file66  file74  file82  file90  file99
d34  d42  d50  d59  d67  d75  d83  d91  file2   file27   file35  file43  file51  file6   file67  file75  file83  file91  run32490
d35  d43  d51  d6   d68  d76  d84  d92  file20  file28   file36  file44  file52  file60  file68  file76  file84  file92

2. added bricks to make it 6X2
[root@rhsauto031 ~]# gluster volume add-brick dist-rep rhsauto018.lab.eng.blr.redhat.com:/rhs/brick3/8 rhsauto038.lab.eng.blr.redhat.com:/rhs/brick3/8 rhsauto031.lab.eng.blr.redhat.com:/rhs/brick3/8  rhsauto031.lab.eng.blr.redhat.com:/rhs/brick3/6 rhsauto018.lab.eng.blr.redhat.com:/rhs/brick3/6 rhsauto038.lab.eng.blr.redhat.com:/rhs/brick3/6
Add Brick successful

3. start a rebalance with start force option

4. while rebalance is in progress. remove data from mount point using rm -rf *


[root@rhsauto031 ~]# gluster volume rebalance dist-rep status
                                    Node Rebalanced-files          size       scanned      failures         status
                               ---------      -----------   -----------   -----------   -----------   ------------
                               localhost               27     28311552           62            0    in progress
       rhsauto018.lab.eng.blr.redhat.com               32     26214400          167            0    in progress
                             10.70.37.13                0            0          254            0    in progress

[root@localhost system_light]# rm -rf  /mnt/rahulsanity/*
rm: cannot remove `/mnt/rahulsanity/d36/file30': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d40/f99': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d40/f100': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d40/f16': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d40/f54': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d40/file24': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d41/f1': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d41/f51': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d41/f83': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d41/f79': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d41/f66': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d42/f83': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d42/f94': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d42/file26': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d42/file39': Invalid argument
rm: cannot remove `/mnt/rahulsanity/d42/f41': Invalid argument
...
....

5. Once reblance is completed on all server

issue rm -rf * gain from mount point

[root@localhost system_light]# ls /mnt/rahulsanity/
d36  d42  d45  d48  d51  d54  d57  d60  d63  d66  d69  d72  d75  d78  d81  d84  d87  d90  d93  d96
d40  d43  d46  d49  d52  d55  d58  d61  d64  d67  d70  d73  d76  d79  d82  d85  d88  d91  d94  d97
d41  d44  d47  d50  d53  d56  d59  d62  d65  d68  d71  d74  d77  d80  d83  d86  d89  d92  d95  d98
[root@localhost system_light]# rm -rf  /mnt/rahulsanity/*
rm: cannot remove directory `/mnt/rahulsanity/d42': Directory not empty
rm: cannot remove directory `/mnt/rahulsanity/d75': Directory not empty
[root@localhost system_light]# ls /mnt/rahulsanity/
d42  d75
[root@localhost system_light]# ls -lR /mnt/rahulsanity/
/mnt/rahulsanity/:
total 0
d--------- 2 root root 162 May 31 12:42 d42
d--------- 2 root root 152 May 31 12:42 d75

/mnt/rahulsanity/d42:
total 0

/mnt/rahulsanity/d75:
total 0


############
checked on bricks, seems stale link file problem
[root@rhsauto018 ~]# ls -lR /rhs/brick4/2/d*
/rhs/brick4/2/d42:
total 0

/rhs/brick4/2/d75:
total 0
[root@rhsauto018 ~]# ls -lR /rhs/brick4/3/d*
/rhs/brick4/3/d42:
total 0
---------T 1 root root 0 May 31 12:15 f82

/rhs/brick4/3/d75:
total 0
[root@rhsauto018 ~]# ls -lR /rhs/brick3/8/d*
/rhs/brick3/8/d42:
total 0

/rhs/brick3/8/d75:
total 0
[root@rhsauto018 ~]# ls -lR /rhs/brick3/6/d*
/rhs/brick3/6/d42:
total 0
---------T 2 root root 0 May 31 12:15 file2
---------T 2 root root 0 May 31 12:15 file27
---------T 2 root root 0 May 31 12:15 file29
---------T 2 root root 0 May 31 12:15 file33
---------T 2 root root 0 May 31 12:15 file36
---------T 2 root root 0 May 31 12:15 file49
---------T 2 root root 0 May 31 12:15 file56
---------T 2 root root 0 May 31 12:15 file57
---------T 2 root root 0 May 31 12:15 file59

/rhs/brick3/6/d75:
total 0

Comment 14 Amar Tumballi 2013-05-31 09:11:08 UTC
I am for documenting this particular issue as 'Known issues' in update5.

Main reason not to fix it (the fix is available with RHS-2.1 bits) in RHS 2.0 update5 is the code has undergone significant changes around layout optimization etc, which means, we may have to import many patches to fix it in update5 and that would de-stabilize the current branch for other use cases as well.

Comment 17 spandura 2014-02-14 11:19:36 UTC
Able to re-create the issue on "glusterfs 3.4.0.59rhs built on Feb  4 2014 08:44:13"

I was unable to remove files/dirs from the mount point. Also from mount "ls" on the directory on which rmdir failed didn't list any contents. Checked the backend there were lot of stale Link to files .

Comment 19 Susant Kumar Palai 2014-05-02 12:56:42 UTC
From the bug report what I observed is "stale link files" are not getting deleted. But there can be other issues too.

1. Regular file issue:
   ------------------- 
   If a regular file is present on non-hashed subvolume and cluster.lookup-unhashed option is off, then we will run into this situation. remedy:cluster.lookup-unhashed is on)

3. Directory entry issue:
   ---------------------
a. Due to race between rmdir and lookup heal, we end up with directory entries, though rmdir thinks it has succeeded. Fix has been sent upstream : http://review.gluster.org/#/c/4846/ (Not merged)

b. After add-brick(+fix layout) if the new brick becomes the hashed subvolume for the directory entry, then dht_readdir will not list the directory entry. Hence, ENOTEMPTY will result for rm -rf * .(A bug with dht_readdir implementaion)

c. This one is a very corner issue. dht_readdirp currently employs the logic of taking  the directory entries from the first_up_subvol. But for some reason
(lets say mkdir) is not finished on first_up_subvol, then  readdirp will not list this directory entry & we will end up with ENOTEMPTY.

Comment 20 Susant Kumar Palai 2014-05-05 08:57:30 UTC
Did a testing for link file case:

TEST CASE:
(1). Created a 2 * 2 replicate volume

[root@localhost ~]# gluster v i
 
Volume Name: test1
Type: Distributed-Replicate
Volume ID: a6db438d-327a-42bc-9545-d452bea34be4
Status: Created
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: 192.168.122.11:/brick/1
Brick2: 192.168.122.11:/brick/2
Brick3: 192.168.122.11:/brick/3
Brick4: 192.168.122.11:/brick/4

Created linkfile by renaming files that hashes to a different subvolume.
And deleted the actual data file.

mnt : fuse mount 

[root@localhost mnt]# ll /brick/*/*
/brick/1/dir1:
total 0

/brick/2/dir1:
total 0

/brick/3/dir1:
total 4
---------T. 2 root root 0 May  5 03:02 tile

/brick/4/dir1:
total 4
---------T. 2 root root 0 May  5 03:02 tile
[root@localhost mnt]# rm -rf dir1

# Operation was successful. 


The patch that should fix the issue is : http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=188 (Was not able to open)

Commit id in upstream code: 9dbae0c80569689533c92a29871e3fa6dbbae1b9

Hence, at least we should not see ENOTEMPTY for stale link files.

Comment 21 Susant Kumar Palai 2014-05-05 09:01:02 UTC
Hi Rahul,
    Can you try to reproduce the issue again with the latest build, to see if we still get ENOTEMPTY error for stale link files ?

Comment 22 Nagaprasad Sathyanarayana 2014-05-06 11:43:37 UTC
Dev ack to 3.0 RHS BZs

Comment 23 Susant Kumar Palai 2014-05-22 15:45:25 UTC
Sent one possible fix for the bug: http://review.gluster.org/#/c/7733/.
The fix addresses the following issue.

* POSIX_READDIRP function fills the stat information of all the entries present in the directory. If lstat of an entry fails, it used to fill the stat information of the current file with that of the the previous entry read.  

e.g let say the current entry was a file and the previous entry read was a directory. And if the lstat of current file failed, the stat info for current file will be filled with that of the previous directory. Hence, the file will be treated as a directory.

Now one of the following two scenario may happen as dht_readdirp takes directory entry only from the first up subvolume.

1) If the file (now a directory for dht because of wrong stat) is not present on the first_up_subvolume, then it won't be processed for deletion.

2) Even if it is present on first_up_subvolume, a rmdir call will go for the file(corrupted stat) which will result in to  "Not a directory" ERROR.


And we will see a "Directory Not Empty" error while trying to unlink the parent directory.

Comment 24 Susant Kumar Palai 2014-05-23 08:42:19 UTC
Will move the bug to ON_QA once the fix merged. http://review.gluster.org/#/c/7733/

Comment 25 Susant Kumar Palai 2014-06-03 08:54:19 UTC
*** Bug 999496 has been marked as a duplicate of this bug. ***

Comment 26 Susant Kumar Palai 2014-06-03 08:54:52 UTC
*** Bug 1065287 has been marked as a duplicate of this bug. ***

Comment 27 shylesh 2014-06-20 10:54:46 UTC
This issue is still reproducible 

Mount still says 
[root@rhs-client4 test]# rm -rf 1
rm: cannot remove `1': Directory not empty


mount logs says 
===============
[2014-06-20 10:22:58.789105] I [client-handshake.c:188:client_set_lk_version_cbk] 0-test-client-2: Server lk version = 1
[2014-06-20 10:22:58.789154] I [client-handshake.c:1200:client_setvolume_cbk] 0-test-client-3: Connected to test-client-3, attached to remote volume '/home/t3'.
[2014-06-20 10:22:58.789172] I [client-handshake.c:1212:client_setvolume_cbk] 0-test-client-3: Server and Client lk-version numbers are not same, reopening the fds
[2014-06-20 10:22:58.789350] I [client-handshake.c:188:client_set_lk_version_cbk] 0-test-client-0: Server lk version = 1
[2014-06-20 10:22:58.789414] I [client-handshake.c:188:client_set_lk_version_cbk] 0-test-client-1: Server lk version = 1
[2014-06-20 10:22:58.789453] I [client-handshake.c:1200:client_setvolume_cbk] 0-test-client-4: Connected to test-client-4, attached to remote volume '/home/t4'.
[2014-06-20 10:22:58.789472] I [client-handshake.c:1212:client_setvolume_cbk] 0-test-client-4: Server and Client lk-version numbers are not same, reopening the fds
[2014-06-20 10:22:58.796312] I [fuse-bridge.c:5042:fuse_graph_setup] 0-fuse: switched to graph 0
[2014-06-20 10:22:58.796491] I [client-handshake.c:188:client_set_lk_version_cbk] 0-test-client-4: Server lk version = 1
[2014-06-20 10:22:58.796556] I [client-handshake.c:188:client_set_lk_version_cbk] 0-test-client-3: Server lk version = 1
[2014-06-20 10:22:58.796626] I [fuse-bridge.c:3971:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.22 kernel 7.13
[2014-06-20 10:35:39.924058] W [client-rpc-fops.c:679:client3_3_rmdir_cbk] 0-test-client-4: remote operation failed: Directory not empty
[2014-06-20 10:35:39.924521] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-test-client-4: remote operation failed: File exists. Path: /1
[2014-06-20 10:35:39.924566]  [MSGID: 109005] [dht-selfheal.c:574:dht_selfheal_dir_mkdir_cbk] 0-test-dht: Directory selfheal failed: path = /1, gfid = 00000000-0000-0000-0000-000000000000 [File exists]
[2014-06-20 10:35:39.924721] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-test-client-0: remote operation failed: File exists. Path: /1
[2014-06-20 10:35:39.924752]  [MSGID: 109005] [dht-selfheal.c:574:dht_selfheal_dir_mkdir_cbk] 0-test-dht: Directory selfheal failed: path = /1, gfid = 00000000-0000-0000-0000-000000000000 [File exists]

Comment 31 Harold Miller 2014-09-02 20:04:34 UTC
Customer is asking if we have any additional guidance  on when this will be included in a publishedf release.

Comment 32 Wesley Duffee-Braun 2014-09-05 18:52:23 UTC
Echoing Harold's ask, is there anything we can provide to help firm up the 3.0.z target?

Comment 39 shylesh 2014-09-26 09:08:03 UTC
I could reproduce this bug on 3.6.0.29-1.el6rhs.x86_64 once. i tried again to collect the logs but couldn't reproduce consistently, i will update once it is reproduced.


Note You need to log in before you can comment on or make changes to this bug.