Bug 1394244

Summary: directory deletion failing with directory not empty
Product: Red Hat Gluster Storage Reporter: Nag Pavan Chilakam <nchilaka>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED NOTABUG QA Contact: Prasad Desala <tdesala>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.2CC: nchilaka, rhs-bugs, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-14 09:38:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Nag Pavan Chilakam 2016-11-11 13:13:42 UTC
Description of problem:
======================
In my systemic setup, which i started freshly where I have a 4x2 volume spanning 4 nodes.
I hav enabled below features, look at vol info:
root@dhcp35-191 ~]# gluster v status salvol
Status of volume: salvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.191:/rhs/brick1/salvol       49153     0          Y       15470
Brick 10.70.37.108:/rhs/brick1/salvol       49152     0          Y       25158
Brick 10.70.35.3:/rhs/brick1/salvol         49152     0          Y       8975 
Brick 10.70.37.66:/rhs/brick1/salvol        49152     0          Y       26096
Brick 10.70.35.191:/rhs/brick2/salvol       49154     0          Y       15489
Brick 10.70.37.108:/rhs/brick2/salvol       49153     0          Y       25177
Brick 10.70.35.3:/rhs/brick2/salvol         49153     0          Y       8994 
Brick 10.70.37.66:/rhs/brick2/salvol        49153     0          Y       26115
Snapshot Daemon on localhost                49155     0          Y       15598
Self-heal Daemon on localhost               N/A       N/A        Y       15509
Quota Daemon on localhost                   N/A       N/A        Y       15545
Snapshot Daemon on 10.70.35.3               49154     0          Y       9091 
Self-heal Daemon on 10.70.35.3              N/A       N/A        Y       9014 
Quota Daemon on 10.70.35.3                  N/A       N/A        Y       9045 
Snapshot Daemon on 10.70.37.66              49154     0          Y       26214
Self-heal Daemon on 10.70.37.66             N/A       N/A        Y       26135
Quota Daemon on 10.70.37.66                 N/A       N/A        Y       26167
Snapshot Daemon on 10.70.37.108             49154     0          Y       25276
Self-heal Daemon on 10.70.37.108            N/A       N/A        Y       25201
Quota Daemon on 10.70.37.108                N/A       N/A        Y       25228
 
Task Status of Volume salvol
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-191 ~]# gluster v statedump
Usage: volume statedump <VOLNAME> [nfs|quotad] [all|mem|iobuf|callpool|priv|fd|inode|history]...
[root@dhcp35-191 ~]# 
[root@dhcp35-191 ~]# 
[root@dhcp35-191 ~]# 
[root@dhcp35-191 ~]# gluster v info
 
Volume Name: salvol
Type: Distributed-Replicate
Volume ID: cca6a599-ec09-4409-89d5-7cb00c20856b
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: 10.70.35.191:/rhs/brick1/salvol
Brick2: 10.70.37.108:/rhs/brick1/salvol
Brick3: 10.70.35.3:/rhs/brick1/salvol
Brick4: 10.70.37.66:/rhs/brick1/salvol
Brick5: 10.70.35.191:/rhs/brick2/salvol
Brick6: 10.70.37.108:/rhs/brick2/salvol
Brick7: 10.70.35.3:/rhs/brick2/salvol
Brick8: 10.70.37.66:/rhs/brick2/salvol
Options Reconfigured:
features.cache-invalidation: on
features.cache-invalidation-timeout: 400
performance.cache-invalidation: on
performance.md-cache-timeout: 300
cluster.shd-max-threads: 10
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
features.uss: on
features.quota-deem-statfs: on
features.inode-quota: on
features.quota: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root@dhcp35-191 ~]# gluster v status
Status of volume: salvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.191:/rhs/brick1/salvol       49153     0          Y       15470
Brick 10.70.37.108:/rhs/brick1/salvol       49152     0          Y       25158
Brick 10.70.35.3:/rhs/brick1/salvol         49152     0          Y       8975 
Brick 10.70.37.66:/rhs/brick1/salvol        49152     0          Y       26096
Brick 10.70.35.191:/rhs/brick2/salvol       49154     0          Y       15489
Brick 10.70.37.108:/rhs/brick2/salvol       49153     0          Y       25177
Brick 10.70.35.3:/rhs/brick2/salvol         49153     0          Y       8994 
Brick 10.70.37.66:/rhs/brick2/salvol        49153     0          Y       26115
Snapshot Daemon on localhost                49155     0          Y       15598
Self-heal Daemon on localhost               N/A       N/A        Y       15509
Quota Daemon on localhost                   N/A       N/A        Y       15545
Snapshot Daemon on 10.70.35.3               49154     0          Y       9091 
Self-heal Daemon on 10.70.35.3              N/A       N/A        Y       9014 
Quota Daemon on 10.70.35.3                  N/A       N/A        Y       9045 
Snapshot Daemon on 10.70.37.108             49154     0          Y       25276
Self-heal Daemon on 10.70.37.108            N/A       N/A        Y       25201
Quota Daemon on 10.70.37.108                N/A       N/A        Y       25228
Snapshot Daemon on 10.70.37.66              49154     0          Y       26214
Self-heal Daemon on 10.70.37.66             N/A       N/A        Y       26135
Quota Daemon on 10.70.37.66                 N/A       N/A        Y       26167
 
Task Status of Volume salvol
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-191 ~]# 

I then mounted the volume on 5 different clients and did following IOs:
From all clients:===> started taking statedump of the fuse mount process every 5 minutes and moving them to a dedicated directory for each host on the mount point(so into gluster vol)
From all clients:====>collecting top and cpu usage every 2 mins and appending the contents into a file for each host on the mount point(so into gluster vol)

Now from two of the clients: I started to created deep directory stucture parallely:

Client1:rhs-client11 mounted from  10.70.35.191:salvol
Client2:rhs-client32:mounted from 10.70.37.66:/salvol


However after about just 3 minutes I stopped the directory creation
I then did a parallel rm -rf * from both these clients:

The rm -rf failed with dir not empty on both clients


client1:
[root@rhs-client11 same-dir-create]# rm -rf *
rm: cannot remove `level1.1/level2.1/level3.1/level4.4': Directory not empty
[root@rhs-client11 same-dir-create]# 
[root@rhs-client11 same-dir-create]# 
[root@rhs-client11 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client11 same-dir-create]# rm -rf *
rm: cannot remove `level1.1/level2.1/level3.1/level4.6': Directory not empty
[root@rhs-client11 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client11 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client11 same-dir-create]# pwd
/mnt/salvol/test-arena/same-dir-create
[root@rhs-client11 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client11 same-dir-create]# rm -rf *
rm: cannot remove `level1.1/level2.1/level3.1': Directory not empty
[root@rhs-client11 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client11 same-dir-create]# 
[root@rhs-client11 same-dir-create]# 
[root@rhs-client11 same-dir-create]# 
[root@rhs-client11 same-dir-create]# ls level1.1/level2.1/level3.1/level4.26/
level5.100  level5.74  level5.78  level5.82  level5.86  level5.90  level5.94  level5.98
level5.71   level5.75  level5.79  level5.83  level5.87  level5.91  level5.95  level5.99
level5.72   level5.76  level5.80  level5.84  level5.88  level5.92  level5.96
level5.73   level5.77  level5.81  level5.85  level5.89  level5.93  level5.97
[root@rhs-client11 same-dir-create]# ls level1.1/level2.1/level3.1/level4.26/*



client2:
[root@rhs-client32 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  dir.rhs-client32.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client32 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  dir.rhs-client32.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client32 same-dir-create]# owd
-bash: owd: command not found
[root@rhs-client32 same-dir-create]# pwd
/mnt/salvol/test-arena/same-dir-create
[root@rhs-client32 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  dir.rhs-client32.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client32 same-dir-create]# cd sam
-bash: cd: sam: No such file or directory
[root@rhs-client32 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  dir.rhs-client32.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client32 same-dir-create]# rm -rf *
rm: cannot remove ‘level1.1/level2.1/level3.1/level4.4’: Directory not empty
[root@rhs-client32 same-dir-create]# 
[root@rhs-client32 same-dir-create]# 
[root@rhs-client32 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client32 same-dir-create]# rm -rf *
rm: cannot remove ‘level1.1/level2.1/level3.1/level4.6’: Directory not empty
[root@rhs-client32 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client32 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client32 same-dir-create]# pwd
/mnt/salvol/test-arena/same-dir-create
[root@rhs-client32 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client32 same-dir-create]# rm -rf *
rm: cannot remove ‘level1.1/level2.1/level3.1’: Directory not empty
[root@rhs-client32 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client32 same-dir-create]# rm -rf *
rm: cannot remove ‘level1.1/level2.1/level3.1’: Directory not empty
[root@rhs-client32 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client32 same-dir-create]# rm -rf *
rm: cannot remove ‘level1.1/level2.1/level3.1/level4.14’: Directory not empty
[root@rhs-client32 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1
[root@rhs-client32 same-dir-create]# ls level1.1/level2.1/level3.1/level4.1
level4.14/ level4.15/ 
[root@rhs-client32 same-dir-create]# ls level1.1/level2.1/level3.1/level4.1
level4.14/ level4.15/ 
[root@rhs-client32 same-dir-create]# ls level1.1/level2.1/level3.1/level4.1
level4.14/ level4.15/ 
[root@rhs-client32 same-dir-create]# ls level1.1/level2.1/level3.1/level4.14/level5.
level5.100/ level5.53/  level5.57/  level5.61/  level5.65/  level5.69/  level5.73/  level5.77/  level5.81/  level5.85/  level5.89/  level5.93/  level5.97/  
level5.50/  level5.54/  level5.58/  level5.62/  level5.66/  level5.70/  level5.74/  level5.78/  level5.82/  level5.86/  level5.90/  level5.94/  level5.98/  
level5.51/  level5.55/  level5.59/  level5.63/  level5.67/  level5.71/  level5.75/  level5.79/  level5.83/  level5.87/  level5.91/  level5.95/  level5.99/  
level5.52/  level5.56/  level5.60/  level5.64/  level5.68/  level5.72/  level5.76/  level5.80/  level5.84/  level5.88/  level5.92/  level5.96/  
[root@rhs-client32 same-dir-create]# ls level1.1/level2.1/level3.1/level4.14/level5.
level5.100/ level5.53/  level5.57/  level5.61/  level5.65/  level5.69/  level5.73/  level5.77/  level5.81/  level5.85/  level5.89/  level5.93/  level5.97/  
level5.50/  level5.54/  level5.58/  level5.62/  level5.66/  level5.70/  level5.74/  level5.78/  level5.82/  level5.86/  level5.90/  level5.94/  level5.98/  
level5.51/  level5.55/  level5.59/  level5.63/  level5.67/  level5.71/  level5.75/  level5.79/  level5.83/  level5.87/  level5.91/  level5.95/  level5.99/  
level5.52/  level5.56/  level5.60/  level5.64/  level5.68/  level5.72/  level5.76/  level5.80/  level5.84/  level5.88/  level5.92/  level5.96/  
[root@rhs-client32 same-dir-create]# ls level1.1/level2.1/level3.1/level4.14/level5.
level5.100/ level5.53/  level5.57/  level5.61/  level5.65/  level5.69/  level5.73/  level5.77/  level5.81/  level5.85/  level5.89/  level5.93/  level5.97/  
level5.50/  level5.54/  level5.58/  level5.62/  level5.66/  level5.70/  level5.74/  level5.78/  level5.82/  level5.86/  level5.90/  level5.94/  level5.98/  
level5.51/  level5.55/  level5.59/  level5.63/  level5.67/  level5.71/  level5.75/  level5.79/  level5.83/  level5.87/  level5.91/  level5.95/  level5.99/  
level5.52/  level5.56/  level5.60/  level5.64/  level5.68/  level5.72/  level5.76/  level5.80/  level5.84/  level5.88/  level5.92/  level5.96/  
[root@rhs-client32 same-dir-create]# ls level1.1/level2.1/level3.1/level4.14/level5.
level5.100/ level5.53/  level5.57/  level5.61/  level5.65/  level5.69/  level5.73/  level5.77/  level5.81/  level5.85/  level5.89/  level5.93/  level5.97/  
level5.50/  level5.54/  level5.58/  level5.62/  level5.66/  level5.70/  level5.74/  level5.78/  level5.82/  level5.86/  level5.90/  level5.94/  level5.98/  
level5.51/  level5.55/  level5.59/  level5.63/  level5.67/  level5.71/  level5.75/  level5.79/  level5.83/  level5.87/  level5.91/  level5.95/  level5.99/  
level5.52/  level5.56/  level5.60/  level5.64/  level5.68/  level5.72/  level5.76/  level5.80/  level5.84/  level5.88/  level5.92/  level5.96/  
[root@rhs-client32 same-dir-create]# #ls level1.1/level2.1/level3.1/level4.14/level5.100/
[root@rhs-client32 same-dir-create]# rm -rf *
rm: cannot remove ‘level1.1/level2.1/level3.1/level4.16’: Directory not empty
[root@rhs-client32 same-dir-create]# ls
dir.rhs-client11.lab.eng.blr.redhat.com.log  level1.1





I then tried to delete only from client2:
But it still fails with dir not empty

Checked the mount logs but found no new log entries on retry



Tried to check the brick logs while doing the same and found that there was only one log entry on the last brick ie brick2 of node4 

[2016-11-11 12:40:28.128298] E [MSGID: 113039] [posix.c:3018:posix_open] 0-salvol-posix: open on /rhs/brick2/salvol/.glusterfs/e4/df/e4df858e-c6c6-4fdb-bdbb-e3c07a3187ba, flags: 1025 [No such file or directory]

Comment 2 Nithya Balachandran 2016-11-11 15:21:55 UTC
Nag,

Can you please leave the system in the same state until Monday? I will take a look at it then.

Thanks,
Nithya

Comment 3 Nag Pavan Chilakam 2016-11-14 09:38:17 UTC
I did notice later that there were directories being created from one of the clients while delete was being tried.
Hence that could be the reason why dir deletion failed.
Closing this as Not A Bug.
Will reopen or raise a new bug if i see this in a healthy setup.
Sorry for the inconvenience