Bug 1422822

Summary: [RGW:NFS]: Directories not getting removed from NFS mount
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Ramakrishnan Periyasamy <rperiyas>
Component: RGWAssignee: Matt Benjamin (redhat) <mbenjamin>
Status: CLOSED ERRATA QA Contact: Ramakrishnan Periyasamy <rperiyas>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.2CC: cbodley, ceph-eng-bugs, hnallurv, hyelloji, kbader, kdreyer, mbenjamin, owasserm, rperiyas, sweil, tserlin
Target Milestone: rc   
Target Release: 2.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.7-20.el7cp Ubuntu: ceph_10.2.7-22redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-06-19 13:29:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ramakrishnan Periyasamy 2017-02-16 10:52:40 UTC
Description of problem:
Tried to remove a empty directory from NFS mount but it failed with "rm: cannot remove ‘bigbucket/’: Directory not empty" error.

Created some buckets with data using s3 some days before, 2 days back removed all data and there was only empty directory. 

Updated ceph and NFS with latest builds using manual way part of this unmounted NFS mount and rebooted the machines and then mounted it back again(not having fstab entry)

Then part of testing tried to remove the directory but it was failing with "rm: cannot remove ‘bigbucket/’: Directory not empty" error.

[ubuntu@host003 nfs]$ ls
bigbucket  client1  client1_run4  client1_run5  client2_run1  client2_run2  comp
[ubuntu@host003 nfs]$ cd bigbucket/
[ubuntu@host003 bigbucket]$ ls
[ubuntu@host003 bigbucket]$ ll -h
total 0
[ubuntu@host003 bigbucket]$ cd ..
[ubuntu@host003 nfs]$ rm -rf bigbucket/
rm: cannot remove ‘bigbucket/’: Directory not empty

'comp' directory created directly on NFS mount, other directories all created using S3 python boto script.
[ubuntu@host056 nfs]$ ll -h
total 0
drwxrwxrwx. 3 root   root   0 Jan  1  1970 bigbucket
drwxrwxrwx. 3 root   root   0 Jan  1  1970 client1
drwxrwxrwx. 3 root   root   0 Jan  1  1970 client1_run4
drwxrwxrwx. 3 root   root   0 Jan  1  1970 client1_run5
drwxrwxrwx. 3 root   root   0 Jan  1  1970 client2_run1
drwxrwxrwx. 3 root   root   0 Jan  1  1970 client2_run2
drwxrwxr-x. 3 ubuntu ubuntu 0 Feb  2 09:56 comp
[ubuntu@host056 nfs]$ date
Thu Feb 16 10:39:52 UTC 2017

Below is the stdout for other directories. in this output list only 'client1_run5’ is having data other folders all empty but command fails to remove.
[ubuntu@host056 nfs]$ rm -rf *
rm: cannot remove ‘bigbucket’: Directory not empty
rm: cannot remove ‘client1’: Directory not empty
rm: cannot remove ‘client1_run4’: Directory not empty
rm: cannot remove ‘client1_run5’: Directory not empty
rm: cannot remove ‘client2_run1’: Directory not empty
rm: cannot remove ‘client2_run2’: Directory not empty
rm: cannot remove ‘comp’: Directory not empty

Was using this directories and data for more than a month, not sure either this directories are stale ones they are available even after remounting or restarting service.

Version-Release number of selected component (if applicable):
ceph: 10.2.5-26.el7cp
NFS: nfs-ganesha-rgw-2.4.2-5.el7cp.x86_64

How reproducible:
1/1

Steps to Reproduce:
1. Configure Cluster with RGW and NFS
2. Create some directories with data using S3 and directory on NFS mount.
3. Perform RW IO to folders and buckets using different interface(S3 or kernel)
4. Delete all the files in the directory
5. Update the packages, unmount NFS, reboot the machine and remount it.
6. Try to remove the directory

Actual results:
Deleting directories fails

Expected results:
Directory should be get deleted.

Additional info:
N/A

Comment 16 Ramakrishnan Periyasamy 2017-02-22 10:03:42 UTC
Added "Cache_Inode" section in ganesha.conf file before verifying this bug.

Some directories deleted without any issues. like directories with less number of files, empty directories.

While deleting a old directory which was having almost 100 directories in depth observed ganesha service stop.

This pastebin location is having all the commands executed.
http://pastebin.test.redhat.com/457924

numbers of levels in that directory
cd level30/level40/level50/level60/level70/level80/level90/level100/level110/level120/level130/level140/level150/level160/level170/level180/level190/level200/level210/level220/level230/level240/level250/level260/level270/level280/level290/level300/level310/level320/level330/level340/level350/level360/level370/level380/level390/level400/level410/level420/level430/level440/level450/level460/level470/level480/level490/level500/level510/level520/level530/level540/level550/level560/level570/level580/level590/level600/level610/level620/level630/level640/level650/level660/level670/level680/level690/level700/level710/level720/level730/level740/level750/level760/level770/level780/level790/level800/level810/level820/level830/level840/level850/level860/level870/level880/level890/level900/level910/level920/level930/

each and every level is also having files of different size.

Comment 21 Ramakrishnan Periyasamy 2017-02-27 09:56:08 UTC
Moving this bug to verified state. Directories are getting deleted without any issues.

http://pastebin.test.redhat.com/459363

Verified in 
[ubuntu@host003 nfs]$ sudo ceph -v
ceph version 10.2.5-34.el7cp (d11a516e27459d970ff00b54315f5ba66185f046)

[ubuntu@host023 ~]$ sudo rpm -qa | grep nfs
nfs-ganesha-rgw-2.4.2-7.el7cp.x86_64
libnfsidmap-0.25-15.el7.x86_64
nfs-ganesha-2.4.2-7.el7cp.x86_64
nfs-utils-1.3.0-0.33.el7.x86_64

Comment 23 Hemanth Kumar 2017-03-01 12:34:23 UTC
Re-opening the Bz as deletion of files and directory is failing again..

I had created files using crefi in one of the folder created in mount-point, deletion of files from that folder exited without any error but the files still remains,. 

[root@thunderbolt level00]# ls
58b571b6%%EC3HU1UYM8  58b571b6%%MQTNSIH5IY  58b571b6%%U9IEYN02QP  58b571b7%%EX1RPUMI9Q  58b571b7%%NO42T9XN8N  58b571b8%%1UQ2OIGDVO  58b571b8%%DQ5JY34KB0  58b571b8%%NL1QOFBERA
58b571b6%%ET3G7KN81U  58b571b6%%PZZ1PW898F  58b571b7%%0ZAKZ5QB0F  58b571b7%%H8OWLQXCQT  58b571b7%%SM84JZ4O9X  58b571b8%%47S7LY8KMN  58b571b8%%E53ECU00OL
58b571b6%%G2D2F79I4J  58b571b6%%Q2RYXJD1YZ  58b571b7%%17K29EIZDT  58b571b7%%HVHC2IT0S4  58b571b7%%SVF2LS9J17  58b571b8%%4ZIIBQ0ATW  58b571b8%%GKYBRA0QN5
58b571b6%%INBTCWYRDS  58b571b6%%S3IAKA11QH  58b571b7%%1CK501FLYL  58b571b7%%HYXOKPQ48O  58b571b7%%U9PXNKERIB  58b571b8%%5GX3V3ZPRU  58b571b8%%KQSJ01ENKA
58b571b6%%KJPVAXNMG6  58b571b6%%S3PXYOND2W  58b571b7%%35VTDUQEEE  58b571b7%%IICDH8779B  58b571b7%%ZV3IBWMRMC  58b571b8%%B8P0LNS3BB  58b571b8%%MAFVF3TS7F
58b571b6%%KY4PF5VDD9  58b571b6%%SBBO35QVM6  58b571b7%%6TS38VMBB7  58b571b7%%K0ONK2VPCJ  58b571b8%%079GG29DJT  58b571b8%%BQVVL0JZQG  58b571b8%%N2BO6M1U35
58b571b6%%M2FZ4JDJ5R  58b571b6%%SNF4E9M5YA  58b571b7%%8QJRU6LF5I  58b571b7%%MFI505JJY8  58b571b8%%0ACKH6THXT  58b571b8%%BUEIMW9GRT  58b571b8%%N3W6JKRYU0

[root@thunderbolt level00]# rm -rf *

[root@thunderbolt level00]# ls
58b571b8%%NP3KL90ADP  58b571b9%%0P5ZV2KKGZ  58b571b9%%5JOVB63C71  58b571b9%%KNPNNPAVXP  58b571ba%%2TQU3BM2EG  58b571ba%%HIN1BCPSU6  58b571ba%%XMSBVAKH14  58b571bb%%AL00RT19W4
58b571b8%%P5NEN2MJGD  58b571b9%%2TJJFH34J5  58b571b9%%5SMMPRA34Z  58b571b9%%QYUFUTA1LF  58b571ba%%44P906JE5G  58b571ba%%IG7I4SDC6X  58b571ba%%ZV8UPC9LTC
58b571b8%%RSWLL8SML0  58b571b9%%3MCCD7KB1M  58b571b9%%5SP5BR2A8V  58b571b9%%S5MUQ6HWW4  58b571ba%%6VBIZP0J61  58b571ba%%K2P1IP80CS  58b571bb%%2F4ARLF8OE
58b571b8%%SKSJJF6C30  58b571b9%%43VGBC2G8G  58b571b9%%6369M83681  58b571b9%%TGL100DM57  58b571ba%%6XU0ULTJXE  58b571ba%%MQ92YTE3QU  58b571bb%%55Z8UXRBOS
58b571b8%%UAPTAZU962  58b571b9%%45O6HOVK9V  58b571b9%%CMTIA7BBBZ  58b571b9%%VM3OZM0TM7  58b571ba%%B74HXNUZ7W  58b571ba%%NQPB5XU1J0  58b571bb%%6GRMMDT53W
58b571b8%%X2QU3TLY3K  58b571b9%%48S059I1PO  58b571b9%%JO0TP5KKVS  58b571b9%%X1OQA4PAN8  58b571ba%%DAPXM9UAMV  58b571ba%%QYYMNC0B3Y  58b571bb%%731CW6T994
58b571b8%%YGA7SUMNR3  58b571b9%%55CI1HQ8KX  58b571b9%%K5M5ODW4JB  58b571b9%%XVPC6DN46X  58b571ba%%FRP2Q47A1O  58b571ba%%TAQ4OGJYIY  58b571bb%%77B5AWIVT7
[root@thunderbolt level00]#



===================================================
Tried deleting them from another client

[root@skytrain ~]# cd /hello/folder1/

[root@skytrain folder1]# ls
level00

[root@skytrain folder1]# rm -rf *

[root@skytrain folder1]# ls
level00

[root@skytrain folder1]# cd /hello

[root@skytrain hello]# ls
folder1

[root@skytrain hello]# rm -rf folder1/
rm: cannot remove ‘folder1/’: Directory not empty

[root@skytrain hello]# ls
folder1

[root@skytrain hello]# cd folder1

[root@skytrain folder1]#

[root@skytrain folder1]# ls
ls: cannot access level00: Stale file handle
level00


Tried the same steps in another Setup and the issue is seen there also. :- http://pastebin.test.redhat.com/460199
It is reproducible consistently.

Build :- 
---------
nfs-ganesha-2.4.2-7.el7cp 
ceph-radosgw-10.2.5-37.el7cp.x86_64.rpm

Comment 30 Ken Dreyer (Red Hat) 2017-04-07 18:33:24 UTC
Matt, do you expect that this BZ requires further code changes, or can QE test with Ceph v10.2.7 + nfs-ganesha 2.4.5?

Comment 31 Matt Benjamin (redhat) 2017-04-07 19:09:53 UTC
(In reply to Ken Dreyer (Red Hat) from comment #30)
> Matt, do you expect that this BZ requires further code changes, or can QE
> test with Ceph v10.2.7 + nfs-ganesha 2.4.5?

Hi Ken,

There will be a few more commits to resolve large directory enumeration, though it is close.

Matt

Comment 32 Ken Dreyer (Red Hat) 2017-04-07 20:11:22 UTC
Thanks Matt, what Redmine ticket(s) track those?

Comment 40 John Poelstra 2017-05-17 15:09:10 UTC
discussed at the program meeting, Matt agrees blocker and will fix by Monday, May 22nd

Comment 54 Ramakrishnan Periyasamy 2017-05-26 15:49:32 UTC
Moving this bug to verified state.

Bug verified in "ceph version 10.2.7-21.el7cp (ebe0fca146985f59e6ab136a860d1f063a26c700)" build.

Directories are getting removed without any issues.

Comment 56 errata-xmlrpc 2017-06-19 13:29:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1497