Bug 1558974
Summary: | [Ganesha] Unable to delete few files from mount point while performing rm -rf post linux untars and lookups | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Manisha Saini <msaini> |
Component: | nfs-ganesha | Assignee: | Kaleb KEITHLEY <kkeithle> |
Status: | CLOSED NOTABUG | QA Contact: | Manisha Saini <msaini> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | rhgs-3.4 | CC: | amukherj, dang, ffilz, grajoria, jijoy, jthottan, msaini, pasik, rhinduja, rhs-bugs, storage-qa-internal |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-05-06 12:04:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Manisha Saini
2018-03-21 12:36:54 UTC
Ah, you've managed to find a test that hits what I suspected could be an issue... The problem is the way directory cookies are generated... With POSIX readdir system call (actually getdents now), the d_off value that we use as the cookie is NOT the "address" of the entry in question, it's the "address" of the NEXT dirent with a single readdir going on and no files being removed, that works just fine. With a lot of churn in the directory, there is the possibility that a file gets added between the dirent that has a particular d_off and the next dirent that is actually at that d_off. Now the new file has the same d_off or with multiple readdir, let's say the files are: "first" (100) "new" (200) "xlast" (300) (note that d_off is probably not actually alpha order, just pretending it is to make it easier to follow the example, numbers in parenthesis are the actual addresses of each dirent so a readdir happening AFTER "new" has already been added would show: "first" (d_off = 200) "new" (d_off = 300) "xlast" (d_off = 400) A readdir happening BEFORE "new" is added would show: "first" (d_off = 300) "xlast" (d_off = 400) So now if we had a BEFORE readdir followed by an AFTER readdir see how the cookie for "first" has changed, also see how the cookie for "new" is a duplicate of the cookie for the original instance of "first". It happens there's a way to fix this... The FSAL readdir keeps track of the PREVIOUS d_off/cookie for each dirent and uses that one which is the actual "address" of the dirent and now each dirent has a deterministic cookie under most modern filesystems (it's actually a hash value of the file name rather than an offset into a directory flat file). I used this mechanism in this patch: https://review.gerrithub.io/#/c/354400/ That patch implements a brute force compute_readdir_cookie operation and to have a consistent cookie for an entry, relies of use of the d_off from the previous dirent. So this has been run with my proposed patch? Are we still seeing the WARN and CRIT messages? Kaleb mentioned he was able to recreate this with a single client doing untar followed by rm -Rf but could not duplicate with FSAL_VFS. This suggests to me that the issue is in FSAL_GLUSTER or libgfapi. If you unmount and remount the client, does that fix the issue? Observing this issue with readdir disable build as well i.e # rpm -qa | grep ganesha nfs-ganesha-gluster-2.5.5-10.el7rhgs.x86_64 nfs-ganesha-debuginfo-2.5.5-10.el7rhgs.x86_64 nfs-ganesha-2.5.5-10.el7rhgs.x86_64 glusterfs-ganesha-3.12.2-16.el7rhgs.x86_64 ----------------- [root@rhs-client9 ganesha]# rm -rf * rm: cannot remove ‘dir2/linux-4.9.5/tools/lib/lockdep/uinclude/linux’: Directory not empty [root@rhs-client9 ganesha]# rm -rf * rm: cannot remove ‘dir2/linux-4.9.5/tools/lib/lockdep/uinclude/linux’: Directory not empty [root@rhs-client9 ganesha]# rm -rf * rm: cannot remove ‘dir2/linux-4.9.5/tools/lib/lockdep/uinclude/linux’: Directory not empty [root@rhs-client9 ganesha]# rm -rf * rm: cannot remove ‘dir2/linux-4.9.5/tools/lib/lockdep/uinclude/linux’: Directory not empty [root@rhs-client9 ganesha]# rm -rf * rm: cannot remove ‘dir2/linux-4.9.5/tools/lib/lockdep/uinclude/linux’: Directory not empty [root@rhs-client9 ganesha]# rm -rf * rm: cannot remove ‘dir2/linux-4.9.5/tools/lib/lockdep/uinclude/linux’: Directory not empty [root@rhs-client9 ganesha]# rm -rf * rm: cannot remove ‘dir2/linux-4.9.5/tools/lib/lockdep/uinclude/linux’: Directory not empty [root@rhs-client9 ganesha]# rm -rf * rm: cannot remove ‘dir2/linux-4.9.5/tools/lib/lockdep/uinclude/linux’: Directory not empty [root@rhs-client9 ganesha]# ls dir2 [root@rhs-client9 ganesha]# rm -rf * rm: cannot remove ‘dir2/linux-4.9.5/tools/lib/lockdep/uinclude/linux’: Directory not empty [root@rhs-client9 ganesha]# rm -rf * rm: cannot remove ‘dir2/linux-4.9.5/tools/lib/lockdep/uinclude/linux’: Directory not empty [root@rhs-client9 ganesha]# rm -rf * rm: cannot remove ‘dir2/linux-4.9.5/tools/lib/lockdep/uinclude/linux’: Directory not empty ------------------- With this, it makes it very likely this is caused by bug #1458215 |