Bug 1756002

Summary: git clone fails on gluster volumes exported via nfs-ganesha
Product: [Community] GlusterFS Reporter: Soumya Koduri <skoduri>
Component: libgfapiAssignee: bugs <bugs>
Status: CLOSED NEXTRELEASE QA Contact: bugs <bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7CC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-27 12:41:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Soumya Koduri 2019-09-26 14:47:18 UTC
This bug was initially created as a copy of Bug #1753569

I am copying this bug because: 



+++ This bug was initially created as a clone of Bug #1735480 +++

Description of problem:

'git clone' fails on Gluster volumes exported via nfs-ganesha (mounted via NFS). 'git clone' succeeds on the same volume exported natively (mounted via glusterfs).


Version-Release number of selected component (if applicable):
Server: 
CentOS 7

NFS Ganesha:
nfs-ganesha-2.7.6-1.el7.x86_64
nfs-ganesha-gluster-2.7.6-1.el7.x86_64

Gluster:
glusterfs-server-6.4-1.el7.x86_64
glusterfs-6.4-1.el7.x86_64

Client:
Debian 10

How reproducible:

Always

Steps to Reproduce:
1. mount -t nfs <gluster-server>:/gluster/vol /mnt/gluster
2. cd /mnt/gluster
3. git clone https://github.com/torvalds/linux.git

Actual results:
git clone https://github.com/torvalds/linux.git
Cloning into 'linux'...
remote: Enumerating objects: 1640, done.
remote: Counting objects: 100% (1640/1640), done.
remote: Compressing objects: 100% (881/881), done.
fatal: Unable to create temporary file '/mnt/gluster/linux/.git/objects/pack/tmp_pack_XXXXXX': Permission denied
fatal: index-pack failed


Expected results:
Successful git clone

Additional info:
This appears to be related to this:
https://github.com/nfs-ganesha/nfs-ganesha/issues/262

Which then reference a couple of other bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1543996
which also references this one:
https://bugzilla.redhat.com/show_bug.cgi?id=1405147

Not sure if this is a separate issue, related, or a regression.

--- Additional comment from Soumya Koduri on 2019-09-13 11:03:02 UTC ---

EACCESS error was returned on COMMIT operation. nfs-ganesha tries to open a fd as part of COMMIT operation which is denied by backend glusterfs server as the file perms are 0444.

Similar problem was posted recently in the upstream - https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/447012. The issue here is that 

- nfs-ganesha/fsal-gluster switches to user creds before performing any operations on the backend. [ This is needed to be able to run nfs-ganesha as non-root user ]
- And to perform any stateless I/O (like NFSv3 operations or NFSv4.x COMMIT operations), the ganesha server tries to open/use a global fd maintained for each file.

In this particular case,

- nfsv4 client created a file with permissions 0444 and using the same fd, some data is written into it.
- Now when it performs COMMIT operation to flush cached_data, ganesha server tries to open a new global fd which is denied by the gluster server as expected as the file has 0444 perms.

The fix for this issue is not trivial. Possible options we are exploring is to either

a) dup fd (when the file is OPENed with CREAT) and store it in global fd by taking extra ref so that it doesn't get flushed during CLOSE. This same fd if used in COMMIT can bypass access checks.

(or)
b) Frank suggested that we maintain a list of all open states of the file and for any stateless I/O, find a matching state (with same client and creds) and use it to perform I/O. This shall even help in enforcing share reservations for stateless I/O as well.

Need to check if we can use approach(a) as interim workaround till (b) gets done.

--- Additional comment from Soumya Koduri on 2019-09-13 11:03:29 UTC ---

I have done some PoC to fix this particular case. There are multiple places where we need the fixes -


Issue1) Right now in FSAL_GLUSTER, we use "glfs_h_creat" to create handle and then "glfs_h_open" to fetch glfd. These two operations need to be combined into one atomic fop which shall create handle and also return glfd  to handle file creations with 0444 perms.

Fix: Add a new API "glfs_h_creat_glfd" for the same in libgfapi.

2) Sometimes NFS client seem to be opening a file twice without closing the first OPEN (first time with OPEN_SHARE_ACCESS_BOTH and second time with OPEN_SHARE_ACCESS_READ). In such cases NFS-Ganesha tries to reopen the file the second time which may fail with EPERM error

Fix: If the first OPEN state/fd contains the access needed for second OPEN, avoid re-opening the file.

3) As mentioned in above comment, as there is no state associated, COMMIT operation tries to re-open and obtain globalfd which fails with EPERM.

Approach taken: Dup the glfd returned in OPEN operation and store it as globalfd. Dup will make sure to take extra ref while this new glfd/globalfd shall get closed as part of lru purge or file removal.

Comment 1 Worker Ant 2019-09-26 14:52:52 UTC
REVIEW: https://review.gluster.org/23497 (gfapi: 'glfs_h_creat_open' - new API to create handle and open fd) posted (#1) for review on release-7 by soumya k

Comment 2 Worker Ant 2019-09-27 12:41:51 UTC
REVIEW: https://review.gluster.org/23497 (gfapi: 'glfs_h_creat_open' - new API to create handle and open fd) merged (#1) on release-7 by soumya k