Description of problem:
'git clone' fails on Gluster volumes exported via nfs-ganesha (mounted via NFS). 'git clone' succeeds on the same volume exported natively (mounted via glusterfs).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. mount -t nfs <gluster-server>:/gluster/vol /mnt/gluster
2. cd /mnt/gluster
3. git clone https://github.com/torvalds/linux.git
git clone https://github.com/torvalds/linux.git
Cloning into 'linux'...
remote: Enumerating objects: 1640, done.
remote: Counting objects: 100% (1640/1640), done.
remote: Compressing objects: 100% (881/881), done.
fatal: Unable to create temporary file '/mnt/gluster/linux/.git/objects/pack/tmp_pack_XXXXXX': Permission denied
fatal: index-pack failed
Successful git clone
This appears to be related to this:
Which then reference a couple of other bugs:
which also references this one:
Not sure if this is a separate issue, related, or a regression.
EACCESS error was returned on COMMIT operation. nfs-ganesha tries to open a fd as part of COMMIT operation which is denied by backend glusterfs server as the file perms are 0444.
Similar problem was posted recently in the upstream - https://review.gerrithub.io/c/ffilz/nfs-ganesha/+/447012. The issue here is that
- nfs-ganesha/fsal-gluster switches to user creds before performing any operations on the backend. [ This is needed to be able to run nfs-ganesha as non-root user ]
- And to perform any stateless I/O (like NFSv3 operations or NFSv4.x COMMIT operations), the ganesha server tries to open/use a global fd maintained for each file.
In this particular case,
- nfsv4 client created a file with permissions 0444 and using the same fd, some data is written into it.
- Now when it performs COMMIT operation to flush cached_data, ganesha server tries to open a new global fd which is denied by the gluster server as expected as the file has 0444 perms.
The fix for this issue is not trivial. Possible options we are exploring is to either
a) dup fd (when the file is OPENed with CREAT) and store it in global fd by taking extra ref so that it doesn't get flushed during CLOSE. This same fd if used in COMMIT can bypass access checks.
b) Frank suggested that we maintain a list of all open states of the file and for any stateless I/O, find a matching state (with same client and creds) and use it to perform I/O. This shall even help in enforcing share reservations for stateless I/O as well.
Need to check if we can use approach(a) as interim workaround till (b) gets done.
I have done some PoC to fix this particular case. There are multiple places where we need the fixes -
Issue1) Right now in FSAL_GLUSTER, we use "glfs_h_creat" to create handle and then "glfs_h_open" to fetch glfd. These two operations need to be combined into one atomic fop which shall create handle and also return glfd to handle file creations with 0444 perms.
Fix: Add a new API "glfs_h_creat_glfd" for the same in libgfapi.
2) Sometimes NFS client seem to be opening a file twice without closing the first OPEN (first time with OPEN_SHARE_ACCESS_BOTH and second time with OPEN_SHARE_ACCESS_READ). In such cases NFS-Ganesha tries to reopen the file the second time which may fail with EPERM error
Fix: If the first OPEN state/fd contains the access needed for second OPEN, avoid re-opening the file.
3) As mentioned in above comment, as there is no state associated, COMMIT operation tries to re-open and obtain globalfd which fails with EPERM.
Approach taken: Dup the glfd returned in OPEN operation and store it as globalfd. Dup will make sure to take extra ref while this new glfd/globalfd shall get closed as part of lru purge or file removal.
Patches posted for review :
I'm using the following version of NFS Ganesha/GlusterFS and facing this bug:
There is any workaround we can do to this issue?
There are plans to roll this fix to 2.8 versions or it will be on the latest Ganesha release (3.0)?
It'd be nice to have the fixes backported to nfs-ganesha 2.8.x branch aswell, for the 2.8.3 release.
Yes, that would be great.
I can help to test if those patches fix the problem on 2.8.
@Soumya, can you please provide a patch on latest 2.8 branch?
I did a backport of Soumya patches to GlusterFS 6.5 and nfs-ganesha 2.8 and can confirm that it solve the problem on that specific versions.
(In reply to Matheus Morais from comment #7)
> I did a backport of Soumya patches to GlusterFS 6.5 and nfs-ganesha 2.8 and
> can confirm that it solve the problem on that specific versions.
Do you have RPMs or source somewhere for your backport?
If this is still an issue please open an issue in the github tracker at https://github.com/nfs-ganesha/nfs-ganesha/issues