Bug 2136452
| Summary: | v4 server is not interpreting ACL inheritance correctly | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | Ondrej <ondrej.valousek> | 
| Component: | kernel | Assignee: | Jeff Layton <jlayton> | 
| kernel sub component: | NFS | QA Contact: | Yongcheng Yang <yoyang> | 
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | unspecified | ||
| Priority: | unspecified | CC: | agruenba, chuck.lever, jiyin, jlayton, xzhou, yoyang | 
| Version: | 9.0 | Keywords: | Triaged | 
| Target Milestone: | rc | Flags: | pm-rhel:
                mirror+ | 
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-10-21 08:56:36 UTC | Type: | Bug | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| 
        
          Description
        
        
          Ondrej
        
        
        
        
        
          2022-10-20 10:18:31 UTC
        
       Hi Ondrej Have you configured idmap on your nfs server and client side? Whether the uid of your test user is the same on the server side and the client side? and have you tried mount with Kerberos security modes (sec=krb5)? I can reproduce this, and I think I sort of understand what's going on. Mount looks like this:
    localhost:/export on /mnt/local type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp6,timeo=600,retrans=2,sec=sys,clientaddr=::1,local_lock=none,addr=::1)
[jlayton@tleilax ~]$ sudo nfs4_setfacl -a A:fd:jlayton@localdomain:RWX /mnt/local/test
[jlayton@tleilax ~]$ cd /mnt/local/test
[jlayton@tleilax test]$ mkdir test2
[jlayton@tleilax test]$ cd test2
bash: cd: test2: Permission denied
[jlayton@tleilax test]$ nfs4_getfacl .
# file: .
A::OWNER@:rwaDxtTcCy
A::1111:rwaDxtcy
A::GROUP@:rxtcy
A::EVERYONE@:rxtcy
D:fdi:OWNER@:rwaDx
A:fdi:OWNER@:tTcCy
A:fdi:1111:rwaDxtcy
A:fdi:GROUP@:tcy
A:fdi:EVERYONE@:tcy
[jlayton@tleilax test]$ nfs4_getfacl ./test2
# file: ./test2
D::OWNER@:rwaDx
A::OWNER@:tTcCy
A::1111:rwaDxtcy
A::GROUP@:tcy
A::EVERYONE@:tcy
D:fdi:OWNER@:rwaDx
A:fdi:OWNER@:tTcCy
A:fdi:1010:rwaDxtcy
A:fdi:GROUP@:tcy
A:fdi:EVERYONE@:tcy
[jlayton@tleilax test]$ cd /export/test
[jlayton@tleilax test]$ getfacl ./test2
# file: test2
# owner: jlayton
# group: jlayton
user::---
user:jlayton:rwx
group::---
mask::rwx
other::---
default:user::---
default:user:jlayton:rwx
default:group::---
default:mask::rwx
default:other::---
The NFSv4 ACLs get translated to regular POSIX ACLs by the server (since the Linux kernel never got richacl support). When it goes to enforce this ACL, your user matches the owner, and so it gets denied before ever reaching the next explicit ACE for ondrejv. Probably, you can work around this by adding another ACE that ensures that the owner always has full access. In my example:
     $ nfs4_setfacl -a A:fd:OWNER@:RWX /mnt/local/test
Whether this represents a bug in the server, or just a limitation of the translation to POSIX ACLs, I'm not sure.
After you do this test vs. the netapp, could you run this and post the output? I'm curious as to whether the resulting ACLs look different on Netapp vs. Linux server. nfs4_getfacl test2 nfs4_getfacl test2/test3 I am getting this: [ondrejv@rambo /archive2/acl]$ nfs4_getfacl . # file: . A:fd:ondrejv:rwaDxtTnNcCy A::OWNER@:rwaDxtTnNcCy A:g:GROUP@:rxtncy A::EVERYONE@:rxtncy [ondrejv@rambo /archive2/acl]$ nfs4_getfacl test2 # file: test2 A:fd:ondrejv:rwaDxtTnNcCy A::OWNER@:rwaDxtTnNcCy A:g:GROUP@:rwaDxtTnNcy A::EVERYONE@:rxtncy ... i.e. pretty much what I would expect. BTW I know we do not have richacl support, but AFAIK we do not need them for the example above to work - POSIX acls are "rich enough" to make it work. What happens on the server is that a command like "nfs4_setfacl -a A:fd:12345:RWX" changes the existing file permissions from, say, a trivial POSIX ACL: user::rwx,group::r-x,other::r-x to the following POSIX ACL: user::rwx,user:12345:rwx,group::rwx,mask::rwx,other::r-x, default:user::---,default:user:12345:rwx,default:group::---,default:mask::rwx,default:other::--- The default ACL in this case only gives permissions to user 12345. In particular, the owner gets no permissions. When the subdirectory is created, it ends up with the following POSIX ACL: user::---,user:12345:rwx,group::---,mask::rwx,other::--- default:user::---,default:user:12345:rwx,default:group::---,default:mask::rwx,default:other::--- As the owner, you only get the user:: permissions, even if you also match user:12345. That's why access is denied. So it's nfs4_setacl that causes the problem in this case. A command like "nfs4_setfacl -a A:fd:OWNER@:RWX,A:fd:12345:RWX" leads to something closer to the result you are looking for. When you compare that with Linux's setfacl command, "setfacl -dm u:12345:rwx test2" copies the existing u::, g::, and o:: entries into the new default ACL to avoid creating a mostly-empty default ACL. The nfs4_setacl command do something similar: when a file has existing OWNER@, GROUP@, or EVERYONE@ entries, it could knock the inheritance flags of those entries into place. But it's not clear to me that this will always do what the user intended, especially when the server isn't using POSIX ACLs. And it certainly isn't what the user was asking for. Thanks Andreas. I suspect the problem is actually in knfsd. nfs4_setfacl just causes a SETATTR to be sent to the server, so we should be sending the same ACL to both servers in the above test. knfsd is what translates the v4 ACL into a POSIX ACL. The netapp almost certainly has native NFSv4 ACL support so it doesn't need to do that sort of thing. As to whether we ought to make them inherit the OWNER@, GROUP@, EVERYONE@ entries, I'm not sure. It certainly seems like the netapp filer behaves that way, so it's possible that we should be doing just that. I probably need to experiment a bit with how that sort of inheritance works on a netapp. (In reply to Ondrej from comment #4) > I am getting this: > [...] And what about the NFSv4 ACL of test2/test3? (In reply to Jeff Layton from comment #6) > Thanks Andreas. > > I suspect the problem is actually in knfsd. nfs4_setfacl just causes a > SETATTR to be sent to the server, so we should be sending the same ACL to > both servers in the above test. I'm pretty sure that's also what's happening. > knfsd is what translates the v4 ACL into a POSIX ACL. On the server side, when presented with an NFSv4 ACL like: A::OWNER@:RWX,A::GROUP@:RWX,A::EVERYONE@:RWX,A:fd:12345:RWX it would be a far stretch to assume that that the client actually meant the following: A:fd:OWNER@:RWX,A:fd:GROUP@:RWX,A:fd:EVERYONE@:RWX,A:fd:12345:RWX But let's see what the resulting NFSv4 ACL is of test2/test3. The Netapp probably simply grants the aggregate permissions of OWNER@ and all other ACL entries that match the specific process, in violation of the POSIX permission model. I am getting this: [ondrejv@login02 /proj/it/tmp/acltest/test2]$ nfs4_getfacl . # file: . A:fd:ondrejv:rwaDxtTnNcCy A::OWNER@:rwaDxtTnNcCy A:g:GROUP@:rwaDxtTnNcy A::EVERYONE@:rxtncy [ondrejv@login02 /proj/it/tmp/acltest/test2]$ nfs4_getfacl test3 # file: test3 A:fd:ondrejv:rwaDxtTnNcCy A::OWNER@:rwaDxtTnNcCy A:g:GROUP@:rwaDxtTnNcy A::EVERYONE@:rxtncy ... i.e. pretty much what I would expect. > The Netapp probably simply grants the aggregate permissions of OWNER@ and all other ACL entries that match the specific process, in violation of the POSIX permission model. I do not think so, but as a matter of fact Netapp has also very nice option called "v4-inherited-acl-preserve" or long version "Ignore Client Specified Mode Bits and Preserve Inherited NFSv4 ACL When Creating New Files or Directories" which does break POSIX permission model, but on the other side eliminates the extra complexity introduced by ACL_MASKs which renders acls practically unusable for normal human. http://michael.orlitzky.com/articles/there_was_an_attempt_to_save_linux_filesystem_acls.xhtml I wish we had something similar, too (could be handy for pure Samba exported filesystems, too). Different story I know, but regardless of the value to the option above, on Netapp the test above works always well. also, for the group ACE we should set the 'g' flag just like Netapp is doing:
...
  ACE FLAGS:
       There are three kinds of ACE flags: group, inheritance, and administrative.  An Allow or Deny ACE may contain zero or more flags, while an Audit or  Alarm  ACE  must  contain  at
       least one of the successful-access and failed-access flags.
       Note  that  ACEs are inherited from the parent directory's ACL at the time a file or subdirectory is created.  Accordingly, inheritance flags can be used only in ACEs in a direcā
       tory's ACL (and are therefore stripped from inherited ACEs in a new file's ACL).  Please see the INHERITANCE FLAGS COMMENTARY section for more information.
       GROUP FLAG - can be used in any ACE
       g      group - indicates that principal represents a group instead of a user.
....
Comment 10: Section 6.2.1.5. "ACE Who" of RFC 7530 says otherwise: The ACE4_IDENTIFIER_GROUP flag MUST be ignored on entries with these special identifiers. When encoding entries with these special identifiers, the ACE4_IDENTIFIER_GROUP flag SHOULD be set to zero. Any updates here? Note I tried to replicate this on OmniOS based NFSv4 server (basically OpenSolaris on ZFS) and I failed. Works fine there. So the only problem seems to be with Linux kernel nfs server. Also side note that OmniOS is also giving 'g' flag on "GROUP@" entries - but that's just a FYI to Andreas. Looking at this again today, and I'm struck wondering what the rationale is for adding the implicit DENY ACE here: After making a new directory: [jlayton@tleilax local]$ nfs4_getfacl /mnt/local/test ; getfacl /export/test # file: /mnt/local/test A::OWNER@:rwaDxtTcCy A::GROUP@:rxtcy A::EVERYONE@:rxtcy getfacl: Removing leading '/' from absolute path names # file: export/test # owner: jlayton # group: jlayton user::rwx group::r-x other::r-x [jlayton@tleilax local]$ sudo nfs4_setfacl -a A:fd:jlayton@localdomain:RWX /mnt/local/test [jlayton@tleilax local]$ nfs4_getfacl /mnt/local/test ; getfacl /export/test # file: /mnt/local/test A::OWNER@:rwaDxtTcCy A::4447:rwaDxtcy A::GROUP@:rxtcy A::EVERYONE@:rxtcy D:fdi:OWNER@:rwaDx A:fdi:OWNER@:tTcCy A:fdi:4447:rwaDxtcy A:fdi:GROUP@:tcy A:fdi:EVERYONE@:tcy getfacl: Removing leading '/' from absolute path names # file: export/test # owner: jlayton # group: jlayton user::rwx user:jlayton:rwx group::r-x mask::rwx other::r-x default:user::--- <<< Why add this? default:user:jlayton:rwx default:group::--- <<< and this default:mask::rwx default:other::--- <<< and this I don't have the best grasp of how the v4 to POSIX ACL translation is supposed to work, but I don't understand why we end up with an implicit default deny ACEs here for user,group and other. The user only asked for a new inheritable ACE to be added, not to only exclusively allow that user to access the children. I guess I need to crawl over nfs4_acl_nfsv4_to_posix and try to understand the logic (again). I would (sort of) understand the implicit deny ACEs for OWNER@ and GROUP@ because thing is that EVERYONE@ != others (by mode bits), hence file with mode bits say ------rwx Has to be translated to deny ACEs for OWNER@ and GROUP@ to ensure access is not granted to owner by the EVERYONE@ ACE That's the only possible explanation I have Comment 13: I only see a single "D:fdi:OWNER@:rwaDx" entry in that example, and as Ondrej says, that entry is necessary because the default acl grants fewer permissions to the owner than it grants to user jlayton, so if the owner is jlayton, he wouldn't get access. (But as the owner, he would be able to change permissions.) (In reply to Andreas Gruenbacher from comment #15) > Comment 13: I only see a single "D:fdi:OWNER@:rwaDx" entry in that example, > and as Ondrej says, that entry is necessary because the default acl grants > fewer permissions to the owner than it grants to user jlayton, so if the > owner is jlayton, he wouldn't get access. (But as the owner, he would be > able to change permissions.) Not sure if I understand you - you say that the default ACL inserted by sudo nfs4_setfacl -a A:fd:jlayton@localdomain:RWX /mnt/local/test means the default permissions given by mode bit should be ignored? I do not think so - at least I can't find it RFC 8881 and Netapp does not work that way either. Well, in a little more detail, when the server maps entry "A:fd:jlayton@localdomain:RWX" to POSIX ACLs, it ends up adding "user:jlayton:rwx" to the access ACL, and "default:user:jlayton:rwx" to the default ACL. Since "default:user:jlayton:rwx" alone is not a valid ACL in its own right, it needs to come up with "default:user::", "default:group::", "default:mask::", and "default:other::" entries as well. To err on the safe side, it puts no permissions into the "default:owner::", "default:group::", and "default:other::" entries it inserts implicitly; the permissions of the "default:mask::" entry are set to "rwx" based on the "default:user:jlayton:rwx" entry. Even though I didn't mention it in comment 15, having a default ACL on that directory means that the umask should be ignored when creating files in that directory; see section OBJECT CREATION AND DEFAULT ACLs in the acl(5) man page. This behavior may not be consistent on all NFSv4 implementations; before RFC 8275, NFSv4 didn't actually allow servers to ignore the umask as it was already applied on the client side. I assume that by "default permissions given by mode bit should be ignored", you mean the fact that when creating files, Linux sets the permissions of the "user::", "group::", and "other::" entries to the intersection of the permissions of the entries of the directory's "default:user::", "default:group::", and "default:other::" entries and the create mode (i.e., the mode bits specified in the create operation). That is the correct behavior for POSIX ACLs. I know that other NFSv4 implementations have instead chosen to set the permissions of the "user::", "group::", and "other::" entries to the create mode, ignoring the permissions in the directory's default ACL. I believe that the NFSv4 specification allows that behavior, but it is in conflict with the idea that the create mode defines an "upper limit" on the resulting permissions. Linux has always implemented the POSIX ACL behavior, locally and over NFSv4. Well by "default permissions given by mode bit should be ignored" I meant that newly created files should be owned by the creator (well, unless sticky bits are set) + umask is used to set mode bits. And if the top directory has inheritance ACLs set, then no problem, we just _add_ the access of the corresponding ACE, keeping the access set by classic Unix permission mode untouched - because we are just adding an ALLOW ACE. Anyway, so in a nutshell, you say that excessive deny ACEs in the example above are inserted as a result of a POSIX acls (that we have to map to) implementation limitation right? Because as I say, if I try the same on Netapp/PureStorage/Solaris based NFSv4 servers who are lucky enough to implement native v4 ACLs, then all work fine. (In reply to Ondrej from comment #18) > Anyway, so in a nutshell, you say that excessive deny ACEs in the example > above are inserted as a result of a POSIX acls (that we have to map to) > implementation limitation right? > Because as I say, if I try the same on Netapp/PureStorage/Solaris based > NFSv4 servers who are lucky enough to implement native v4 ACLs, then all > work fine. Those other servers all have native NFSv4 ACL support and don't have to do that sort of translation. That effort died upstream for Linux (after a valiant attempt by Andreas). FWIW, I think nfs-ganesha has NFSv4<->POSIX ACL translation too. It might be interesting to compare its behavior to this -- perhaps it does something better here? At this point, it doesn't sound like this is something fixable and is just an inherent limitation when we translate into POSIX ACLs. Should we close this CANTFIX ? Tried Ganesha nfs server but looks like their VFS handler does not really support ACLs Out of curiosity, I also tried how Samba does things - it has to map Windows ACLs (very similar to NFSv4 ones) to Posix ACLs too. With the same example, I do not see any Deny ACEs injected: [root@mnsrvcomp-38 samba-test]# ls -al Ondrej total 4 drwxrwxr-x+ 2 ovalouse eng 6 Apr 17 09:17 . [root@mnsrvcomp-38 samba-test]# getfacl Ondrej # file: Ondrej # owner: ovalouse # group: eng user::rwx user:ovalouse:rwx user:dobrown:r-x group::r-x group:eng:r-x mask::rwx other::r-x default:user::rwx default:user:ovalouse:rwx default:group::--- default:group:eng:--- default:mask::rwx default:other::--- I would expect something similar happens in our case, too. I made some progress on this today, mostly through some brute-force trace_printk debugging. The way this all works is that the NFSv4 acl is translated into a struct posix_acl_state (which really describes the nfsv4 acl). That is then translated into struct posix_acl via posix_state_to_acl. posix_acl_state has a number of "static" entries for the OWNER/GROUP/OTHER entries. When we go to translate that to the posix_acl, sets empty ACEs for those entries because those static fields in the posix_acl_state are zeroed out (from when they were initialized) and never touched. There is some special handling of "empty" ACLs which works around this in a lot of trivial cases, but I think this is probably a bug, and that we need a way to indicate that a default entry just isn't present at all in the posix_acl_state. Ok, I think I have a patch that should fix this particular case. I'm not sure what side effects it will have. One thing that is different between POSIX and NFSv4 ACLs is that well-formed POSIX acls require entries for owner/group/other, so when you set a single default ACL for a user, you get entries for those as well. With my new patch: [jlayton@knfsd ~]$ nfs4_getfacl /mnt/local/test # file: /mnt/local/test A::OWNER@:rwaDxtTcCy A::GROUP@:rxtcy A::EVERYONE@:rxtcy [jlayton@knfsd ~]$ nfs4_setfacl -a A:fd:jlayton@localdomain:RWX /mnt/local/test [jlayton@knfsd ~]$ nfs4_getfacl /mnt/local/test # file: /mnt/local/test A::OWNER@:rwaDxtTcCy A::1000:rwaDxtcy A::GROUP@:rxtcy A::EVERYONE@:rxtcy A:fdi:OWNER@:rwaDxtTcCy A:fdi:1000:rwaDxtcy A:fdi:GROUP@:rxtcy A:fdi:EVERYONE@:rxtcy Which is not exactly the behavior of a fs with native v4 ACL support, but is the best we can do with the POSIX ACL translation. Patch posted here:
    https://lore.kernel.org/linux-nfs/20230719-nfsd-acl-v1-1-eb0faf3d2917@kernel.org/T/#u
We'll have to see what the consensus is. Unfortunately, our ACL related test coverage in existing testsuites is pretty lacking.
Updated patch here, after some review and discussion:
    https://lore.kernel.org/linux-nfs/20230724-nfsd-acl-v2-1-1cfaac973498@kernel.org/T/#u
Good, would you mind producing some rpm package with the patch applied so I can test it? Sure. Scratch build here:
    https://koji.fedoraproject.org/koji/taskinfo?taskID=103833111
Setting this for 9.4.0 since this probably won't go into mainline until v6.6.
Earlier scratch build failed, as I tried to build against eln instead of epel9-next. Do-over build here:
    https://koji.fedoraproject.org/koji/taskinfo?taskID=103833993
Looks like the ppc64le build failed due to some BPF issue, but the others seemed to build just fine. Let me know how it goes! Kernel installed, tested but still does not work: [ondrejv@slsrvadm-03v slsrvadm-02v]$ ls -l total 0 -rw-r--r--. 1 root root 0 Dec 25 2022 file.test dr-xrwxr-x. 2 ondrejv dlgusers 6 Jul 25 10:12 ondrejv # I intentionally dropped w access for the OWNER (me) [ondrejv@slsrvadm-03v slsrvadm-02v]$ nfs4_setfacl -a A:fd:ondrejv:RWX ondrejv [ondrejv@slsrvadm-03v slsrvadm-02v]$ nfs4_getfacl ondrejv # file: ondrejv D::OWNER@:waD A::OWNER@:rxtTcCy A::62133:rwaDxtcy A::GROUP@:rxtcy A::EVERYONE@:rxtcy D:fdi:OWNER@:waD A:fdi:OWNER@:rxtTcCy A:fdi:62133:rwaDxtcy A:fdi:GROUP@:rxtcy A:fdi:EVERYONE@:rxtcy [ondrejv@slsrvadm-03v slsrvadm-02v]$ cd ondrejv [ondrejv@slsrvadm-03v ondrejv]$ touch test.txt touch: cannot touch 'test.txt': Permission denied -> ideally, we should not see any DENY rules here either and the touch should work OK That looks correct to me, given the starting conditions and the way the patch works. The default OWNER@ ACE in this case is inherited from the effective OWNER@ ACE. You intentionally denied W access to OWNER@, so the default ACL ends up with the same deny ACE as the effective. Again, we are dealing with a NFSv4->POSIX ACL translation here and that translation is (necessarily) lossy. Well, yes: more investigation shows the problem is actually deeper as the similar erroneous behavior can be replicated on the local filesystem as well: [ondrejv@slsrvadm-02v /share]$ getfacl ondrejv2 # file: ondrejv2 # owner: ondrejv # group: dlgusers user::r-x group::rwx other::r-x [ondrejv@slsrvadm-02v /share]$ setfacl -m default:u:ondrejv:rwx ondrejv2 [ondrejv@slsrvadm-02v /share]$ getfacl ondrejv2 # file: ondrejv2 # owner: ondrejv # group: dlgusers user::r-x group::rwx other::r-x default:user::r-x default:user:ondrejv:rwx default:group::rwx default:mask::rwx default:other::r-x [ondrejv@slsrvadm-02v /share]$ cd ondrejv2 [ondrejv@slsrvadm-02v ondrejv2]$ touch test3 touch: cannot touch 'test3': Permission denied ....-> so this is actually no problem with NFSD and POSIX<->NFSv4 ACL translation, but rather with the POSIX ACLs implementation in kernel, i.e. when OWNER has less rights than the user which happens to be equal to the OWNER, then OWNER permissions should be also updated, correct? Tested with OmniOS&ZFS (yes I know it does not know Posix ACL but this bit should actually work as well on Linux): [ondrejv@skynet19 /mnt/omnios]$ chmod ug-w ondrejv [ondrejv@skynet19 /mnt/omnios]$ ls -al total 1 dr-xr-xr-x. 2 ondrejv corp 2 Jul 25 12:50 ondrejv # <----- w flag missing there for user [ondrejv@skynet19 /mnt/omnios]$ nfs4_setfacl -a A:fd:14019:RWX ondrejv [ondrejv@skynet19 /mnt/omnios]$ ls -al total 1 drwxr-xr-x. 2 ondrejv corp 2 Jul 25 12:50 ondrejv # we updated standard Unix permissions as well by the previous nfs4_setfacl [ondrejv@skynet19 /mnt/omnios]$ nfs4_getfacl ondrejv # just for a reference, I know we can't get that far # file: ondrejv A:fd:ondrejv:rwaDxtTnNcCy A::OWNER@:rxtTnNcCoy A:g:GROUP@:rxtncy A::EVERYONE@:rxtncy [ondrejv@skynet19 /mnt/omnios]$ cd ondrejv [ondrejv@skynet19 /mnt/omnios/ondrejv]$ touch test.txt # Success! ============================================================================================================================= To wrap up - I would agree that this BZ can be closed as it works OK in the case I described earlier in this BZ (no Deny ACEs,and access works as expected), but I feel a different BZ should be perhaps opened regarding the problem with the POSIX ACLs implementation in the kernel (see above). Do you agree? (In reply to Ondrej from comment #32) > ....-> so this is actually no problem with NFSD and POSIX<->NFSv4 ACL > translation, but rather with the POSIX ACLs implementation in kernel, i.e. > when OWNER has less rights than the user which happens to be equal to the > OWNER, then OWNER permissions should be also updated, correct? > I would say no. Those are explicitly different ACEs. One denotes the owner of the file and one is for a user that just happens to be the current owner of the file. Ownership of files can change, and AFAIU we don't clear the mode or ACL when that happens (other than setuid/gid bits). This is interesting, however: [ondrejv@skynet19 /mnt/omnios]$ nfs4_getfacl ondrejv # just for a reference, I know we can't get that far # file: ondrejv A:fd:ondrejv:rwaDxtTnNcCy A::OWNER@:rxtTnNcCoy A:g:GROUP@:rxtncy A::EVERYONE@:rxtncy You've set a default ACL here, but that's the only default ACL being displayed in the resulting pile. What happens if you do this? $ mkdir ondrejv/foo Does the resulting directory get any OWNER@/GROUP@/EVERYONE@ permissions or does it inherit nothing but the explicit default ACE that you set (which is what the above ACL seems to imply)? Oh, and we don't want to close this just yet. This was opened against RHEL9, so we'd need to backport it there before we close this as resolved. If we want to discuss this further refinements to the ACL handling, linux-nfs.org would probably be a better venue, as the ACL handling is a place where we need the opinions and advice of the larger community. > I would say no. Those are explicitly different ACEs. One denotes the owner of the file and one is for a user that just happens to be the current owner of the file. > Ownership of files can change, and AFAIU we don't clear the mode or ACL when that happens (other than setuid/gid bits). Right, OmniOS does not actually change Unix permissions either - see the nfs4_getfacl above: [ondrejv@skynet19 /mnt/omnios]$ nfs4_getfacl ondrejv # just for a reference, I know we can't get that far # file: ondrejv A:fd:ondrejv:rwaDxtTnNcCy A::OWNER@:rxtTnNcCoy # <----- No write access here A:g:GROUP@:rxtncy A::EVERYONE@:rxtncy So only "ls -l" output suggests it did, but it did actually not. Nevertheless, write access is granted, which is what we would expect here, correct? I don't think linux-nfs.org is the best place to discuss because it has actually nothing to do with NFS right? > You've set a default ACL here, but that's the only default ACL being displayed in the resulting pile. What happens if you do this? [ondrejv@skynet19 /mnt/omnios]$ mkdir ondrejv/test [ondrejv@skynet19 /mnt/omnios]$ nfs4_getfacl ondrejv/test # file: ondrejv/test A:fd:ondrejv:rwaDxtTnNcy A::OWNER@:rwaDxtTnNcCoy A:g:GROUP@:rwaDxtncy A::EVERYONE@:rxtncy BTW, NetApp behaves the same way, FYI (In reply to Ondrej from comment #35) > > I would say no. Those are explicitly different ACEs. One denotes the owner of the file and one is for a user that just happens to be the current owner of the file. > Ownership of files can change, and AFAIU we don't clear the mode or ACL when that happens (other than setuid/gid bits). > Right, OmniOS does not actually change Unix permissions either - see the > nfs4_getfacl above: > [ondrejv@skynet19 /mnt/omnios]$ nfs4_getfacl ondrejv # just for a > reference, I know we can't get that far > # file: ondrejv > A:fd:ondrejv:rwaDxtTnNcCy > A::OWNER@:rxtTnNcCoy # <----- No write > access here > A:g:GROUP@:rxtncy > A::EVERYONE@:rxtncy > > So only "ls -l" output suggests it did, but it did actually not. > Nevertheless, write access is granted, which is what we would expect here, > correct? That's because NFSv4 ACLs are evaluated and enforced in the order displayed. You're able to match the ondrej@ ACE first because it's the first in the list. POSIX ACLs are not like that. They are evaluated in a well-defined order that starts with the owner of the file (see acl(5)). Note that when you set the ACL against a linux server, they end up reordered. This is due to that fact. > I don't think linux-nfs.org is the best place to discuss because > it has actually nothing to do with NFS right? > NFSv4<->POSIX ACL translation is certainly relevant, and this is related to that and its limitations. linux-fsdevel would also be relevant for this. Any problems in this area will need to be resolved there first before we'll take a fix into RHEL. > > You've set a default ACL here, but that's the only default ACL being displayed in the resulting pile. What happens if you do this? > > [ondrejv@skynet19 /mnt/omnios]$ mkdir ondrejv/test > [ondrejv@skynet19 /mnt/omnios]$ nfs4_getfacl ondrejv/test > # file: ondrejv/test > A:fd:ondrejv:rwaDxtTnNcy > A::OWNER@:rwaDxtTnNcCoy > A:g:GROUP@:rwaDxtncy > A::EVERYONE@:rxtncy > > BTW, NetApp behaves the same way, FYI That's interesting. Where do the OWNER@/GROUP@/EVERYONE@ ACEs come from? I'm guessing that they're derived from file create mode & ~umask ? That makes sense I guess. Thanks for testing it. > That's because NFSv4 ACLs are evaluated and enforced in the order displayed. You're able to match the ondrej@ ACE first because it's the first in the list. POSIX > ACLs are not like that. They are evaluated in a well-defined order that starts with the owner of the file (see acl(5)). Note that when you set the ACL against a > linux server, they end up reordered. This is due to that fact. Ok got it, so not a bug, it's a feature :) If it's documented in 'man acl', then fine. > That's interesting. Where do the OWNER@/GROUP@/EVERYONE@ ACEs come from? I'm guessing that they're derived from file create mode & ~umask ? That comes from RFC 8881, chapter 6.4: "The server that supports both mode and ACL must take care to synchronize the MODE4_*USR, MODE4_*GRP, and MODE4_*OTH bits with the ACEs that have respective who fields of "OWNER@", "GROUP@", and "EVERYONE@"...." But (for sake of the completeness), Netapp has a special config flag called "v4-inherited-acl-preserve" or long version "Ignore Client Specified Mode Bits and Preserve Inherited NFSv4 ACL When Creating New Files or Directories" which, when turned on, would render the newly created directory ondrejv/test in the example above to only contain the single inheritance ACE of "A:fd:ondrejv:rwaDxtTnNcy" meaning that when there is any inheritance ACE defined, it is used instead of the classic Unix permission behavior. (In reply to Jeff Layton from comment #41) > I don't get the same result as you with the centos test kernel either. Can > you verify that you didn't just make a mistake in the testing? I don't see > any reason why the NFSv4 minorversion should make any difference here. Hi Jeff, I can 100% reproduce this issue by steps: 1. Mount in v4.2 2. A user create new directory in that mountpoint and update with ACL inheritance: e.g. # su bob -c 'nfs4_setfacl -a A:fd:bob:RWX /mnt/nfsmp-user_permission_check/testdir_bob' 3. The user go to that directory and create new directory/files: e.g. # su bob -c 'cd /mnt/nfsmp-user_permission_check/testdir_bob && mkdir subdir && touch files' Here are some manual test logs: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [root@dell-per740-89 ~]# uname -r 5.14.0-344.bz2136452.1.el9.next.x86_64 [root@dell-per740-89 ~]# su testuser [testuser@dell-per740-89 root]$ nfsstat -m /mnt_test from dell-per740-89.rhts.eng.pek2.redhat.com:/export_test Flags: rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.73.180.15,local_lock=none,addr=10.73.180.15 [testuser@dell-per740-89 root]$ cd /mnt_test/testdir/ [testuser@dell-per740-89 testdir]$ ll total 0 [testuser@dell-per740-89 testdir]$ mkdir dirv4.0 [testuser@dell-per740-89 testdir]$ touch filev4.0 [testuser@dell-per740-89 testdir]$ ll total 0 drwxr-xr-x+ 2 testuser testuser 6 Jul 27 21:33 dirv4.0 -rw-r--r--+ 1 testuser testuser 0 Jul 27 21:33 filev4.0 ^^^^ [testuser@dell-per740-89 testdir]$ id uid=1001(testuser) gid=1001(testuser) groups=1001(testuser) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 [testuser@dell-per740-89 testdir]$ rm -rf ./* [testuser@dell-per740-89 testdir]$ exit exit [root@dell-per740-89 ~]# umount /mnt_test/ [root@dell-per740-89 ~]# mount -o vers=4.2 $HOSTNAME:/export_test/ /mnt_test/ ^^^^^^^^^^ [root@dell-per740-89 ~]# su testuser [testuser@dell-per740-89 root]$ nfsstat -m /mnt_test from dell-per740-89.rhts.eng.pek2.redhat.com:/export_test Flags: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.73.180.15,local_lock=none,addr=10.73.180.15 [testuser@dell-per740-89 root]$ cd /mnt_test/testdir/ [testuser@dell-per740-89 testdir]$ ll total 0 [testuser@dell-per740-89 testdir]$ touch file4.2 [testuser@dell-per740-89 testdir]$ mkdir dir4.2 [testuser@dell-per740-89 testdir]$ ll total 0 drwxrwxr-x+ 2 testuser testuser 6 Jul 27 21:40 dir4.2 -rw-rw-r--+ 1 testuser testuser 0 Jul 27 21:40 file4.2 ^^^^ [testuser@dell-per740-89 testdir]$ umask 0022 ^^^^ [testuser@dell-per740-89 testdir]$ id uid=1001(testuser) gid=1001(testuser) groups=1001(testuser) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 [testuser@dell-per740-89 testdir]$ P.s. per the `umask` output 0022 looks like the problem is in v4.2 IMO. Not seeing that here at all: [jlayton@centos9 ~]$ grep /mnt/local /etc/fstab localhost:/export /mnt/local nfs noauto 0 0 [jlayton@centos9 ~]$ mount -v | grep local localhost:/export on /mnt/local type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp6,timeo=600,retrans=2,sec=sys,clientaddr=::1,local_lock=none,addr=::1) [jlayton@centos9 ~]$ id -a uid=4447(jlayton) gid=4447(jlayton) groups=4447(jlayton),10(wheel) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 [jlayton@centos9 ~]$ umask 0022 [jlayton@centos9 ~]$ sudo mount /mnt/local [jlayton@centos9 ~]$ rm -rf /mnt/local/testdir/ [jlayton@centos9 ~]$ mkdir /mnt/local/testdir/ [jlayton@centos9 ~]$ nfs4_getfacl /mnt/local/testdir/ # file: /mnt/local/testdir/ A::OWNER@:rwaDxtTcCy A::GROUP@:rxtcy A::EVERYONE@:rxtcy [jlayton@centos9 ~]$ nfs4_setfacl -a A:fd:jlayton@localdomain:rwx /mnt/local/testdir [jlayton@centos9 ~]$ nfs4_getfacl /mnt/local/testdir/ # file: /mnt/local/testdir/ A::OWNER@:rwaDxtTcCy A::4447:rxtcy A::GROUP@:rxtcy A::EVERYONE@:rxtcy A:fdi:OWNER@:rwaDxtTcCy A:fdi:4447:rxtcy A:fdi:GROUP@:rxtcy A:fdi:EVERYONE@:rxtcy [jlayton@centos9 ~]$ mkdir /mnt/local/testdir/dir4.2 [jlayton@centos9 ~]$ touch /mnt/local/testdir/file4.2 [jlayton@centos9 ~]$ ls -l /mnt/local/testdir total 0 drwxr-xr-x+ 2 jlayton jlayton 6 Jul 28 12:50 dir4.2 -rw-r--r--+ 1 jlayton jlayton 0 Jul 28 12:51 file4.2 ...which looks correct to me. v4.0 works identically for me. We must be missing some difference in our setups. Do you get the same result in yours with v4.1 ? (In reply to Jeff Layton from comment #44) > .. > ...which looks correct to me. v4.0 works identically for me. We must be > missing some difference in our setups. Do you get the same result in yours > with v4.1 ? Hey, I only see this problem with v4.2 (v4.0 and v4.1 are good). Btw, please try once more with permission "RWX" instead of "rwx": # # It won't reproduce with "A:fd:testuser@localdomain:rwx" # [testuser@ibm-x3850x5-03-vm-01 mnt_test]$ nfs4_setfacl -a A:fd:testuser@localdomain:rwx testdir/ [testuser@ibm-x3850x5-03-vm-01 mnt_test]$ nfs4_getfacl testdir/ # file: testdir/ A::OWNER@:rwaDxtTcCy A::1001:rxtcy A::GROUP@:rxtcy A::EVERYONE@:rxtcy A:fdi:OWNER@:rwaDxtTcCy A:fdi:1001:rxtcy A:fdi:GROUP@:rxtcy A:fdi:EVERYONE@:rxtcy [testuser@ibm-x3850x5-03-vm-01 mnt_test]$ mkdir testdir/subdir [testuser@ibm-x3850x5-03-vm-01 mnt_test]$ ll -ld testdir/subdir drwxr-xr-x+ 2 testuser testuser 6 Jul 29 03:00 testdir/subdir [testuser@ibm-x3850x5-03-vm-01 mnt_test]$ rm -rf testdir/subdir # # Only with "RWX" (mounted in v4.2) # [testuser@ibm-x3850x5-03-vm-01 mnt_test]$ nfs4_setfacl -a A:fd:testuser@localdomain:RWX testdir/ [testuser@ibm-x3850x5-03-vm-01 mnt_test]$ nfs4_getfacl testdir/ # file: testdir/ A::OWNER@:rwaDxtTcCy A::1001:rwaDxtcy A::GROUP@:rxtcy A::EVERYONE@:rxtcy A:fdi:OWNER@:rwaDxtTcCy A:fdi:1001:rwaDxtcy A:fdi:GROUP@:rxtcy A:fdi:EVERYONE@:rxtcy [testuser@ibm-x3850x5-03-vm-01 mnt_test]$ mkdir testdir/subdir [testuser@ibm-x3850x5-03-vm-01 mnt_test]$ ll -ld testdir/subdir drwxrwxr-x+ 2 testuser testuser 6 Jul 29 03:01 testdir/subdir [testuser@ibm-x3850x5-03-vm-01 mnt_test]$ [testuser@ibm-x3850x5-03-vm-01 ~]$ exit exit [root@ibm-x3850x5-03-vm-01 ~]# # # Check again with v4.1 # [root@ibm-x3850x5-03-vm-01 ~]# umount /mnt_test/ [root@ibm-x3850x5-03-vm-01 ~]# mount $HOSTNAME:/export_test/ /mnt_test/ -o vers=4.1 [root@ibm-x3850x5-03-vm-01 ~]# [root@ibm-x3850x5-03-vm-01 ~]# grep test /proc/mounts ibm-x3850x5-03-vm-01.rhts.eng.pek2.redhat.com:/export_test /mnt_test nfs4 rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.73.4.115,local_lock=none,addr=10.73.4.115 0 0 [root@ibm-x3850x5-03-vm-01 ~]# su testuser [testuser@ibm-x3850x5-03-vm-01 root]$ ls -ld /mnt_test/testdir/subdir/ drwxrwxr-x+ 2 testuser testuser 6 Jul 29 03:01 /mnt_test/testdir/subdir/ [testuser@ibm-x3850x5-03-vm-01 root]$ mkdir /mnt_test/testdir/subv4.1 [testuser@ibm-x3850x5-03-vm-01 root]$ ll /mnt_test/testdir/ total 0 drwxrwxr-x+ 2 testuser testuser 6 Jul 29 03:01 subdir drwxr-xr-x+ 2 testuser testuser 6 Jul 29 03:06 subv4.1 ^^^ [testuser@ibm-x3850x5-03-vm-01 root]$ Making a few of the prior comments public, so non-RH folks are aware that we're still looking at problems here:
Got it, thanks. rwx -> RWX made the difference. Interestingly, the mode on the directory changes after the nfs4_setfacl call. That seems to be unexpected, but maybe I'm missing some subtlety with translation of the more obscure NFSv4 ACE permission bits?
[jlayton@knfsd ~]$ mkdir /mnt/local/testdir
[jlayton@knfsd ~]$ stat !$
stat /mnt/local/testdir
  File: /mnt/local/testdir
  Size: 4096      	Blocks: 8          IO Block: 1048576 directory
Device: 0,51	Inode: 131075      Links: 2
Access: (0755/drwxr-xr-x)  Uid: ( 1000/ jlayton)   Gid: ( 1000/ jlayton)
Access: 2023-07-31 07:39:53.059381794 -0400
Modify: 2023-07-31 07:39:53.059381794 -0400
Change: 2023-07-31 07:39:53.059381794 -0400
 Birth: -
[jlayton@knfsd ~]$ nfs4_setfacl  -a A:fd:jlayton@localdomain:RWX /mnt/local/testdir
[jlayton@knfsd ~]$ stat /mnt/local/testdir
  File: /mnt/local/testdir
  Size: 4096      	Blocks: 8          IO Block: 1048576 directory
Device: 0,51	Inode: 131075      Links: 2
Access: (0775/drwxrwxr-x)  Uid: ( 1000/ jlayton)   Gid: ( 1000/ jlayton)            <<<< MODE CHANGE
Access: 2023-07-31 07:39:53.059381794 -0400
Modify: 2023-07-31 07:39:53.059381794 -0400
Change: 2023-07-31 07:40:10.071919438 -0400
 Birth: -
[jlayton@knfsd ~]$ getfacl /export/testdir
getfacl: Removing leading '/' from absolute path names
# file: export/testdir
# owner: jlayton
# group: jlayton
user::rwx
user:jlayton:rwx
group::r-x
mask::rwx
other::r-x
default:user::---
default:user:jlayton:rwx
default:group::---
default:mask::rwx
default:other::---
This happens regardless of the NFS version in use, and it also happens without the patch for this BZ. I think this is most likely a different, but related issue involving permissions in the ACL translation code. It was somewhat papered over before by the extra deny ACEs, but that may make the problem more acute.
The create behavior is almost certainly related to the mode_umask that the client sends in NFSv4.2 CREATE calls. See RFC8572:
    https://datatracker.ietf.org/doc/html/rfc8275
The 4.0/4.1 client sends a mode of 0755, but v4.2+ send the mode_umask parameter. That may mean that it's not being interpreted correctly by the server, but I'll need to do some investigation to confirm it.
Not sure if it helps, but Netapp does not behave that way: [ondrejv@skynet19 /proj/it/tmp]$ stat acltest2 File: acltest2 Size: 4096 Blocks: 8 IO Block: 65536 directory Device: 0,79 Inode: 130010300 Links: 2 Access: (0755/drwxr-xr-x) Uid: (14019/ ondrejv) Gid: (10116/ it) Context: system_u:object_r:nfs_t:s0 Access: 2023-09-04 22:19:02.661375000 +0200 Modify: 2023-09-04 22:19:02.661375000 +0200 Change: 2023-09-04 22:20:28.881284000 +0200 Birth: - [ondrejv@skynet19 /proj/it/tmp]$ nfs4_setfacl -a A:fd:ondrejv:RWX acltest2 [ondrejv@skynet19 /proj/it/tmp]$ stat acltest2 File: acltest2 Size: 4096 Blocks: 8 IO Block: 65536 directory Device: 0,79 Inode: 130010300 Links: 2 Access: (0755/drwxr-xr-x) Uid: (14019/ ondrejv) Gid: (10116/ it) Context: system_u:object_r:nfs_t:s0 Access: 2023-09-04 22:19:02.661375000 +0200 Modify: 2023-09-04 22:19:02.661375000 +0200 Change: 2023-09-04 22:21:33.962203000 +0200 Birth: - [ondrejv@skynet19 /proj/it/tmp]$ nfs4_getfacl acltest2 # file: acltest2 A:fd:ondrejv:rwaDxtTnNcCy A::OWNER@:rwaDxtTnNcCy A:g:GROUP@:rxtncy A::EVERYONE@:rxtncy This patch will be as part of the MR of BZ 2178799 a.k.a. https://issues.redhat.com/browse/RHEL-7936 *** This bug has been marked as a duplicate of bug 2178799 *** Hello, May I know how this bug relates to #2178799? To me it looks like two different issues. Hi Ondrej, Sorry for the confusion. I just change the previous comment as public. Yes they are 2 different issues but this patch will be included as part of the Merge Request of that one. Ok, got it - thanks! |