Bug 1463472

Summary: Failed setxattr operation: Invalid argument for nfs4_setfacl as sec=null when mounting NetApp v4.1 by default
Product: Red Hat Enterprise Linux 7 Reporter: Yongcheng Yang <yoyang>
Component: nfs4-acl-toolsAssignee: Steve Dickson <steved>
Status: CLOSED NOTABUG QA Contact: Yongcheng Yang <yoyang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.4CC: bfields, xzhou, yoyang
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-11 09:50:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
successful (4.0) network traces
none
failed (4.1) network traces none

Description Yongcheng Yang 2017-06-21 02:20:00 UTC
Description of problem:

During bz1427974 testing with Netapp server, if mounted with
NFSv4.1, it always get error "Failed setxattr operation: Invalid
argument" when execute "nfs4_setfacl" setting facl to users other
than "OWNER", "GROUP" and "EVERYONE". This operation can pass if
mounted with v4.0.

Previously I thought it's our netapp server's configuration issue,
but still cannot figure out how to fix it.  However, this issue may
interfere others as we now set v4.1 mount by default.


Version-Release number of selected component (if applicable):
nfs4-acl-tools-0.3.3-15.el7


How reproducible:
100% easily


Steps to Reproduce:
1. mount netapp server with v4.1
2. nfs4_setfacl -a "A::${some_number}:RW" $mountpoint/${some_file}


Actual results:
######### mounted with nfs version 4.1 #########
[root@ ~]# mount -o vers=4.1 netapp-pnfs-02.rhts.eng.pek2.redhat.com:/export/qe-test /mnt
[root@ ~]# get_mp_nfsvers /mnt
4.1
[root@ ~]# nfsstat -m
/mnt from netapp-pnfs-02.rhts.eng.pek2.redhat.com:/export/qe-test
 Flags:	rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=null,clientaddr=10.73.4.163,local_lock=none,addr=10.73.4.5

[root@ ~]# touch /mnt/testfile
[root@ ~]# nfs4_getfacl /mnt/testfile
A::OWNER@:rwatTnNcCy
A:g:GROUP@:rtncy
A::EVERYONE@:rtncy
[root@ ~]# nfs4_setfacl -a "A::10000:RW" /mnt/testfile
Failed setxattr operation: Invalid argument
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[root@ ~]# echo $?
255
^^^
[root@ ~]# rm -f /mnt/testfile
[root@ ~]# umount /mnt/


Expected results:
######### mounted with nfs version 4.0 #########
[root@ ~]# mount -o vers=4.0 netapp-pnfs-02.rhts.eng.pek2.redhat.com:/export/qe-test /mnt
[root@ ~]# get_mp_nfsvers /mnt/
4.0
[root@ ~]# nfsstat -m
/mnt from netapp-pnfs-02.rhts.eng.pek2.redhat.com:/export/qe-test
 Flags:	rw,relatime,vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.73.4.163,local_lock=none,addr=10.73.4.5

[root@ ~]# touch /mnt/testfile
[root@ ~]# nfs4_getfacl /mnt/testfile
A::OWNER@:rwatTnNcCy
A:g:GROUP@:rtncy
A::EVERYONE@:rtncy
[root@ ~]# nfs4_setfacl -a "A::10000:RW" /mnt/testfile
[root@ ~]# nfs4_setfacl -a "A::10001:RW" /mnt/testfile
[root@ ~]# echo $?
0
[root@ ~]# nfs4_getfacl /mnt/testfile
A::10001:rwatTnNcCy
A::10000:rwatTnNcCy
A::OWNER@:rwatTnNcCy
A:g:GROUP@:rtncy
A::EVERYONE@:rtncy
[root@ ~]# rm -f /mnt/testfile && umount /mnt/
[root@ ~]# 


Additional info:
Following is the filer setting. The v4.0-acl,v4.1-acl are all enabled.

redhat::>  vserver nfs show -vserver qe-test -fields v4.0-acl,v4.1-acl
vserver v4.0-acl v4.1-acl 
------- -------- -------- 
qe-test enabled  enabled  

redhat::> 
redhat::> vserver nfs show -vserver qe-test                          

                         Vserver: qe-test
              General NFS Access: true
                          NFS v3: enabled
                        NFS v4.0: enabled
                    UDP Protocol: enabled
                    TCP Protocol: enabled
             Spin Authentication: disabled
            Default Windows User: -
             NFSv4.0 ACL Support: enabled
 NFSv4.0 Read Delegation Support: enabled
NFSv4.0 Write Delegation Support: enabled
         NFSv4 ID Mapping Domain: mgmt.lab.eng.nay.redhat.com
   NFSv4.1 Minor Version Support: enabled
                   Rquota Enable: disabled
    NFSv4.1 Parallel NFS Support: enabled
             NFSv4.1 ACL Support: enabled
            NFS vStorage Support: disabled
           Default Windows Group: -
 NFSv4.1 Read Delegation Support: enabled
NFSv4.1 Write Delegation Support: enabled
             NFS Mount Root Only: enabled
                   NFS Root Only: disabled

redhat::> exit
Goodbye

Comment 1 Yongcheng Yang 2017-06-26 09:28:31 UTC
(In reply to Yongcheng Yang from comment #0)

> Steps to Reproduce:
> 1. mount netapp server with v4.1
> 2. nfs4_setfacl -a "A::${some_number}:RW" $mountpoint/${some_file}

Not sure whether the above operation is common or not.

If it's the reconfiguration issue, looks like it's another
issue we need to doc (like Bug1450447).

Comment 2 Andrey 2018-08-08 08:53:48 UTC
This problem is also present in CentOS Linux release 7.5.1804 (Core).
On the versions of the protocol, NSF 4.0 and 4.1 give similar errors, which are described in the chapter.

[root@testserver ~]# cat /etc/exports
/opt	 *(rw,acl)

[root@testserver ~]# mount -t nfs -o vers=4.0 127.0.0.1:/opt /mnt/
[root@testserver ~]# mount
127.0.0.1:/opt on /mnt type nfs4 (rw,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1)

[root@testserver ~]# nfs4_getfacl /mnt/test
A::OWNER@:rwaxtTcCy
A::GROUP@:rwaxtcy
A::EVERYONE@:rwaxtcy

[root@testserver ~]# nfs4_setfacl -a A::test.0.1:rw /mnt/test
Failed setxattr operation: Invalid argument
[root@testserver ~]# echo $?
255

Comment 3 J. Bruce Fields 2018-08-09 18:47:06 UTC
(In reply to Yongcheng Yang from comment #0)
> Description of problem:
> 
> During bz1427974 testing with Netapp server, if mounted with
> NFSv4.1, it always get error "Failed setxattr operation: Invalid
> argument" when execute "nfs4_setfacl" setting facl to users other
> than "OWNER", "GROUP" and "EVERYONE". This operation can pass if
> mounted with v4.0.

That's very strange!

Would it be possible to get network traces showing both the successful (4.0) and unsuccessful (4.1) cases?

Comment 4 J. Bruce Fields 2018-08-09 18:50:36 UTC
(In reply to Andrey from comment #2)
> This problem is also present in CentOS Linux release 7.5.1804 (Core).
> On the versions of the protocol, NSF 4.0 and 4.1 give similar errors, which
> are described in the chapter.

That (and the fact that you're testing against a CentOS server instead of Netapp) suggests to me that this is a different bug (or possibly a misconfiguration of some sort).

I'd also suggest starting with a network trace.

My first suspicion would be something about the name test.0.1 that the server doesn't like.

Comment 5 Yongcheng Yang 2018-08-10 09:59:10 UTC
Created attachment 1474935 [details]
successful (4.0) network traces

Comment 6 Yongcheng Yang 2018-08-10 10:00:07 UTC
Created attachment 1474936 [details]
failed (4.1) network traces

Comment 7 Yongcheng Yang 2018-08-10 10:09:35 UTC
(In reply to J. Bruce Fields from comment #3)
> ...
> Would it be possible to get network traces showing both the successful (4.0)
> and unsuccessful (4.1) cases?

Sorry for didn't investigate that before.

Just found that in the bad case, server returned "NFS4ERR_BADOWNER" for SETATTR.
However, I just checked the created file was like "nobody:nobody" in both v4.0 and v4.1. Still can't figure out where's the difference.

Simple tshark output:
~~~~~~~~~~~~~~~~~~~~
[root ~]# tshark -n -r v4.0.pcap | grep -A1 SETATTR
...
--
 61 19  10.73.4.149 -> 10.73.4.5    NFS 362 V4 Call SETATTR FH: 0x80fbef23
 62 19    10.73.4.5 -> 10.73.4.149  NFS 130 V4 Reply (Call In 61)[Malformed Packet]
[root ~]# 
[root ~]# tshark -n -r v4.1.pcap | grep -A1 SETATTR
...
--
 66 29  10.73.4.149 -> 10.73.4.5    NFS 334 V4 Call SETATTR FH: 0x7d021756
 67 29    10.73.4.5 -> 10.73.4.149  NFS 170 V4 Reply (Call In 66) SETATTR Status: NFS4ERR_BADOWNER
 68 29  10.73.4.149 -> 10.73.4.5    NFS 334 V4 Call SETATTR FH: 0x7d021756
 69 29    10.73.4.5 -> 10.73.4.149  NFS 170 V4 Reply (Call In 68) SETATTR Status: NFS4ERR_BADOWNER
 70 29  10.73.4.149 -> 10.73.4.5    TCP 66 756 > 2049 [ACK] Seq=4705 Ack=5537 Win=52864 Len=0 TSval=281894477 TSecr=877673443
[root ~]#

Comment 8 J. Bruce Fields 2018-08-10 20:29:03 UTC
I wonder why the client's using AUTH_NULL in the 4.1 case and AUTH_UNIX in the 4.0 case?

It shouldn't make a difference, but I'm curious.

Is /sys/module/nfsd/parameters/nfs4_disable_idmapping on the server the same in both cases?

Comment 9 Yongcheng Yang 2018-08-13 08:13:51 UTC
(In reply to J. Bruce Fields from comment #8)
Hi Bruce,
> Is /sys/module/nfsd/parameters/nfs4_disable_idmapping on the server the same
> in both cases?

The tests are all with the same nfs server, i.e. Netapp (see 
"Additional info" in comment #0).

> I wonder why the client's using AUTH_NULL in the 4.1 case and AUTH_UNIX in
> the 4.0 case?
> 
> It shouldn't make a difference, but I'm curious.

Thanks for the hint! It does matter with using AHTH_NULL or AUTH_UNIX:

####################################
# When specifying sec=sys explicitly
####################################
[root ~]# mount -o vers=4.1,sec=sys netapp-pnfs-02.rhts.eng.pek2.redhat.com:/export/qe-test /mnt
[root ~]# nfsstat -m
/mnt from netapp-pnfs-02.rhts.eng.pek2.redhat.com:/export/qe-test
 Flags: rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.66.144.84,local_lock=none,addr=10.73.4.5

[root ~]# touch /mnt/testfile
[root ~]# nfs4_getfacl /mnt/testfile

# file: /mnt/testfile
A::OWNER@:rwatTnNcCy
A:g:GROUP@:rtncy
A::EVERYONE@:rtncy
[root ~]# nfs4_setfacl -a "A::10000:RW" /mnt/testfile
[root ~]# echo $?
0
^^^       <<<<<<<<<<<<<<<<< success!
[root ~]# umount /mnt/
[root ~]# 
####################################
# It's sec=null by default with v4.1
####################################
[root ~]# mount -o vers=4.1 netapp-pnfs-02.rhts.eng.pek2.redhat.com:/export/qe-test /mnt
[root ~]# nfsstat -m
/mnt from netapp-pnfs-02.rhts.eng.pek2.redhat.com:/export/qe-test
 Flags: rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=null,clientaddr=10.66.144.84,local_lock=none,addr=10.73.4.5

[root ~]# nfs4_setfacl -a "A::10000:RW" /mnt/testfile
Failed setxattr operation: Invalid argument
[root ~]# echo $?
255
[root ~]# rm -f /mnt/testfile
[root ~]# umount /mnt/


Then the problem changes to be "why mounting v4.1 with NetApp turns out to be sec=null instead of sec=sys" maybe.

Comment 10 J. Bruce Fields 2018-08-13 18:53:26 UTC
(In reply to Yongcheng Yang from comment #9)
> (In reply to J. Bruce Fields from comment #8)
> Hi Bruce,
> > Is /sys/module/nfsd/parameters/nfs4_disable_idmapping on the server the same
> > in both cases?
> 
> The tests are all with the same nfs server, i.e. Netapp (see 
> "Additional info" in comment #0).

Oh, thanks for pointing out my oversight.

This definitely looks like a server bug.

Note that according to RFC 7530, servers and clients should allow numeric id's (like "1000") on the wire only when not using krb5.  I wonder if the server has incorrectly implemented that check so that it also rejects them over auth_null.

Anyway, we should report this to Netapp.

That said, my guess would be that auth_null is not that widely used, so perhaps our tests should just be using auth_sys.

Comment 11 Yongcheng Yang 2019-03-11 09:50:26 UTC
I'm closing this as NOTABUG for RHEL client according to comment #10 and comment #9.

However, haven't report this issue to Netapp as I have no account for it.