RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1602836 - Kerberos-enabled NFSv4 doesn't work on armv7hl
Summary: Kerberos-enabled NFSv4 doesn't work on armv7hl
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: nfs-utils
Version: 7.6
Hardware: armv7hl
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Steve Dickson
QA Contact: Yongcheng Yang
URL:
Whiteboard:
Depends On: 1595927
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-18 15:23 UTC by Steve Dickson
Modified: 2018-10-30 11:48 UTC (History)
10 users (show)

Fixed In Version: nfs-utils-1.3.0-0.57.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1595927
Environment:
Last Closed: 2018-10-30 11:48:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3311 0 None None None 2018-10-30 11:48:44 UTC

Description Steve Dickson 2018-07-18 15:23:37 UTC
+++ This bug was initially created as a clone of Bug #1595927 +++

Description of problem:
There's something going on in the ARM build of the Kerberos bits of NFSv4 that stop it working on that platform (at least for me). Whenever I try to access an NFSv4 share I get either 'stale file handle' or 'permission denied'. 

I've tried it with F28 on an RPi 3, and seen similar results on other small ARM boxes. The only other thing they have in common is no hardware RTC.

The machines are configured as FreeIPA clients, enrolled using ipa-client-install and ipa-client-automount. All the other Kerberos bits are working fine: I get tickets, can use GSSAPI ssh to log into other machines, etc.

I'm using exactly the same steps and configuration as I do on my x86-64 boxes, and they all work fine. I also have Krb5+NFSv4 working on my realm for SPARC, PA-RISC and POWER boxes (albeit in another distro) so I really can't understand this one!

I currently suspect gssproxy because when I try to access a share it produces the following:


[2018/06/27 19:16:44]: Debug Enabled (level: 3)
[2018/06/27 19:16:44]: Service: nfs-server, Keytab: /etc/krb5.keytab, Enctype: 18
[2018/06/27 19:16:44]: Service: nfs-client, Keytab: /etc/krb5.keytab, Enctype: 18
[2018/06/27 19:16:44]: Client [2018/06/27 19:16:44]: (/usr/sbin/gssproxy) [2018/06/27 19:16:44]:  connected (fd = 12)[2018/06/27 19:16:44]:  (pid = 1810) (uid = 0) (gid = 0)[2018/06/27 19:16:44]:  (context = system_u:system_r:kernel_t:s0)[2018/06/27 19:16:44]: 
[2018/06/27 19:16:50]: Client [2018/06/27 19:16:50]: (/usr/sbin/rpc.gssd) [2018/06/27 19:16:50]:  connected (fd = 13)[2018/06/27 19:16:50]:  (pid = 617) (uid = 60673) (gid = 60673)[2018/06/27 19:16:50]:  (context = system_u:system_r:gssd_t:s0)[2018/06/27 19:16:50]: 
[CID 13][2018/06/27 19:16:50]: [status] Handling query input: 0xdf43a8 (116)
[CID 13][2018/06/27 19:16:50]: Connection matched service nfs-client
[CID 13][2018/06/27 19:16:50]: [status] Processing request [0xdf43a8 (116)]
[CID 13][2018/06/27 19:16:50]: [status] Executing request 6 (GSSX_ACQUIRE_CRED) from [0xdf43a8 (116)]
[CID 13][2018/06/27 19:16:50]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "nfs-client", euid: 60673,socket: (null)
    GSSX_ARG_ACQUIRE_CRED( call_ctx: { "" [  ] } input_cred_handle: <Null> add_cred: 0 desired_name: <Null> time_req: 4294967295 desired_mechs: { { 1 2 840 113554 1 2 2 } } cred_usage: INITIATE initiator_time_req: 0 acceptor_time_req: 0 )
gssproxy[1810]: (OID: { 1 2 840 113554 1 2 2 }) Unspecified GSS failure.  Minor code may provide more information, No credentials cache found
    GSSX_RES_ACQUIRE_CRED( status: { 851968 { 1 2 840 113554 1 2 2 } 2529639107 "Unspecified GSS failure.  Minor code may provide more information" "No credentials cache found" [  ] } output_cred_handle: <Null> )
[CID 13][2018/06/27 19:16:50]: [status] Returned buffer 6 (GSSX_ACQUIRE_CRED) from [0xdf43a8 (116)]: [0xb3943870 (176)]
[CID 13][2018/06/27 19:16:50]: [status] Handling query output: 0xb3943870 (176)
[2018/06/27 19:16:50]: [status] Handling query reply: 0xb3943870 (176)
[2018/06/27 19:16:50]: [status] Sending data: 0xb3943870 (176)
[2018/06/27 19:16:50]: [status] Sending data [0xb3943870 (176)]: successful write of 176


I don't know why it's not finding the credential cache. The stuff in /etc/gssproxy is the same as on the working x86-64 boxes. The machine keytab seems to be in order. I've tried both kernel keyring and KCM, no difference.


Version-Release number of selected component (if applicable):
gssproxy-0.8.0-4.fc28.armv7hl
kernel-4.17.2-200.fc28.armv7hl
nfs-utils-2.3.2-0.fc28.armv7hl
sssd-1.16.2-1.fc28.armv7hl
sssd-kcm-1.16.2-1.fc28.armv7hl

How reproducible:
Consistently.

Steps to Reproduce:
1. Install F28 on an RPi 3.
2. Enroll it to a FreeIPA domain with Kerberised NFSv4, add the relevant automounts.
3. Attempt to access the share.

Actual results:
No share.

Expected results:
Shared stuff.

--- Additional comment from Robbie Harwood on 2018-06-28 17:23:32 EDT ---

Hi, unfortunately, I don't have hardware for this architecture to test this on.  Could you capture an ltrace of gssproxy during the failure?  Also, `klist -ekt` on the keytab, and `klist -e` as the user?

--- Additional comment from James Ettle on 2018-06-28 17:59:26 EDT ---

$ klist -e
Ticket cache: KCM:1402400001:43805
Default principal: james

Valid starting     Expires            Service principal
28/06/18 22:54:06  29/06/18 22:54:06  krbtgt/CB.ETTLE
        Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96


# klist -ekt
Keytab name: FILE:/etc/krb5.keytab
KVNO Timestamp         Principal
---- ----------------- --------------------------------------------------------
   1 27/06/18 00:25:39 host/buck.cb.ettle (aes256-cts-hmac-sha1-96)
   1 27/06/18 00:25:39 host/buck.cb.ettle (aes128-cts-hmac-sha1-96)


I see pretty much the same on one of my x86-64 boxes. (ltrace to follow shortly.)

--- Additional comment from James Ettle on 2018-06-28 18:56:36 EDT ---

From ltrace gssproxy -i, when doing an ls /<nfsv4 path>:


calloc(1, 148)                                                                      = 0xf2ed10
verto_get_private(0xf2db70, 0, -1, 0xf2edac)                                        = 0xf2f850
verto_get_fd(0xf2db70, 0, -1, -1)                                                   = 6
accept(6, 0xf2ed18, 0xf2ed88, 1)                                                    = 11
fcntl(11, 3, 0, 0)                                                                  = 2
fcntl(11, 4, 2050, 0xc75fe700)                                                      = 0
fcntl(11, 1, 0, 0xc75fe700)                                                         = 0
fcntl(11, 2, 1, 0xc75fe700)                                                         = 0
getsockopt(11, 1, 17, 0xf2ed90)                                                     = 0
getpeercon(11, 0xbe8e571c, 17, 1)                                                   = 0
context_new(0xf2cc00, 4, 0xc75fe700, 3)                                             = 0xf2cc38
freecon(0xf2cc00, 0xf2cc1b, 0, 3)                                                   = 0xf29010
__snprintf_chk(0xbe8e578c, 20, 1, 21)                                               = 14
realpath(0xbe8e578c, 0, 0xbe8e57a8, 0)                                              = 0xf2fa28
calloc(1, 16)                                                                       = 0xf2f908
verto_add_io(0xf2dc40, 16, 0x445de0, 11)                                            = 0xf35448
verto_set_private(0xf35448, 0xf2f908, 0, 0x465c08)                                  = 0xf35448
verto_get_fd(0xf35448, 0xf35448, 0x465c08, 0xc75fe700)                              = 11
verto_get_private(0xf35448, 0xf35448, 0x465c08, 1)                                  = 0xf2f908
read(11, "\200", 4)                                                                 = 4
malloc(116)                                                                         = 0xf36148
__errno_location()                                                                  = 0xb6efc9ac
read(11, "\022l\202\221", 116)                                                      = 116
calloc(1, 20)                                                                       = 0xf2f920
pthread_mutex_lock(0xf2a558, 0xf2f920, 20, 0)                                       = 0
pthread_mutex_unlock(0xf2a558, 0, 0, 0xf2d398)                                      = 0
pthread_mutex_lock(0xf2ecbc, 1, 0, 0)                                               = 0
pthread_cond_signal(0xf2ecd8, 0, 2351, 2)                                           = 0
pthread_mutex_unlock(0xf2ecbc, 129, 1, 0)                                           = 0
free(0xf2f908)                                                                      = <void>
gssproxy[2351]: (OID: { 1 2 840 113554 1 2 2 }) Unspecified GSS failure.  Minor code may provide more information, No credentials cache found
verto_get_private(0xf2f3c0, 0xf2f3c0, 0x446824, 0xc75fe700)                         = 0xf2a558
read(8, "", 1)                                                                      = 1
pthread_mutex_lock(0xf2a558, 2, 0, 0xf2f920)                                        = 0
pthread_mutex_unlock(0xf2a558, 0, 2351, 0)                                          = 0
calloc(1, 16)                                                                       = 0xf35480
verto_add_io(0xf2dc40, 32, 0x445bb8, 11)                                            = 0xf35448
verto_set_private(0xf35448, 0xf35480, 0, 0x465c08)                                  = 0xf35448
free(0xf2f920)                                                                      = <void>
verto_get_fd(0xf35448, 0xf35448, 0x445bb8, 0xc75fe700)                              = 11
verto_get_private(0xf35448, 0xf35448, 0x445bb8, 1)                                  = 0xf35480
__errno_location()                                                                  = 0xb6efc9ac
writev(11, 0xbe8e579c, 2, 0xbe8e5798)                                               = 180
calloc(1, 16)                                                                       = 0xf35498
verto_add_io(0xf2dc40, 16, 0x445de0, 11)                                            = 0xf2d400
verto_set_private(0xf2d400, 0xf35498, 0, 0x465c08)                                  = 0xf2d400
free(0xb3908bf0)                                                                    = <void>
free(0xf35480)                                                                      = <void>

--- Additional comment from Robbie Harwood on 2018-07-03 16:38:09 EDT ---

Thanks.  I don't immediately see anything amiss in the keytab or the ltrace.  Could I trouble you for ptrace output as well?

--- Additional comment from James Ettle on 2018-07-03 16:43:11 EDT ---

Do you mean strace? For anything else I'll need some guidance...

--- Additional comment from James Ettle on 2018-07-03 17:18:29 EDT ---

Executing speculatively... I attached strace to the gssproxy process. When I did my first ls /nas/scratch, I got 'stale file handle' and from strace:


epoll_wait(5,   <-- where it paused before ls

[{EPOLLIN, {u32=11, u64=77309411339}}], 64, 59743) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=425, tv_nsec=858583211}) = 0
gettimeofday({tv_sec=1530652550, tv_usec=465087}, NULL) = 0
read(11, "\200\0\0t", 4)                = 4
read(11, "3\253\366\275\0\0\0\0\0\0\0\2\0\6\32\360\0\0\0\1\0\0\0\6\0\0\0\0\0\0\0\0"..., 116) = 116
futex(0x10ebd00, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x10ebcbc, FUTEX_WAKE_PRIVATE, 1) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=425, tv_nsec=876667276}) = 0
epoll_wait(5, [{EPOLLIN, {u32=4, u64=4294967300}}], 64, 59743) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=425, tv_nsec=887067009}) = 0
read(4, "\0", 1)                        = 1
epoll_ctl(5, EPOLL_CTL_ADD, 11, {EPOLLOUT, {u32=11, u64=81604378635}}) = -1 EEXIST (File exists)
epoll_ctl(5, EPOLL_CTL_MOD, 11, {EPOLLOUT, {u32=11, u64=81604378635}}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=425, tv_nsec=889832722}) = 0
epoll_wait(5, [{EPOLLOUT, {u32=11, u64=81604378635}}], 64, 59743) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=425, tv_nsec=890922924}) = 0
writev(11, [{iov_base="\200\0\0\260", iov_len=4}, {iov_base="3\253\366\275\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\0\0"..., iov_len=176}], 2) = 180
epoll_ctl(5, EPOLL_CTL_MOD, 11, {EPOLLIN, {u32=11, u64=85899345931}}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=425, tv_nsec=892383541}) = 0
epoll_wait(5, [{EPOLLIN, {u32=11, u64=85899345931}}], 64, 59743) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=426, tv_nsec=412722422}) = 0
gettimeofday({tv_sec=1530652551, tv_usec=19199}, NULL) = 0
read(11, "\200\0\0t", 4)                = 4
read(11, "3\253\366\276\0\0\0\0\0\0\0\2\0\6\32\360\0\0\0\1\0\0\0\6\0\0\0\0\0\0\0\0"..., 116) = 116
futex(0x10ebd04, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x10ebcbc, FUTEX_WAKE_PRIVATE, 1) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=426, tv_nsec=416787347}) = 0
epoll_wait(5, [{EPOLLIN, {u32=4, u64=4294967300}}], 64, 59743) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=426, tv_nsec=420828261}) = 0
read(4, "\0", 1)                        = 1
epoll_ctl(5, EPOLL_CTL_ADD, 11, {EPOLLOUT, {u32=11, u64=90194313227}}) = -1 EEXIST (File exists)
epoll_ctl(5, EPOLL_CTL_MOD, 11, {EPOLLOUT, {u32=11, u64=90194313227}}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=426, tv_nsec=421796954}) = 0
epoll_wait(5, [{EPOLLOUT, {u32=11, u64=90194313227}}], 64, 59743) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=426, tv_nsec=422194660}) = 0
writev(11, [{iov_base="\200\0\0\260", iov_len=4}, {iov_base="3\253\366\276\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\0\0"..., iov_len=176}], 2) = 180
epoll_ctl(5, EPOLL_CTL_MOD, 11, {EPOLLIN, {u32=11, u64=94489280523}}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=426, tv_nsec=423228456}) = 0
epoll_wait(5, 


I tried the ls again and got 'permission denied' this time. From strace:


[{EPOLLIN, {u32=11, u64=94489280523}}], 64, 59743) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=457, tv_nsec=317881963}) = 0
gettimeofday({tv_sec=1530652581, tv_usec=924402}, NULL) = 0
read(11, "\200\0\0t", 4)                = 4
read(11, "3\253\366\277\0\0\0\0\0\0\0\2\0\6\32\360\0\0\0\1\0\0\0\6\0\0\0\0\0\0\0\0"..., 116) = 116
futex(0x10ebd00, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x10ebcbc, FUTEX_WAKE_PRIVATE, 1) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=457, tv_nsec=323206360}) = 0
epoll_wait(5, [{EPOLLIN, {u32=4, u64=4294967300}}], 64, 59743) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=457, tv_nsec=325111037}) = 0
read(4, "\0", 1)                        = 1
epoll_ctl(5, EPOLL_CTL_ADD, 11, {EPOLLOUT, {u32=11, u64=98784247819}}) = -1 EEXIST (File exists)
epoll_ctl(5, EPOLL_CTL_MOD, 11, {EPOLLOUT, {u32=11, u64=98784247819}}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=457, tv_nsec=328941432}) = 0
epoll_wait(5, [{EPOLLOUT, {u32=11, u64=98784247819}}], 64, 59743) = 1
clock_gettime(CLOCK_MONOTONIC, {tv_sec=457, tv_nsec=329499919}) = 0
writev(11, [{iov_base="\200\0\0\260", iov_len=4}, {iov_base="3\253\366\277\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\r\0\0"..., iov_len=176}], 2) = 180
epoll_ctl(5, EPOLL_CTL_MOD, 11, {EPOLLIN, {u32=11, u64=103079215115}}) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=457, tv_nsec=331903708}) = 0
epoll_wait(5, [], 64, 59743)            = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=517, tv_nsec=90213467}) = 0
gettimeofday({tv_sec=1530652641, tv_usec=697179}, NULL) = 0
clock_gettime(CLOCK_MONOTONIC, {tv_sec=517, tv_nsec=92249229}) = 0
epoll_wait(5,

--- Additional comment from Robbie Harwood on 2018-07-05 13:50:31 EDT ---

Yes, thanks, I meant strace.

What I don't understand from the trace is why I don't see any GSSAPI calls at all.  Nothing seems to *fail*, either.

If you're running `ls` as root: can you show `klist -ekt` of the keytab?  If not: can you show `klist -e` before (and after) running the command?

Can you also show your gssproxy configuration?

--- Additional comment from James Ettle on 2018-07-05 17:51:58 EDT ---

Before ls, klist -e gives:


Ticket cache: KCM:1402400001:43805
Default principal: james

Valid starting     Expires            Service principal
05/07/18 22:45:53  06/07/18 22:45:53  krbtgt/CB.ETTLE
	Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96


and after (with 'stale file handle')


Ticket cache: KCM:1402400001:43805
Default principal: james

Valid starting     Expires            Service principal
05/07/18 22:45:53  06/07/18 22:45:53  krbtgt/CB.ETTLE
	Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96


which is interesting as I'd expect an nfs/skipper.cb.ettle service ticket.

The gssproxy config is the one created by ipa-client-install (same as on my x86-64 boxes), plus sssd-kcm. (Same results on kernel keyring.) I've got:

gssproxy.conf:
[gssproxy]

99-nfs-client.conf:
[service/nfs-client]
  mechs = krb5
  cred_store = keytab:/etc/krb5.keytab
  cred_store = ccache:FILE:/var/lib/gssproxy/clients/krb5cc_%U
  cred_store = client_keytab:/var/lib/gssproxy/clients/%U.keytab
  cred_usage = initiate
  allow_any_uid = yes
  trusted = yes
  euid = 0

24-nfs-server.conf:
[service/nfs-server]
  mechs = krb5
  socket = /run/gssproxy.sock
  cred_store = keytab:/etc/krb5.keytab
  trusted = yes
  kernel_nfsd = yes
  euid = 0

(This machine is not configured as a server.)

--- Additional comment from Robbie Harwood on 2018-07-13 11:19:31 EDT ---

Okay, another idea: what libverto backend are you using?  (Check `rpm -qa | grep libverto`).  Could you try a different one - glib, libev, and libevent are the options in Fedora these days.

--- Additional comment from James Ettle on 2018-07-13 14:59:01 EDT ---

Initially it was libverto-libev.

I then removed the -libev package and installed the -libevent one -- no difference after reboot.

I installed the -glib backed. Wouldn't let me remove the -libevent package so I just moved the .so and symlink out the way -- still stale file handles.

I assume I've switched the backends correctly -- is there a config option I've missed?

[I'm open to this not being a gssproxy bug, but it just seemed the likeliest place to start debugging this...]

--- Additional comment from Robbie Harwood on 2018-07-13 16:35:03 EDT ---

Thanks for checking.  Config file switching is probably something I should look into adding to libverto, but there isn't a better way right now than what you did.

I agree that gssproxy makes sense to start with (or krb5, or libverto, but those are also me) - just trying to eliminate causes as much as I can.  Next guess is to stare at the epoll calls, I think.

--- Additional comment from James Ettle on 2018-07-14 04:32:20 EDT ---

Hmm... curiously I can access the 'public' NFS4 shares when I log in as root. There are some weird credentials in the cache.


[root@buck ~]# klist 
klist: Credentials cache keyring 'persistent:0:0' not found 
[root@buck ~]# ls /nas/public 
<content correctly listed>
[root@buck ~]# klist 
Ticket cache: KEYRING:persistent:0:krb_ccache_2Jumg4T 
Default principal: host/buck.cb.ettle 
 
Valid starting     Expires            Service principal 
01/01/70 01:00:00  01/01/70 01:00:00  Encrypted/Credentials/v1@X-GSSPROXY: 
[root@buck ~]# exit 


Note the times - I really can't shake the suspicion that this is somehow related to the Pi having a zero clock at power-on until it gets a network time.

(Obviously I still get permission denied on home directories.)


THEN AGAIN (heh...) below I logged in remotely as me just after reboot. With the usual tickets I can see the public share as root, but not the actual user associated with the pricipal.


[james@buck ~]$ su
Password:
[root@buck james]# klist
Ticket cache: KEYRING:persistent:1402400001:krb_ccache_RdKlQey
Default principal: james

Valid starting     Expires            Service principal
14/07/18 09:19:49  15/07/18 09:19:49  krbtgt/CB.ETTLE
[root@buck james]# ls /nas/public
<content correctly listed>
[root@buck james]# klist
Ticket cache: KEYRING:persistent:1402400001:krb_ccache_RdKlQey
Default principal: james

Valid starting     Expires            Service principal
14/07/18 09:19:49  15/07/18 09:19:49  krbtgt/CB.ETTLE
[root@buck james]# exit
[james@buck ~]$ klist
Ticket cache: KEYRING:persistent:1402400001:krb_ccache_RdKlQey
Default principal: james

Valid starting     Expires            Service principal
14/07/18 09:19:49  15/07/18 09:19:49  krbtgt/CB.ETTLE
[james@buck ~]$ ls /nas/public
ls: cannot access '/nas/public': Permission denied

--- Additional comment from Simo Sorce on 2018-07-16 09:27:48 EDT ---

(In reply to James Ettle from comment #12)
> Hmm... curiously I can access the 'public' NFS4 shares when I log in as
> root. There are some weird credentials in the cache.

This is because you are allowed to use the machine creds as root, I would guess.

> Valid starting     Expires            Service principal 
> 01/01/70 01:00:00  01/01/70 01:00:00  Encrypted/Credentials/v1@X-GSSPROXY: 
> [root@buck ~]# exit 
> 
> 
> Note the times - I really can't shake the suspicion that this is somehow
> related to the Pi having a zero clock at power-on until it gets a network
> time.

It is not, that's a special ticket we used in gssproxy and it's times aer always set to the epoch 0 time.

> (Obviously I still get permission denied on home directories.)
> 
> 
> THEN AGAIN (heh...) below I logged in remotely as me just after reboot. With
> the usual tickets I can see the public share as root, but not the actual
> user associated with the pricipal.
> 
> 
> [james@buck ~]$ su
> Password:
> [root@buck james]# klist
> Ticket cache: KEYRING:persistent:1402400001:krb_ccache_RdKlQey
> Default principal: james
> 
> Valid starting     Expires            Service principal
> 14/07/18 09:19:49  15/07/18 09:19:49  krbtgt/CB.ETTLE
> [root@buck james]# ls /nas/public
> <content correctly listed>
> [root@buck james]# klist
> Ticket cache: KEYRING:persistent:1402400001:krb_ccache_RdKlQey
> Default principal: james

Use klist -A in this case, you'll see more ...

> Valid starting     Expires            Service principal
> 14/07/18 09:19:49  15/07/18 09:19:49  krbtgt/CB.ETTLE
> [root@buck james]# exit
> [james@buck ~]$ klist
> Ticket cache: KEYRING:persistent:1402400001:krb_ccache_RdKlQey
> Default principal: james
> 
> Valid starting     Expires            Service principal
> 14/07/18 09:19:49  15/07/18 09:19:49  krbtgt/CB.ETTLE
> [james@buck ~]$ ls /nas/public
> ls: cannot access '/nas/public': Permission denied

Something is failing to get you a ticket, rpc.gssd should fallback and try to use your ticket if gssproxy fails, could you try to set GSSPROXY_BEHAVIOR=LOCAL_ONLY as an env var for rpc.gssd and see if that still fails ?
Also try to make rpc.gssd stop using GSS-Proxy completely by removing the GSS_USE_PROXY env var.
These experiments will help us narrow down where the bug should be.

--- Additional comment from James Ettle on 2018-07-16 17:32:14 EDT ---

I tried klist -A, couldn't see any additional info.

Also had a go at disabling gssproxy, then the rpc.gssd env var. Same as before in either case.

One difference I did notice is that *without* GSSPROXY_BEHAVIOR=LOCAL_ONLY I see 'Unspecified GSS failure'/'No credentials cache found' errors logged by gssproxy whenever I try to access a share.

I just tried using the FILE: credential cache instead. rpc.gssd -f -vvvv complains that the cache file 'is expired or corrupt' but the klist results look OK. The cache also works fine for ssh.

--- Additional comment from James Ettle on 2018-07-16 18:14:37 EDT ---

Found a solution. See here:

https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1779962

Seems rpc.gssd is using the wrong syscall and mangling large uids in the process. I applied the changes as described in the link above to nfs-utils-2.3.2-1.rc2.fc28 and it works.

Why this only shows up for me on ARM must be some quirk of the ABI...

--- Additional comment from Simo Sorce on 2018-07-17 05:48:08 EDT ---

Wow, great catch James.

Steve, this is a bad bad bug, we want to fix this ASAP on all platforms.

--- Additional comment from James Ettle on 2018-07-17 06:05:11 EDT ---

(In reply to Simo Sorce from comment #16)
> Wow, great catch James.
> 
> Steve, this is a bad bad bug, we want to fix this ASAP on all platforms.

With all due credit to 'sree314' on Launchpad. Hopefully this can get upstream and backported to the 1.3 branch (if still maintained) since I've found this is a cross-distro issue.

Just demonstrates the value of cross-platform testing...

--- Additional comment from Steve Dickson on 2018-07-17 15:29 EDT ---

Here is the patch I'm about to proposed to upstream 

I've also added to the original bug report. 
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1779962

James, if possible, could you please test this 
patch before I post it.

--- Additional comment from James Ettle on 2018-07-17 17:48:42 EDT ---

Fix confirmed with the patch in attachment 1459549 [details].

--- Additional comment from Steve Dickson on 2018-07-18 10:19:30 EDT ---

(In reply to James Ettle from comment #19)
> Fix confirmed with the patch in attachment 1459549 [details].

Thank you!

Comment 2 Steve Dickson 2018-07-18 15:41:11 UTC
The upstream commit
commit 2a6b8307fa4243a7921270aedf8ce6506e31569a (HEAD -> master, origin/master, origin/HEAD)
Author: Steve Dickson <steved>
Date:   Tue Jul 17 15:09:37 2018 -0400

    rpc.gssd: truncates 32-bit UIDs/GIDs to 16 bits architectures.
    
    utils/gssd_proc.c uses SYS_setresuid and SYS_setresgid in
    change_identity when it should use SYS_setresuid32 and
    SYS_setresgid32 instead. This causes it to truncate
    UIDs/GIDs > 65536.
    
    Fixes: https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1779962
    Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1595927
    
    Tested-by: James Ettle <theholyettlz>
    Tested-by: Sree <Sree>
    Signed-off-by: Steve Dickson <steved>

Comment 6 Yongcheng Yang 2018-09-29 04:15:00 UTC
Have checked the patch of comment #2 has already been merged. And NO regression found in recent tests.

Moving to VERIFIED now.

Comment 8 errata-xmlrpc 2018-10-30 11:48:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3311


Note You need to log in before you can comment on or make changes to this bug.