Bug 1488629

Summary: gssproxy interposer code broken in RHEL 7.4 / gssproxy-0.7.0?
Product: Red Hat Enterprise Linux 7 Reporter: James Ralston <ralston>
Component: gssproxyAssignee: Robbie Harwood <rharwood>
Status: CLOSED ERRATA QA Contact: ipa-qe <ipa-qe>
Severity: unspecified Docs Contact: Aneta Šteflová Petrová <apetrova>
Priority: unspecified    
Version: 7.4CC: adam.winberg, amore, apetrova, atolani, dwysocha, fsorenso, fs-qe, hkhot, jlyle, jpazdziora, jvilicic, kfiresmith, ksiddiqu, mkosek, modolo, nsoman, pasik, ralston, R.Eggermont, rharwood, riehecky, rkothiya, rraghuwa, yoyang
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: https://pagure.io/gssproxy/pull-request/214
Whiteboard:
Fixed In Version: gssproxy-0.7.0-13.el7 Doc Type: Bug Fix
Doc Text:
_gssproxy_ fixed to properly update ccaches Previously, the _gssproxy_ package did not correctly handle the key version number (kvno) incrementation in Kerberos credential caches (ccaches). As a consequence, stale ccaches were not properly overwritten. This update fixes these problems in _gssproxy_ ccache caching. As a result, ccaches are now properly updated, and the caching prevents excessive requests for updates.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-10 11:09:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1420851    
Attachments:
Description Flags
ssl_error_log
none
pulp_gssapi.conf
none
Handle outdated encrypted ccaches none

Description James Ralston 2017-09-05 22:05:49 UTC
Description of problem:

On our RHEL7 systems, we configured multiple services (httpd, cyrus-imapd) to use the gssproxy GSSAPI interposer plugin, via the instructions here:

https://pagure.io/gssproxy/blob/master/f/proxy/docs/Apache.md

This has worked well for us.

However, when RHEL 7.4 was released, and gssproxy moved from version 0.4.1 to 0.7.0, the GSSAPI interposer plugin ceased to work for the services for which we enabled it (httpd, cyrus-imapd).

Version-Release number of selected component (if applicable):

gssproxy-0.4.1-13.el7

How reproducible:

Follow the instructions here to configure an Apache httpd server that uses the GSSAPI interposer plugin:

https://pagure.io/gssproxy/blob/master/f/proxy/docs/Apache.md

Actual results:

On RHEL7, httpd will be unable to use the interposer plugin, and gssproxy will log debugging messages similar to these:

2017-09-01T14:11:26.793965-04:00 webserver gssproxy: [2017/09/01 18:11:26]: Client connected (fd = 14)[2017/09/01 18:11:26]:  (pid = 2241) (uid = 48) (gid = 48)[2017/09/01 18:11:26]:  (context = system_u:system_r:httpd_t:s0)[2017/09/01 18:11:26]:
2017-09-01T14:11:26.794028-04:00 webserver gssproxy: [2017/09/01 18:11:26]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null)
2017-09-01T14:11:26.794055-04:00 webserver gssproxy: GSSX_ARG_ACQUIRE_CRED( call_ctx: { "" [  ] } input_cred_handle: { "host/webserver.example.org" [ { "host/webserver.example.org" { 1 2 840 113554 1 2 2 } ACCEPT 0 4294967295 } ] [ ..3.....B.A........ ] 0 } add_cred: 0 desired_name: <Null> time_req: 4294967295 desired_mechs: { { 1 2 840 113554 1 2 2 } } cred_usage: ACCEPT initiator_time_req: 0 acceptor_time_req: 0 )
2017-09-01T14:11:26.794090-04:00 webserver gssproxy: GSSX_RES_ACQUIRE_CRED( status: { 851968 <None> 2 "Unspecified GSS failure.  Minor code may provide more information" "" [  ] } output_cred_handle: <Null> )

Expected results:

The interposer plugin should work correctly, as it did in RHEL 7.3.

Additional info:

Downgrading to gssproxy-0.4.1-13.el7, restarting gssproxy, and restarting any applications that use the interposer plugin will resolve the issue.

Comment 2 Robbie Harwood 2017-09-06 18:40:26 UTC
Please provide configs and apache logs.

Comment 3 Yongcheng Yang 2017-09-11 08:18:18 UTC
(recovery flag needinfo)

Please see comment #2. Sorry for confusion.

Comment 4 Kodiak Firesmith 2017-09-12 11:47:42 UTC
Created attachment 1324836 [details]
ssl_error_log

Comment 5 Kodiak Firesmith 2017-09-12 11:48:13 UTC
Created attachment 1324837 [details]
pulp_gssapi.conf

Comment 6 Robbie Harwood 2017-09-12 21:40:18 UTC
Thank you.

As a workaround, they can create the file /etc/gssproxy/80-httpd.conf that looks like this:

[service/HTTP]
  mechs = krb5
  cred_store = keytab:/etc/gssproxy/http.keytab
  cred_store = ccache:/var/lib/gssproxy/clients/krb5cc_%U
  euid = apache

Comment 8 James Ralston 2017-09-13 16:38:10 UTC
I don't understand what this workaround is supposed to accomplish. What is the purpose of pointing gssproxy at a non-existent keytab file (/etc/gssproxy/http.keytab)? That will always fail, for ANY service:

Sep 13 12:14:47 server gssproxy: gssproxy[14155]: (OID: { 1 2 840 113554 1 2 2 }) Unspecified GSS failure.  Minor code may provide more information, Key table file '/etc/gssproxy/http.keytab' not found

Due to bug 1458913, we have commented out the contents of all /etc/gssproxy/[0-9][0-9]-*.conf files. We use a single monolithic /etc/gssproxy/gssproxy.conf file, which contains exactly:

[gssproxy]

[service/HTTP]
  mechs = krb5
  cred_store = keytab:/etc/krb5.keytab
  cred_store = ccache:/var/lib/gssproxy/clients/krb5cc_%U
  euid = 48
  krb5_principal = HTTP

[service/ldap]
  mechs = krb5
  cred_store = keytab:/etc/krb5.keytab
  cred_store = ccache:/var/lib/gssproxy/clients/krb5cc_%U
  euid = 55
  krb5_principal = ldap

[service/imap]
  mechs = krb5
  cred_store = keytab:/etc/krb5.keytab
  cred_store = ccache:/var/lib/gssproxy/clients/krb5cc_%U
  euid = 76
  krb5_principal = imap

[service/nfs-server]
  mechs = krb5
  socket = /run/gssproxy.sock
  cred_store = keytab:/etc/krb5.keytab
  trusted = yes
  kernel_nfsd = yes
  euid = 0

[service/nfs-client]
  mechs = krb5
  cred_store = keytab:/etc/krb5.keytab
  cred_store = ccache:FILE:/var/lib/gssproxy/clients/krb5cc_%U
  cred_store = client_keytab:/var/lib/gssproxy/clients/%U.keytab
  cred_usage = initiate
  allow_any_uid = yes
  trusted = yes
  euid = 0

This file works perfectly for gssproxy-0.4.1-13.el7, for all services. After upgrading to gssproxy-0.7.0-4.el7, the nfs-server and nfs-client services continue to work, but the HTTP and imap services do not.

For example, if I test cyrus-imapd with 0.7.0, attempting to use GSSAPI authentication fails with:

S: A01 NO generic failure

…and this is the only thing that is logged:

Sep 13 12:26:55 server imap[21382]: badlogin: client.example.org [1.2.3.4] GSSAPI [SASL(-13): authentication failure: GSSAPI Failure: gss_accept_sec_context]

There are two possibilities (not necessarily mutually exclusive):

1. Our config file (above) is compatible with gssproxy 0.4.1 but not 0.7.0.

2. The interposer code in gssproxy 0.7.0 is broken.

If #1 is the issue, I don't see how.  While 0.7.0 adds more options to gssproxy configuration files, I don't see how what we have is incorrect. Am I missing something?

Comment 9 Robbie Harwood 2017-09-13 17:40:38 UTC
(In reply to James Ralston from comment #8)
> I don't understand what this workaround is supposed to accomplish. What is
> the purpose of pointing gssproxy at a non-existent keytab file
> (/etc/gssproxy/http.keytab)? That will always fail, for ANY service:

I mean, adapt as appropriate to your keytab location.  That should go without saying, but I did say "that looks like" there.  You'll note that this is exactly what is specified in the instructions you linked, as well.

> Due to bug 1458913, we have commented out the contents of all
> /etc/gssproxy/[0-9][0-9]-*.conf files. We use a single monolithic
> /etc/gssproxy/gssproxy.conf file, which contains exactly:

I don't know how that follows from that bug (why not just remove the duplicate sections...), but sure, that's fine.

In the future, when filing bugs against a component, it should go without saying that you provide any nonstandard configurations, especially when asked for them.

klist -k the keytab, please.

Comment 10 Robbie Harwood 2017-09-13 17:40:54 UTC
er, klist -ek, sorry.

Comment 11 James Ralston 2017-09-13 17:44:54 UTC
Here's the (sanitized) /etc/krb5.keytab file:

   2 host/server.example.org (des-cbc-crc) 
   2 host/server.example.org (des-cbc-md5) 
   2 host/server.example.org (aes128-cts-hmac-sha1-96) 
   2 host/server.example.org (aes256-cts-hmac-sha1-96) 
   2 host/server.example.org (arcfour-hmac) 
   2 host/SERVER (des-cbc-crc) 
   2 host/SERVER (des-cbc-md5) 
   2 host/SERVER (aes128-cts-hmac-sha1-96) 
   2 host/SERVER (aes256-cts-hmac-sha1-96) 
   2 host/SERVER (arcfour-hmac) 
   2 SERVER$@EXAMPLE.ORG (des-cbc-crc) 
   2 SERVER$@EXAMPLE.ORG (des-cbc-md5) 
   2 SERVER$@EXAMPLE.ORG (aes128-cts-hmac-sha1-96) 
   2 SERVER$@EXAMPLE.ORG (aes256-cts-hmac-sha1-96) 
   2 SERVER$@EXAMPLE.ORG (arcfour-hmac) 
   2 nfs/server.example.org (des-cbc-crc) 
   2 nfs/server.example.org (des-cbc-md5) 
   2 nfs/server.example.org (aes128-cts-hmac-sha1-96) 
   2 nfs/server.example.org (aes256-cts-hmac-sha1-96) 
   2 nfs/server.example.org (arcfour-hmac) 
   2 nfs/SERVER (des-cbc-crc) 
   2 nfs/SERVER (des-cbc-md5) 
   2 nfs/SERVER (aes128-cts-hmac-sha1-96) 
   2 nfs/SERVER (aes256-cts-hmac-sha1-96) 
   2 nfs/SERVER (arcfour-hmac) 
   2 imap/server.example.org (des-cbc-crc) 
   2 imap/server.example.org (des-cbc-md5) 
   2 imap/server.example.org (aes128-cts-hmac-sha1-96) 
   2 imap/server.example.org (aes256-cts-hmac-sha1-96) 
   2 imap/server.example.org (arcfour-hmac) 
   2 imap/SERVER (des-cbc-crc) 
   2 imap/SERVER (des-cbc-md5) 
   2 imap/SERVER (aes128-cts-hmac-sha1-96) 
   2 imap/SERVER (aes256-cts-hmac-sha1-96) 
   2 imap/SERVER (arcfour-hmac) 
   2 sieve/server.example.org (des-cbc-crc) 
   2 sieve/server.example.org (des-cbc-md5) 
   2 sieve/server.example.org (aes128-cts-hmac-sha1-96) 
   2 sieve/server.example.org (aes256-cts-hmac-sha1-96) 
   2 sieve/server.example.org (arcfour-hmac) 
   2 sieve/SERVER (des-cbc-crc) 
   2 sieve/SERVER (des-cbc-md5) 
   2 sieve/SERVER (aes128-cts-hmac-sha1-96) 
   2 sieve/SERVER (aes256-cts-hmac-sha1-96) 
   2 sieve/SERVER (arcfour-hmac)

Comment 12 James Ralston 2017-09-13 17:50:49 UTC
The output in comment 11 is from a server running cyrus-imapd. Here's one from a server running httpd:

   2 host/server.example.org (des-cbc-crc) 
   2 host/server.example.org (des-cbc-md5) 
   2 host/server.example.org (aes128-cts-hmac-sha1-96) 
   2 host/server.example.org (aes256-cts-hmac-sha1-96) 
   2 host/server.example.org (arcfour-hmac) 
   2 host/SERVER (des-cbc-crc) 
   2 host/SERVER (des-cbc-md5) 
   2 host/SERVER (aes128-cts-hmac-sha1-96) 
   2 host/SERVER (aes256-cts-hmac-sha1-96) 
   2 host/SERVER (arcfour-hmac) 
   2 SERVER$@EXAMPLE.ORG (des-cbc-crc) 
   2 SERVER$@EXAMPLE.ORG (des-cbc-md5) 
   2 SERVER$@EXAMPLE.ORG (aes128-cts-hmac-sha1-96) 
   2 SERVER$@EXAMPLE.ORG (aes256-cts-hmac-sha1-96) 
   2 SERVER$@EXAMPLE.ORG (arcfour-hmac) 
   2 nfs/server.example.org (des-cbc-crc) 
   2 nfs/server.example.org (des-cbc-md5) 
   2 nfs/server.example.org (aes128-cts-hmac-sha1-96) 
   2 nfs/server.example.org (aes256-cts-hmac-sha1-96) 
   2 nfs/server.example.org (arcfour-hmac) 
   2 nfs/SERVER (des-cbc-crc) 
   2 nfs/SERVER (des-cbc-md5) 
   2 nfs/SERVER (aes128-cts-hmac-sha1-96) 
   2 nfs/SERVER (aes256-cts-hmac-sha1-96) 
   2 nfs/SERVER (arcfour-hmac) 
   2 HTTP/server.example.org (des-cbc-crc) 
   2 HTTP/server.example.org (des-cbc-md5) 
   2 HTTP/server.example.org (aes128-cts-hmac-sha1-96) 
   2 HTTP/server.example.org (aes256-cts-hmac-sha1-96) 
   2 HTTP/server.example.org (arcfour-hmac) 
   2 HTTP/SERVER (des-cbc-crc) 
   2 HTTP/SERVER (des-cbc-md5) 
   2 HTTP/SERVER (aes128-cts-hmac-sha1-96) 
   2 HTTP/SERVER (aes256-cts-hmac-sha1-96) 
   2 HTTP/SERVER (arcfour-hmac)

Comment 13 Frank Sorenson 2017-09-15 16:06:08 UTC
Attaching a customer case which appears to be hitting this as well, after upgrading nfs clients from 7.3 to 7.4:

[root@system700 apache]# pwd
/data/apache

[root@system700 apache]# cd docroot
-bash: cd: docroot: Permission denied


nfs server is RHEL 7.4 NFSv4+KRB5


1. Clients running nfs-utils-1.3.0-0.33.el7_3.x86_64 + gssproxy-0.4.1-13.el7.x86_64 have no issues connecting to the NFS server.
- local users with credentials are able to access the NFS share
- local users without credentials, but gssproxy tokens, are able to access the NFS share
- local services (eg: httpd) are able to access the NFS share

Sep 15 08:18:06 system810 systemd: Started RPC security service for NFS client and server.
Sep 15 08:18:24 system810 rpc.gssd[13498]: #012handle_gssd_upcall: 'mech=krb5 uid=0 service=* enctypes=18,17,16,23,3,1,2 ' (nfs/clnt18)
Sep 15 08:18:24 system810 rpc.gssd[13498]: krb5_use_machine_creds: uid 0 tgtname (null)
Sep 15 08:18:24 system810 rpc.gssd[13498]: Full hostname for 'system530-pre.domain.corp' is 'system530-pre.domain.corp'
Sep 15 08:18:24 system810 rpc.gssd[13498]: Full hostname for 'system810.domain.corp' is 'system810.domain.corp'
Sep 15 08:18:24 system810 rpc.gssd[13498]: No key table entry found for system810$@DOMAIN.CORP while getting keytab entry for 'system810$@DOMAIN.CORP'
Sep 15 08:18:24 system810 rpc.gssd[13498]: Success getting keytab entry for 'SYSTEM810$@DOMAIN.CORP'
Sep 15 08:18:24 system810 rpc.gssd[13498]: gssd_get_single_krb5_cred: principal 'SYSTEM810$@DOMAIN.CORP' ccache:'FILE:/tmp/krb5ccmachine_DOMAIN.CORP'
Sep 15 08:18:24 system810 rpc.gssd[13498]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_DOMAIN.CORP' are good until 1505517503
Sep 15 08:18:24 system810 gssproxy: Client connected (fd = 12) (pid = 13498) (uid = 0) (gid = 0) (context = system_u:system_r:gssd_t:s0)
Sep 15 08:18:24 system810 gssproxy: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "nfs-client", euid: 0, socket: (null)
Sep 15 08:18:24 system810 rpc.gssd[13498]: creating tcp client for server system530-pre.domain.corp
Sep 15 08:18:24 system810 rpc.gssd[13498]: creating context with server nfs.corp
Sep 15 08:18:24 system810 gssproxy: gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "nfs-client", euid: 0, socket: (null)
Sep 15 08:18:24 system810 gssproxy: gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "nfs-client", euid: 0, socket: (null)
Sep 15 08:18:24 system810 rpc.gssd[13498]: doing downcall: lifetime_rec=35582 acceptor=nfs.corp
Sep 15 08:18:24 system810 rpc.gssd[13498]: #012handle_gssd_upcall: 'mech=krb5 uid=0 enctypes=18,17,16,23,3,1,2 ' (nfs/clnt18)
Sep 15 08:18:24 system810 rpc.gssd[13498]: krb5_use_machine_creds: uid 0 tgtname (null)
Sep 15 08:18:24 system810 rpc.gssd[13498]: Full hostname for 'system530-pre.domain.corp' is 'system530-pre.domain.corp'
Sep 15 08:18:24 system810 rpc.gssd[13498]: Full hostname for 'system810.domain.corp' is 'system810.domain.corp'
Sep 15 08:18:24 system810 rpc.gssd[13498]: No key table entry found for system810$@DOMAIN.CORP while getting keytab entry for 'system810$@DOMAIN.CORP'
Sep 15 08:18:24 system810 rpc.gssd[13498]: Success getting keytab entry for 'SYSTEM810$@DOMAIN.CORP'
Sep 15 08:18:24 system810 rpc.gssd[13498]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_DOMAIN.CORP' are good until 1505517503
Sep 15 08:18:24 system810 rpc.gssd[13498]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_DOMAIN.CORP' are good until 1505517503
Sep 15 08:18:24 system810 gssproxy: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "nfs-client", euid: 0, socket: (null)
Sep 15 08:18:24 system810 rpc.gssd[13498]: creating tcp client for server system530-pre.domain.corp
Sep 15 08:18:24 system810 rpc.gssd[13498]: creating context with server nfs.corp
Sep 15 08:18:24 system810 gssproxy: gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "nfs-client", euid: 0, socket: (null)
Sep 15 08:18:24 system810 gssproxy: gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "nfs-client", euid: 0, socket: (null)
Sep 15 08:18:24 system810 rpc.gssd[13498]: doing downcall: lifetime_rec=35582 acceptor=nfs.corp
Sep 15 08:18:25 system810 rpc.gssd[13498]: #012handle_gssd_upcall: 'mech=krb5 uid=48 enctypes=18,17,16,23,3,1,2 ' (nfs/clnt18)
Sep 15 08:18:25 system810 gssproxy: Client connected (fd = 12) (pid = 13498) (uid = 48) (gid = 48) (context = system_u:system_r:gssd_t:s0)
Sep 15 08:18:25 system810 gssproxy: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48, socket: (null)
Sep 15 08:18:25 system810 rpc.gssd[13498]: krb5_not_machine_creds: uid 48 tgtname (null)
Sep 15 08:18:25 system810 rpc.gssd[13498]: creating tcp client for server system530-pre.domain.corp
Sep 15 08:18:25 system810 rpc.gssd[13498]: creating context with server nfs.corp
Sep 15 08:18:25 system810 gssproxy: gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "HTTP", euid: 48, socket: (null)
Sep 15 08:18:25 system810 gssproxy: gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "HTTP", euid: 48, socket: (null)
Sep 15 08:18:25 system810 rpc.gssd[13498]: doing downcall: lifetime_rec=35583 acceptor=nfs.corp


2. Clients running nfs-utils-1.3.0-0.48.el7.x86_64 + gssproxy-0.7.0-4.el7.x86_64 have issues with services connecting to the NFS server.
- local users with credentials are able to access the NFS share
- local users without credentials, but gssproxy tokens, are unable to access the NFS share
- local services (eg: httpd) are unable to access the NFS share

Sep 15 08:13:33 system700 systemd: Started RPC security service for NFS client and server.
Sep 15 08:14:11 system700 rpc.gssd[23981]: #012handle_gssd_upcall: 'mech=krb5 uid=48 enctypes=18,17,16,23,3,1,2 ' (nfs/clnt70e)
Sep 15 08:14:11 system700 rpc.gssd[23981]: krb5_not_machine_creds: uid 48 tgtname (null)
Sep 15 08:14:11 system700 gssproxy: [2017/09/15 13:14:11]: Client connected (fd = 11)[2017/09/15 13:14:11]:  (pid = 23981) (uid = 48) (gid = 48)[2017/09/15 13:14:11]:  (context = system_u:system_r:gssd_t:s0)[2017/09/15 13:14:11]:
Sep 15 08:14:11 system700 gssproxy: [2017/09/15 13:14:11]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "nfs-client", euid: 48,socket: (null)
Sep 15 08:14:11 system700 gssproxy: GSSX_ARG_ACQUIRE_CRED( call_ctx: { "" [  ] } input_cred_handle: { { "ss2user" { 1 2 840 113554 1 2 2 1 } [ ... ] [ ...  ] [ ] } [ { { "ss2user" { 1 2 840 113554 1 2 2 1 } [ ... ] [ ... ] [ ] } { 1 2 840 113554 1 2 2 } INITIATE 35999 0 } ] [ ... ] 0 } add_cred: 0 desired_name: <Null> time_req: 4294967295 desired_mechs: { { 1 2 840 113554 1 2 2 } } cred_usage: INITIATE initiator_time_req: 0 acceptor_time_req: 0 )
Sep 15 08:14:11 system700 gssproxy: GSSX_RES_ACQUIRE_CRED( status: { 851968 <None> 2 "Unspecified GSS failure.  Minor code may provide more information" "" [  ] } output_cred_handle: <Null> )
Sep 15 08:14:11 system700 rpc.gssd[23981]: creating tcp client for server system530-qa.domain.corp
Sep 15 08:14:11 system700 rpc.gssd[23981]: creating context with server nfs.corp
Sep 15 08:14:11 system700 rpc.gssd[23981]: in authgss_create_default()
Sep 15 08:14:11 system700 rpc.gssd[23981]: in authgss_create()
Sep 15 08:14:11 system700 rpc.gssd[23981]: in authgss_refresh()
Sep 15 08:14:11 system700 rpc.gssd[23981]: authgss_refresh: gss_init_sec_context: Unspecified GSS failure.  Minor code may provide more information - Credential cache is empty
Sep 15 08:14:11 system700 rpc.gssd[23981]: in authgss_destroy()
Sep 15 08:14:11 system700 rpc.gssd[23981]: in authgss_destroy_context()
Sep 15 08:14:11 system700 rpc.gssd[23981]: WARNING: Failed to create krb5 context for user with uid 48 for server nfs.corp
Sep 15 08:14:11 system700 rpc.gssd[23981]: looking for client creds with uid 48 for server system530-qa.domain.corp in /tmp
Sep 15 08:14:11 system700 rpc.gssd[23981]: looking for client creds with uid 48 for server system530-qa.domain.corp in /run/user/%U
Sep 15 08:14:11 system700 rpc.gssd[23981]: Error doing scandir on directory '/run/user/48': No such file or directory
Sep 15 08:14:11 system700 rpc.gssd[23981]: doing error downcall
Sep 15 08:14:11 system700 rpc.gssd[23981]: ERROR: event_dispatch() returned!
Sep 15 08:14:11 system700 systemd: rpc-gssd.service: main process exited, code=exited, status=1/FAILURE
Sep 15 08:14:11 system700 systemd: Unit rpc-gssd.service entered failed state.
Sep 15 08:14:11 system700 systemd: rpc-gssd.service failed.

Packet captures can be provided as well, if they will be helpful.

Comment 14 Robbie Harwood 2017-09-15 21:26:35 UTC
Okay.  After some debugging, I think what may have happened is that some processes have stale keys around.

Normally this wouldn't be a problem, but with gssproxy, these keys are encrypted.  The result is that when you change the keytab in gssproxy, they all go invalid.

Rebooting the machine should fix it.  Clearing out the appropriate ccache should also fix it.  (For instance, apache/mod_auth_gssapi by default uses the kernel keyring; if you log in, you'll see something like:

-bash-4.2$ klist
Ticket cache: KEYRING:persistent:48:48
Default principal: HTTP/7-2

Valid starting       Expires              Service principal
12/31/1969 19:00:00  12/31/1969 19:00:00  Encrypted/Credentials/v1@X-GSSPROXY:

and a kdestroy should take care of that.)

To actually fix this, we'll probably need to modify the lifetime of this ccache, and probably also more gracefully handle the failure here.

Thanks for your patience.

Comment 16 James Ralston 2017-09-18 16:49:04 UTC
When we apply updates to a host, we always reboot afterward. So the 0.7.0 gssproxy was starting on a cleanly-booted system. And it still failed.

Furthermore, if I downgrade to 0.4.1, and restart the service in question but don't reboot, 0.4.1 works. I can upgrade/downgrade back and forth between 0.7.0 and 0.4.1, without rebooting, and I consistently see that 0.4.1 works, but 0.7.0 fails.

I'll build a local gssproxy with the patch from pull request 214, but I'm skeptical that it will resolve the problem.

Comment 17 James Ralston 2017-09-18 22:17:56 UTC
Created attachment 1327638 [details]
Handle outdated encrypted ccaches

I rebased the current state of PR#214 for 0.7.0-4.el7 (attached), and rolled a local gssproxy package with this rebased patch.

I can confirm that the patch DOES resolve the issue, for both the cyrus-imapd and httpd services.

I can't confirm that this also resolves the issue Frank's customer is seeing, because we didn't see those issues.

Comment 18 James Ralston 2017-10-02 19:11:13 UTC
Ah; I see that https://pagure.io/gssproxy/pull-request/214 was merged. I'll rebuild gssproxy 0.7.0 locally with that patch and kick the tires again.

Comment 19 James Ralston 2017-10-03 19:28:42 UTC
OK, I've rebuilt local gssproxy packages with the commit from https://pagure.io/gssproxy/pull-request/214 included as a patch.

But the results we're seeing are troubling. Specifically, what we're seeing with pull request 214 is that almost *every* call to gssproxy originating from httpd hits the GSS_C_NO_CREDENTIAL code path:

if (ret) {
    /* Allow for re-issuance of the keytab. */
    GPDEBUG("Stored ccache failed to decrypt; treating as empty\n");
    *out = GSS_C_NO_CREDENTIAL;
    goto done;
}

Here's the result from a clean start of gssproxy, using a very simple test web page that uses mod_auth_gssapi, and with default_ccache_name = KEYRING:persistent:%{uid} in /etc/krb5.conf:

[2017/10/03 18:26:12]: Debug Enabled (level: 0)
[2017/10/03 18:26:12]: Problem with kernel communication!  NFS server will not work
[2017/10/03 18:26:12]: Client connected (fd = 10)[2017/10/03 18:26:12]:  (pid = 14884) (uid = 0) (gid = 0)[2017/10/03 18:26:12]:  (context = system_u:system_r:kernel_t:s0)[2017/10/03 18:26:12]:
[2017/10/03 18:26:12]: Client connected (fd = 11)[2017/10/03 18:26:12]:  (pid = 14892) (uid = 0) (gid = 0)[2017/10/03 18:26:12]:  (context = system_u:system_r:gssd_t:s0)[2017/10/03 18:26:12]:
[2017/10/03 18:26:12]: gp_rpc_execute: executing 1 (GSSX_INDICATE_MECHS) for service "nfs-client", euid: 0,socket: (null)
[2017/10/03 18:26:13]: Client connected (fd = 11)[2017/10/03 18:26:13]:  (pid = 14899) (uid = 0) (gid = 0)[2017/10/03 18:26:13]:  (context = system_u:system_r:httpd_t:s0)[2017/10/03 18:26:13]:
[2017/10/03 18:26:13]: gp_rpc_execute: executing 1 (GSSX_INDICATE_MECHS) for service "nfs-client", euid: 0,socket: (null)
[2017/10/03 18:26:18]: Client connected (fd = 12)[2017/10/03 18:26:18]:  (pid = 14907) (uid = 48) (gid = 48)[2017/10/03 18:26:18]:  (context = system_u:system_r:httpd_t:s0)[2017/10/03 18:26:18]:
[2017/10/03 18:26:18]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null)
[2017/10/03 18:26:18]: Stored ccache failed to decrypt; treating as empty
[2017/10/03 18:26:18]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null)
[2017/10/03 18:26:18]: Stored ccache failed to decrypt; treating as empty
[2017/10/03 18:26:18]: gp_rpc_execute: executing 9 (GSSX_ACCEPT_SEC_CONTEXT) for service "HTTP", euid: 48,socket: (null)
[2017/10/03 18:26:18]: Stored ccache failed to decrypt; treating as empty

I can see the cached, encrypted key:

$ sudo -u apache klist
Ticket cache: KEYRING:persistent:48:48
Default principal: host/client.example.org

Valid starting       Expires              Service principal
1969-12-31 19:00:00  1969-12-31 19:00:00  Encrypted/Credentials/v1@X-GSSPROXY:

And I can remove it:

$ sudo -u apache kdestroy
$ sudo -u apache klist
klist: Credentials cache keyring 'persistent:48:48' not found

But then the very next hit on the test web page generates this:

[2017/10/03 18:30:55]: Client connected (fd = 13)[2017/10/03 18:30:55]:  (pid = 14903) (uid = 48) (gid = 48)[2017/10/03 18:30:55]:  (context = system_u:system_r:httpd_t:s0)[2017/10/03 18:30:55]:
[2017/10/03 18:30:55]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null)
[2017/10/03 18:30:55]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null)
[2017/10/03 18:30:55]: Stored ccache failed to decrypt; treating as empty
[2017/10/03 18:30:55]: gp_rpc_execute: executing 9 (GSSX_ACCEPT_SEC_CONTEXT) for service "HTTP", euid: 48,socket: (null)
[2017/10/03 18:30:55]: Stored ccache failed to decrypt; treating as empty

The very first GSSX_ACQUIRE_CRED call doesn't generate the "failed to decrypt" error, but all subsequent calls do. And the cached, encrypted key is back:

$ sudo -u apache klist
Ticket cache: KEYRING:persistent:48:48
Default principal: host/client.example.org

Valid starting       Expires              Service principal
1969-12-31 19:00:00  1969-12-31 19:00:00  Encrypted/Credentials/v1@X-GSSPROXY:

So what seems to be happening here is that gssproxy's handling of encrypted, cached keys seems to be broken for some environments: the code to cache the key succeeds, but then all subsequent attempts to read back that cached key fail, regardless of whether the key is actually stale. That would explain why the 0.7.0 series is completely non-functional for us without pull request 214 applied.

We use AD directly as our KDC for Linux hosts, so perhaps the PAC (or some other difference between a Microsoft KDC and an MIT Kerberos KDC) is the root cause?

We're happy to GPDEBUG() the sh*t out of a locally-rebuilt gssproxy. Just give us a patch that has the GPDEBUG() statements you want.

Comment 20 James Ralston 2017-10-03 19:30:40 UTC
Also, as an aside: while I set default_ccache_name to the RHEL7 default for the purpose of the above tests, we *do not* set default_ccache_name in /etc/krb5.conf on any of our hosts, so the default value (FILE:/tmp/krb5cc_%{uid}) is used.

If I test gssproxy with our standard configuration (default_ccache_name not set in /etc/krb5.conf), from performing an strace of gssproxy, I see no evidence that gssproxy is attempting to cache the encrypted key in any filesystem location: not in /tmp (which would be denied by SELinux anyway), and not in /var/lib/gssproxy/clients.

Instead, gssproxy appears to cache the encrypted key somewhere in memory, because as with the default_ccache_name = KEYRING:persistent:%{uid} case, the very first GSSX_ACQUIRE_CRED call after a restart does not throw the "failed to decrypt" error, but all subsequent calls do:

[2017/10/03 18:59:47]: Client connected (fd = 12)[2017/10/03 18:59:47]:  (pid = 17969) (uid = 48) (gid = 48)[2017/10/03 18:59:47]:  (context = system_u:system_r:httpd_t:s0)[2017/10/03 18:59:47]:
[2017/10/03 18:59:47]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null)
[2017/10/03 18:59:47]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null)
[2017/10/03 18:59:47]: Stored ccache failed to decrypt; treating as empty
[2017/10/03 18:59:47]: gp_rpc_execute: executing 9 (GSSX_ACCEPT_SEC_CONTEXT) for service "HTTP", euid: 48,socket: (null)
[2017/10/03 18:59:47]: Stored ccache failed to decrypt; treating as empty

I don't *think* the difference between a KEYRING or FILE cache matters for this issue (both credential cache locations show the same problem with recalling the encrypted key). But in case it might, I wanted to mention it.

Comment 24 adam winberg 2017-10-13 05:59:54 UTC
We don't use gssproxy in production yet but we have a test environment, and we have similar problems after upgrading to gssproxy-0.7.0-4.el7.x86_64. 

I haven't done any further debugging, but with the new version I don't get access to my NFS fileshares even though I have a kerberos keytab:

-bash-4.2$ klist
Ticket cache: KEYRING:persistent:66293:66293
Default principal: user

Valid starting       Expires              Service principal
01/01/1970 00:00:00  01/01/1970 00:00:00  Encrypted/Credentials/v1@X-GSSPROXY:


Downgrading gssproxy to gssproxy-0.4.1-13.el7.x86_64 (and nfs-utils) makes it work again.

Comment 28 Arpit Tolani 2017-11-16 09:35:23 UTC
*** Bug 1497532 has been marked as a duplicate of this bug. ***

Comment 46 Robbie Harwood 2018-01-17 16:51:22 UTC
Pasi, please don't mess with tags on other people's bugs, thanks.

Comment 47 Pasi Karkkainen 2018-01-17 17:13:37 UTC
Uhm, sorry about that. I have no idea how that happened. I thought I was only adding myself to the CC list.. I must have mis-clicked something.

Comment 48 Martin Kosek 2018-01-18 11:26:45 UTC
(In reply to Pasi Karkkainen from comment #47)
> Uhm, sorry about that. I have no idea how that happened. I thought I was
> only adding myself to the CC list.. I must have mis-clicked something.

When needinfo? flag is not targeting a specific person, then check box "I am providing the requested information for this bug (Will clear all needinfo requests)." is always checked in by default.

Comment 52 James Ralston 2018-03-14 22:15:27 UTC
I kicked the tires on gssproxy-0.7.0-17.el7 from the 7.5 beta. Everything I posted in comment 19 is still true.

Specifically, if I configure a simple Apache httpd test page:

<Location /private>

  Order Allow,Deny
  Satisfy any
  Allow from all

  AuthType GSSAPI
  AuthName "EXAMPLE.ORG (EXAMPLE)"
  GssapiSSLonly Off
  GssapiAllowedMech krb5
  Require valid-user

</Location>

Before I hit the test page, the apache user has no encrypted credentials cached:

$ klist
klist: Credentials cache keyring 'persistent:48:48' not found

When I hit the test page, gssproxy logs the following:

2018-03-14T17:46:39.874781-04:00 myhost.example.org gssproxy: [2018/03/14 21:46:39]: Client connected (fd = 11)[2018/03/14 21:46:39]:  (pid = 14205) (uid = 48) (gid = 48)[2018/03/14 21:46:39]:  (context = system_u:system_r:httpd_t:s0)[2018/03/14 21:46:39]:
2018-03-14T17:46:39.874837-04:00 myhost.example.org gssproxy: [CID 11][2018/03/14 21:46:39]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null)
2018-03-14T17:46:39.875693-04:00 myhost.example.org gssproxy: [CID 11][2018/03/14 21:46:39]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null)
2018-03-14T17:46:39.875764-04:00 myhost.example.org gssproxy: [CID 11][2018/03/14 21:46:39]: Stored ccache failed to decrypt; treating as empty
2018-03-14T17:46:39.876280-04:00 myhost.example.org gssproxy: [CID 11][2018/03/14 21:46:39]: gp_rpc_execute: executing 9 (GSSX_ACCEPT_SEC_CONTEXT) for service "HTTP", euid: 48,socket: (null)
2018-03-14T17:46:39.876356-04:00 myhost.example.org gssproxy: [CID 11][2018/03/14 21:46:39]: Stored ccache failed to decrypt; treating as empty

…and now there are cached credentials:

$ klist
Ticket cache: KEYRING:persistent:48:krb_ccache_yI7n81k
Default principal: host/myhost.example.org.ORG

Valid starting       Expires              Service principal
1969-12-31 19:00:00  1969-12-31 19:00:00  Encrypted/Credentials/v1@X-GSSPROXY:

But since gp_decrypt_buffer() *always* fails, either gssproxy is failing to cache the credentials properly, or it is failing to read them back properly. Regardless, the effect is the same: the GSS_C_NO_CREDENTIAL code path is taken on *every* attempt to use cached credentials:

if (ret) {
    /* Allow for re-issuance of the keytab. */
    GPDEBUG("Stored ccache failed to decrypt; treating as empty\n");
    *out = GSS_C_NO_CREDENTIAL;
    goto done;
}

This code to allow re-issuance of the keytab prevents gssproxy from failing entirely, but this is clearly a bug.

As I said in comment 19, we use AD directly as our KDC for Linux hosts, so I suspect the PAC (or some other difference between a Microsoft KDC and an MIT Kerberos KDC) is the root cause.

Robbie, how do you want me to proceed? Should I keep the discussion on this bug, open a new Bugzilla bug, open a pagure bug, or open a Red Hat support request?

Comment 53 Robbie Harwood 2018-03-19 21:56:12 UTC
Thanks James.  This needs a separate bug; a case is helpful if you want it fixed sooner, though of course I'll try to fix it regardless.

Comment 55 errata-xmlrpc 2018-04-10 11:09:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0709