Bug 1488629
Summary: | gssproxy interposer code broken in RHEL 7.4 / gssproxy-0.7.0? | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | James Ralston <ralston> | ||||||||
Component: | gssproxy | Assignee: | Robbie Harwood <rharwood> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | ipa-qe <ipa-qe> | ||||||||
Severity: | unspecified | Docs Contact: | Aneta Šteflová Petrová <apetrova> | ||||||||
Priority: | unspecified | ||||||||||
Version: | 7.4 | CC: | adam.winberg, amore, apetrova, atolani, dwysocha, fsorenso, fs-qe, hkhot, jlyle, jpazdziora, jvilicic, kfiresmith, ksiddiqu, mkosek, modolo, nsoman, pasik, ralston, R.Eggermont, rharwood, riehecky, rkothiya, rraghuwa, yoyang | ||||||||
Target Milestone: | rc | Keywords: | Regression | ||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
URL: | https://pagure.io/gssproxy/pull-request/214 | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | gssproxy-0.7.0-13.el7 | Doc Type: | Bug Fix | ||||||||
Doc Text: |
_gssproxy_ fixed to properly update ccaches
Previously, the _gssproxy_ package did not correctly handle the key version number (kvno) incrementation in Kerberos credential caches (ccaches). As a consequence, stale ccaches were not properly overwritten. This update fixes these problems in _gssproxy_ ccache caching. As a result, ccaches are now properly updated, and the caching prevents excessive requests for updates.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2018-04-10 11:09:51 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 1420851 | ||||||||||
Attachments: |
|
Description
James Ralston
2017-09-05 22:05:49 UTC
Please provide configs and apache logs. (recovery flag needinfo) Please see comment #2. Sorry for confusion. Created attachment 1324836 [details]
ssl_error_log
Created attachment 1324837 [details]
pulp_gssapi.conf
Thank you. As a workaround, they can create the file /etc/gssproxy/80-httpd.conf that looks like this: [service/HTTP] mechs = krb5 cred_store = keytab:/etc/gssproxy/http.keytab cred_store = ccache:/var/lib/gssproxy/clients/krb5cc_%U euid = apache I don't understand what this workaround is supposed to accomplish. What is the purpose of pointing gssproxy at a non-existent keytab file (/etc/gssproxy/http.keytab)? That will always fail, for ANY service: Sep 13 12:14:47 server gssproxy: gssproxy[14155]: (OID: { 1 2 840 113554 1 2 2 }) Unspecified GSS failure. Minor code may provide more information, Key table file '/etc/gssproxy/http.keytab' not found Due to bug 1458913, we have commented out the contents of all /etc/gssproxy/[0-9][0-9]-*.conf files. We use a single monolithic /etc/gssproxy/gssproxy.conf file, which contains exactly: [gssproxy] [service/HTTP] mechs = krb5 cred_store = keytab:/etc/krb5.keytab cred_store = ccache:/var/lib/gssproxy/clients/krb5cc_%U euid = 48 krb5_principal = HTTP [service/ldap] mechs = krb5 cred_store = keytab:/etc/krb5.keytab cred_store = ccache:/var/lib/gssproxy/clients/krb5cc_%U euid = 55 krb5_principal = ldap [service/imap] mechs = krb5 cred_store = keytab:/etc/krb5.keytab cred_store = ccache:/var/lib/gssproxy/clients/krb5cc_%U euid = 76 krb5_principal = imap [service/nfs-server] mechs = krb5 socket = /run/gssproxy.sock cred_store = keytab:/etc/krb5.keytab trusted = yes kernel_nfsd = yes euid = 0 [service/nfs-client] mechs = krb5 cred_store = keytab:/etc/krb5.keytab cred_store = ccache:FILE:/var/lib/gssproxy/clients/krb5cc_%U cred_store = client_keytab:/var/lib/gssproxy/clients/%U.keytab cred_usage = initiate allow_any_uid = yes trusted = yes euid = 0 This file works perfectly for gssproxy-0.4.1-13.el7, for all services. After upgrading to gssproxy-0.7.0-4.el7, the nfs-server and nfs-client services continue to work, but the HTTP and imap services do not. For example, if I test cyrus-imapd with 0.7.0, attempting to use GSSAPI authentication fails with: S: A01 NO generic failure …and this is the only thing that is logged: Sep 13 12:26:55 server imap[21382]: badlogin: client.example.org [1.2.3.4] GSSAPI [SASL(-13): authentication failure: GSSAPI Failure: gss_accept_sec_context] There are two possibilities (not necessarily mutually exclusive): 1. Our config file (above) is compatible with gssproxy 0.4.1 but not 0.7.0. 2. The interposer code in gssproxy 0.7.0 is broken. If #1 is the issue, I don't see how. While 0.7.0 adds more options to gssproxy configuration files, I don't see how what we have is incorrect. Am I missing something? (In reply to James Ralston from comment #8) > I don't understand what this workaround is supposed to accomplish. What is > the purpose of pointing gssproxy at a non-existent keytab file > (/etc/gssproxy/http.keytab)? That will always fail, for ANY service: I mean, adapt as appropriate to your keytab location. That should go without saying, but I did say "that looks like" there. You'll note that this is exactly what is specified in the instructions you linked, as well. > Due to bug 1458913, we have commented out the contents of all > /etc/gssproxy/[0-9][0-9]-*.conf files. We use a single monolithic > /etc/gssproxy/gssproxy.conf file, which contains exactly: I don't know how that follows from that bug (why not just remove the duplicate sections...), but sure, that's fine. In the future, when filing bugs against a component, it should go without saying that you provide any nonstandard configurations, especially when asked for them. klist -k the keytab, please. er, klist -ek, sorry. Here's the (sanitized) /etc/krb5.keytab file: 2 host/server.example.org (des-cbc-crc) 2 host/server.example.org (des-cbc-md5) 2 host/server.example.org (aes128-cts-hmac-sha1-96) 2 host/server.example.org (aes256-cts-hmac-sha1-96) 2 host/server.example.org (arcfour-hmac) 2 host/SERVER (des-cbc-crc) 2 host/SERVER (des-cbc-md5) 2 host/SERVER (aes128-cts-hmac-sha1-96) 2 host/SERVER (aes256-cts-hmac-sha1-96) 2 host/SERVER (arcfour-hmac) 2 SERVER$@EXAMPLE.ORG (des-cbc-crc) 2 SERVER$@EXAMPLE.ORG (des-cbc-md5) 2 SERVER$@EXAMPLE.ORG (aes128-cts-hmac-sha1-96) 2 SERVER$@EXAMPLE.ORG (aes256-cts-hmac-sha1-96) 2 SERVER$@EXAMPLE.ORG (arcfour-hmac) 2 nfs/server.example.org (des-cbc-crc) 2 nfs/server.example.org (des-cbc-md5) 2 nfs/server.example.org (aes128-cts-hmac-sha1-96) 2 nfs/server.example.org (aes256-cts-hmac-sha1-96) 2 nfs/server.example.org (arcfour-hmac) 2 nfs/SERVER (des-cbc-crc) 2 nfs/SERVER (des-cbc-md5) 2 nfs/SERVER (aes128-cts-hmac-sha1-96) 2 nfs/SERVER (aes256-cts-hmac-sha1-96) 2 nfs/SERVER (arcfour-hmac) 2 imap/server.example.org (des-cbc-crc) 2 imap/server.example.org (des-cbc-md5) 2 imap/server.example.org (aes128-cts-hmac-sha1-96) 2 imap/server.example.org (aes256-cts-hmac-sha1-96) 2 imap/server.example.org (arcfour-hmac) 2 imap/SERVER (des-cbc-crc) 2 imap/SERVER (des-cbc-md5) 2 imap/SERVER (aes128-cts-hmac-sha1-96) 2 imap/SERVER (aes256-cts-hmac-sha1-96) 2 imap/SERVER (arcfour-hmac) 2 sieve/server.example.org (des-cbc-crc) 2 sieve/server.example.org (des-cbc-md5) 2 sieve/server.example.org (aes128-cts-hmac-sha1-96) 2 sieve/server.example.org (aes256-cts-hmac-sha1-96) 2 sieve/server.example.org (arcfour-hmac) 2 sieve/SERVER (des-cbc-crc) 2 sieve/SERVER (des-cbc-md5) 2 sieve/SERVER (aes128-cts-hmac-sha1-96) 2 sieve/SERVER (aes256-cts-hmac-sha1-96) 2 sieve/SERVER (arcfour-hmac) The output in comment 11 is from a server running cyrus-imapd. Here's one from a server running httpd: 2 host/server.example.org (des-cbc-crc) 2 host/server.example.org (des-cbc-md5) 2 host/server.example.org (aes128-cts-hmac-sha1-96) 2 host/server.example.org (aes256-cts-hmac-sha1-96) 2 host/server.example.org (arcfour-hmac) 2 host/SERVER (des-cbc-crc) 2 host/SERVER (des-cbc-md5) 2 host/SERVER (aes128-cts-hmac-sha1-96) 2 host/SERVER (aes256-cts-hmac-sha1-96) 2 host/SERVER (arcfour-hmac) 2 SERVER$@EXAMPLE.ORG (des-cbc-crc) 2 SERVER$@EXAMPLE.ORG (des-cbc-md5) 2 SERVER$@EXAMPLE.ORG (aes128-cts-hmac-sha1-96) 2 SERVER$@EXAMPLE.ORG (aes256-cts-hmac-sha1-96) 2 SERVER$@EXAMPLE.ORG (arcfour-hmac) 2 nfs/server.example.org (des-cbc-crc) 2 nfs/server.example.org (des-cbc-md5) 2 nfs/server.example.org (aes128-cts-hmac-sha1-96) 2 nfs/server.example.org (aes256-cts-hmac-sha1-96) 2 nfs/server.example.org (arcfour-hmac) 2 nfs/SERVER (des-cbc-crc) 2 nfs/SERVER (des-cbc-md5) 2 nfs/SERVER (aes128-cts-hmac-sha1-96) 2 nfs/SERVER (aes256-cts-hmac-sha1-96) 2 nfs/SERVER (arcfour-hmac) 2 HTTP/server.example.org (des-cbc-crc) 2 HTTP/server.example.org (des-cbc-md5) 2 HTTP/server.example.org (aes128-cts-hmac-sha1-96) 2 HTTP/server.example.org (aes256-cts-hmac-sha1-96) 2 HTTP/server.example.org (arcfour-hmac) 2 HTTP/SERVER (des-cbc-crc) 2 HTTP/SERVER (des-cbc-md5) 2 HTTP/SERVER (aes128-cts-hmac-sha1-96) 2 HTTP/SERVER (aes256-cts-hmac-sha1-96) 2 HTTP/SERVER (arcfour-hmac) Attaching a customer case which appears to be hitting this as well, after upgrading nfs clients from 7.3 to 7.4: [root@system700 apache]# pwd /data/apache [root@system700 apache]# cd docroot -bash: cd: docroot: Permission denied nfs server is RHEL 7.4 NFSv4+KRB5 1. Clients running nfs-utils-1.3.0-0.33.el7_3.x86_64 + gssproxy-0.4.1-13.el7.x86_64 have no issues connecting to the NFS server. - local users with credentials are able to access the NFS share - local users without credentials, but gssproxy tokens, are able to access the NFS share - local services (eg: httpd) are able to access the NFS share Sep 15 08:18:06 system810 systemd: Started RPC security service for NFS client and server. Sep 15 08:18:24 system810 rpc.gssd[13498]: #012handle_gssd_upcall: 'mech=krb5 uid=0 service=* enctypes=18,17,16,23,3,1,2 ' (nfs/clnt18) Sep 15 08:18:24 system810 rpc.gssd[13498]: krb5_use_machine_creds: uid 0 tgtname (null) Sep 15 08:18:24 system810 rpc.gssd[13498]: Full hostname for 'system530-pre.domain.corp' is 'system530-pre.domain.corp' Sep 15 08:18:24 system810 rpc.gssd[13498]: Full hostname for 'system810.domain.corp' is 'system810.domain.corp' Sep 15 08:18:24 system810 rpc.gssd[13498]: No key table entry found for system810$@DOMAIN.CORP while getting keytab entry for 'system810$@DOMAIN.CORP' Sep 15 08:18:24 system810 rpc.gssd[13498]: Success getting keytab entry for 'SYSTEM810$@DOMAIN.CORP' Sep 15 08:18:24 system810 rpc.gssd[13498]: gssd_get_single_krb5_cred: principal 'SYSTEM810$@DOMAIN.CORP' ccache:'FILE:/tmp/krb5ccmachine_DOMAIN.CORP' Sep 15 08:18:24 system810 rpc.gssd[13498]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_DOMAIN.CORP' are good until 1505517503 Sep 15 08:18:24 system810 gssproxy: Client connected (fd = 12) (pid = 13498) (uid = 0) (gid = 0) (context = system_u:system_r:gssd_t:s0) Sep 15 08:18:24 system810 gssproxy: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "nfs-client", euid: 0, socket: (null) Sep 15 08:18:24 system810 rpc.gssd[13498]: creating tcp client for server system530-pre.domain.corp Sep 15 08:18:24 system810 rpc.gssd[13498]: creating context with server nfs.corp Sep 15 08:18:24 system810 gssproxy: gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "nfs-client", euid: 0, socket: (null) Sep 15 08:18:24 system810 gssproxy: gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "nfs-client", euid: 0, socket: (null) Sep 15 08:18:24 system810 rpc.gssd[13498]: doing downcall: lifetime_rec=35582 acceptor=nfs.corp Sep 15 08:18:24 system810 rpc.gssd[13498]: #012handle_gssd_upcall: 'mech=krb5 uid=0 enctypes=18,17,16,23,3,1,2 ' (nfs/clnt18) Sep 15 08:18:24 system810 rpc.gssd[13498]: krb5_use_machine_creds: uid 0 tgtname (null) Sep 15 08:18:24 system810 rpc.gssd[13498]: Full hostname for 'system530-pre.domain.corp' is 'system530-pre.domain.corp' Sep 15 08:18:24 system810 rpc.gssd[13498]: Full hostname for 'system810.domain.corp' is 'system810.domain.corp' Sep 15 08:18:24 system810 rpc.gssd[13498]: No key table entry found for system810$@DOMAIN.CORP while getting keytab entry for 'system810$@DOMAIN.CORP' Sep 15 08:18:24 system810 rpc.gssd[13498]: Success getting keytab entry for 'SYSTEM810$@DOMAIN.CORP' Sep 15 08:18:24 system810 rpc.gssd[13498]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_DOMAIN.CORP' are good until 1505517503 Sep 15 08:18:24 system810 rpc.gssd[13498]: INFO: Credentials in CC 'FILE:/tmp/krb5ccmachine_DOMAIN.CORP' are good until 1505517503 Sep 15 08:18:24 system810 gssproxy: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "nfs-client", euid: 0, socket: (null) Sep 15 08:18:24 system810 rpc.gssd[13498]: creating tcp client for server system530-pre.domain.corp Sep 15 08:18:24 system810 rpc.gssd[13498]: creating context with server nfs.corp Sep 15 08:18:24 system810 gssproxy: gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "nfs-client", euid: 0, socket: (null) Sep 15 08:18:24 system810 gssproxy: gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "nfs-client", euid: 0, socket: (null) Sep 15 08:18:24 system810 rpc.gssd[13498]: doing downcall: lifetime_rec=35582 acceptor=nfs.corp Sep 15 08:18:25 system810 rpc.gssd[13498]: #012handle_gssd_upcall: 'mech=krb5 uid=48 enctypes=18,17,16,23,3,1,2 ' (nfs/clnt18) Sep 15 08:18:25 system810 gssproxy: Client connected (fd = 12) (pid = 13498) (uid = 48) (gid = 48) (context = system_u:system_r:gssd_t:s0) Sep 15 08:18:25 system810 gssproxy: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48, socket: (null) Sep 15 08:18:25 system810 rpc.gssd[13498]: krb5_not_machine_creds: uid 48 tgtname (null) Sep 15 08:18:25 system810 rpc.gssd[13498]: creating tcp client for server system530-pre.domain.corp Sep 15 08:18:25 system810 rpc.gssd[13498]: creating context with server nfs.corp Sep 15 08:18:25 system810 gssproxy: gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "HTTP", euid: 48, socket: (null) Sep 15 08:18:25 system810 gssproxy: gp_rpc_execute: executing 8 (GSSX_INIT_SEC_CONTEXT) for service "HTTP", euid: 48, socket: (null) Sep 15 08:18:25 system810 rpc.gssd[13498]: doing downcall: lifetime_rec=35583 acceptor=nfs.corp 2. Clients running nfs-utils-1.3.0-0.48.el7.x86_64 + gssproxy-0.7.0-4.el7.x86_64 have issues with services connecting to the NFS server. - local users with credentials are able to access the NFS share - local users without credentials, but gssproxy tokens, are unable to access the NFS share - local services (eg: httpd) are unable to access the NFS share Sep 15 08:13:33 system700 systemd: Started RPC security service for NFS client and server. Sep 15 08:14:11 system700 rpc.gssd[23981]: #012handle_gssd_upcall: 'mech=krb5 uid=48 enctypes=18,17,16,23,3,1,2 ' (nfs/clnt70e) Sep 15 08:14:11 system700 rpc.gssd[23981]: krb5_not_machine_creds: uid 48 tgtname (null) Sep 15 08:14:11 system700 gssproxy: [2017/09/15 13:14:11]: Client connected (fd = 11)[2017/09/15 13:14:11]: (pid = 23981) (uid = 48) (gid = 48)[2017/09/15 13:14:11]: (context = system_u:system_r:gssd_t:s0)[2017/09/15 13:14:11]: Sep 15 08:14:11 system700 gssproxy: [2017/09/15 13:14:11]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "nfs-client", euid: 48,socket: (null) Sep 15 08:14:11 system700 gssproxy: GSSX_ARG_ACQUIRE_CRED( call_ctx: { "" [ ] } input_cred_handle: { { "ss2user" { 1 2 840 113554 1 2 2 1 } [ ... ] [ ... ] [ ] } [ { { "ss2user" { 1 2 840 113554 1 2 2 1 } [ ... ] [ ... ] [ ] } { 1 2 840 113554 1 2 2 } INITIATE 35999 0 } ] [ ... ] 0 } add_cred: 0 desired_name: <Null> time_req: 4294967295 desired_mechs: { { 1 2 840 113554 1 2 2 } } cred_usage: INITIATE initiator_time_req: 0 acceptor_time_req: 0 ) Sep 15 08:14:11 system700 gssproxy: GSSX_RES_ACQUIRE_CRED( status: { 851968 <None> 2 "Unspecified GSS failure. Minor code may provide more information" "" [ ] } output_cred_handle: <Null> ) Sep 15 08:14:11 system700 rpc.gssd[23981]: creating tcp client for server system530-qa.domain.corp Sep 15 08:14:11 system700 rpc.gssd[23981]: creating context with server nfs.corp Sep 15 08:14:11 system700 rpc.gssd[23981]: in authgss_create_default() Sep 15 08:14:11 system700 rpc.gssd[23981]: in authgss_create() Sep 15 08:14:11 system700 rpc.gssd[23981]: in authgss_refresh() Sep 15 08:14:11 system700 rpc.gssd[23981]: authgss_refresh: gss_init_sec_context: Unspecified GSS failure. Minor code may provide more information - Credential cache is empty Sep 15 08:14:11 system700 rpc.gssd[23981]: in authgss_destroy() Sep 15 08:14:11 system700 rpc.gssd[23981]: in authgss_destroy_context() Sep 15 08:14:11 system700 rpc.gssd[23981]: WARNING: Failed to create krb5 context for user with uid 48 for server nfs.corp Sep 15 08:14:11 system700 rpc.gssd[23981]: looking for client creds with uid 48 for server system530-qa.domain.corp in /tmp Sep 15 08:14:11 system700 rpc.gssd[23981]: looking for client creds with uid 48 for server system530-qa.domain.corp in /run/user/%U Sep 15 08:14:11 system700 rpc.gssd[23981]: Error doing scandir on directory '/run/user/48': No such file or directory Sep 15 08:14:11 system700 rpc.gssd[23981]: doing error downcall Sep 15 08:14:11 system700 rpc.gssd[23981]: ERROR: event_dispatch() returned! Sep 15 08:14:11 system700 systemd: rpc-gssd.service: main process exited, code=exited, status=1/FAILURE Sep 15 08:14:11 system700 systemd: Unit rpc-gssd.service entered failed state. Sep 15 08:14:11 system700 systemd: rpc-gssd.service failed. Packet captures can be provided as well, if they will be helpful. Okay. After some debugging, I think what may have happened is that some processes have stale keys around. Normally this wouldn't be a problem, but with gssproxy, these keys are encrypted. The result is that when you change the keytab in gssproxy, they all go invalid. Rebooting the machine should fix it. Clearing out the appropriate ccache should also fix it. (For instance, apache/mod_auth_gssapi by default uses the kernel keyring; if you log in, you'll see something like: -bash-4.2$ klist Ticket cache: KEYRING:persistent:48:48 Default principal: HTTP/7-2 Valid starting Expires Service principal 12/31/1969 19:00:00 12/31/1969 19:00:00 Encrypted/Credentials/v1@X-GSSPROXY: and a kdestroy should take care of that.) To actually fix this, we'll probably need to modify the lifetime of this ccache, and probably also more gracefully handle the failure here. Thanks for your patience. When we apply updates to a host, we always reboot afterward. So the 0.7.0 gssproxy was starting on a cleanly-booted system. And it still failed. Furthermore, if I downgrade to 0.4.1, and restart the service in question but don't reboot, 0.4.1 works. I can upgrade/downgrade back and forth between 0.7.0 and 0.4.1, without rebooting, and I consistently see that 0.4.1 works, but 0.7.0 fails. I'll build a local gssproxy with the patch from pull request 214, but I'm skeptical that it will resolve the problem. Created attachment 1327638 [details]
Handle outdated encrypted ccaches
I rebased the current state of PR#214 for 0.7.0-4.el7 (attached), and rolled a local gssproxy package with this rebased patch.
I can confirm that the patch DOES resolve the issue, for both the cyrus-imapd and httpd services.
I can't confirm that this also resolves the issue Frank's customer is seeing, because we didn't see those issues.
Ah; I see that https://pagure.io/gssproxy/pull-request/214 was merged. I'll rebuild gssproxy 0.7.0 locally with that patch and kick the tires again. OK, I've rebuilt local gssproxy packages with the commit from https://pagure.io/gssproxy/pull-request/214 included as a patch. But the results we're seeing are troubling. Specifically, what we're seeing with pull request 214 is that almost *every* call to gssproxy originating from httpd hits the GSS_C_NO_CREDENTIAL code path: if (ret) { /* Allow for re-issuance of the keytab. */ GPDEBUG("Stored ccache failed to decrypt; treating as empty\n"); *out = GSS_C_NO_CREDENTIAL; goto done; } Here's the result from a clean start of gssproxy, using a very simple test web page that uses mod_auth_gssapi, and with default_ccache_name = KEYRING:persistent:%{uid} in /etc/krb5.conf: [2017/10/03 18:26:12]: Debug Enabled (level: 0) [2017/10/03 18:26:12]: Problem with kernel communication! NFS server will not work [2017/10/03 18:26:12]: Client connected (fd = 10)[2017/10/03 18:26:12]: (pid = 14884) (uid = 0) (gid = 0)[2017/10/03 18:26:12]: (context = system_u:system_r:kernel_t:s0)[2017/10/03 18:26:12]: [2017/10/03 18:26:12]: Client connected (fd = 11)[2017/10/03 18:26:12]: (pid = 14892) (uid = 0) (gid = 0)[2017/10/03 18:26:12]: (context = system_u:system_r:gssd_t:s0)[2017/10/03 18:26:12]: [2017/10/03 18:26:12]: gp_rpc_execute: executing 1 (GSSX_INDICATE_MECHS) for service "nfs-client", euid: 0,socket: (null) [2017/10/03 18:26:13]: Client connected (fd = 11)[2017/10/03 18:26:13]: (pid = 14899) (uid = 0) (gid = 0)[2017/10/03 18:26:13]: (context = system_u:system_r:httpd_t:s0)[2017/10/03 18:26:13]: [2017/10/03 18:26:13]: gp_rpc_execute: executing 1 (GSSX_INDICATE_MECHS) for service "nfs-client", euid: 0,socket: (null) [2017/10/03 18:26:18]: Client connected (fd = 12)[2017/10/03 18:26:18]: (pid = 14907) (uid = 48) (gid = 48)[2017/10/03 18:26:18]: (context = system_u:system_r:httpd_t:s0)[2017/10/03 18:26:18]: [2017/10/03 18:26:18]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null) [2017/10/03 18:26:18]: Stored ccache failed to decrypt; treating as empty [2017/10/03 18:26:18]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null) [2017/10/03 18:26:18]: Stored ccache failed to decrypt; treating as empty [2017/10/03 18:26:18]: gp_rpc_execute: executing 9 (GSSX_ACCEPT_SEC_CONTEXT) for service "HTTP", euid: 48,socket: (null) [2017/10/03 18:26:18]: Stored ccache failed to decrypt; treating as empty I can see the cached, encrypted key: $ sudo -u apache klist Ticket cache: KEYRING:persistent:48:48 Default principal: host/client.example.org Valid starting Expires Service principal 1969-12-31 19:00:00 1969-12-31 19:00:00 Encrypted/Credentials/v1@X-GSSPROXY: And I can remove it: $ sudo -u apache kdestroy $ sudo -u apache klist klist: Credentials cache keyring 'persistent:48:48' not found But then the very next hit on the test web page generates this: [2017/10/03 18:30:55]: Client connected (fd = 13)[2017/10/03 18:30:55]: (pid = 14903) (uid = 48) (gid = 48)[2017/10/03 18:30:55]: (context = system_u:system_r:httpd_t:s0)[2017/10/03 18:30:55]: [2017/10/03 18:30:55]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null) [2017/10/03 18:30:55]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null) [2017/10/03 18:30:55]: Stored ccache failed to decrypt; treating as empty [2017/10/03 18:30:55]: gp_rpc_execute: executing 9 (GSSX_ACCEPT_SEC_CONTEXT) for service "HTTP", euid: 48,socket: (null) [2017/10/03 18:30:55]: Stored ccache failed to decrypt; treating as empty The very first GSSX_ACQUIRE_CRED call doesn't generate the "failed to decrypt" error, but all subsequent calls do. And the cached, encrypted key is back: $ sudo -u apache klist Ticket cache: KEYRING:persistent:48:48 Default principal: host/client.example.org Valid starting Expires Service principal 1969-12-31 19:00:00 1969-12-31 19:00:00 Encrypted/Credentials/v1@X-GSSPROXY: So what seems to be happening here is that gssproxy's handling of encrypted, cached keys seems to be broken for some environments: the code to cache the key succeeds, but then all subsequent attempts to read back that cached key fail, regardless of whether the key is actually stale. That would explain why the 0.7.0 series is completely non-functional for us without pull request 214 applied. We use AD directly as our KDC for Linux hosts, so perhaps the PAC (or some other difference between a Microsoft KDC and an MIT Kerberos KDC) is the root cause? We're happy to GPDEBUG() the sh*t out of a locally-rebuilt gssproxy. Just give us a patch that has the GPDEBUG() statements you want. Also, as an aside: while I set default_ccache_name to the RHEL7 default for the purpose of the above tests, we *do not* set default_ccache_name in /etc/krb5.conf on any of our hosts, so the default value (FILE:/tmp/krb5cc_%{uid}) is used. If I test gssproxy with our standard configuration (default_ccache_name not set in /etc/krb5.conf), from performing an strace of gssproxy, I see no evidence that gssproxy is attempting to cache the encrypted key in any filesystem location: not in /tmp (which would be denied by SELinux anyway), and not in /var/lib/gssproxy/clients. Instead, gssproxy appears to cache the encrypted key somewhere in memory, because as with the default_ccache_name = KEYRING:persistent:%{uid} case, the very first GSSX_ACQUIRE_CRED call after a restart does not throw the "failed to decrypt" error, but all subsequent calls do: [2017/10/03 18:59:47]: Client connected (fd = 12)[2017/10/03 18:59:47]: (pid = 17969) (uid = 48) (gid = 48)[2017/10/03 18:59:47]: (context = system_u:system_r:httpd_t:s0)[2017/10/03 18:59:47]: [2017/10/03 18:59:47]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null) [2017/10/03 18:59:47]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null) [2017/10/03 18:59:47]: Stored ccache failed to decrypt; treating as empty [2017/10/03 18:59:47]: gp_rpc_execute: executing 9 (GSSX_ACCEPT_SEC_CONTEXT) for service "HTTP", euid: 48,socket: (null) [2017/10/03 18:59:47]: Stored ccache failed to decrypt; treating as empty I don't *think* the difference between a KEYRING or FILE cache matters for this issue (both credential cache locations show the same problem with recalling the encrypted key). But in case it might, I wanted to mention it. We don't use gssproxy in production yet but we have a test environment, and we have similar problems after upgrading to gssproxy-0.7.0-4.el7.x86_64. I haven't done any further debugging, but with the new version I don't get access to my NFS fileshares even though I have a kerberos keytab: -bash-4.2$ klist Ticket cache: KEYRING:persistent:66293:66293 Default principal: user Valid starting Expires Service principal 01/01/1970 00:00:00 01/01/1970 00:00:00 Encrypted/Credentials/v1@X-GSSPROXY: Downgrading gssproxy to gssproxy-0.4.1-13.el7.x86_64 (and nfs-utils) makes it work again. *** Bug 1497532 has been marked as a duplicate of this bug. *** Pasi, please don't mess with tags on other people's bugs, thanks. Uhm, sorry about that. I have no idea how that happened. I thought I was only adding myself to the CC list.. I must have mis-clicked something. (In reply to Pasi Karkkainen from comment #47) > Uhm, sorry about that. I have no idea how that happened. I thought I was > only adding myself to the CC list.. I must have mis-clicked something. When needinfo? flag is not targeting a specific person, then check box "I am providing the requested information for this bug (Will clear all needinfo requests)." is always checked in by default. I kicked the tires on gssproxy-0.7.0-17.el7 from the 7.5 beta. Everything I posted in comment 19 is still true. Specifically, if I configure a simple Apache httpd test page: <Location /private> Order Allow,Deny Satisfy any Allow from all AuthType GSSAPI AuthName "EXAMPLE.ORG (EXAMPLE)" GssapiSSLonly Off GssapiAllowedMech krb5 Require valid-user </Location> Before I hit the test page, the apache user has no encrypted credentials cached: $ klist klist: Credentials cache keyring 'persistent:48:48' not found When I hit the test page, gssproxy logs the following: 2018-03-14T17:46:39.874781-04:00 myhost.example.org gssproxy: [2018/03/14 21:46:39]: Client connected (fd = 11)[2018/03/14 21:46:39]: (pid = 14205) (uid = 48) (gid = 48)[2018/03/14 21:46:39]: (context = system_u:system_r:httpd_t:s0)[2018/03/14 21:46:39]: 2018-03-14T17:46:39.874837-04:00 myhost.example.org gssproxy: [CID 11][2018/03/14 21:46:39]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null) 2018-03-14T17:46:39.875693-04:00 myhost.example.org gssproxy: [CID 11][2018/03/14 21:46:39]: gp_rpc_execute: executing 6 (GSSX_ACQUIRE_CRED) for service "HTTP", euid: 48,socket: (null) 2018-03-14T17:46:39.875764-04:00 myhost.example.org gssproxy: [CID 11][2018/03/14 21:46:39]: Stored ccache failed to decrypt; treating as empty 2018-03-14T17:46:39.876280-04:00 myhost.example.org gssproxy: [CID 11][2018/03/14 21:46:39]: gp_rpc_execute: executing 9 (GSSX_ACCEPT_SEC_CONTEXT) for service "HTTP", euid: 48,socket: (null) 2018-03-14T17:46:39.876356-04:00 myhost.example.org gssproxy: [CID 11][2018/03/14 21:46:39]: Stored ccache failed to decrypt; treating as empty …and now there are cached credentials: $ klist Ticket cache: KEYRING:persistent:48:krb_ccache_yI7n81k Default principal: host/myhost.example.org.ORG Valid starting Expires Service principal 1969-12-31 19:00:00 1969-12-31 19:00:00 Encrypted/Credentials/v1@X-GSSPROXY: But since gp_decrypt_buffer() *always* fails, either gssproxy is failing to cache the credentials properly, or it is failing to read them back properly. Regardless, the effect is the same: the GSS_C_NO_CREDENTIAL code path is taken on *every* attempt to use cached credentials: if (ret) { /* Allow for re-issuance of the keytab. */ GPDEBUG("Stored ccache failed to decrypt; treating as empty\n"); *out = GSS_C_NO_CREDENTIAL; goto done; } This code to allow re-issuance of the keytab prevents gssproxy from failing entirely, but this is clearly a bug. As I said in comment 19, we use AD directly as our KDC for Linux hosts, so I suspect the PAC (or some other difference between a Microsoft KDC and an MIT Kerberos KDC) is the root cause. Robbie, how do you want me to proceed? Should I keep the discussion on this bug, open a new Bugzilla bug, open a pagure bug, or open a Red Hat support request? Thanks James. This needs a separate bug; a case is helpful if you want it fixed sooner, though of course I'll try to fix it regardless. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0709 |