Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1432643

Summary:

segfault in rpc.gssd in find_keytab_entry

Product:

Red Hat Enterprise Linux 7

Reporter:

Orion Poplawski <orion>

Component:

nfs-utils

Assignee:

Steve Dickson <steved>

Status:

CLOSED ERRATA

QA Contact:

ChunYu Wang <chunwang>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

7.3

CC:

chunwang, jiyin, yoyang

Target Milestone:

Keywords:

Patch

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

nfs-utils-1.3.0-0.40.el7

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2017-08-01 19:50:23 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
A simple script to trigger equivalent segfault	none

Description Orion Poplawski 2017-03-15 20:38:32 UTC

Description of problem:

Program terminated with signal 11, Segmentation fault.
#0  0x00007f89c2a2358a in __strcmp_sse42 () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f89c2a2358a in __strcmp_sse42 () from /lib64/libc.so.6
#1  0x000055d96fcd8fca in find_keytab_entry (context=0x7f89bc000910, kt=0x7f89bc000ae0,
    tgtname=tgtname@entry=0x55d970d73c30 "saga.cora.nwra.com", kte=kte@entry=0x7f89c15e1bb0,
    svcnames=svcnames@entry=0x7f89c15e1b80) at krb5_util.c:851
#2  0x000055d96fcd9d6d in gssd_refresh_krb5_machine_credential (
    hostname=0x55d970d73c30 "saga.cora.nwra.com", ple=ple@entry=0x0,
    service=service@entry=0x55d970d834c0 "*") at krb5_util.c:1277
#3  0x000055d96fcd71c0 in krb5_use_machine_creds (clp=clp@entry=0x55d970d73410, uid=uid@entry=0,
    tgtname=tgtname@entry=0x0, service=service@entry=0x55d970d834c0 "*",
    rpc_clnt=rpc_clnt@entry=0x7f89c15e1cf0) at gssd_proc.c:543
#4  0x000055d96fcd73ed in process_krb5_upcall (clp=clp@entry=0x55d970d73410, uid=uid@entry=0,
    fd=9, tgtname=tgtname@entry=0x0, service=service@entry=0x55d970d834c0 "*") at gssd_proc.c:652
#5  0x000055d96fcd7baf in handle_gssd_upcall (info=0x55d970d834a0) at gssd_proc.c:803
#6  0x00007f89c2cb8dc5 in start_thread () from /lib64/libpthread.so.0
#7  0x00007f89c29e773d in clone () from /lib64/libc.so.6
(gdb) up
#1  0x000055d96fcd8fca in find_keytab_entry (context=0x7f89bc000910, kt=0x7f89bc000ae0,
    tgtname=tgtname@entry=0x55d970d73c30 "saga.cora.nwra.com", kte=kte@entry=0x7f89c15e1bb0,
    svcnames=svcnames@entry=0x7f89c15e1b80) at krb5_util.c:851
851             if (strcmp (realm, preferred_realm) != 0) {
(gdb) list
846              * the host and local default realm (if that hasn't already been tried).
847              */
848             i = 0;
849             realm = realmnames[i];
850
851             if (strcmp (realm, preferred_realm) != 0) {
852                     realm = preferred_realm;
853                     /* resetting the realmnames index */
854                     i = -1;
855             }
(gdb) print realm
$1 = 0x7f89bc005060 "NWRA.COM"
(gdb) print preferred_realm
$2 = 0x0

Version-Release number of selected component (if applicable):
nfs-utils-1.3.0-0.33.el7_3.x86_64

Looks like this is a duplicate of bug #1108615

Comment 2 Yongcheng Yang 2017-03-16 01:14:39 UTC

(In reply to Orion Poplawski from comment #0)

> Version-Release number of selected component (if applicable):
> nfs-utils-1.3.0-0.33.el7_3.x86_64
> 
> Looks like this is a duplicate of bug #1108615

It should need the following upstream patch.

commit 8399548e6b904116e0e41d83e4a4b571af8ea578
Author: Jeff Layton <jlayton>
Date:   Fri Sep 12 13:20:13 2014 -0400

    gssd: ensure that preferred_realm is non-NULL before passing it to strcmp

Comment 3 ChunYu Wang 2017-03-16 03:17:19 UTC

(In reply to Orion Poplawski from comment #0)
> Description of problem:
> 
> Program terminated with signal 11, Segmentation fault.

Hi, Orion,

We can make sure this bug you reported is just the same as the bug 1108615, but I am very confused about how to reproduce it, by reading code in file utils/gssd/krb5_util.c of package nfs-utils, I found the problem will happen when function krb5_get_default_realm returns a NULL realm, but it is really hard for me to reproduce it again.

krb5_error_code krb5_get_default_realm(krb5_context context, krb5_realm *realm);

I have tried the methods logged in bug 1108615, but seems it cannot be reproduced with setting DNS related Records, so did you have some good methods to reproduce that?

Thanks,
ChunYu Wang

Comment 4 Orion Poplawski 2017-03-16 17:31:31 UTC

Unfortunately, the crash occurred during some very strange circumstances due to errors with our IPA configuration and I'm not sure I can reproduce it either.  It happened after system boot and during user login.

Mar 15 13:13:48 amakihi sssd: Starting up
Mar 15 13:13:48 amakihi sssd[be[nwra.com]]: Starting up
Mar 15 13:13:48 amakihi systemd: Starting RPC security service for NFS client and server...
Mar 15 13:13:48 amakihi sssd[nss]: Starting up
Mar 15 13:13:48 amakihi sssd[sudo]: Starting up
Mar 15 13:13:48 amakihi sssd[autofs]: Starting up
Mar 15 13:13:48 amakihi sssd[ssh]: Starting up
Mar 15 13:13:48 amakihi sssd[pam]: Starting up
Mar 15 13:13:48 amakihi systemd: Started RPC security service for NFS client and server.
Mar 15 13:13:48 amakihi sssd[pac]: Starting up
Mar 15 13:13:58 amakihi kernel: FS-Cache: Loaded
Mar 15 13:13:58 amakihi kernel: FS-Cache: Netfs 'nfs' registered for caching
Mar 15 13:13:58 amakihi kernel: Key type dns_resolver registered
Mar 15 13:13:58 amakihi kernel: NFS: Registering the id_resolver key type
Mar 15 13:13:58 amakihi kernel: Key type id_resolver registered
Mar 15 13:13:58 amakihi kernel: Key type id_legacy registered
Mar 15 13:13:58 amakihi kernel: rpc.gssd[2207]: segfault at 0 ip 00007f89c2a2358a sp 00007f89c15dee08 error 4 in libc-2.17.so[7f89c28f0000+1b6000]
Mar 15 13:13:58 amakihi abrt-hook-ccpp: Process 733 (rpc.gssd) of user 0 killed by SIGSEGV - dumping core
Mar 15 13:13:58 amakihi kernel: NFS: nfs4_discover_server_trunking unhandled error -32. Exiting with error EIO
Mar 15 13:13:58 amakihi systemd: rpc-gssd.service: main process exited, code=killed, status=11/SEGV

I've noticed that it appears that sssd touches /etc/krb5.conf when it starts.  Could there be some kind of race between that and rpc.gssd startup?

Comment 5 ChunYu Wang 2017-03-17 01:09:35 UTC

(In reply to Orion Poplawski from comment #4)
> Unfortunately, the crash occurred during some very strange circumstances due
> to errors with our IPA configuration and I'm not sure I can reproduce it
> either.  It happened after system boot and during user login.

Yes, SSSD will read configurations from krb5.conf as it is the system daemon to provide access to identity and authentication.

> Mar 15 13:13:48 amakihi sssd: Starting up
> Mar 15 13:13:48 amakihi sssd[be[nwra.com]]: Starting up
> Mar 15 13:13:48 amakihi systemd: Starting RPC security service for NFS
> client and server...
^^^ rpc.gssd starts with using realm nwra.com (Normally, the krb5_get_default_realm function ensure we can get at least one realm other then NULL)
...
> Mar 15 13:13:58 amakihi kernel: rpc.gssd[2207]: segfault at 0 ip
> 00007f89c2a2358a sp 00007f89c15dee08 error 4 in
> libc-2.17.so[7f89c28f0000+1b6000]
^^^ rpc.gssd restarts with "preferred_realm" changed to NULL and this problem will happen

Maybe some resource races in rebooting may trigger this bug, but I cannot reproduce it manually, as this code defect is clear and obvious, I will set the "qe_test_coverage" to "-" and keep an eye on rpc.gssd status during other tests.

Thanks,
ChunYu Wang

Comment 9 ChunYu Wang 2017-04-09 14:52:54 UTC

Created attachment 1270251 [details]
A simple script to trigger equivalent segfault

Comment 10 ChunYu Wang 2017-04-09 15:04:10 UTC

(In reply to ChunYu Wang from comment #3)
> We can make sure this bug you reported is just the same as the bug 1108615,
> but I am very confused about how to reproduce it, by reading code in file
> utils/gssd/krb5_util.c of package nfs-utils, I found the problem will happen
> when function krb5_get_default_realm returns a NULL realm, but it is really
> hard for me to reproduce it again.

Passing NULL as string pointer to strcmp() function will force a deference at NULL to compare with chars codes (e.g. acsii code), this is an undefined behavior at run time.

I will try to prove the effectiveness of this patch in a simplified scenario.

According to the script in comment 9, I will set test_env=getenv("TEST") first, and observe the response of this strcmp() statement before/after fix with setting test_env to NULL; this will be exactly the same as this bug described:

#ifndef FIXED
    if(strcmp(pair, test_env)!=0)
// comment 0: if (strcmp (realm, preferred_realm) != 0) {

#else
    if(test_env && strcmp(pair, test_env)!=0)
// fixed version: if (preferred_realm && strcmp (realm, preferred_realm) != 0) {

#endif

--
[root@hp-dl380pg8-09 ~]# gcc ./segFault.c -o test.out
[root@hp-dl380pg8-09 ~]# ./test.out
Segmentation fault
^^^ Reproduced
[root@hp-dl380pg8-09 ~]# gcc ./segFault.c -o test.out -D FIXED
[root@hp-dl380pg8-09 ~]# ./test.out
Same or var 'TEST' is NULL
^^^ Resolved

Comment 12 ChunYu Wang 2017-04-09 15:33:19 UTC

As explained in comment 10, this fix method is common and effective. Regression test also shows this "common fix" will not cause problem in normal rpc.gssd workflow;

I will move the status to VERIFIED first, please feel free to open it again if any equivalent issue reproduces again.

Will keep an eye on this field during future RHEL-7.4 tests.

Thanks,
ChunYu Wang

Comment 13 errata-xmlrpc 2017-08-01 19:50:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2233