Bug 1432643
| Summary: | segfault in rpc.gssd in find_keytab_entry | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Orion Poplawski <orion> | ||||
| Component: | nfs-utils | Assignee: | Steve Dickson <steved> | ||||
| Status: | CLOSED ERRATA | QA Contact: | ChunYu Wang <chunwang> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.3 | CC: | chunwang, jiyin, yoyang | ||||
| Target Milestone: | rc | Keywords: | Patch | ||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | nfs-utils-1.3.0-0.40.el7 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-08-01 19:50:23 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
(In reply to Orion Poplawski from comment #0) > Version-Release number of selected component (if applicable): > nfs-utils-1.3.0-0.33.el7_3.x86_64 > > Looks like this is a duplicate of bug #1108615 It should need the following upstream patch. commit 8399548e6b904116e0e41d83e4a4b571af8ea578 Author: Jeff Layton <jlayton> Date: Fri Sep 12 13:20:13 2014 -0400 gssd: ensure that preferred_realm is non-NULL before passing it to strcmp (In reply to Orion Poplawski from comment #0) > Description of problem: > > Program terminated with signal 11, Segmentation fault. Hi, Orion, We can make sure this bug you reported is just the same as the bug 1108615, but I am very confused about how to reproduce it, by reading code in file utils/gssd/krb5_util.c of package nfs-utils, I found the problem will happen when function krb5_get_default_realm returns a NULL realm, but it is really hard for me to reproduce it again. krb5_error_code krb5_get_default_realm(krb5_context context, krb5_realm *realm); I have tried the methods logged in bug 1108615, but seems it cannot be reproduced with setting DNS related Records, so did you have some good methods to reproduce that? Thanks, ChunYu Wang Unfortunately, the crash occurred during some very strange circumstances due to errors with our IPA configuration and I'm not sure I can reproduce it either. It happened after system boot and during user login. Mar 15 13:13:48 amakihi sssd: Starting up Mar 15 13:13:48 amakihi sssd[be[nwra.com]]: Starting up Mar 15 13:13:48 amakihi systemd: Starting RPC security service for NFS client and server... Mar 15 13:13:48 amakihi sssd[nss]: Starting up Mar 15 13:13:48 amakihi sssd[sudo]: Starting up Mar 15 13:13:48 amakihi sssd[autofs]: Starting up Mar 15 13:13:48 amakihi sssd[ssh]: Starting up Mar 15 13:13:48 amakihi sssd[pam]: Starting up Mar 15 13:13:48 amakihi systemd: Started RPC security service for NFS client and server. Mar 15 13:13:48 amakihi sssd[pac]: Starting up Mar 15 13:13:58 amakihi kernel: FS-Cache: Loaded Mar 15 13:13:58 amakihi kernel: FS-Cache: Netfs 'nfs' registered for caching Mar 15 13:13:58 amakihi kernel: Key type dns_resolver registered Mar 15 13:13:58 amakihi kernel: NFS: Registering the id_resolver key type Mar 15 13:13:58 amakihi kernel: Key type id_resolver registered Mar 15 13:13:58 amakihi kernel: Key type id_legacy registered Mar 15 13:13:58 amakihi kernel: rpc.gssd[2207]: segfault at 0 ip 00007f89c2a2358a sp 00007f89c15dee08 error 4 in libc-2.17.so[7f89c28f0000+1b6000] Mar 15 13:13:58 amakihi abrt-hook-ccpp: Process 733 (rpc.gssd) of user 0 killed by SIGSEGV - dumping core Mar 15 13:13:58 amakihi kernel: NFS: nfs4_discover_server_trunking unhandled error -32. Exiting with error EIO Mar 15 13:13:58 amakihi systemd: rpc-gssd.service: main process exited, code=killed, status=11/SEGV I've noticed that it appears that sssd touches /etc/krb5.conf when it starts. Could there be some kind of race between that and rpc.gssd startup? (In reply to Orion Poplawski from comment #4) > Unfortunately, the crash occurred during some very strange circumstances due > to errors with our IPA configuration and I'm not sure I can reproduce it > either. It happened after system boot and during user login. Yes, SSSD will read configurations from krb5.conf as it is the system daemon to provide access to identity and authentication. > Mar 15 13:13:48 amakihi sssd: Starting up > Mar 15 13:13:48 amakihi sssd[be[nwra.com]]: Starting up > Mar 15 13:13:48 amakihi systemd: Starting RPC security service for NFS > client and server... ^^^ rpc.gssd starts with using realm nwra.com (Normally, the krb5_get_default_realm function ensure we can get at least one realm other then NULL) ... > Mar 15 13:13:58 amakihi kernel: rpc.gssd[2207]: segfault at 0 ip > 00007f89c2a2358a sp 00007f89c15dee08 error 4 in > libc-2.17.so[7f89c28f0000+1b6000] ^^^ rpc.gssd restarts with "preferred_realm" changed to NULL and this problem will happen Maybe some resource races in rebooting may trigger this bug, but I cannot reproduce it manually, as this code defect is clear and obvious, I will set the "qe_test_coverage" to "-" and keep an eye on rpc.gssd status during other tests. Thanks, ChunYu Wang Created attachment 1270251 [details]
A simple script to trigger equivalent segfault
(In reply to ChunYu Wang from comment #3) > We can make sure this bug you reported is just the same as the bug 1108615, > but I am very confused about how to reproduce it, by reading code in file > utils/gssd/krb5_util.c of package nfs-utils, I found the problem will happen > when function krb5_get_default_realm returns a NULL realm, but it is really > hard for me to reproduce it again. Passing NULL as string pointer to strcmp() function will force a deference at NULL to compare with chars codes (e.g. acsii code), this is an undefined behavior at run time. I will try to prove the effectiveness of this patch in a simplified scenario. According to the script in comment 9, I will set test_env=getenv("TEST") first, and observe the response of this strcmp() statement before/after fix with setting test_env to NULL; this will be exactly the same as this bug described: #ifndef FIXED if(strcmp(pair, test_env)!=0) // comment 0: if (strcmp (realm, preferred_realm) != 0) { #else if(test_env && strcmp(pair, test_env)!=0) // fixed version: if (preferred_realm && strcmp (realm, preferred_realm) != 0) { #endif -- [root@hp-dl380pg8-09 ~]# gcc ./segFault.c -o test.out [root@hp-dl380pg8-09 ~]# ./test.out Segmentation fault ^^^ Reproduced [root@hp-dl380pg8-09 ~]# gcc ./segFault.c -o test.out -D FIXED [root@hp-dl380pg8-09 ~]# ./test.out Same or var 'TEST' is NULL ^^^ Resolved As explained in comment 10, this fix method is common and effective. Regression test also shows this "common fix" will not cause problem in normal rpc.gssd workflow; I will move the status to VERIFIED first, please feel free to open it again if any equivalent issue reproduces again. Will keep an eye on this field during future RHEL-7.4 tests. Thanks, ChunYu Wang Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2233 |
Description of problem: Program terminated with signal 11, Segmentation fault. #0 0x00007f89c2a2358a in __strcmp_sse42 () from /lib64/libc.so.6 (gdb) bt #0 0x00007f89c2a2358a in __strcmp_sse42 () from /lib64/libc.so.6 #1 0x000055d96fcd8fca in find_keytab_entry (context=0x7f89bc000910, kt=0x7f89bc000ae0, tgtname=tgtname@entry=0x55d970d73c30 "saga.cora.nwra.com", kte=kte@entry=0x7f89c15e1bb0, svcnames=svcnames@entry=0x7f89c15e1b80) at krb5_util.c:851 #2 0x000055d96fcd9d6d in gssd_refresh_krb5_machine_credential ( hostname=0x55d970d73c30 "saga.cora.nwra.com", ple=ple@entry=0x0, service=service@entry=0x55d970d834c0 "*") at krb5_util.c:1277 #3 0x000055d96fcd71c0 in krb5_use_machine_creds (clp=clp@entry=0x55d970d73410, uid=uid@entry=0, tgtname=tgtname@entry=0x0, service=service@entry=0x55d970d834c0 "*", rpc_clnt=rpc_clnt@entry=0x7f89c15e1cf0) at gssd_proc.c:543 #4 0x000055d96fcd73ed in process_krb5_upcall (clp=clp@entry=0x55d970d73410, uid=uid@entry=0, fd=9, tgtname=tgtname@entry=0x0, service=service@entry=0x55d970d834c0 "*") at gssd_proc.c:652 #5 0x000055d96fcd7baf in handle_gssd_upcall (info=0x55d970d834a0) at gssd_proc.c:803 #6 0x00007f89c2cb8dc5 in start_thread () from /lib64/libpthread.so.0 #7 0x00007f89c29e773d in clone () from /lib64/libc.so.6 (gdb) up #1 0x000055d96fcd8fca in find_keytab_entry (context=0x7f89bc000910, kt=0x7f89bc000ae0, tgtname=tgtname@entry=0x55d970d73c30 "saga.cora.nwra.com", kte=kte@entry=0x7f89c15e1bb0, svcnames=svcnames@entry=0x7f89c15e1b80) at krb5_util.c:851 851 if (strcmp (realm, preferred_realm) != 0) { (gdb) list 846 * the host and local default realm (if that hasn't already been tried). 847 */ 848 i = 0; 849 realm = realmnames[i]; 850 851 if (strcmp (realm, preferred_realm) != 0) { 852 realm = preferred_realm; 853 /* resetting the realmnames index */ 854 i = -1; 855 } (gdb) print realm $1 = 0x7f89bc005060 "NWRA.COM" (gdb) print preferred_realm $2 = 0x0 Version-Release number of selected component (if applicable): nfs-utils-1.3.0-0.33.el7_3.x86_64 Looks like this is a duplicate of bug #1108615