Bug 1490467
| Summary: | systemd[1]: rpc-gssd.service: main process exited, code=killed, status=6/ABRT | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Orion Poplawski <orion> | ||||||
| Component: | nfs-utils | Assignee: | Steve Dickson <steved> | ||||||
| Status: | CLOSED WONTFIX | QA Contact: | Yongcheng Yang <yoyang> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 7.5 | CC: | baumanmo, fsorenso, orion, rbergant, rharwood, ssorce, steved, xzhou, yoyang | ||||||
| Target Milestone: | rc | Keywords: | Reopened | ||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2020-11-11 21:55:34 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Orion Poplawski
2017-09-11 16:34:23 UTC
Just curious... If you take gssproxy out of the picture by set GSS_USE_PROXY="no" in /etc/sysconfig/nfs does the abrt happen? It may be too early to tell, but early testing seems to indicate that setting GSS_USE_PROXY=no prevents the crash. Unfortunately, I also cannot reproduce the crash with gdb attached to rpc.gssd. Di abrt catch the rpc.gssd stacktrace ? I would like to take a look at it to see where it blows up. No, it didn't. I don't know why. abrt-hook-ccpp[7829]: Process 1283 (rpc.gssd) of user 0 killed by SIGABRT - dumping core abrt-hook-ccpp[7829]: Failed to create core_backtrace: waitpid failed: No child processes Not sure why it isn't catching the coredump. Still present with nfs-utils-1.3.0-0.61.el7.x86_64, but still not producing a coredump. Created attachment 1523541 [details]
core_backtrace
I cant't get a good backtrace with gdb on the coredump, but this is what abrtd collected.
, { "address": 139826512251128
, "build_id": "95cdabda24bcd671d2876c8d7c5d6411902a8566"
, "build_id_offset": 227576
, "function_name": "abort"
, "file_name": "/lib64/libc.so.6"
}
, { "address": 139826512518343
, "build_id": "95cdabda24bcd671d2876c8d7c5d6411902a8566"
, "build_id_offset": 494791
, "function_name": "__libc_message"
, "file_name": "/lib64/libc.so.6"
}
, { "address": 139826512553001
, "build_id": "95cdabda24bcd671d2876c8d7c5d6411902a8566"
, "build_id_offset": 529449
, "function_name": "_int_free"
, "file_name": "/lib64/libc.so.6"
}
, { "address": 94794261335291
, "build_id": "5b24daf020ad3925c1805d79c7152bbdaa7b2715"
, "build_id_offset": 40187
, "function_name": "gssd_get_single_krb5_cred.constprop.4"
, "file_name": "/usr/sbin/rpc.gssd"
}
, { "address": 94794261336012
, "build_id": "5b24daf020ad3925c1805d79c7152bbdaa7b2715"
, "build_id_offset": 40908
, "function_name": "gssd_refresh_krb5_machine_credential"
, "file_name": "/usr/sbin/rpc.gssd"
}
, { "address": 94794261324896
, "build_id": "5b24daf020ad3925c1805d79c7152bbdaa7b2715"
, "build_id_offset": 29792
, "function_name": "krb5_use_machine_creds"
, "file_name": "/usr/sbin/rpc.gssd"
}
Finally seem to have a viable coredump - looks like we have memory corruption:
(gdb) bt
#0 0x00007f9e03947207 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1 0x00007f9e039488f8 in __GI_abort () at abort.c:90
#2 0x00007f9e03989d27 in __libc_message (do_abort=do_abort@entry=2,
fmt=fmt@entry=0x7f9e03a9b678 "*** Error in `%s': %s: 0x%s ***\n")
at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3 0x00007f9e03992489 in malloc_printerr (ar_ptr=0x7f9dfc000020, ptr=<optimized out>,
str=0x7f9e03a9b738 "double free or corruption (fasttop)", action=3) at malloc.c:5004
#4 _int_free (av=0x7f9dfc000020, p=<optimized out>, have_lock=0) at malloc.c:3843
#5 0x0000557c2be4acfb in gssd_get_single_krb5_cred (context=0x7f9dfc0045e0, kt=<optimized out>,
ple=ple@entry=0x7f9dfc005fa0, nocache=0) at krb5_util.c:427
#6 0x0000557c2be4afcc in gssd_refresh_krb5_machine_credential (
hostname=0x557c2c87da00 "csdisk4ib.cora.nwra.com", ple=0x7f9dfc005fa0, ple@entry=0x0,
service=service@entry=0x557c2c892410 "*") at krb5_util.c:1302
#7 0x0000557c2be48460 in krb5_use_machine_creds (clp=clp@entry=0x557c2c87de40, uid=uid@entry=0,
tgtname=tgtname@entry=0x0, service=service@entry=0x557c2c892410 "*",
rpc_clnt=rpc_clnt@entry=0x7f9e00f4acf0) at gssd_proc.c:546
#8 0x0000557c2be4868d in process_krb5_upcall (clp=clp@entry=0x557c2c87de40, uid=uid@entry=0,
fd=10, tgtname=tgtname@entry=0x0, service=service@entry=0x557c2c892410 "*") at gssd_proc.c:655
#9 0x0000557c2be48ed9 in handle_gssd_upcall (info=0x557c2c8923f0) at gssd_proc.c:814
#10 0x00007f9e03ce5dd5 in start_thread (arg=0x7f9e00f4b700) at pthread_create.c:307
#11 0x00007f9e03a0eead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) up 5
#5 0x0000557c2be4acfb in gssd_get_single_krb5_cred (context=0x7f9dfc0045e0, kt=<optimized out>,
ple=ple@entry=0x7f9dfc005fa0, nocache=0) at krb5_util.c:427
427 free(ple->ccname);
(gdb) list
422 cache_type,
423 ccachesearch[0], GSSD_DEFAULT_CRED_PREFIX,
424 GSSD_DEFAULT_MACHINE_CRED_SUFFIX, ple->realm);
425 ple->endtime = my_creds.times.endtime;
426 if (ple->ccname != NULL)
427 free(ple->ccname);
428 ple->ccname = strdup(cc_name);
429 if (ple->ccname == NULL) {
430 printerr(0, "ERROR: no storage to duplicate credentials "
431 "cache name '%s'\n", cc_name);
(gdb) print *ple
$1 = {next = 0x0, princ = 0x7f9dfc006460,
ccname = 0x7f9df4006060 "FILE:/tmp/krb5ccmachine_NWRA.COM", realm = 0x7f9dfc0061a0 "NWRA.COM",
endtime = 1549433693}
Robby, I seem to recall some recent fixes with ccaches and double frees, can you take a look at this one and see if this is related ? Unless you're using a MEMORY ccache, it wouldn't be related to all that. (And that stuff only matters for the case of manipulating multiple handles to the same one anyway.) But if you wanted to be sure, you can try krb5-1.15.1-37 (7.6.z). Unfortunately corruption issues are going to be nigh-impossible to debug without a trace from under valgrind (with debug symbols installed). Uhm looking better at the backtrace this is not a libkr5 call, this is still pure gssd code. Steve, sounds like this is in your court. Red Hat Enterprise Linux 7 shipped it's final minor release on September 29th, 2020. 7.9 was the last minor releases scheduled for RHEL 7. From intial triage it does not appear the remaining Bugzillas meet the inclusion criteria for Maintenance Phase 2 and will now be closed. From the RHEL life cycle page: https://access.redhat.com/support/policy/updates/errata#Maintenance_Support_2_Phase "During Maintenance Support 2 Phase for Red Hat Enterprise Linux version 7,Red Hat defined Critical and Important impact Security Advisories (RHSAs) and selected (at Red Hat discretion) Urgent Priority Bug Fix Advisories (RHBAs) may be released as they become available." If this BZ was closed in error and meets the above criteria please re-open it flag for 7.9.z, provide suitable business and technical justifications, and follow the process for Accelerated Fixes: https://source.redhat.com/groups/public/pnt-cxno/pnt_customer_experience_and_operations_wiki/support_delivery_accelerated_fix_release_handbook Feature Requests can re-opened and moved to RHEL 8 if the desired functionality is not already present in the product. Please reach out to the applicable Product Experience Engineer[0] if you have any questions or concerns. [0] https://bugzilla.redhat.com/page.cgi?id=agile_component_mapping.html&product=Red+Hat+Enterprise+Linux+7 |