Hide Forgot
Description of problem: this is about an outdated version of certmonger-0.61-3.el6.x86_64 on RHEL 6.4 for a request to review a stack trace and explain what is happening. in this case, certmonger shuts down for a request that seem correct, Request ID '20160110091219': status: NEED_TO_SAVE_CERT see some gdb details below. in particular, could this be the same as the following fix on RHEL 7, either: * Mon Jan 27 2014 Nalin Dahyabhai <nalin> 0.71-1 ...snip... - only handle one watch at a time, which should avoid abort() during attempts to reconnect to the message bus after losing our connection to it (#1055521) or * Mon Dec 09 2013 Nalin Dahyabhai <nalin> 0.69-1 ...snip,,, - don't abort() due to incorrect locking when an attempt to save an issued certificate to the designated location fails (part of #1032760/#1033333, ticket #22) ( but bz 1032760 - Certmonger crashes when trying to decode certificate with invalid data in header says fixed in version certmonger-0.75.2-1.el6 while upstream ticket 22 indicates fix is in 0.69 and later, which version has the fix? ) https://fedorahosted.org/certmonger/ticket/22 Certmonger aborts after unsuccessful attempt to save certificate "Certmonger acquires writing lock in the NEED_TO_SAVE_CERT state but doesn't release it after unsucessful attempt to save the certificate before it goes to NEED_TO_SAVE_CERT again, causing it to abort itself: ...snip... Fixed upstream in 0.69 and later. The RHEL6-specific resolution of this is being tracked at https://bugzilla.redhat.com/show_bug.cgi?id=1032760, and since we no longer need this ticket to track it, I'm marking this one as fixed. " ? In which case the recommendation would be an update to a recent version of certmonger. I can provide access to a VM to read a core file collected from the customer's case number 01556025 (sev1 for multiple morphing issues) Thanks, M. Version-Release number of selected component (if applicable): RHEL 6.4 certmonger-0.61-3.el6.x86_64 How reproducible: N/A Steps to Reproduce: 1. N/A 2. 3. Actual results: certmonger SIGABRT Expected results: Additional info: information collected from customer's attachment 880-sosreport-puncmnldap01.01556025-20160111094416-ed89.tar.xz certmonger-0.61-3.el6.x86_64 redhat-release-server-6Server-6.4.0.4.el6.x86_64 880-sosreport-puncmnldap01.01556025-20160111094416-ed89.tar.xz/puncmnldap01-2016011109411452485482/var/log/messages ... Jan 10 14:42:18 puncmnldap01 certmonger: Certificate named "auditSigningCert cert-pki-ca" in token "NSS Certificate DB" in database "/var/lib/pki-ca/alias" is no longer valid. Jan 10 14:42:19 puncmnldap01 certmonger: Certificate named "ocspSigningCert cert-pki-ca" in token "NSS Certificate DB" in database "/var/lib/pki-ca/alias" is no longer valid. Jan 10 14:42:20 puncmnldap01 certmonger: Certificate named "subsystemCert cert-pki-ca" in token "NSS Certificate DB" in database "/var/lib/pki-ca/alias" is no longer valid. Jan 10 14:42:21 puncmnldap01 python: Exception trying to retrieve auditSigningCert cert-pki-ca: Error initializing principal host/puncmnldap01.accelyakale.com in /etc/krb5.keytab: (-1765328230, 'Cannot find KDC for requested realm') Jan 10 14:42:21 puncmnldap01 certmonger: #033[?1034h Jan 10 14:42:22 puncmnldap01 python: Exception trying to retrieve ocspSigningCert cert-pki-ca: Error initializing principal host/puncmnldap01.accelyakale.com in /etc/krb5.keytab: (-1765328230, 'Cannot find KDC for requested realm') Jan 10 14:42:22 puncmnldap01 certmonger: #033[?1034h Jan 10 14:42:22 puncmnldap01 python: Exception trying to retrieve subsystemCert cert-pki-ca: Error initializing principal host/puncmnldap01.accelyakale.com in /etc/krb5.keytab: (-1765328230, 'Cannot find KDC for requested realm') Jan 10 14:42:23 puncmnldap01 certmonger: #033[?1034h Jan 10 14:43:37 puncmnldap01 ntpd[25973]: synchronized to LOCAL(0), stratum 10 Jan 10 14:43:39 puncmnldap01 python: Exception trying to retrieve ocspSigningCert cert-pki-ca: Error initializing principal host/puncmnldap01.accelyakale.com in /etc/krb5.keytab: (-1765328230, 'Cannot find KDC for requested realm') Jan 10 14:43:39 puncmnldap01 certmonger: #033[?1034h Jan 10 14:43:39 puncmnldap01 python: Exception trying to retrieve auditSigningCert cert-pki-ca: Error initializing principal host/puncmnldap01.accelyakale.com in /etc/krb5.keytab: (-1765328230, 'Cannot find KDC for requested realm') Jan 10 14:43:39 puncmnldap01 certmonger: #033[?1034h Jan 10 14:43:40 puncmnldap01 python: Exception trying to retrieve subsystemCert cert-pki-ca: Error initializing principal host/puncmnldap01.accelyakale.com in /etc/krb5.keytab: (-1765328230, 'Cannot find KDC for requested realm') Jan 10 14:43:40 puncmnldap01 certmonger: #033[?1034h Jan 10 14:44:51 puncmnldap01 ns-slapd: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (Credentials cache file '/tmp/krb5cc_495' not found) ...snip... Jan 11 09:29:11 puncmnldap01 python: certmonger started pki-cad, nickname 'ocspSigningCert cert-pki-ca' Jan 11 09:29:11 puncmnldap01 certmonger: Certificate named "ocspSigningCert cert-pki-ca" in token "NSS Certificate DB" in database "/var/lib/pki-ca/alias" issued by CA and saved. Jan 11 09:29:12 puncmnldap01 named[9357]: zone accelyakale.com/IN: sending notifies (serial 1452484748) Jan 11 09:29:15 puncmnldap01 python: certmonger stopping pki-cad Jan 11 09:29:16 puncmnldap01 python: Cannot stop pki-cad: Command '/sbin/service pki-cad stop pki-ca' returned non-zero exit status 4 Jan 11 09:29:16 puncmnldap01 certmonger: Certificate named "ocspSigningCert cert-pki-ca" in token "NSS Certificate DB" in database "/var/lib/pki-ca/alias" is no longer valid. Jan 11 09:29:18 puncmnldap01 python: Updating certificate for ocspSigningCert cert-pki-ca ...snip... Jan 11 09:29:46 puncmnldap01 certmonger: Certificate named "auditSigningCert cert-pki-ca" in token "NSS Certificate DB" in database "/var/lib/pki-ca/alias" issued by CA but not saved. Jan 11 09:30:13 puncmnldap01 named[9357]: client 10.10.9.64#12977: received notify for zone 'accelyakale.com' Jan 11 09:30:15 puncmnldap01 named[9357]: zone accelyakale.com/IN: sending notifies (serial 1452484815) Jan 11 09:30:16 puncmnldap01 abrtd: Directory 'ccpp-2016-01-11-09:30:16-12680' creation detected Jan 11 09:30:16 puncmnldap01 abrt[14239]: Saved core dump of pid 12680 (/usr/sbin/certmonger) to /var/spool/abrt/ccpp-2016-01-11-09:30:16-12680 (1064960 bytes) ...snip... Jan 11 10:39:59 puncmnldap01 abrtd: Directory 'ccpp-2016-01-11-10:39:59-21333' creation detected Jan 11 10:39:59 puncmnldap01 abrt[23407]: Saved core dump of pid 21333 (/usr/sbin/certmonger) to /var/spool/abrt/ccpp-2016-01-11-10:39:59-21333 (1073152 bytes) I do not seem to see a relevant fix for this kind of crash per the certmonger rpm changelog from RHEL 6.7: on rhel67 rpm -q certmonger certmonger-0.77.5-1.el6.x86_64 rpm -q certmonger --changelog | less * Thu May 28 2015 Nalin Dahyabhai <nalin> 0.77.5-1 ... * Fri Feb 27 2015 Nalin Dahyabhai <nalin> 0.77.1-1 - update to 0.77 - ipa-getcert: fix a crash when using DNS discovery to locate servers (#39) ... * Tue Jan 08 2013 Nalin Dahyabhai <nalin> 0.65-1 - fix a crash in the self-tests ... * Thu Nov 29 2012 Nalin Dahyabhai <nalin> 0.62-1 - don't crash when OpenSSL can't build a template certificate from a request when we're in FIPS mode ... * Tue Nov 27 2012 Nalin Dahyabhai <nalin> 0.61-3 - backport change from git to not choke if X509_REQ_to_X509() fails when we're self-signing using OpenSSL - backport another change from git to represent this as a CA-rejected error and reading from attachment 10-core.tar.gz Core was generated by `/usr/sbin/certmonger -S -p /var/run/certmonger.pid'. Program terminated with signal 6, Aborted. #0 0x00000031d02328a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); Thread 1 (Thread 0x7f4d38f16800 (LWP 21333)): #0 0x00000031d02328a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 resultvar = 0 pid = <value optimized out> selftid = 21333 #1 0x00000031d0234085 in abort () at abort.c:92 save_stage = 2 act = {__sigaction_handler = {sa_handler = 0x436, sa_sigaction = 0x436}, sa_mask = {__val = {1079, 1168, 213945652101, 0, 1159, 213948886760, 4305167, 6, 2, 0, 160, 0, 0, 0, 214748364808, 532575944795}}, sa_flags = 119, sa_restorer = 0xb} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x00000000004096a0 in cm_writing_lock (entry=0x225a490) at iterate.c:256 No locals. #3 0x000000000040a2e0 in cm_iterate (entry=0x225a490, ca=0x2265f50, context=0x2255590, get_ca_by_index=0x407b50 <cm_get_ca_by_index>, get_n_cas=0x407b70 <cm_get_n_cas>, emit_entry_saved_cert=0x41bbd0 <cm_tdbush_property_emit_entry_saved_cert>, emit_entry_changes=0x41bc90 <cm_tdbush_property_emit_entry_changes>, cm_iterate_state=0x22576f0, when=0x7fff17b3c204, delay=0x7fff17b3c20c, readfd=0x7fff17b3c208) at iterate.c:822 i = <value optimized out> j = <value optimized out> remaining = <value optimized out> state = 0x22576f0 tmp_ca = <value optimized out> old_entry = 0x226af50 serial = <value optimized out> tmp_ca_name = <value optimized out> #4 0x000000000040895f in cm_service_one (context=0x2255590, i=2, current_time=0x0) at cm.c:322 ret = <value optimized out> delay = <value optimized out> fd = -1 now = {tv_sec = 1452488999, tv_usec = 753394} then = {tv_sec = 140733591044608, tv_usec = 140733591845052} when = <value optimized out> t = <value optimized out> #5 0x0000000000408d69 in cm_timer_h (ec=<value optimized out>, te=<value optimized out>, current_time=..., pvt=0x2255590) at cm.c:146 context = 0x2255590 i = <value optimized out> #6 0x00000031d8207bd9 in tevent_common_loop_timer_delay (ev=0x2255450) at ../tevent_timed.c:254 current_time = {tv_sec = 1452488999, tv_usec = 753389} te = 0x2273760 #7 0x00000031d8207490 in epoll_event_loop (ev=<value optimized out>, location=<value optimized out>) at ../tevent_standard.c:298 ret = 0 i = <value optimized out> events = {{events = 16, data = {ptr = 0x22738c0, fd = 36124864, u32 = 36124864, u64 = 36124864}}} timeout = <value optimized out> #8 std_event_loop_once (ev=<value optimized out>, location=<value optimized out>) at ../tevent_standard.c:567 std_ev = 0x2255520 tval = {tv_sec = 2, tv_usec = 15464} #9 0x00000031d82038f0 in _tevent_loop_once (ev=0x2255450, location=0x42ab1c "main.c:287") at ../tevent.c:507 ret = <value optimized out> nesting_stack_ptr = 0x0 #10 0x00000000004075bd in main (argc=<value optimized out>, argv=0x7fff17b3c4d8) at main.c:287 ...snip... (gdb) where #0 0x00000031d02328a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00000031d0234085 in abort () at abort.c:92 #2 0x00000000004096a0 in cm_writing_lock (entry=0x225a490) at iterate.c:256 #3 0x000000000040a2e0 in cm_iterate (entry=0x225a490, ca=0x2265f50, context=0x2255590, get_ca_by_index=0x407b50 <cm_get_ca_by_index>, get_n_cas=0x407b70 <cm_get_n_cas>, emit_entry_saved_cert=0x41bbd0 <cm_tdbush_property_emit_entry_saved_cert>, emit_entry_changes=0x41bc90 <cm_tdbush_property_emit_entry_changes>, cm_iterate_state=0x22576f0, when=0x7fff17b3c204, delay=0x7fff17b3c20c, readfd=0x7fff17b3c208) at iterate.c:822 #4 0x000000000040895f in cm_service_one (context=0x2255590, i=2, current_time=0x0) at cm.c:322 #5 0x0000000000408d69 in cm_timer_h (ec=<value optimized out>, te=<value optimized out>, current_time=..., pvt=0x2255590) at cm.c:146 #6 0x00000031d8207bd9 in tevent_common_loop_timer_delay (ev=0x2255450) at ../tevent_timed.c:254 #7 0x00000031d8207490 in epoll_event_loop (ev=<value optimized out>, location=<value optimized out>) at ../tevent_standard.c:298 #8 std_event_loop_once (ev=<value optimized out>, location=<value optimized out>) at ../tevent_standard.c:567 #9 0x00000031d82038f0 in _tevent_loop_once (ev=0x2255450, location=0x42ab1c "main.c:287") at ../tevent.c:507 #10 0x00000000004075bd in main (argc=<value optimized out>, argv=0x7fff17b3c4d8) at main.c:287 (gdb) (gdb) up #1 0x00000031d0234085 in abort () at abort.c:92 92 raise (SIGABRT); (gdb) up #2 0x00000000004096a0 in cm_writing_lock (entry=0x225a490) at iterate.c:256 256 abort(); (gdb) list 251 if (writing_lock == NULL) { 252 cm_log(3, "%s('%s') taking writing lock\n", 253 entry->cm_busname, entry->cm_nickname); 254 writing_lock = entry; 255 } else { 256 abort(); 257 } 258 return TRUE; 259 } else { 260 return FALSE; (gdb) │247 static dbus_bool_t │ │248 cm_writing_lock(struct cm_store_entry *entry) │ │249 { │ │250 if ((writing_lock == entry) || (writing_lock == NULL)) { │ │251 if (writing_lock == NULL) { │ │252 cm_log(3, "%s('%s') taking writing lock\n", │ │253 entry->cm_busname, entry->cm_nickname); │ │254 writing_lock = entry; │ │255 } else { │ >│256 abort(); │ │257 } │ │258 return TRUE; │ │259 } else { │ │260 return FALSE; │ │261 } │ │262 } │ (gdb) p writing_lock $1 = (struct cm_store_entry *) 0x225a490 (gdb) (gdb) p entry $2 = (struct cm_store_entry *) 0x225a490 (gdb) (gdb) p * writing_lock $3 = {cm_busname = 0x225c4f0 "Request3", cm_store_private = 0x225c560, cm_nickname = 0x2258ef0 "20160110091219", cm_key_type = {cm_key_algorithm = cm_key_rsa, cm_key_gen_algorithm = cm_key_rsa, cm_key_size = 2048, cm_key_gen_size = 2048}, cm_key_storage_type = cm_key_storage_nssdb, cm_key_storage_location = 0x225a700 "/var/lib/pki-ca/alias", cm_key_token = 0x2256ee0 "NSS Certificate DB", cm_key_nickname = 0x225a130 "ocspSigningCert cert-pki-ca", cm_key_pin = 0x2258320 "501310396051", cm_key_pin_file = 0x0, cm_key_pubkey = 0x225a8b0 "3082010A0282010100E004C44FD3C01B34C5137DA7D0C555695B2323935594251DC1758E53183B77CFDC19081395AA9738DB39B30349111A85EC15DB883F6A82110573340776CB4917797795D0A7B4476702F5CDC715989094E0D915FEE66BE34E7A7C6B"..., cm_cert_storage_type = cm_cert_storage_nssdb, cm_cert_storage_location = 0x225a690 "/var/lib/pki-ca/alias", cm_cert_token = 0x2257860 "NSS Certificate DB", cm_cert_nickname = 0x2258390 "ocspSigningCert cert-pki-ca", cm_cert_issuer = 0x2259390 "CN=Certificate Authority,O=ACCELYAKALE.COM", cm_cert_serial = 0x225abd0 "02", cm_cert_subject = 0x225ac30 "CN=OCSP Subsystem,O=ACCELYAKALE.COM", cm_cert_spki = 0x225ae10 "30820122300d06092a864886f70d01010105000382010f003082010a0282010100e004c44fd3c01b34c5137da7d0c555695b2323935594251dc1758e53183b77cfdc19081395aa9738db39b30349111a85ec15db883f6a82110573340776cb4917797795"..., cm_cert_not_before = 1378736462, cm_cert_not_after = 1440944462, cm_cert_hostname = 0x225acb0, cm_cert_email = 0x225ab40, cm_cert_principal = 0x2257790, cm_cert_ku = 0x2257800 "1100011", cm_cert_eku = 0x225ad20 "1.3.6.1.5.5.7.3.9", cm_cert_profile = 0x0, cm_last_need_notify_check = 1452488898, cm_last_need_enroll_check = 1452488898, cm_notification_method = cm_notification_unspecified, cm_notification_destination = 0x0, cm_template_subject = 0x225b1f0 "CN=OCSP Subsystem,O=ACCELYAKALE.COM", cm_template_hostname = 0x0, cm_template_email = 0x0, cm_template_principal = 0x0, cm_template_ku = 0x225b280 "1100011", cm_template_eku = 0x225ad90 "1.3.6.1.5.5.7.3.9", cm_challenge_password = 0x0, cm_csr = 0x2271960 "-----BEGIN NEW CERTIFICATE REQUEST-----\nMIIC3TCCAcUCAQAwMzEYMBYGA1UEChMPQUNDRUxZQUtBTEUuQ09NMRcwFQYDVQQD\nEw5PQ1NQIFN1YnN5c3RlbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEB\nAOAExE/TwBs0xRN9p9DFVWlbIyOTVZ"..., cm_spkac = 0x225b430 "MIICQDCCASgwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDgBMRP08AbNMUTfafQxVVpWyMjk1WUJR3BdY5TGDt3z9wZCBOVqpc42zmzA0kRGoXsFduIP2qCEQVzNAd2y0kXeXeV0Ke0R2cC9c3HFZiQlODZFf7ma+NOenxrLt6L5ygBvWqmFD+6qyMO6Tgj"..., cm_state = CM_NEED_TO_SAVE_CERT, cm_autorenew = 1, cm_monitor = 1, cm_ca_nickname = 0x225b360 "dogtag-ipa-retrieve-agent-submit", cm_ca_profile = 0x225b8d0 "ocspSigningCert cert-pki-ca", cm_submitted = 1452488901, cm_ca_cookie = 0x0, cm_ca_error = 0x0, cm_cert = 0x2269df0 "\033[?1034h-----BEGIN CERTIFICATE-----\nMIIDgDCCAmigAwIBAgICAOIwDQYJKoZIhvcNAQELBQAwOjEYMBYGA1UEChMPQUND\nRUxZQUtBTEUuQ09NMR4wHAYDVQQDExVDZXJ0aWZpY2F0ZSBBdXRob3JpdHkwHhcN\nMTUwODI5MDMwOTU4WhcNMTcwODE4MDMwOT"..., cm_pre_certsave_command = 0x225be10 "/usr/lib64/ipa/certmonger/stop_pkicad", cm_pre_certsave_uid = 0x225beb0 "0", cm_post_certsave_command = 0x225c690 "/usr/lib64/ipa/certmonger/restart_pkicad \"ocspSigningCert cert-pki-ca\"", cm_post_certsave_uid = 0x225bf20 "0"} (gdb) this matches the third record from output of getcert list ... Request ID '20160110091219': status: NEED_TO_SAVE_CERT stuck: no key pair storage: type=NSSDB,location='/var/lib/pki-ca/alias',nickname='ocspSigningCert cert-pki-ca',token='NSS Certificate DB',pin='501310396051' certificate: type=NSSDB,location='/var/lib/pki-ca/alias',nickname='ocspSigningCert cert-pki-ca',token='NSS Certificate DB' CA: dogtag-ipa-retrieve-agent-submit issuer: CN=Certificate Authority,O=ACCELYAKALE.COM subject: CN=OCSP Subsystem,O=ACCELYAKALE.COM expires: 2015-08-30 14:21:02 UTC eku: id-kp-OCSPSigning pre-save command: /usr/lib64/ipa/certmonger/stop_pkicad post-save command: /usr/lib64/ipa/certmonger/restart_pkicad "ocspSigningCert cert-pki-ca" track: yes auto-renew: yes gdb) up #3 0x000000000040a2e0 in cm_iterate (entry=0x225a490, ca=0x2265f50, context=0x2255590, get_ca_by_index=0x407b50 <cm_get_ca_by_index>, get_n_cas=0x407b70 <cm_get_n_cas>, emit_entry_saved_cert=0x41bbd0 <cm_tdbush_property_emit_entry_saved_cert>, emit_entry_changes=0x41bc90 <cm_tdbush_property_emit_entry_changes>, cm_iterate_state=0x22576f0, when=0x7fff17b3c204, delay=0x7fff17b3c20c, readfd=0x7fff17b3c208) at iterate.c:822 (gdb) │821 case CM_NEED_TO_SAVE_CERT: >│822 if (!cm_writing_lock(entry)) { │823 /* Just hang out in this state while we're messing │824 * around with the outside world for another entry. */ │825 cm_log(3, "%s('%s') waiting for saving lock\n", │826 entry->cm_busname, entry->cm_nickname); │827 *when = cm_time_soon; │828 break; │829 }