Bug 1103090

Summary: certmonger coredumps -- dbus related
Product: Red Hat Enterprise Linux 6 Reporter: Jan Pazdziora (Red Hat) <jpazdziora>
Component: certmongerAssignee: Nalin Dahyabhai <nalin>
Status: CLOSED ERRATA QA Contact: Kaleem <ksiddiqu>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 6.5CC: jpazdziora, kchamart, nalin, nsoman
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: certmonger-0.75.8-1.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-10-14 07:12:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1098208    
Bug Blocks:    

Description Jan Pazdziora (Red Hat) 2014-05-30 08:28:16 UTC
Description of problem:

I've seen in multiple cases certmonger aborting with core dump lately.

Version-Release number of selected component (if applicable):

certmonger-0.61-3.el6.x86_64

How reproducible:

I cannot reproduce it in deterministic fashion on minimal setup but I've seen it multiple times in large setup.

Steps to Reproduce:
1. Have certmonger running, do something.

Actual results:

Certmonger coredumps.

Expected results:

Certmonger does not coredump.

Additional info:

The backtrace is

Core was generated by `/usr/sbin/certmonger -S -p /var/run/certmonger.pid'.
Program terminated with signal 6, Aborted.
#0  0x00007fd94caac925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007fd94caac925 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007fd94caae105 in abort () at abort.c:92
#2  0x00007fd94efa7af5 in _dbus_abort () at dbus-sysdeps.c:88
#3  0x00007fd94efa39a5 in _dbus_warn_check_failed (
    format=0x7fd94efb0c80 "arguments to %s() were incorrect, assertion \"%s\" failed in file %s line %d.\nThis is normally a bug in some application using the D-Bus library.\n") at dbus-internals.c:283
#4  0x00007fd94ef8aaa3 in dbus_connection_get_is_connected (connection=0x0) at dbus-connection.c:2876
#5  0x0000000000415a0a in cm_tdbus_reconnect (ec=0x9cb3b0, timer=<value optimized out>, current_time=..., pvt=0x9cd750) at tdbus.c:463
#6  0x00007fd94eb6fc91 in tevent_common_loop_timer_delay (ev=0x9cb3b0) at ../tevent_timed.c:341
#7  0x00007fd94eb70cbb in epoll_event_loop_once (ev=0x9cb3b0, location=<value optimized out>) at ../tevent_epoll.c:916
#8  0x00007fd94eb6f2e6 in std_event_loop_once (ev=0x9cb3b0, location=0x42ab1c "main.c:287") at ../tevent_standard.c:112
#9  0x00007fd94eb6b49d in _tevent_loop_once (ev=0x9cb3b0, location=0x42ab1c "main.c:287") at ../tevent.c:530
#10 0x00000000004075bd in main (argc=<value optimized out>, argv=0x7fff48661d88) at main.c:287
(gdb) quit

Comment 3 Nalin Dahyabhai 2014-05-30 15:40:41 UTC
This looks like bug #1055521, the fix for which will be pulled in as part of the rebase that we're doing for bug #1098208.

Comment 4 Jan Pazdziora (Red Hat) 2014-06-02 06:32:31 UTC
I made this one depend on bug 1098208.

Also, the build for bug 1055521 was in testing for some time, would it make sense to push it to stable updates so that it gets more exposure in Fedora 20?

Comment 7 Kaleem 2014-07-24 12:59:25 UTC
How we can verify this? 
From description it seems that this was not consistent and scenario is not much clear to reproduce except mentioning of large setup.
So i think we will verifying this as SanityOnly.

Comment 8 Nalin Dahyabhai 2014-07-24 15:21:10 UTC
(In reply to Kaleem from comment #7)
> How we can verify this? 
> From description it seems that this was not consistent and scenario is not
> much clear to reproduce except mentioning of large setup.
> So i think we will verifying this as SanityOnly.

One method for triggering this should be to start the certmonger service and then restart the messagebus service out from under it.  Restarting the message bus is not recommended in general, but certmonger's attempt to reconnect in case the outage is temporary shouldn't cause it to crash.

Comment 10 Kaleem 2014-08-06 09:37:17 UTC
Verified.

certmonger version:
==================
[root@rhel66-master ~]# rpm -q certmonger
certmonger-0.75.9-1.el6.x86_64
[root@rhel66-master ~]#

[root@rhel66-master ~]# service certmonger status
certmonger (pid  3511) is running...
[root@rhel66-master ~]# service messagebus restart
Stopping system message bus:                               [  OK  ]
Starting system message bus:                               [  OK  ]
[root@rhel66-master ~]# service certmonger status
certmonger (pid  3511) is running...
[root@rhel66-master ~]#

Comment 11 errata-xmlrpc 2014-10-14 07:12:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1512.html