Bug 1117151 - lrmd segfaults when 'service' resource class is used
Summary: lrmd segfaults when 'service' resource class is used
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: pacemaker
Version: 20
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Andrew Beekhof
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-08 07:31 UTC by Jan Provaznik
Modified: 2015-06-30 01:19 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-06-30 01:19:46 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
pacemaker crm_report output (1002.38 KB, application/gzip)
2014-07-08 11:33 UTC, Jan Provaznik
no flags Details

Description Jan Provaznik 2014-07-08 07:31:59 UTC
Description of problem:
When I add a resource with class 'service', lrmd process segfaults, error in log:
kernel: lrmd[6787]: segfault at 0 ip 00007ffd1184d625 sp 00007fff3d777a00 error 4 in libcrmservice.so.1.0.0[7ffd11842000+10000]

I added this resource (which causes segfault):

/usr/sbin/cibadmin -o resources -C -X "
<primitive class=\"service\" id=\"haproxy\" type=\"haproxy\">
  <instance_attributes id=\"haproxy-instance_attributes\"/>
  <operations>                                                                  
    <op id=\"haproxy-monitor-start-delay-10s\" interval=\"30s\" name=\"monitor\" start-delay=\"10s\"/>
  </operations>
"


When 'systemd' resource class is used, the resource is properly added, this works:

/usr/sbin/cibadmin -o resources -C -X "
<primitive class=\"systemd\" id=\"haproxy\" type=\"haproxy\">
  <instance_attributes id=\"haproxy-instance_attributes\"/>
  <operations>                                                                  
    <op id=\"haproxy-monitor-start-delay-10s\" interval=\"30s\" name=\"monitor\" start-delay=\"10s\"/>
  </operations>
"

Version-Release number of selected component (if applicable):
pacemaker-cli-1.1.11-1.fc20.x86_64
pacemaker-cluster-libs-1.1.11-1.fc20.x86_64
pacemaker-1.1.11-1.fc20.x86_64
pacemaker-libs-1.1.11-1.fc20.x86_64


How reproducible:
I hit this issue in 100% cases when using 'service' resource class.


Actual results:
lrmd segfaults

Expected results:
resource is properly added

Additional info:
log when adding a resource with 'service' class:

Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cibadmin[8429]: notice: crm_log_args: Invoked: /usr/sbin/cibadmin -o resources -C -X 
                                                                   <primitive class="service" id="haproxy" type="haproxy">
                                                                     <instance_attributes id="haproxy-instance_attributes"/>
                                                                     <operations>                                                                  
                                                                       <op id="haproxy-monitor-start-delay-10s" interval="30s" name="monitor" start-delay="10s"/>
                                                                     </operations>
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: Diff: --- 0.11.5
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: Diff: +++ 0.12.1 64e44b6c40afbc659e02c0505daa2e7e
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: -- <cib admin_epoch="0" epoch="11" num_updates="5"/>
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: ++       <primitive class="service" id="haproxy" type="haproxy">
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: ++         <instance_attributes id="haproxy-instance_attributes"/>
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: ++         <operations>
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: ++           <op id="haproxy-monitor-start-delay-10s" interval="30s" name="monitor" start-delay="10s"/>
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: ++         </operations>
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: ++       </primitive>
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pengine[6789]: notice: unpack_config: On loss of CCM Quorum: Ignore
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pengine[6789]: notice: LogActions: Start   haproxy	(overcloud-controller1-vcyt7vbtaun3)
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: notice: te_rsc_command: Initiating action 9: monitor haproxy_monitor_0 on overcloud-controller2-ymk6smqx37vw
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: notice: te_rsc_command: Initiating action 7: monitor haproxy_monitor_0 on overcloud-controller1-vcyt7vbtaun3
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: notice: te_rsc_command: Initiating action 5: monitor haproxy_monitor_0 on overcloud-controller0-sjepigxqnoqi (local)
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pengine[6789]: notice: process_pe_message: Calculated Transition 139: /var/lib/pacemaker/pengine/pe-input-14.bz2
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi kernel: lrmd[6787]: segfault at 0 ip 00007ffd1184d625 sp 00007fff3d777a00 error 4 in libcrmservice.so.1.0.0[7ffd11842000+10000]
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: crm_ipc_read: Connection to lrmd failed
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: mainloop_gio_callback: Connection to lrmd[0x26c7ed0] closed (I/O condition=17)
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: crit: lrm_connection_destroy: LRM Connection failed
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: do_log: FSA: Input I_ERROR from lrm_connection_destroy() received in state S_TRANSITION_ENGINE
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: warning: do_state_transition: State transition S_TRANSITION_ENGINE -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=lrm_connection_destroy ]
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: warning: do_recover: Fast-tracking shutdown in response to errors
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: warning: do_election_vote: Not voting in election, we're in state S_RECOVERY
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: warning: destroy_action: Cancelling timer for action 9 (src=473)
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: warning: destroy_action: Cancelling timer for action 7 (src=474)
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: warning: destroy_action: Cancelling timer for action 5 (src=475)
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: lrm_state_verify_stopped: 1 pending LRM operations at shutdown
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: lrm_state_verify_stopped: Pending action: ceilometer-agent-central:11 (ceilometer-agent-central_monitor_30000)
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: lrm_state_verify_stopped: Pending action: haproxy:21 (haproxy_monitor_0)
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: notice: do_lrm_control: Disconnected from the LRM
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: notice: terminate_cs_connection: Disconnecting from Corosync
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: crmd_fast_exit: Could not recover from internal error
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pacemakerd[6777]: error: child_death_dispatch: Managed process 6787 (lrmd) dumped core
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pacemakerd[6777]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=6787, core=1)
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pacemakerd[6777]: notice: pcmk_process_exit: Respawning failed child process: lrmd
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pacemakerd[6777]: error: pcmk_child_exit: Child process crmd (6790) exited: Generic Pacemaker error (201)
Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pacemakerd[6777]: notice: pcmk_process_exit: Respawning failed child process: crmd

Comment 1 Andrew Beekhof 2014-07-08 07:48:54 UTC
Can we get a crm_report for this please? Be sure to install the pacemaker debug packages first so that stacktrace it generates is useful.

Comment 2 Jan Provaznik 2014-07-08 11:33:07 UTC
Created attachment 916334 [details]
pacemaker crm_report output

Comment 3 Andrew Beekhof 2014-07-08 23:53:02 UTC
Looks like a pretty straightforward use-of-NULL:

#0  systemd_unit_by_name (arg_name=arg_name@entry=0x10c1fd0 "haproxy", out_unit=out_unit@entry=0x0) at systemd.c:145
145	    while(*out_unit == NULL) {
(gdb) p out_unit
$1 = (gchar **) 0x0

This was fixed upstream in:
   https://github.com/ClusterLabs/pacemaker/commit/0597697

We'll pick it up once 1.1.12-final is released (next few days)

Comment 4 Fedora End Of Life 2015-05-29 12:19:18 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 5 Fedora End Of Life 2015-06-30 01:19:46 UTC
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.