Description of problem: When I add a resource with class 'service', lrmd process segfaults, error in log: kernel: lrmd[6787]: segfault at 0 ip 00007ffd1184d625 sp 00007fff3d777a00 error 4 in libcrmservice.so.1.0.0[7ffd11842000+10000] I added this resource (which causes segfault): /usr/sbin/cibadmin -o resources -C -X " <primitive class=\"service\" id=\"haproxy\" type=\"haproxy\"> <instance_attributes id=\"haproxy-instance_attributes\"/> <operations> <op id=\"haproxy-monitor-start-delay-10s\" interval=\"30s\" name=\"monitor\" start-delay=\"10s\"/> </operations> " When 'systemd' resource class is used, the resource is properly added, this works: /usr/sbin/cibadmin -o resources -C -X " <primitive class=\"systemd\" id=\"haproxy\" type=\"haproxy\"> <instance_attributes id=\"haproxy-instance_attributes\"/> <operations> <op id=\"haproxy-monitor-start-delay-10s\" interval=\"30s\" name=\"monitor\" start-delay=\"10s\"/> </operations> " Version-Release number of selected component (if applicable): pacemaker-cli-1.1.11-1.fc20.x86_64 pacemaker-cluster-libs-1.1.11-1.fc20.x86_64 pacemaker-1.1.11-1.fc20.x86_64 pacemaker-libs-1.1.11-1.fc20.x86_64 How reproducible: I hit this issue in 100% cases when using 'service' resource class. Actual results: lrmd segfaults Expected results: resource is properly added Additional info: log when adding a resource with 'service' class: Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cibadmin[8429]: notice: crm_log_args: Invoked: /usr/sbin/cibadmin -o resources -C -X <primitive class="service" id="haproxy" type="haproxy"> <instance_attributes id="haproxy-instance_attributes"/> <operations> <op id="haproxy-monitor-start-delay-10s" interval="30s" name="monitor" start-delay="10s"/> </operations> Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: Diff: --- 0.11.5 Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: Diff: +++ 0.12.1 64e44b6c40afbc659e02c0505daa2e7e Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: -- <cib admin_epoch="0" epoch="11" num_updates="5"/> Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: ++ <primitive class="service" id="haproxy" type="haproxy"> Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: ++ <instance_attributes id="haproxy-instance_attributes"/> Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: ++ <operations> Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: ++ <op id="haproxy-monitor-start-delay-10s" interval="30s" name="monitor" start-delay="10s"/> Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: ++ </operations> Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi cib[6785]: notice: cib:diff: ++ </primitive> Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pengine[6789]: notice: unpack_config: On loss of CCM Quorum: Ignore Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pengine[6789]: notice: LogActions: Start haproxy (overcloud-controller1-vcyt7vbtaun3) Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: notice: te_rsc_command: Initiating action 9: monitor haproxy_monitor_0 on overcloud-controller2-ymk6smqx37vw Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: notice: te_rsc_command: Initiating action 7: monitor haproxy_monitor_0 on overcloud-controller1-vcyt7vbtaun3 Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: notice: te_rsc_command: Initiating action 5: monitor haproxy_monitor_0 on overcloud-controller0-sjepigxqnoqi (local) Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pengine[6789]: notice: process_pe_message: Calculated Transition 139: /var/lib/pacemaker/pengine/pe-input-14.bz2 Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi kernel: lrmd[6787]: segfault at 0 ip 00007ffd1184d625 sp 00007fff3d777a00 error 4 in libcrmservice.so.1.0.0[7ffd11842000+10000] Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: crm_ipc_read: Connection to lrmd failed Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: mainloop_gio_callback: Connection to lrmd[0x26c7ed0] closed (I/O condition=17) Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: crit: lrm_connection_destroy: LRM Connection failed Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: do_log: FSA: Input I_ERROR from lrm_connection_destroy() received in state S_TRANSITION_ENGINE Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: warning: do_state_transition: State transition S_TRANSITION_ENGINE -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL origin=lrm_connection_destroy ] Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: warning: do_recover: Fast-tracking shutdown in response to errors Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: warning: do_election_vote: Not voting in election, we're in state S_RECOVERY Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: warning: destroy_action: Cancelling timer for action 9 (src=473) Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: warning: destroy_action: Cancelling timer for action 7 (src=474) Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: warning: destroy_action: Cancelling timer for action 5 (src=475) Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: do_log: FSA: Input I_TERMINATE from do_recover() received in state S_RECOVERY Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: lrm_state_verify_stopped: 1 pending LRM operations at shutdown Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: lrm_state_verify_stopped: Pending action: ceilometer-agent-central:11 (ceilometer-agent-central_monitor_30000) Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: lrm_state_verify_stopped: Pending action: haproxy:21 (haproxy_monitor_0) Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: notice: do_lrm_control: Disconnected from the LRM Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: notice: terminate_cs_connection: Disconnecting from Corosync Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi crmd[6790]: error: crmd_fast_exit: Could not recover from internal error Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pacemakerd[6777]: error: child_death_dispatch: Managed process 6787 (lrmd) dumped core Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pacemakerd[6777]: notice: pcmk_child_exit: Child process lrmd terminated with signal 11 (pid=6787, core=1) Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pacemakerd[6777]: notice: pcmk_process_exit: Respawning failed child process: lrmd Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pacemakerd[6777]: error: pcmk_child_exit: Child process crmd (6790) exited: Generic Pacemaker error (201) Jul 08 07:15:52 overcloud-controller0-sjepigxqnoqi pacemakerd[6777]: notice: pcmk_process_exit: Respawning failed child process: crmd
Can we get a crm_report for this please? Be sure to install the pacemaker debug packages first so that stacktrace it generates is useful.
Created attachment 916334 [details] pacemaker crm_report output
Looks like a pretty straightforward use-of-NULL: #0 systemd_unit_by_name (arg_name=arg_name@entry=0x10c1fd0 "haproxy", out_unit=out_unit@entry=0x0) at systemd.c:145 145 while(*out_unit == NULL) { (gdb) p out_unit $1 = (gchar **) 0x0 This was fixed upstream in: https://github.com/ClusterLabs/pacemaker/commit/0597697 We'll pick it up once 1.1.12-final is released (next few days)
This message is a reminder that Fedora 20 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 20. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '20'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 20 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.