Bug 1300729 - Pacemaker failed to start a systemd resource: 'not running'
Pacemaker failed to start a systemd resource: 'not running'
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pacemaker (Show other bugs)
7.2
x86_64 Linux
unspecified Severity high
: rc
: ---
Assigned To: Ken Gaillot
cluster-qe@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-21 09:56 EST by Matti Linnanvuori
Modified: 2016-01-21 10:47 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-01-21 10:47:32 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matti Linnanvuori 2016-01-21 09:56:38 EST
Description of problem:

Pacemaker failed to start a systemd resource: 'not running'

Version-Release number of selected component (if applicable):

pacemaker 1.1.13-10.el7

How reproducible:

Not reproducible.

Steps to Reproduce:
1. Create a three-node cluster with a standby node.
2. Add a systemd resource.
3. Watch the resource.

Actual results:

The resource failed to start on an online node.

Expected results:

The resource should start on an online node.

Additional info:

sudo pcs status
Cluster name: MDCS
Last updated: Thu Jan 21 14:00:03 2016		Last change: Thu Jan 21 13:31:31 2016 by hacluster via crm_attribute on tauti
Stack: corosync
Current DC: tauti (version 1.1.13-10.el7-44eb2dd) - partition with quorum
3 nodes and 13 resources configured

Node tauti: standby
Node teema: standby
Online: [ tauko ]

Full list of resources:

 DMS-IP	(ocf::heartbeat:IPaddr2):	Started tauko
 Resource Group: DMS
     apache2	(systemd:httpd):	Started tauko
     DMS-GW	(lsb:dms):	Started tauko
 Resource Group: PMC
     pmc-routing	(systemd:pmc-routing):	Stopped
     pmc-email-amqp-dispatcher	(systemd:pmc-email-amqp-dispatcher):	Stopped
     pmc-email-main	(systemd:pmc-email-main):	Stopped
     pmc-smpp-receive-json	(systemd:pmc-smpp-receive-json):	Stopped
     pmc-smpp-receive-dlr	(systemd:pmc-smpp-receive-dlr):	Stopped
     pmc-smpp-receive-msg	(systemd:pmc-smpp-receive-msg):	Stopped
     postfix	(systemd:postfix):	Stopped
 Resource Group: kannel
     kannel-bearerbox	(systemd:kannel-bearerbox):	Started tauko
     kannel-smsbox	(systemd:kannel-smsbox):	Started tauko
     kannel-wapbox	(systemd:kannel-wapbox):	Started tauko

Failed Actions:
* pmc-routing_start_0 on tauko 'not running' (7): call=60, status=complete, exitreason='none',
    last-rc-change='Wed Jan 20 16:07:59 2016', queued=0ms, exec=2008ms


PCSD Status:
  tauko: Online
  tauti: Online
  teema: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

tauko /var/log/messages:
Jan 20 16:08:01 localhost crmd[1491]:  notice: Operation pmc-routing_start_0: not running (node=tauko, call=60, rc=7, cib-update=32, confirmed=true)

tauti /var/log/messages:
Jan 20 16:07:59 localhost crmd[2145]:  notice: Transition 6 (Complete=20, Pending=0, Fired=0, Skipped=3, Incomplete=25, Source=/var/lib/pacemaker/pengine/pe-error-162.bz2): Stopped
Jan 20 16:07:59 localhost pengine[2144]:  notice: Start   DMS-GW#011(tauko)
Jan 20 16:07:59 localhost pengine[2144]:  notice: Start   pmc-routing#011(tauko)
Jan 20 16:07:59 localhost pengine[2144]:  notice: Start   pmc-email-amqp-dispatcher#011(tauko)
Jan 20 16:07:59 localhost pengine[2144]:  notice: Start   pmc-email-main#011(tauko)
Jan 20 16:07:59 localhost pengine[2144]:  notice: Start   pmc-smpp-receive-json#011(tauko)
Jan 20 16:07:59 localhost pengine[2144]:  notice: Start   pmc-smpp-receive-dlr#011(tauko)
Jan 20 16:07:59 localhost pengine[2144]:  notice: Start   pmc-smpp-receive-msg#011(tauko)
Jan 20 16:07:59 localhost pengine[2144]:  notice: Start   postfix#011(tauko)
Jan 20 16:07:59 localhost pengine[2144]:  notice: Start   kannel-bearerbox#011(tauko)
Jan 20 16:07:59 localhost pengine[2144]:  notice: Start   kannel-smsbox#011(tauko)
Jan 20 16:07:59 localhost pengine[2144]:  notice: Start   kannel-wapbox#011(tauko)
Jan 20 16:07:59 localhost crmd[2145]:  notice: Initiating action 12: start DMS-GW_start_0 on tauko
Jan 20 16:07:59 localhost pengine[2144]:  notice: Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-268.bz2
Jan 20 16:07:59 localhost crmd[2145]:  notice: Initiating action 13: monitor DMS-GW_monitor_60000 on tauko
Jan 20 16:07:59 localhost crmd[2145]:  notice: Initiating action 18: start pmc-routing_start_0 on tauko
Jan 20 16:08:01 localhost crmd[2145]: warning: Action 18 (pmc-routing_start_0) on tauko failed (target: 0 vs. rc: 7): Error
Jan 20 16:08:01 localhost crmd[2145]:  notice: Transition aborted by pmc-routing_start_0 'modify' on tauko: Event failed (magic=0:7;18:7:0:da931aba-558d-4290-a05b-6f5971f308e0, cib=0.290.59, source=match_graph_event:381, 0)
Jan 20 16:08:01 localhost crmd[2145]: warning: Action 18 (pmc-routing_start_0) on tauko failed (target: 0 vs. rc: 7): Error
Jan 20 16:08:01 localhost crmd[2145]:  notice: Transition 7 (Complete=7, Pending=0, Fired=0, Skipped=1, Incomplete=21, Source=/var/lib/pacemaker/pengine/pe-input-268.bz2): Stopped
Jan 20 16:08:01 localhost pengine[2144]: warning: Processing failed op start for pmc-routing on tauko: not running (7)
Jan 20 16:08:01 localhost pengine[2144]: warning: Processing failed op start for pmc-routing on tauko: not running (7)
Jan 20 16:08:01 localhost pengine[2144]:  notice: Recover pmc-routing#011(Started tauko)
Jan 20 16:08:01 localhost pengine[2144]:  notice: Start   pmc-email-amqp-dispatcher#011(tauko)
Jan 20 16:08:01 localhost pengine[2144]:  notice: Start   pmc-email-main#011(tauko)
Jan 20 16:08:01 localhost pengine[2144]:  notice: Start   pmc-smpp-receive-json#011(tauko)
Jan 20 16:08:01 localhost pengine[2144]:  notice: Start   pmc-smpp-receive-dlr#011(tauko)
Jan 20 16:08:01 localhost pengine[2144]:  notice: Start   pmc-smpp-receive-msg#011(tauko)
Jan 20 16:08:01 localhost pengine[2144]:  notice: Start   postfix#011(tauko)
Jan 20 16:08:01 localhost pengine[2144]:  notice: Start   kannel-bearerbox#011(tauko)
Jan 20 16:08:01 localhost pengine[2144]:  notice: Start   kannel-smsbox#011(tauko)
Jan 20 16:08:01 localhost pengine[2144]:  notice: Start   kannel-wapbox#011(tauko)
Jan 20 16:08:01 localhost pengine[2144]:  notice: Calculated Transition 8: /var/lib/pacemaker/pengine/pe-input-269.bz2
Jan 20 16:08:01 localhost pengine[2144]: warning: Processing failed op start for pmc-routing on tauko: not running (7)
Jan 20 16:08:01 localhost pengine[2144]: warning: Processing failed op start for pmc-routing on tauko: not running (7)
Jan 20 16:08:01 localhost pengine[2144]: warning: Forcing pmc-routing away from tauko after 1000000 failures (max=1000000)
Comment 2 John Ruemker 2016-01-21 10:47:32 EST
(In reply to Matti Linnanvuori from comment #0)
> Description of problem:
> 
> Pacemaker failed to start a systemd resource: 'not running'
> 

Hello,
The problem you described would require further investigation in order to identify the true cause and any available means to resolve it, which is best done in a support case rather than here in bugzilla, which is intended more for reporting bugs and undesired behaviors in the product itself.  

I would like to request that you please engage Red Hat Global Support Services through one of the methods described at:

  https://access.redhat.com/start/how-to-engage-red-hat-support

From there, we'll collect some additional information from you and take a closer look at the specifics of this incident to help you resolve the underlying problem.

In an attempt to give you some guidance in the meantime: the problem you described doesn't look to be anything unexpected from pacemaker itself.  It attempted to start the pmc-routing systemd service and encountered a failure; by default, the cluster property start-failure-is-fatal is set, which causes a start failure to automatically be treated as a reason to ban the local node from further attempts to start that resource and to try on another node.  Since both of the other nodes are in standby, there is nowhere else to start, so it gives up.  

The real problem here is that a systemd start of pmc-routing failed.  We'll need to look more closely at why that is and address it.  This is what we can help you with in a support case.

Since there doesn't appear to be anything in this data suggesting that there is a problem in need of a fix in pacemaker, I'm going to close this out.  If we discover in the course of our investigation through a support case that there is some unexpected behavior in one of our products, then Red Hat Global Support Services will coordinate with the developers of that component to look into it and pursue any necessary fixes. 

Regards,
John Ruemker, RHCA
Principal Software Maintenance Engineer
Red Hat Global Support Services

Note You need to log in before you can comment on or make changes to this bug.