Bug 1001987 - pacemaker tries to start a resource too often
Summary: pacemaker tries to start a resource too often
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: pacemaker
Version: 18
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Andrew Beekhof
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-08-28 09:20 UTC by lav
Modified: 2014-01-14 05:47 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-01-14 05:47:01 UTC
Type: Bug


Attachments (Terms of Use)
/var/log/messages (169.22 KB, text/plain)
2013-08-28 11:36 UTC, lav
no flags Details

Description lav 2013-08-28 09:20:08 UTC
Description of problem:
When a resource fails to start, pacemaker repeatedly tries to start it without any delay. For service: resource systemd refuses even to try to start it because of rate limit.


Version-Release number of selected component (if applicable):
pacemaker-1.1.9-0.1.70ad9fa.git.fc18.i686


How reproducible:
always

Steps to Reproduce:
1. create a service: resource that fails to start (in my case it was service:named with a typo in config) with op monitor=30s
2. try to manage it (pcs resource manage named)
3. see in the log /var/log/message lots of messages, and systemctl shows that it refuses to start named.service because of rate limit.

Actual results:
repeated attempts to start the resource without any delay

Expected results:
I believe it should delay additional attempts to start a resource.

Additional info:

Comment 1 Andrew Beekhof 2013-08-28 09:44:09 UTC
At the very least we need logs.  Even better would be a crm_report archive

Comment 2 lav 2013-08-28 11:36:58 UTC
Created attachment 791337 [details]
/var/log/messages

I have fixed my problem by changing "monitor interval=30s" to "monitor interval=30s start-delay=10s timeout=20s".

Anyway, here is the log.

Comment 3 Andrew Beekhof 2013-11-13 06:06:47 UTC
There's not a lot pacemaker can do here.

systemd claims that the start completed without error (clearly untrue) and then we find the resource stopped in the recurring monitor so we try to recover it.

We do supply a migration-threshold option though. This would cause the cluster to give up after the indicated number of failures.

Unless you object, I'll close this for now.

Comment 4 Fedora End Of Life 2013-12-21 14:31:54 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.


Note You need to log in before you can comment on or make changes to this bug.