| Summary: | Failcount and related info should be reset or removed when the resource is deleted | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jaroslav Kortus <jkortus> |
| Component: | pacemaker | Assignee: | Andrew Beekhof <abeekhof> |
| Status: | CLOSED ERRATA | QA Contact: | Cluster QE <mspqa-list> |
| Severity: | low | Docs Contact: | |
| Priority: | low | ||
| Version: | 6.2 | CC: | cluster-maint, dvossel |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-1.1.7-6.el6 | Doc Type: | Bug Fix |
| Doc Text: |
Cause: Un-tested use case
Consequence: Records of previous failures of now deleted resources was preserved.
Fix: Implement new feature
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-06-20 13:48:47 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Jaroslav Kortus
2012-02-10 17:00:43 UTC
A related patch has been committed upstream: https://github.com/beekhof/pacemaker/commit/dbf1a62\n Medium: PE: Bug rhbz#789397 - Failcount and related info should be reset or removed when the resource is deleted A related patch has been committed upstream: https://github.com/beekhof/pacemaker/commit/c26e624 Low: PE: Bug rhbz#789397 - Failcount and related info should be reset or removed when the resource is deleted (regression test) # make sure httpd is not installed
# crm configure primitive webserver ocf:heartbeat:apache params configfile=/etc/httpd/conf/httpd.conf op monitor interval=30s
$ crm_mon -1 --inactive
============
Last updated: Thu Apr 5 10:01:30 2012
Last change: Thu Apr 5 10:01:24 2012 via cibadmin on m3c1-node01
Stack: cman
Current DC: m3c1-node01 - partition with quorum
Version: 1.1.7-5.el6-148fccfd5985c5590cc601123c6c16e966b85d14
3 Nodes configured, unknown expected votes
2 Resources configured.
============
Online: [ m3c1-node03 m3c1-node01 m3c1-node02 ]
Full list of resources:
virt-fencing (stonith:fence_xvm): Started m3c1-node01
webserver (ocf::heartbeat:apache): Stopped
Failed actions:
webserver_start_0 (node=m3c1-node03, call=4, rc=5, status=complete): not installed
webserver_start_0 (node=m3c1-node02, call=4, rc=5, status=complete): not installed
webserver_start_0 (node=m3c1-node01, call=5, rc=5, status=complete): not installed
# failed as expected
# for all nodes the same:
$ crm resource failcount webserver show m3c1-node02
scope=status name=fail-count-webserver value=0
# so far so good
# crm resource stop webserver; crm configure delete webserver
# install httpd (yum -y install httpd)
# crm configure primitive webserver ocf:heartbeat:apache params configfile=/etc/httpd/conf/httpd.conf op monitor interval=30s'
And the result is the same, notice from logs that is no longer true:
pengine[10316]: notice: unpack_rsc_op: Preventing webserver from re-starting on m3c1-node03: operation start failed 'not installed' (rc=5)
So there are probably still some traces of previous failure.
crm resource cleanup webserver fixes it.
Moving back to ASSIGNED.
forgot to mention the version: pacemaker-1.1.7-5.el6.x86_64 There was an issue fixed upstream recently that will resolve what you are running into. The crm stores historical data of the last failure for a resource. That data was not being cleared correctly resulting in the failure appearing like it returned. Like you noticed, before this patch the only way to complete remove that historical data was to issue a crm_resource --C command. This upstream patch will fix this. https://github.com/ClusterLabs/pacemaker/commit/2f970a1a2a7c50c25ff06f2a7f87d66438bb0afb works as expected with pacemaker-1.1.7-6.el6.x86_64 thank you :)
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause: Un-tested use case
Consequence: Records of previous failures of now deleted resources was preserved.
Fix: Implement new feature
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2012-0846.html |