Bug 1499217
Summary: | Cleanup of bundle resource is incomplete | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Damien Ciabrini <dciabrin> | ||||||||||||
Component: | pacemaker | Assignee: | Andrew Beekhof <abeekhof> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | pkomarov | ||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||
Priority: | urgent | ||||||||||||||
Version: | 7.4 | CC: | abeekhof, aherr, ahrechan, chjones, cluster-maint, kgaillot, mkrcmari, ohochman, ushkalim | ||||||||||||
Target Milestone: | rc | Keywords: | Triaged, ZStream | ||||||||||||
Target Release: | 7.5 | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | pacemaker-1.1.18-4.el7 | Doc Type: | No Doc Update | ||||||||||||
Doc Text: |
Previously, the "pcs resource cleanup" command ignored stopped child clone resources of a bundle. Consequently, it was not possible to erase the state of the resources. With this update, Pacemaker now recognizes stopped clone resources. As a result, the pcs tool now works correctly with bundles when cleaning up.
|
Story Points: | --- | ||||||||||||
Clone Of: | |||||||||||||||
: | 1509874 1514520 (view as bug list) | Environment: | |||||||||||||
Last Closed: | 2018-04-10 15:32:51 UTC | Type: | Bug | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 1494455, 1509874, 1514520 | ||||||||||||||
Attachments: |
|
Description
Damien Ciabrini
2017-10-06 11:48:57 UTC
Created attachment 1335243 [details]
CIB before pcs cleanup resource galera
Created attachment 1335244 [details]
CIB after pcs resource cleanup galera
Created attachment 1335245 [details]
output of pcs resource cleanup galera
Created attachment 1335281 [details]
galera configuration
As with clones, the upstream recommendation is to always operate on the bundle resource, never its primitive. I think pcs automatically translates it for clones, and it might be a good idea to do that with bundles, too. But I agree this is an odd outcome worth looking into. (In reply to Ken Gaillot from comment #6) > As with clones, the upstream recommendation is to always operate on the > bundle resource, never its primitive. I think pcs automatically translates > it for clones, and it might be a good idea to do that with bundles, too. But > I agree this is an odd outcome worth looking into. I don't buy this. crm_resource/pcs automatically escalates the request from the primitive to the clone. The only difference here is that it doesn't go all the way up to the bundle. Fixed by the following: https://github.com/beekhof/pacemaker/commit/a6466923875cb752cb68ad412cfc8296191e62ac https://github.com/beekhof/pacemaker/commit/b0ca9a11581e3ec62429e41899f76fe3afc8b294 https://github.com/beekhof/pacemaker/commit/c3d4ec0377a5e742a7aca5b129139f1ad970e4f7 *** Bug 1505909 has been marked as a duplicate of this bug. *** As noted in https://bugzilla.redhat.com/show_bug.cgi?id=1505909, comment #7, I tested a scratch build with the provided patch and I can now clean errors by doing "pcs resource cleanup galera-bundle". I can also reprobe the state of unmanaged resource. However, I now face another issue, in that when I "pcs resource manage galera-bundle" after the cleanup, a restart operation is triggered, which is unexpected and breaks the idiomatic way of "reprobing the current state of a resource before gicing back controller to pacemaker". Created attachment 1349106 [details]
crm_report of the unexpected restart
Attached crm_report of the unexpected restart:
Nov 07 21:01:15 ra1 crmd[5111]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Nov 07 21:01:15 ra1 pengine[5110]: notice: * Restart galera:2 ( Master galera-bundle-2 )
(In reply to Damien Ciabrini from comment #12) > As noted in https://bugzilla.redhat.com/show_bug.cgi?id=1505909, comment #7, > I tested a scratch build with the provided patch and I can now clean errors > by doing "pcs resource cleanup galera-bundle". I can also reprobe the state > of unmanaged resource. > > However, I now face another issue, in that when I "pcs resource manage > galera-bundle" after the cleanup, a restart operation is triggered, which is > unexpected and breaks the idiomatic way of "reprobing the current state of a > resource before gicing back controller to pacemaker". To clarify, the scratch build is for the z-stream Bug 1509874. Will comment there. Move to POST because in latest puddle - http://download.lab.bos.redhat.com/rcm-guest/puddles/OpenStack/12.0-RHEL-7/2017-11-16.4/ pacemaker-1.1.16-12.el7_4.4.x86_64 (In reply to Artem Hrechanychenko from comment #15) > Move to POST because in latest puddle - > http://download.lab.bos.redhat.com/rcm-guest/puddles/OpenStack/12.0-RHEL-7/ > 2017-11-16.4/ > > pacemaker-1.1.16-12.el7_4.4.x86_64 Switch back to ON_QA as this is RHEL BZ . I'm cloning this bug to be verified on OSP12, as it blocks the replace controller scenario. Resolved , cluster retains active control after galera node resumes its active status : after Description steps : galera resoure is active on all nodes Full list of resources: galera-bundle-0 (ocf::heartbeat:galera): Master controller-0 galera-bundle-1 (ocf::heartbeat:galera): Master controller-1 galera-bundle-2 (ocf::heartbeat:galera): Master controller-2 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled and as indicated by the logs : process_lrm_event: Result of monitor operation for galera-bundle-docker-0 on controller-0: 0 (ok) remote_node_up: Announcing pacemaker_remote node galera-bundle-0 erase_status_tag: Deleting lrm status entries for galera-bundle-0 | xpath=//node_state[@uname='galera-bundle-0']/lrm erase_status_tag: Deleting transient_attributes status entries for galera-bundle-0 | xpath=//node_state[@uname='galera-bundle-0']/transient_attributes crm_update_peer_state_iter: Node galera-bundle-0 state is now member | nodeid=0 previous=lost source=remote_node_up peer_update_callback: Remote node galera-bundle-0 is now member (was lost) send_remote_state_message: Notifying DC controller-2 of pacemaker_remote node galera-bundle-0 coming up Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:0860 |