Bug 1652752
Summary: | Master/Slave bundle resource does not failover Master state across replicas | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Damien Ciabrini <dciabrin> | ||||||
Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | ||||||
Status: | CLOSED ERRATA | QA Contact: | pkomarov | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 7.6 | CC: | abeekhof, aherr, chjones, cluster-maint, mkrcmari, pkomarov, salmy | ||||||
Target Milestone: | rc | Keywords: | Regression, ZStream | ||||||
Target Release: | 7.7 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | pacemaker-1.1.20-1.el7 | Doc Type: | Bug Fix | ||||||
Doc Text: |
Previously, a clone notification scheduled for a Pacemaker Remote node or bundle that was disconnected sometimes blocked Pacemaker from all further cluster actions. With this update, notifications are scheduled correctly, and a notification on a disconnected remote connection does not prevent the cluster from taking further actions. As a result, the cluster continues to manage resources correctly.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 1654602 (view as bug list) | Environment: | |||||||
Last Closed: | 2019-08-06 12:53:44 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1652613, 1654602, 1658631 | ||||||||
Attachments: |
|
Description
Damien Ciabrini
2018-11-22 21:57:15 UTC
Created attachment 1508126 [details]
redis config
In both 7.5 and 7.6, notify actions are scheduled to run on a bundle node that has just been stopped, and so the cluster node that had the bundle connection fakes the result of the notify. The regression in behavior is a side effect of unrelated bug fixes improving fail-safe checking of faked results. In 7.5 in this situation, the faked result would be unconditionally processed. In 7.6, we first check whether the node has resource info for the result being faked. Since the node was stopped, that info doesn't exist, and the fake result is not processed. This leads to the notify action being lost, so the transition is restarted, which gets into a loop doing the same thing again. Beekhof's patch addresses the underlying (and pre-existing) issue, so the result does not need to be faked. Fixed in upstream master branch by commit be5d23c1 (which will make it into RHEL 8), backported to upstream 1.1 branch as commit 32fac002 (which will make it into RHEL 7.7 as part of this bz) Verified , initial state : (undercloud) [stack@undercloud-0 ~]$ ansible controller -m shell -b -a 'rpm -qa|grep pacemaker' [WARNING]: Found both group and host with same name: undercloud [WARNING]: Consider using the yum, dnf or zypper module rather than running rpm. If you need to use command because yum, dnf or zypper is insufficient you can add warn=False to this command task or set command_warnings=False in ansible.cfg to get rid of this message. controller-0 | SUCCESS | rc=0 >> pacemaker-cli-1.1.20-1.el7.x86_64 pacemaker-remote-1.1.20-1.el7.x86_64 pacemaker-1.1.20-1.el7.x86_64 pacemaker-cluster-libs-1.1.20-1.el7.x86_64 puppet-pacemaker-0.7.2-0.20181008172520.9a4bc2d.el7ost.noarch ansible-pacemaker-1.0.4-0.20180827141254.0e4d7c0.el7ost.noarch pacemaker-libs-1.1.20-1.el7.x86_64 controller-2 | SUCCESS | rc=0 >> pacemaker-cli-1.1.20-1.el7.x86_64 pacemaker-remote-1.1.20-1.el7.x86_64 pacemaker-1.1.20-1.el7.x86_64 pacemaker-cluster-libs-1.1.20-1.el7.x86_64 puppet-pacemaker-0.7.2-0.20181008172520.9a4bc2d.el7ost.noarch ansible-pacemaker-1.0.4-0.20180827141254.0e4d7c0.el7ost.noarch pacemaker-libs-1.1.20-1.el7.x86_64 controller-1 | SUCCESS | rc=0 >> pacemaker-cli-1.1.20-1.el7.x86_64 pacemaker-remote-1.1.20-1.el7.x86_64 pacemaker-1.1.20-1.el7.x86_64 pacemaker-cluster-libs-1.1.20-1.el7.x86_64 puppet-pacemaker-0.7.2-0.20181008172520.9a4bc2d.el7ost.noarch ansible-pacemaker-1.0.4-0.20180827141254.0e4d7c0.el7ost.noarch pacemaker-libs-1.1.20-1.el7.x86_64 (undercloud) [stack@undercloud-0 ~]$ ansible controller-1 -m shell -b -a 'pcs status' controller-1 | SUCCESS | rc=0 >> Cluster name: tripleo_cluster Stack: corosync Current DC: controller-2 (version 1.1.20-1.el7-1642a7f847) - partition with quorum Last updated: Mon Feb 25 06:57:02 2019 Last change: Mon Feb 25 06:52:20 2019 by hacluster via crmd on controller-2 12 nodes configured 37 resources configured Online: [ controller-0 controller-1 controller-2 ] GuestOnline: [ galera-bundle-0@controller-2 galera-bundle-1@controller-0 galera-bundle-2@controller-1 rabbitmq-bundle-0@controller-2 rabbitmq-bundle-1@controller-0 rabbitmq-bundle-2@controller-1 redis-bundle-0@controller-2 redis-bundle-1@controller-0 redis-bundle-2@controller-1 ] Full list of resources: Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp14/openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-2 rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-0 rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-1 Docker container set: galera-bundle [192.168.24.1:8787/rhosp14/openstack-mariadb:pcmklatest] galera-bundle-0 (ocf::heartbeat:galera): Master controller-2 galera-bundle-1 (ocf::heartbeat:galera): Master controller-0 galera-bundle-2 (ocf::heartbeat:galera): Master controller-1 Docker container set: redis-bundle [192.168.24.1:8787/rhosp14/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Slave controller-2 redis-bundle-1 (ocf::heartbeat:redis): Master controller-0 redis-bundle-2 (ocf::heartbeat:redis): Slave controller-1 ip-192.168.24.14 (ocf::heartbeat:IPaddr2): Started controller-2 ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Started controller-2 ip-172.17.1.12 (ocf::heartbeat:IPaddr2): Started controller-2 ip-172.17.1.21 (ocf::heartbeat:IPaddr2): Started controller-2 ip-172.17.3.23 (ocf::heartbeat:IPaddr2): Started controller-2 ip-172.17.4.30 (ocf::heartbeat:IPaddr2): Started controller-2 Docker container set: haproxy-bundle [192.168.24.1:8787/rhosp14/openstack-haproxy:pcmklatest] haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started controller-2 haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started controller-0 haproxy-bundle-docker-2 (ocf::heartbeat:docker): Started controller-1 Docker container: openstack-cinder-volume [192.168.24.1:8787/rhosp14/openstack-cinder-volume:pcmklatest] openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started controller-2 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled (undercloud) [stack@undercloud-0 ~]$ ansible controller-1 -m shell -b -a 'cat /etc/*release*' [WARNING]: Found both group and host with same name: undercloud controller-1 | SUCCESS | rc=0 >> NAME="Red Hat Enterprise Linux Server" VERSION="7.6 (Maipo)" ID="rhel" ID_LIKE="fedora" VARIANT="Server" VARIANT_ID="server" VERSION_ID="7.6" PRETTY_NAME="Red Hat Enterprise Linux Server 7.6 (Maipo)" #check master->slave failover: [root@controller-1 ~]# pcs status|grep redis GuestOnline: [ galera-bundle-0@controller-2 galera-bundle-1@controller-0 galera-bundle-2@controller-1 rabbitmq-bundle-0@controller-2 rabbitmq-bundle-1@controller-0 rabbitmq-bundle-2@controller-1 redis-bundle-0@controller-2 redis-bundle-1@controller-0 redis-bundle-2@controller-1 ] Docker container set: redis-bundle [192.168.24.1:8787/rhosp14/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Slave controller-2 redis-bundle-1 (ocf::heartbeat:redis): Master controller-0 redis-bundle-2 (ocf::heartbeat:redis): Slave controller-1 [root@controller-0 ~]# pcs resource ban redis-bundle controller-0;crm_mon ... redis-bundle-1 (ocf::heartbeat:redis): Demoting controller-0 ... redis-bundle-1 (ocf::heartbeat:redis): Slave controller-0 ... redis-bundle-1 (ocf::heartbeat:redis): Stopped controller-0 ... ... redis-bundle-1 (ocf::heartbeat:redis): Stopped ... redis-bundle-0 (ocf::heartbeat:redis): Master controller-2 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2129 |