Bug 1638593
Summary: | [OSP] SBD cannot be used with bundles | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Andrew Beekhof <abeekhof> | |
Component: | pacemaker | Assignee: | Klaus Wenninger <kwenning> | |
Status: | CLOSED ERRATA | QA Contact: | pkomarov | |
Severity: | urgent | Docs Contact: | ||
Priority: | urgent | |||
Version: | 7.5 | CC: | abeekhof, aherr, cfeist, cluster-maint, ctowsley, kgaillot, michele, mkrcmari, pkomarov, sbradley | |
Target Milestone: | rc | Keywords: | Triaged, ZStream | |
Target Release: | 7.7 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | pacemaker-1.1.20-1.el7 | Doc Type: | Bug Fix | |
Doc Text: |
Cause: When SBD is configured on the cluster nodes, Pacemaker Remote nodes (including guest nodes and bundle nodes) will compare the local SBD configuration and abort if not compatible.
Consequence: Guest nodes and bundle nodes unnecessarily fail when SBD is used on the cluster nodes, since they use resource recovery rather than standard fencing mechanisms.
Fix: Pacemaker Remote skips the SBD compatibility check when run on a guest node or bundle node.
Result: Guest nodes and bundle nodes may be used in a cluster with SBD, without configuring SBD on the guest nodes or bundle nodes themselves.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1646871 1646872 1656731 (view as bug list) | Environment: | ||
Last Closed: | 2019-08-06 12:53:44 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1646871, 1646872, 1656731 |
Description
Andrew Beekhof
2018-10-12 02:02:08 UTC
sbd is not currently supported in clusters with Pacemaker Remote nodes. We can use this bz as the RFE. Actually the reason why SBD wouldn't work with remote-nodes is different. In case of guest-nodes/containers/bundle-nodes there is no need for SBD as fencing is anyway done via stopping the resource. Remote-nodes on the other hand would be fenced the classical way. As long as poison-pill-fencing is used and the name of the slot on the shared-disk is set manually to the remote-node-name instead of the host-name sbd is actually working well together with remote-nodes. Watchdog-fencing is gonna trigger on remote nodes whenever the connection to a cluster-node is lost iirc. So some work on the observation sbd does on pacemaker-remote would be needed for that to work. The approach followed with bz1443666 is more generic and enables us to enable watchdog-fencing just for selected nodes. This can be used both to prevent remote-nodes from using watchdog-fencing (because it would basically trigger too often) as well as regular cluster-nodes that e.g. don't have proper watchdog-devices (as e.g. nodes that are running on certain hypervisors or just exotic hardware not having a hardware-watchdog or this not being supported by the linux-kernel). If we want to use this bz for a quick solution of the bundle-issue as outined in Andrew's commit we might use a less missleading name for it. On the long run I would prefer a way to automatically disable the check instead of offering a switch that can be misused in cases where the cluster would then assume a node to self-fence when it doesn't. Can't we just prevent the property from being passed into the bundle-node? The cluster node should know when that is safe. Once we have the manual feature in, it is probably hard to get rid of it again and it is dangerous. (In reply to Klaus Wenninger from comment #3) > Actually the reason why SBD wouldn't work with remote-nodes is different. > > In case of guest-nodes/containers/bundle-nodes there is no need for SBD as > fencing is anyway done via stopping the resource. > > Remote-nodes on the other hand would be fenced the classical way. > As long as poison-pill-fencing is used and the name of the slot on the > shared-disk is set manually to the remote-node-name instead of the host-name > sbd is actually working well together with remote-nodes. > > Watchdog-fencing is gonna trigger on remote nodes whenever the connection to > a cluster-node is lost iirc. Not if shared storage is in use, which we can mandate for OSP, which is the only supported use for bundles. I've no objection to a more generic or automated[1] solution if you think one exists, but I would ask that it not delay progress on this specific scenario and bug as we have customers waiting on a solution. [1] Automation is not really a priority for container workloads since the option can be baked into the image. So some work on the observation sbd does on > pacemaker-remote would be needed for that to work. > > The approach followed with bz1443666 is more generic and enables us to > enable watchdog-fencing just for selected nodes. > This can be used both to prevent remote-nodes from using watchdog-fencing > (because it would basically trigger too often) as well as regular > cluster-nodes > that e.g. don't have proper watchdog-devices (as e.g. nodes that are running > on certain hypervisors or just exotic hardware not having a > hardware-watchdog or this not being supported by the linux-kernel). > > If we want to use this bz for a quick solution of the bundle-issue as > outined in Andrew's commit we might use a less missleading name for it. > On the long run I would prefer a way to automatically disable the check > instead of offering a switch that can be misused in cases where the cluster > would then assume a node to self-fence when it doesn't. > Can't we just prevent the property from being passed into the bundle-node? > The cluster node should know when that is safe. > Once we have the manual feature in, it is probably hard to get rid of it > again and it is dangerous. Gah, hit save too early... (In reply to Andrew Beekhof from comment #4) > (In reply to Klaus Wenninger from comment #3) > > Actually the reason why SBD wouldn't work with remote-nodes is different. > > > > In case of guest-nodes/containers/bundle-nodes there is no need for SBD as > > fencing is anyway done via stopping the resource. > > > > Remote-nodes on the other hand would be fenced the classical way. > > As long as poison-pill-fencing is used and the name of the slot on the > > shared-disk is set manually to the remote-node-name instead of the host-name > > sbd is actually working well together with remote-nodes. > > > > Watchdog-fencing is gonna trigger on remote nodes whenever the connection to > > a cluster-node is lost iirc. > > Not if shared storage is in use, which we can mandate for OSP, which is the > only supported use for bundles. > > I've no objection to a more generic or automated[1] solution if you think > one exists, but I would ask that it not delay progress on this specific > scenario and bug as we have customers waiting on a solution. > > > [1] Automation is not really a priority for container workloads since the > option can be baked into the image. > > So some work on the observation sbd does on > > pacemaker-remote would be needed for that to work. > > > > The approach followed with bz1443666 is more generic and enables us to > > enable watchdog-fencing just for selected nodes. > > This can be used both to prevent remote-nodes from using watchdog-fencing > > (because it would basically trigger too often) as well as regular > > cluster-nodes > > that e.g. don't have proper watchdog-devices (as e.g. nodes that are running > > on certain hypervisors or just exotic hardware not having a > > hardware-watchdog or this not being supported by the linux-kernel). > > > > If we want to use this bz for a quick solution of the bundle-issue as yes. that is all this bug should focus on. > > outined in Andrew's commit we might use a less missleading name for it. What is misleading about "--disable-sbd-check" ? > > On the long run I would prefer a way to automatically disable the check > > instead of offering a switch that can be misused in cases where the cluster > > would then assume a node to self-fence when it doesn't. > > Can't we just prevent the property from being passed into the bundle-node? I attempted that before creating this patch but the relevant information isn't available at the right place/time. > > The cluster node should know when that is safe. The code that initiates the check only has an lrmd_t object, so it has no way to know when it might be safe. > > Once we have the manual feature in, it is probably hard to get rid of it > > again One of the bugs you've added to the dependancies dates back to 2016... I predict a long and useful life for these 10 lines while we wait for the perfect solution. > > and it is dangerous. Compared to pretty much all of cibadmin? Compared to the advice I saw recently where the customer was encouraged to use sbd without any watchdog at all? If it's really that dangerous it can be hidden from the pacemaker-remoted help/man page. (In reply to Andrew Beekhof from comment #5) > > > > outined in Andrew's commit we might use a less missleading name for it. > > What is misleading about "--disable-sbd-check" ? Meant the name of the bug not the feature ;-) The new name btw. is fine with me ... > I attempted that before creating this patch but the relevant information > isn't available at the right place/time. Well, not too encouraging that you didn't come up with an idea. But give me till end of the day to think over it though ... > One of the bugs you've added to the dependancies dates back to 2016... > I predict a long and useful life for these 10 lines while we wait for > the perfect solution. Taking the blame on me - partly ... Till now it was only me who thought this would be a nice thing and there was no customer-demand for it - which made it be pushed back, back, ... > Compared to pretty much all of cibadmin? Compared to the advice I saw > recently where the customer was encouraged to use sbd without any watchdog > at all? Well, unfortunately there always comes responsibility with power ... Adding this option to pacemaker-remote on remote-nodes just seemed to easy and innocent to me ;-) If you are referring to an advice by me I never encouraged to use sbd without watchdog. I just suggested to disable watchdog-fencing with pacemaker and I pointed out that this is substantially different to running sbd without a watchdog-device referenced in /etc/sysconfig/sbd. > > If it's really that dangerous it can be hidden from the pacemaker-remoted > help/man page. (In reply to Klaus Wenninger from comment #6) > > > I attempted that before creating this patch but the relevant information > > isn't available at the right place/time. > > Well, not too encouraging that you didn't come up with an idea. > But give me till end of the day to think over it though ... > Using a transition via pseudo-action to convey the information from pengine to remote-ra as in https://github.com/ClusterLabs/pacemaker/commit/0113ff6fb6bb576356d201cf698b98455dbf5180 should definitely be a viable approach although I was hoping to find something simpler taking advantage of the fact that a remote-instance being a guest or not isn't something that would change over time as with it being managed or not. Fixed in upstream 1.1 branch by commit 4dae674 IIRC, we'd want a functioning OSP cluster and then: - configure the sbd systemd service to start at boot - start sbd systemd service - check SBD_WATCHDOG_TIMEOUT (in /etc/sysconfig/sbd) is less than 120 - run: pcs property set stonith-watchdog-timeout=120s - check the containers dont all die - pcs resource disable rabbitmq-bundle - wait for rabbitmq to stop - pcs resource enable rabbitmq-bundle - check rabbitmq comes back healthy (previously the containers would fail as soon as we tried to do anything with rabbit inside them) The logs are pretty bare. One second the cib is logging, and the next everyone is apparently dead. Pini, definitely try the 'pcs sbd enabled ...' command Ken suggested instead of: ansible overcloud -mshell -b -a'systemctl start sbd' Verified , ansible controller -b -mshell -a'yum install -y sbd ;systemctl enable sbd' (undercloud) [stack@undercloud-0 ~]$ ansible controller -b -mshell -a'pcs stonith sbd config' [WARNING]: Found both group and host with same name: undercloud controller-2 | SUCCESS | rc=0 >> SBD_WATCHDOG_TIMEOUT=5 SBD_TIMEOUT_ACTION=flush,reboot SBD_STARTMODE=always SBD_DELAY_START=no Watchdogs: controller-1: /dev/watchdog controller-0: /dev/watchdog controller-2: /dev/watchdog controller-1 | SUCCESS | rc=0 >> SBD_WATCHDOG_TIMEOUT=5 SBD_TIMEOUT_ACTION=flush,reboot SBD_STARTMODE=always SBD_DELAY_START=no Watchdogs: controller-1: /dev/watchdog controller-0: /dev/watchdog controller-2: /dev/watchdog controller-0 | SUCCESS | rc=0 >> SBD_WATCHDOG_TIMEOUT=5 SBD_TIMEOUT_ACTION=flush,reboot SBD_STARTMODE=always SBD_DELAY_START=no Watchdogs: controller-1: /dev/watchdog controller-2: /dev/watchdog controller-0: /dev/watchdog ansible controller -b -mshell -a'modprobe softdog' ansible controller -b -mshell -a'pcs stonith sbd enable --watchdog=/dev/watchdog SBD_WATCHDOG_TIMEOUT=60' controller-1 | SUCCESS | rc=0 >> Running SBD pre-enabling checks... controller-1: SBD pre-enabling checks done controller-0: SBD pre-enabling checks done controller-2: SBD pre-enabling checks done Distributing SBD config... controller-1: SBD config saved controller-2: SBD config saved controller-0: SBD config saved Enabling SBD service... controller-1: sbd enabled controller-2: sbd enabled controller-0: sbd enabled Warning: Cluster restart is required in order to apply these changes. controller-2 | SUCCESS | rc=0 >> Running SBD pre-enabling checks... controller-1: SBD pre-enabling checks done controller-0: SBD pre-enabling checks done controller-2: SBD pre-enabling checks done Distributing SBD config... controller-1: SBD config saved controller-2: SBD config saved controller-0: SBD config saved Enabling SBD service... controller-1: sbd enabled controller-2: sbd enabled controller-0: sbd enabled Warning: Cluster restart is required in order to apply these changes. controller-0 | SUCCESS | rc=0 >> Running SBD pre-enabling checks... controller-1: SBD pre-enabling checks done controller-0: SBD pre-enabling checks done controller-2: SBD pre-enabling checks done Distributing SBD config... controller-1: SBD config saved controller-2: SBD config saved controller-0: SBD config saved Enabling SBD service... controller-2: sbd enabled controller-1: sbd enabled controller-0: sbd enabled Warning: Cluster restart is required in order to apply these changes. controller-1 | SUCCESS | rc=0 >> Running SBD pre-enabling checks... controller-1: SBD pre-enabling checks done controller-0: SBD pre-enabling checks done controller-2: SBD pre-enabling checks done Distributing SBD config... controller-1: SBD config saved controller-2: SBD config saved controller-0: SBD config saved Enabling SBD service... controller-1: sbd enabled controller-2: sbd enabled controller-0: sbd enabled Warning: Cluster restart is required in order to apply these changes. controller-2 | SUCCESS | rc=0 >> Running SBD pre-enabling checks... controller-1: SBD pre-enabling checks done controller-0: SBD pre-enabling checks done controller-2: SBD pre-enabling checks done Distributing SBD config... controller-1: SBD config saved controller-2: SBD config saved controller-0: SBD config saved Enabling SBD service... controller-1: sbd enabled controller-2: sbd enabled controller-0: sbd enabled Warning: Cluster restart is required in order to apply these changes. controller-0 | SUCCESS | rc=0 >> Running SBD pre-enabling checks... controller-1: SBD pre-enabling checks done controller-0: SBD pre-enabling checks done controller-2: SBD pre-enabling checks done Distributing SBD config... controller-1: SBD config saved controller-2: SBD config saved controller-0: SBD config saved Enabling SBD service... controller-2: sbd enabled controller-1: sbd enabled controller-0: sbd enabled Warning: Cluster restart is required in order to apply these changes. (undercloud) [stack@undercloud-0 ~]$ ansible controller -b -mshell -a'pcs stonith sbd config' [WARNING]: Found both group and host with same name: undercloud controller-1 | SUCCESS | rc=0 >> SBD_WATCHDOG_TIMEOUT=60 SBD_STARTMODE=always SBD_DELAY_START=no Watchdogs: controller-1: /dev/watchdog controller-2: /dev/watchdog controller-0: /dev/watchdog controller-2 | SUCCESS | rc=0 >> SBD_WATCHDOG_TIMEOUT=60 SBD_STARTMODE=always SBD_DELAY_START=no Watchdogs: controller-2: /dev/watchdog controller-0: /dev/watchdog controller-1: /dev/watchdog controller-0 | SUCCESS | rc=0 >> SBD_WATCHDOG_TIMEOUT=60 SBD_STARTMODE=always SBD_DELAY_START=no Watchdogs: controller-0: /dev/watchdog controller-2: /dev/watchdog controller-1: /dev/watchdog (undercloud) [stack@undercloud-0 ~]$ ansible controller-2 -b -mshell -a"pcs cluster stop --all&&pcs cluster start --all" [...] (undercloud) [stack@undercloud-0 ~]$ ansible controller-2 -b -mshell -a"pcs status" [WARNING]: Found both group and host with same name: undercloud controller-2 | SUCCESS | rc=0 >> Cluster name: tripleo_cluster Stack: corosync Current DC: controller-0 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum Last updated: Tue Mar 19 11:09:03 2019 Last change: Tue Mar 19 11:07:00 2019 by hacluster via crmd on controller-0 12 nodes configured 37 resources configured Online: [ controller-0 controller-1 controller-2 ] GuestOnline: [ galera-bundle-0@controller-0 galera-bundle-1@controller-1 galera-bundle-2@controller-2 rabbitmq-bundle-0@controller-0 rabbitmq-bundle-1@controller-1 rabbitmq-bundle-2@controller-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ] Full list of resources: Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp14/openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-0 rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-1 rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-2 Docker container set: galera-bundle [192.168.24.1:8787/rhosp14/openstack-mariadb:pcmklatest] galera-bundle-0 (ocf::heartbeat:galera): Master controller-0 galera-bundle-1 (ocf::heartbeat:galera): Master controller-1 galera-bundle-2 (ocf::heartbeat:galera): Master controller-2 Docker container set: redis-bundle [192.168.24.1:8787/rhosp14/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Master controller-0 redis-bundle-1 (ocf::heartbeat:redis): Slave controller-1 redis-bundle-2 (ocf::heartbeat:redis): Slave controller-2 ip-192.168.24.6 (ocf::heartbeat:IPaddr2): Started controller-0 ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.1.12 (ocf::heartbeat:IPaddr2): Started controller-2 ip-172.17.1.22 (ocf::heartbeat:IPaddr2): Started controller-0 ip-172.17.3.22 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.4.30 (ocf::heartbeat:IPaddr2): Started controller-2 Docker container set: haproxy-bundle [192.168.24.1:8787/rhosp14/openstack-haproxy:pcmklatest] haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started controller-0 haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started controller-1 haproxy-bundle-docker-2 (ocf::heartbeat:docker): Started controller-2 Docker container: openstack-cinder-volume [192.168.24.1:8787/rhosp14/openstack-cinder-volume:pcmklatest] openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started controller-0 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled sbd: active/enabled (undercloud) [stack@undercloud-0 ~]$ ansible controller-2 -b -mshell -a"docker ps" [WARNING]: Found both group and host with same name: undercloud controller-2 | SUCCESS | rc=0 >> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c744d9a3f6a7 192.168.24.1:8787/rhosp14/openstack-haproxy:pcmklatest "/bin/bash /usr/lo..." 2 minutes ago Up 2 minutes haproxy-bundle-docker-2 e72d7031e769 192.168.24.1:8787/rhosp14/openstack-redis:pcmklatest "/bin/bash /usr/lo..." 2 minutes ago Up 2 minutes redis-bundle-docker-2 c31c2399bde7 192.168.24.1:8787/rhosp14/openstack-mariadb:pcmklatest "/bin/bash /usr/lo..." 2 minutes ago Up 2 minutes galera-bundle-docker-2 1bdace098ce3 192.168.24.1:8787/rhosp14/openstack-rabbitmq:pcmklatest "/bin/bash /usr/lo..." 2 minutes ago Up 2 minutes rabbitmq-bundle-docker-2 24724e54a569 192.168.24.1:8787/rhosp14/openstack-gnocchi-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) gnocchi_api 52d2a520691e 192.168.24.1:8787/rhosp14/openstack-gnocchi-metricd:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) gnocchi_metricd 100e62edda58 192.168.24.1:8787/rhosp14/openstack-gnocchi-statsd:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) gnocchi_statsd 2d6bb975803e 192.168.24.1:8787/rhosp14/openstack-neutron-openvswitch-agent:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) neutron_ovs_agent e2fb95fb1996 192.168.24.1:8787/rhosp14/openstack-neutron-l3-agent:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) neutron_l3_agent 3cffc7d772d8 192.168.24.1:8787/rhosp14/openstack-neutron-metadata-agent:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) neutron_metadata_agent afe30b1870d0 192.168.24.1:8787/rhosp14/openstack-neutron-dhcp-agent:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) neutron_dhcp f223fdb7de5c 192.168.24.1:8787/rhosp14/openstack-panko-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) panko_api e3d8d29dc784 192.168.24.1:8787/rhosp14/openstack-nova-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (unhealthy) nova_metadata 916cc7e90749 192.168.24.1:8787/rhosp14/openstack-nova-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) nova_api e15239487f96 192.168.24.1:8787/rhosp14/openstack-glance-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) glance_api d80e0e462652 192.168.24.1:8787/rhosp14/openstack-swift-proxy-server:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) swift_proxy 0f84d02f708a 192.168.24.1:8787/rhosp14/openstack-nova-placement-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) nova_placement 0a591dbb38cf 192.168.24.1:8787/rhosp14/openstack-heat-api-cfn:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) heat_api_cfn 2f7530d750d6 192.168.24.1:8787/rhosp14/openstack-neutron-server:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) neutron_api 8f5cffa9342a 192.168.24.1:8787/rhosp14/openstack-aodh-listener:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) aodh_listener 0ae03bad9ecc 192.168.24.1:8787/rhosp14/openstack-swift-container:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_container_auditor 4fc9ec08be8f 192.168.24.1:8787/rhosp14/openstack-heat-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours heat_api_cron 7672286a6e4d 192.168.24.1:8787/rhosp14/openstack-swift-proxy-server:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_object_expirer 1db59ce4618f 192.168.24.1:8787/rhosp14/openstack-swift-object:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_object_updater 38a8678f82b4 192.168.24.1:8787/rhosp14/openstack-swift-container:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_container_replicator 38d2e6b12e66 192.168.24.1:8787/rhosp14/openstack-swift-account:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_account_auditor c169910504fd 192.168.24.1:8787/rhosp14/openstack-cinder-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours cinder_api_cron b571fe544832 192.168.24.1:8787/rhosp14/openstack-cron:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours logrotate_crond 3b497f64f1c8 192.168.24.1:8787/rhosp14/openstack-swift-account:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) swift_account_server 9ad387cb8ee4 192.168.24.1:8787/rhosp14/openstack-nova-scheduler:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) nova_scheduler b0176d88fe9b 192.168.24.1:8787/rhosp14/openstack-cinder-scheduler:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) cinder_scheduler 07c0c427cce3 192.168.24.1:8787/rhosp14/openstack-swift-object:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_object_replicator d0a53a8e63fa 192.168.24.1:8787/rhosp14/openstack-swift-container:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) swift_container_server 9e48b1c8fac8 192.168.24.1:8787/rhosp14/openstack-heat-engine:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) heat_engine 9d4a050dea9d 192.168.24.1:8787/rhosp14/openstack-aodh-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) aodh_api 47ed4ebc9ff2 192.168.24.1:8787/rhosp14/openstack-swift-object:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_rsync cd84cbc18f15 192.168.24.1:8787/rhosp14/openstack-nova-novncproxy:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) nova_vnc_proxy 52014e82ebfd 192.168.24.1:8787/rhosp14/openstack-ceilometer-notification:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) ceilometer_agent_notification ff702eb6aaad 192.168.24.1:8787/rhosp14/openstack-swift-account:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_account_reaper 3c5457fcac96 192.168.24.1:8787/rhosp14/openstack-nova-consoleauth:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) nova_consoleauth a250ded85a3f 192.168.24.1:8787/rhosp14/openstack-nova-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours nova_api_cron ab0d05c616b2 192.168.24.1:8787/rhosp14/openstack-swift-container:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_container_updater aa0f6eb1356c 192.168.24.1:8787/rhosp14/openstack-aodh-notifier:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) aodh_notifier 7478c9143429 192.168.24.1:8787/rhosp14/openstack-swift-account:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_account_replicator 02ef978da81f 192.168.24.1:8787/rhosp14/openstack-ceilometer-central:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) ceilometer_agent_central e29e79daa7d6 192.168.24.1:8787/rhosp14/openstack-swift-object:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_object_auditor b5209ef8f23b 192.168.24.1:8787/rhosp14/openstack-heat-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) heat_api 6173d69346da 192.168.24.1:8787/rhosp14/openstack-cinder-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) cinder_api ed080c6f6318 192.168.24.1:8787/rhosp14/openstack-swift-object:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) swift_object_server 75bdb66cb4b2 192.168.24.1:8787/rhosp14/openstack-nova-conductor:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) nova_conductor 65bdd6de46eb 192.168.24.1:8787/rhosp14/openstack-aodh-evaluator:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) aodh_evaluator 793026810afc 192.168.24.1:8787/rhosp14/openstack-keystone:2019-03-04.1 "/bin/bash -c '/us..." 23 hours ago Up 23 hours keystone_cron 199261c9e05d 192.168.24.1:8787/rhosp14/openstack-keystone:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) keystone eeb8e2348b79 192.168.24.1:8787/rhosp14/openstack-iscsid:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) iscsid 2ad514845c7c 192.168.24.1:8787/rhosp14/openstack-horizon:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours horizon b18ada5486c5 192.168.24.1:8787/rhosp14/openstack-mariadb:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours clustercheck e876206701a3 192.168.24.1:8787/rhceph:3-18 "/entrypoint.sh" 23 hours ago Up 23 hours ceph-mgr-controller-2 5f28424c69e8 192.168.24.1:8787/rhceph:3-18 "/entrypoint.sh" 23 hours ago Up 23 hours ceph-mon-controller-2 e1acfedf8e4d 192.168.24.1:8787/rhosp14/openstack-memcached:2019-03-04.1 "/bin/bash -c 'sou..." 23 hours ago Up 23 hours (healthy) memcached ansible controller-2 -b -mshell -a"pcs resource disable rabbitmq-bundle" [...] rabbitmq log : =INFO REPORT==== 19-Mar-2019::11:11:53 === Successfully stopped RabbitMQ and its dependencies (undercloud) [stack@undercloud-0 ~]$ ansible controller-2 -b -mshell -a"pcs status"|grep rabbitmq-bundle [WARNING]: Found both group and host with same name: undercloud Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp14/openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Stopped (disabled) rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Stopped (disabled) rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Stopped (disabled) ansible controller-2 -b -mshell -a"pcs resource enable rabbitmq-bundle" [...] rabbitnq log : =INFO REPORT==== 19-Mar-2019::11:14:16 === Starting RabbitMQ 3.6.16 on Erlang 18.3.4.11 Copyright (C) 2007-2018 Pivotal Software, Inc. Licensed under the MPL. See http://www.rabbitmq.com/ ... =INFO REPORT==== 19-Mar-2019::11:14:20 === connection <0.1064.0> (172.17.1.26:47656 -> 172.17.1.14:5672 - cinder-scheduler:1:9740a78e-e237-4e6a-ace9-e902dfca06f4): user 'guest' authenticated and granted access to vhost '/' =INFO REPORT==== 19-Mar-2019::11:14:20 === accepting AMQP connection <0.1071.0> (172.17.1.14:35788 -> 172.17.1.14:5672) =INFO REPORT==== 19-Mar-2019::11:14:20 === Connection <0.1071.0> (172.17.1.14:35788 -> 172.17.1.14:5672) has a client-provided name: neutron-server:39:fd55195a-115d-4fbe-997e-42c75fac3336 =INFO REPORT==== 19-Mar-2019::11:14:20 === connection <0.1071.0> (172.17.1.14:35788 -> 172.17.1.14:5672 - neutron-server:39:fd55195a-115d-4fbe-997e-42c75fac3336): user 'guest' authenticated and granted access to vhost '/' (undercloud) [stack@undercloud-0 ~]$ ansible controller-2 -b -mshell -a"pcs status"|grep rabbitmq-bundle [WARNING]: Found both group and host with same name: undercloud GuestOnline: [ galera-bundle-0@controller-0 galera-bundle-1@controller-1 galera-bundle-2@controller-2 rabbitmq-bundle-0@controller-0 rabbitmq-bundle-1@controller-1 rabbitmq-bundle-2@controller-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ] Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp14/openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-0 rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-1 rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-2 (undercloud) [stack@undercloud-0 ~]$ ansible controller-2 -b -mshell -a"docker ps" [WARNING]: Found both group and host with same name: undercloud controller-2 | SUCCESS | rc=0 >> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9649966bae7e 192.168.24.1:8787/rhosp14/openstack-rabbitmq:pcmklatest "/bin/bash /usr/lo..." 3 minutes ago Up 3 minutes rabbitmq-bundle-docker-2 c744d9a3f6a7 192.168.24.1:8787/rhosp14/openstack-haproxy:pcmklatest "/bin/bash /usr/lo..." 9 minutes ago Up 9 minutes haproxy-bundle-docker-2 e72d7031e769 192.168.24.1:8787/rhosp14/openstack-redis:pcmklatest "/bin/bash /usr/lo..." 9 minutes ago Up 9 minutes redis-bundle-docker-2 c31c2399bde7 192.168.24.1:8787/rhosp14/openstack-mariadb:pcmklatest "/bin/bash /usr/lo..." 9 minutes ago Up 9 minutes galera-bundle-docker-2 24724e54a569 192.168.24.1:8787/rhosp14/openstack-gnocchi-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) gnocchi_api 52d2a520691e 192.168.24.1:8787/rhosp14/openstack-gnocchi-metricd:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) gnocchi_metricd 100e62edda58 192.168.24.1:8787/rhosp14/openstack-gnocchi-statsd:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) gnocchi_statsd 2d6bb975803e 192.168.24.1:8787/rhosp14/openstack-neutron-openvswitch-agent:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) neutron_ovs_agent e2fb95fb1996 192.168.24.1:8787/rhosp14/openstack-neutron-l3-agent:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) neutron_l3_agent 3cffc7d772d8 192.168.24.1:8787/rhosp14/openstack-neutron-metadata-agent:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) neutron_metadata_agent afe30b1870d0 192.168.24.1:8787/rhosp14/openstack-neutron-dhcp-agent:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) neutron_dhcp f223fdb7de5c 192.168.24.1:8787/rhosp14/openstack-panko-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) panko_api e3d8d29dc784 192.168.24.1:8787/rhosp14/openstack-nova-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (unhealthy) nova_metadata 916cc7e90749 192.168.24.1:8787/rhosp14/openstack-nova-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) nova_api e15239487f96 192.168.24.1:8787/rhosp14/openstack-glance-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) glance_api d80e0e462652 192.168.24.1:8787/rhosp14/openstack-swift-proxy-server:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) swift_proxy 0f84d02f708a 192.168.24.1:8787/rhosp14/openstack-nova-placement-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) nova_placement 0a591dbb38cf 192.168.24.1:8787/rhosp14/openstack-heat-api-cfn:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) heat_api_cfn 2f7530d750d6 192.168.24.1:8787/rhosp14/openstack-neutron-server:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) neutron_api 8f5cffa9342a 192.168.24.1:8787/rhosp14/openstack-aodh-listener:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) aodh_listener 0ae03bad9ecc 192.168.24.1:8787/rhosp14/openstack-swift-container:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_container_auditor 4fc9ec08be8f 192.168.24.1:8787/rhosp14/openstack-heat-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours heat_api_cron 7672286a6e4d 192.168.24.1:8787/rhosp14/openstack-swift-proxy-server:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_object_expirer 1db59ce4618f 192.168.24.1:8787/rhosp14/openstack-swift-object:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_object_updater 38a8678f82b4 192.168.24.1:8787/rhosp14/openstack-swift-container:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_container_replicator 38d2e6b12e66 192.168.24.1:8787/rhosp14/openstack-swift-account:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_account_auditor c169910504fd 192.168.24.1:8787/rhosp14/openstack-cinder-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours cinder_api_cron b571fe544832 192.168.24.1:8787/rhosp14/openstack-cron:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours logrotate_crond 3b497f64f1c8 192.168.24.1:8787/rhosp14/openstack-swift-account:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) swift_account_server 9ad387cb8ee4 192.168.24.1:8787/rhosp14/openstack-nova-scheduler:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) nova_scheduler b0176d88fe9b 192.168.24.1:8787/rhosp14/openstack-cinder-scheduler:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) cinder_scheduler 07c0c427cce3 192.168.24.1:8787/rhosp14/openstack-swift-object:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_object_replicator d0a53a8e63fa 192.168.24.1:8787/rhosp14/openstack-swift-container:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) swift_container_server 9e48b1c8fac8 192.168.24.1:8787/rhosp14/openstack-heat-engine:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) heat_engine 9d4a050dea9d 192.168.24.1:8787/rhosp14/openstack-aodh-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) aodh_api 47ed4ebc9ff2 192.168.24.1:8787/rhosp14/openstack-swift-object:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_rsync cd84cbc18f15 192.168.24.1:8787/rhosp14/openstack-nova-novncproxy:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) nova_vnc_proxy 52014e82ebfd 192.168.24.1:8787/rhosp14/openstack-ceilometer-notification:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) ceilometer_agent_notification ff702eb6aaad 192.168.24.1:8787/rhosp14/openstack-swift-account:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_account_reaper 3c5457fcac96 192.168.24.1:8787/rhosp14/openstack-nova-consoleauth:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) nova_consoleauth a250ded85a3f 192.168.24.1:8787/rhosp14/openstack-nova-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours nova_api_cron ab0d05c616b2 192.168.24.1:8787/rhosp14/openstack-swift-container:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_container_updater aa0f6eb1356c 192.168.24.1:8787/rhosp14/openstack-aodh-notifier:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) aodh_notifier 7478c9143429 192.168.24.1:8787/rhosp14/openstack-swift-account:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_account_replicator 02ef978da81f 192.168.24.1:8787/rhosp14/openstack-ceilometer-central:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) ceilometer_agent_central e29e79daa7d6 192.168.24.1:8787/rhosp14/openstack-swift-object:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours swift_object_auditor b5209ef8f23b 192.168.24.1:8787/rhosp14/openstack-heat-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) heat_api 6173d69346da 192.168.24.1:8787/rhosp14/openstack-cinder-api:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) cinder_api ed080c6f6318 192.168.24.1:8787/rhosp14/openstack-swift-object:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) swift_object_server 75bdb66cb4b2 192.168.24.1:8787/rhosp14/openstack-nova-conductor:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) nova_conductor 65bdd6de46eb 192.168.24.1:8787/rhosp14/openstack-aodh-evaluator:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) aodh_evaluator 793026810afc 192.168.24.1:8787/rhosp14/openstack-keystone:2019-03-04.1 "/bin/bash -c '/us..." 23 hours ago Up 23 hours keystone_cron 199261c9e05d 192.168.24.1:8787/rhosp14/openstack-keystone:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) keystone eeb8e2348b79 192.168.24.1:8787/rhosp14/openstack-iscsid:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours (healthy) iscsid 2ad514845c7c 192.168.24.1:8787/rhosp14/openstack-horizon:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours horizon b18ada5486c5 192.168.24.1:8787/rhosp14/openstack-mariadb:2019-03-04.1 "kolla_start" 23 hours ago Up 23 hours clustercheck e876206701a3 192.168.24.1:8787/rhceph:3-18 "/entrypoint.sh" 23 hours ago Up 23 hours ceph-mgr-controller-2 5f28424c69e8 192.168.24.1:8787/rhceph:3-18 "/entrypoint.sh" 24 hours ago Up 24 hours ceph-mon-controller-2 e1acfedf8e4d 192.168.24.1:8787/rhosp14/openstack-memcached:2019-03-04.1 "/bin/bash -c 'sou..." 24 hours ago Up 24 hours (healthy) memcached (undercloud) [stack@undercloud-0 ~]$ ansible controller-2 -b -mshell -a"pcs status" [WARNING]: Found both group and host with same name: undercloud controller-2 | SUCCESS | rc=0 >> Cluster name: tripleo_cluster Stack: corosync Current DC: controller-0 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum Last updated: Tue Mar 19 11:16:50 2019 Last change: Tue Mar 19 11:13:08 2019 by root via cibadmin on controller-2 12 nodes configured 37 resources configured Online: [ controller-0 controller-1 controller-2 ] GuestOnline: [ galera-bundle-0@controller-0 galera-bundle-1@controller-1 galera-bundle-2@controller-2 rabbitmq-bundle-0@controller-0 rabbitmq-bundle-1@controller-1 rabbitmq-bundle-2@controller-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ] Full list of resources: Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp14/openstack-rabbitmq:pcmklatest] rabbitmq-bundle-0 (ocf::heartbeat:rabbitmq-cluster): Started controller-0 rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-1 rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-2 Docker container set: galera-bundle [192.168.24.1:8787/rhosp14/openstack-mariadb:pcmklatest] galera-bundle-0 (ocf::heartbeat:galera): Master controller-0 galera-bundle-1 (ocf::heartbeat:galera): Master controller-1 galera-bundle-2 (ocf::heartbeat:galera): Master controller-2 Docker container set: redis-bundle [192.168.24.1:8787/rhosp14/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Master controller-0 redis-bundle-1 (ocf::heartbeat:redis): Slave controller-1 redis-bundle-2 (ocf::heartbeat:redis): Slave controller-2 ip-192.168.24.6 (ocf::heartbeat:IPaddr2): Started controller-0 ip-10.0.0.101 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.1.12 (ocf::heartbeat:IPaddr2): Started controller-2 ip-172.17.1.22 (ocf::heartbeat:IPaddr2): Started controller-0 ip-172.17.3.22 (ocf::heartbeat:IPaddr2): Started controller-1 ip-172.17.4.30 (ocf::heartbeat:IPaddr2): Started controller-2 Docker container set: haproxy-bundle [192.168.24.1:8787/rhosp14/openstack-haproxy:pcmklatest] haproxy-bundle-docker-0 (ocf::heartbeat:docker): Started controller-0 haproxy-bundle-docker-1 (ocf::heartbeat:docker): Started controller-1 haproxy-bundle-docker-2 (ocf::heartbeat:docker): Started controller-2 Docker container: openstack-cinder-volume [192.168.24.1:8787/rhosp14/openstack-cinder-volume:pcmklatest] openstack-cinder-volume-docker-0 (ocf::heartbeat:docker): Started controller-0 Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled sbd: active/enabled Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2129 |