Description of problem: Deployment of OSP14 with enabled InstanceHA fails with the following error: 2019-11-26 07:01:18,612 p=1150 u=mistral | fatal: [overcloud-novacomputeiha-0]: FAILED! => { "failed_when_result": true, "outputs.stdout_lines | default([]) | union(outputs.stderr_lines | default([]))": [ "Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend", "Notice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.", "Notice: Compiled catalog for overcloud-novacomputeiha-0.localdomain in environment production in 2.10 seconds", "Notice: /Stage[main]/Main/Package_manifest[/var/lib/tripleo/installed-packages/overcloud_ComputeInstanceHA2]/ensure: created", "Notice: /Stage[main]/Firewall::Linux::Redhat/Service[firewalld]/ensure: ensure changed 'running' to 'stopped'", "Notice: /Stage[main]/Firewall::Linux::Redhat/Service[iptables]/ensure: ensure changed 'stopped' to 'running'", "Notice: /Stage[main]/Firewall::Linux::Redhat/Service[ip6tables]/ensure: ensure changed 'stopped' to 'running'", "Notice: /Stage[main]/Tripleo::Profile::Base::Kernel/Kmod::Load[nf_conntrack_proto_sctp]/Exec[modprobe nf_conntrack_proto_sctp]/returns: executed successfully", "Notice: Applied catalog in 3.96 seconds", "Changes:", " Total: 5", "Events:", " Failure: 1", " Success: 5", " Total: 6", "Resources:", " Failed: 1", " Total: 158", " Corrective change: 4", " Changed: 5", " Out of sync: 6", "Time:", " Filebucket: 0.00", " Concat fragment: 0.00", " Concat file: 0.00", " Schedule: 0.00", " Anchor: 0.00", " Package manifest: 0.00", " Sysctl: 0.01", " Sysctl runtime: 0.01", " Augeas: 0.02", " File: 0.05", " Firewall: 0.05", " Exec: 0.27", " Pcmk property: 0.41", " Package: 0.52", " Service: 1.71", " Last run: 1574769670", " Config retrieval: 2.59", " Total: 5.64", "Version:", " Config: 1574769664", " Puppet: 4.8.2", "Warning: Undefined variable '::deploy_config_name'; ", " (file & line not available)", "Warning: Undefined variable 'deploy_config_name'; ", "Warning: Unknown variable: '::deployment_type'. at /etc/puppet/modules/tripleo/manifests/profile/base/database/mysql/client.pp:85:31", "Warning: This method is deprecated, please use the stdlib validate_legacy function,", " with Stdlib::Compat::Bool. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 54]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 34]", " (at /etc/puppet/modules/stdlib/lib/puppet/functions/deprecation.rb:28:in `deprecation')", " with Stdlib::Compat::Absolute_Path. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 55]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 34]", " with Stdlib::Compat::String. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 56]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 34]", " with Stdlib::Compat::Array. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 66]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 34]", " with Pattern[]. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 68]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 34]", " with Stdlib::Compat::Numeric. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/ntp/manifests/init.pp\", 76]:[\"/etc/puppet/modules/tripleo/manifests/profile/base/time/ntp.pp\", 34]", "Warning: ModuleLoader: module 'pacemaker' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules", "Warning: tag is a metaparam; this value will inherit to all contained resources in the tripleo::firewall::rule definition", " with Stdlib::Compat::Hash. There is further documentation for validate_legacy function in the README. at [\"/etc/puppet/modules/tripleo/manifests/firewall/rule.pp\", 148]:", "Error: /Stage[main]/Tripleo::Profile::Pacemaker::Compute_instanceha/Pacemaker::Property[compute-instanceha-role-node-property]/Pcmk_property[property-overcloud-novacomputeiha-0-compute-instanceha-role]: Could not evaluate: backup_cib: Running: pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20191126-41449-14tc4hm failed with code: 1 -> Error: unable to get cib" ] } Version-Release number of selected component (if applicable): [root@overcloud-novacomputeiha-0 heat-admin]# rpm -qa | grep -E 'openstack|rhosp|pacemaker|tripleo' puppet-openstacklib-13.3.2-0.20190420090713.05a84dd.el7ost.noarch python-openstackclient-lang-3.16.2-3.el7ost.noarch pacemaker-cli-1.1.20-5.el7_7.1.x86_64 python2-openstacksdk-0.17.2-0.20180809182657.3ad9dab.el7ost.noarch python2-openstackclient-3.16.2-3.el7ost.noarch pacemaker-1.1.20-5.el7_7.1.x86_64 ansible-pacemaker-1.0.4-0.20180827141254.0e4d7c0.el7ost.noarch openstack-selinux-0.8.18-1.el7ost.noarch puppet-pacemaker-0.7.2-0.20181008172522.9a4bc2d.el7ost.noarch openstack-heat-agents-1.7.1-0.20190420000616.41c7faf.el7ost.noarch pacemaker-libs-1.1.20-5.el7_7.1.x86_64 pacemaker-remote-1.1.20-5.el7_7.1.x86_64 puppet-tripleo-9.4.1-0.20190508182410.el7ost.noarch rhosp-openvswitch-2.11-0.6.el7ost.noarch rhosp-release-14.0.4-1.el7ost.noarch puppet-openstack_extras-13.3.2-0.20190420072608.d650bd8.el7ost.noarch pacemaker-cluster-libs-1.1.20-5.el7_7.1.x86_64 (undercloud) [stack@director ~]$ rpm -qa | grep -i tripleo openstack-tripleo-heat-templates-9.3.1-0.20190513171768.el7ost.noarch ansible-role-tripleo-modify-image-1.0.1-0.20190419231031.f1dfdc6.el7ost.noarch openstack-tripleo-common-containers-9.5.0-8.el7ost.noarch ansible-tripleo-ipsec-9.1.0-2.el7ost.noarch python2-tripleo-common-9.5.0-8.el7ost.noarch python-tripleoclient-heat-installer-10.6.2-0.20190425150607.el7ost.noarch openstack-tripleo-validations-9.3.2-0.20190420045628.361061f.el7ost.noarch openstack-tripleo-common-9.5.0-8.el7ost.noarch python-tripleoclient-10.6.2-0.20190425150607.el7ost.noarch openstack-tripleo-puppet-elements-9.0.1-5.el7ost.noarch openstack-tripleo-image-elements-9.0.1-0.20181102144447.9f1c800.el7ost.noarch puppet-tripleo-9.4.1-0.20190508182410.el7ost.noarch How reproducible: Always Steps to Reproduce: 1. Deploy OSP14 with InstanceHA enabled following official docs 2. Observe deployment fails 3. Actual results: Deployment fails. Expected results: Deployment finishes successfully. Additional info:
Nov 27 10:35:32 overcloud-controller-0 ansible-systemd[38383]: Invoked with no_block=False force=None name=firewalld enabled=True daemon_reload=False state=started user=False masked=None Nov 27 10:35:32 overcloud-controller-0 systemd[1]: Reloading. Nov 27 10:35:32 overcloud-controller-0 systemd[1]: Stopping IPv6 firewall with ip6tables... Nov 27 10:35:32 overcloud-controller-0 systemd[1]: Starting firewalld - dynamic firewall daemon... Nov 27 10:35:32 overcloud-controller-0 ip6tables.init[38405]: ip6tables: Setting chains to policy ACCEPT: filter [ OK ] Nov 27 10:35:32 overcloud-controller-0 ip6tables.init[38405]: ip6tables: Flushing firewall rules: [ OK ] Nov 27 10:35:32 overcloud-controller-0 systemd[1]: Stopped IPv6 firewall with ip6tables. Nov 27 10:35:32 overcloud-controller-0 systemd[1]: Stopping IPv4 firewall with iptables... Nov 27 10:35:32 overcloud-controller-0 iptables.init[38423]: iptables: Setting chains to policy ACCEPT: filter [ OK ] Nov 27 10:35:32 overcloud-controller-0 iptables.init[38423]: iptables: Flushing firewall rules: [ OK ] Nov 27 10:35:32 overcloud-controller-0 systemd[1]: Stopped IPv4 firewall with iptables. Nov 27 10:35:33 overcloud-controller-0 systemd[1]: Started firewalld - dynamic firewall daemon. Nov 27 10:35:33 overcloud-controller-0 sudo[38378]: pam_unix(sudo:session): session closed for user root Nov 27 10:35:33 overcloud-controller-0 kernel: Ebtables v2.0 registered Nov 27 10:35:33 overcloud-controller-0 kernel: Netfilter messages via NETLINK v0.30. Nov 27 10:35:33 overcloud-controller-0 kernel: ip_set: protocol 7 Nov 27 10:35:33 overcloud-controller-0 sudo[38503]: tripleo-admin : TTY=unknown ; PWD=/home/tripleo-admin ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-epwijyfyortpwxkxzyksvtqxnolxumms; /usr/bin/python Nov 27 10:35:33 overcloud-controller-0 sudo[38503]: pam_unix(sudo:session): session opened for user root by (uid=0) Nov 27 10:35:34 overcloud-controller-0 ansible-firewalld[38517]: Invoked with service=ceph-mon zone=public masquerade=None immediate=True source=172.16.1.0/24 state=enabled permanent=True timeout=0 interface=None offline=None port=None rich_rule=None Nov 27 10:35:34 overcloud-controller-0 sudo[38503]: pam_unix(sudo:session): session closed for user root Nov 27 10:35:34 overcloud-controller-0 sudo[38572]: tripleo-admin : TTY=unknown ; PWD=/home/tripleo-admin ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-ioghdksyqqggwciqeqibusmixazlscav; /usr/bin/python Nov 27 10:35:34 overcloud-controller-0 sudo[38572]: pam_unix(sudo:session): session opened for user root by (uid=0) Nov 27 10:35:34 overcloud-controller-0 ansible-firewalld[38576]: Invoked with service=ceph zone=public masquerade=None immediate=True source=172.16.1.0/24 state=enabled permanent=True timeout=0 interface=None offline=None port=None rich_rule=None Nov 27 10:35:34 overcloud-controller-0 sudo[38572]: pam_unix(sudo:session): session closed for user root Nov 27 10:35:35 overcloud-controller-0 sudo[38590]: tripleo-admin : TTY=unknown ; PWD=/home/tripleo-admin ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-pduyziuhzinpowluoycnualeqyouxcwt; /usr/bin/python Nov 27 10:35:35 overcloud-controller-0 sudo[38590]: pam_unix(sudo:session): session opened for user root by (uid=0) Nov 27 10:35:35 overcloud-controller-0 ansible-firewalld[38594]: Invoked with zone=public service=ceph masquerade=None immediate=True source=172.16.1.0/24 state=enabled permanent=True timeout=0 interface=None offline=None port=None rich_rule=None Nov 27 10:35:35 overcloud-controller-0 sudo[38590]: pam_unix(sudo:session): session closed for user root Nov 27 10:35:41 overcloud-controller-0 sudo[38610]: tripleo-admin : TTY=unknown ; PWD=/home/tripleo-admin ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-dxgtxmtnahrrataytigwpwxmzhwltmbg; /usr/bin/python Nov 27 10:35:41 overcloud-controller-0 sudo[38610]: pam_unix(sudo:session): session opened for user root by (uid=0) Nov 27 10:35:41 overcloud-controller-0 ansible-systemd[38614]: Invoked with no_block=False force=None name=firewalld enabled=True daemon_reload=False state=restarted user=False masked=None Nov 27 10:35:41 overcloud-controller-0 systemd[1]: Stopping firewalld - dynamic firewall daemon... Nov 27 10:35:42 overcloud-controller-0 kernel: Ebtables v2.0 unregistered Nov 27 10:35:42 overcloud-controller-0 systemd[1]: Stopped firewalld - dynamic firewall daemon. Nov 27 10:35:42 overcloud-controller-0 systemd[1]: Starting firewalld - dynamic firewall daemon... Nov 27 10:35:42 overcloud-controller-0 systemd[1]: Started firewalld - dynamic firewall daemon. Nov 27 10:35:42 overcloud-controller-0 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Nov 27 10:35:42 overcloud-controller-0 sudo[38610]: pam_unix(sudo:session): session closed for user root Nov 27 10:35:42 overcloud-controller-0 kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team Nov 27 10:35:42 overcloud-controller-0 kernel: Ebtables v2.0 registered Nov 27 10:35:44 overcloud-controller-0 sudo[38784]: tripleo-admin : TTY=unknown ; PWD=/home/tripleo-admin ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-gerhftfmiwatqccnakopeaoeohoeuups; /usr/bin/python Nov 27 10:35:44 overcloud-controller-0 sudo[38784]: pam_unix(sudo:session): session opened for user root by (uid=0) Nov 27 10:35:44 overcloud-controller-0 ansible-command[38789]: Invoked with warn=True executable=None _uses_shell=False _raw_params=docker ps -q --filter='name=ceph-mon-overcloud-controller-0' removes=None argv=None creates=None chdir=None stdin=None Nov 27 10:35:44 overcloud-controller-0 sudo[38784]: pam_unix(sudo:session): session closed for user root Nov 27 10:35:46 overcloud-controller-0 corosync[25607]: [TOTEM ] A processor failed, forming new configuration. Nov 27 10:35:46 overcloud-controller-0 sudo[38814]: tripleo-admin : TTY=unknown ; PWD=/home/tripleo-admin ; USER=root ; COMMAND=/bin/sh -c echo BECOME-SUCCESS-qitrlaolsjxdmlisuferqeyyvajrpxhp; /usr/bin/python Nov 27 10:35:46 overcloud-controller-0 sudo[38814]: pam_unix(sudo:session): session opened for user root by (uid=0) Nov 27 10:35:46 overcloud-controller-0 ansible-command[38819]: Invoked with warn=True executable=None _uses_shell=False _raw_params=docker ps -q --filter='name=ceph-mgr-overcloud-controller-0' removes=None argv=None creates=None chdir=None stdin=None Nov 27 10:35:46 overcloud-controller-0 sudo[38814]: pam_unix(sudo:session): session closed for user root Nov 27 10:35:58 overcloud-controller-0 corosync[25607]: [TOTEM ] A new membership (172.16.2.6:16) was formed. Members left: 2 3 Nov 27 10:35:58 overcloud-controller-0 corosync[25607]: [TOTEM ] Failed to receive the leave message. failed: 2 3 Nov 27 10:35:58 overcloud-controller-0 stonith-ng[25632]: notice: Node overcloud-controller-1 state is now lost Nov 27 10:35:58 overcloud-controller-0 corosync[25607]: [QUORUM] This node is within the non-primary component and will NOT provide any services. Nov 27 10:35:58 overcloud-controller-0 corosync[25607]: [QUORUM] Members[1]: 1 Nov 27 10:35:58 overcloud-controller-0 stonith-ng[25632]: notice: Purged 1 peer with id=2 and/or uname=overcloud-controller-1 from the membership cache Nov 27 10:35:58 overcloud-controller-0 corosync[25607]: [MAIN ] Completed service synchronization, ready to provide service. Nov 27 10:35:58 overcloud-controller-0 stonith-ng[25632]: notice: Node overcloud-controller-2 state is now lost Nov 27 10:35:58 overcloud-controller-0 stonith-ng[25632]: notice: Purged 1 peer with id=3 and/or uname=overcloud-controller-2 from the membership cache Nov 27 10:35:58 overcloud-controller-0 pacemakerd[25630]: warning: Quorum lost Nov 27 10:35:58 overcloud-controller-0 crmd[25636]: warning: Quorum lost Nov 27 10:35:58 overcloud-controller-0 pacemakerd[25630]: notice: Node overcloud-controller-1 state is now lost Nov 27 10:35:58 overcloud-controller-0 crmd[25636]: notice: Node overcloud-controller-2 state is now lost Nov 27 10:35:58 overcloud-controller-0 pacemakerd[25630]: notice: Node overcloud-controller-2 state is now lost Nov 27 10:35:58 overcloud-controller-0 crmd[25636]: notice: Node overcloud-controller-1 state is now lost Nov 27 10:35:58 overcloud-controller-0 crmd[25636]: warning: Our DC node (overcloud-controller-1) left the cluster Nov 27 10:35:58 overcloud-controller-0 cib[25631]: notice: Node overcloud-controller-1 state is now lost Nov 27 10:35:58 overcloud-controller-0 crmd[25636]: notice: State transition S_NOT_DC -> S_ELECTION Nov 27 10:35:58 overcloud-controller-0 cib[25631]: notice: Purged 1 peer with id=2 and/or uname=overcloud-controller-1 from the membership cache Nov 27 10:35:58 overcloud-controller-0 cib[25631]: notice: Node overcloud-controller-2 state is now lost Nov 27 10:35:58 overcloud-controller-0 cib[25631]: notice: Purged 1 peer with id=3 and/or uname=overcloud-controller-2 from the membership cache Nov 27 10:35:58 overcloud-controller-0 crmd[25636]: notice: State transition S_ELECTION -> S_INTEGRATION controller-0 is isolated from ~10:35:40 looking at ceph-ansible logs: 2019-11-27 05:35:30,207 p=96066 u=mistral | TASK [ceph-infra : include_tasks configure_firewall.yml] *********************** 2019-11-27 05:35:31,048 p=96066 u=mistral | TASK [ceph-infra : check firewalld installation on redhat or suse] ************* 2019-11-27 05:35:31,869 p=96066 u=mistral | TASK [ceph-infra : start firewalld] ******************************************** 2019-11-27 05:35:33,303 p=96066 u=mistral | TASK [ceph-infra : open monitor and manager ports] ***************************** 2019-11-27 05:35:35,050 p=96066 u=mistral | TASK [ceph-infra : open manager ports] ***************************************** 2019-11-27 05:35:35,855 p=96066 u=mistral | TASK [ceph-infra : open osd ports] ********************************************* 2019-11-27 05:35:37,547 p=96066 u=mistral | TASK [ceph-infra : open rgw ports] ********************************************* 2019-11-27 05:35:38,019 p=96066 u=mistral | TASK [ceph-infra : open mds ports] ********************************************* 2019-11-27 05:35:38,509 p=96066 u=mistral | TASK [ceph-infra : open nfs ports] ********************************************* 2019-11-27 05:35:39,005 p=96066 u=mistral | TASK [ceph-infra : open nfs ports (portmapper)] ******************************** 2019-11-27 05:35:39,516 p=96066 u=mistral | TASK [ceph-infra : open rbdmirror ports] *************************************** 2019-11-27 05:35:40,061 p=96066 u=mistral | TASK [ceph-infra : open iscsi target ports] ************************************ 2019-11-27 05:35:40,514 p=96066 u=mistral | TASK [ceph-infra : open iscsi api ports] *************************************** 2019-11-27 05:35:42,712 p=96066 u=mistral | TASK [ceph-infra : include_tasks setup_ntp.yml] ******************************** timestamps seem to match
systemctl restart firewalld triggers the issue: Node controller-1: UNCLEAN (offline) Node controller-2: UNCLEAN (offline) Online: [ controller-0 ] RemoteOnline: [ compute-1 ] RemoteOFFLINE: [ compute-0 ] GuestOnline: [ galera-bundle-1@controller-0 rabbitmq-bundle-1@controller-0 redis-bundle-1@controller-0 ]
I wonder your THT include: https://opendev.org/openstack/tripleo-heat-templates/commit/781b1413c48c2e70ccce3a09e1dc3d66be49fa69
*** Bug 1777347 has been marked as a duplicate of this bug. ***
we confirmed that patch was actually missing, I am marking this as duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1767160. Thanks Luca *** This bug has been marked as a duplicate of bug 1767160 ***