Description of problem: OSP-D overcloud deploy fails on: > Error: unable to get cib > Error: /Stage[main]/Pacemaker::Stonith/Pacemaker::Property[Disable STONITH]/Pcmk_property[property--stonith-enabled]: Could not evaluate: backup_cib: Running: /usr/sbin/pcs cluster cib /var/lib/pacemaker/cib/puppet-cib-backup20170208-14072-1iiyj1e failed with code: 1 -> Seems that unlike with few other configurations (vxlan, ipv4, ...) it happend so far only in ipv6-vlan setup (3controller + 1 or 2 compute, virthost). When pcs cluster cib is tried on controller-0, it cannot fetch data, seems pacemaker is not running correctly, even /etc/corosync/corosync.conf file is missing completely. Version-Release number of selected component (if applicable): > openstack-tripleo-0.0.8-0.2.4de13b3git.el7ost.noarch > openstack-tripleo-puppet-elements-6.0.0-0.20170126053436.688584c.el7ost.noarch > puppet-tripleo-6.1.0-0.20170127040716.d427c2a.el7ost.noarch > python-tripleoclient-6.0.1-0.20170127055753.8ea289c.el7ost.noarch > rhosp-director-images-11.0-20170201.1.el7ost.noarch > openstack-tripleo-ui-2.0.1-0.20170126144317.f3bd97e.el7ost.noarch > openstack-tripleo-validations-5.3.1-0.20170125194508.6b928f1.el7ost.noarch > openstack-tripleo-heat-templates-6.0.0-0.20170127041112.ce54697.el7ost.1.noarch > openstack-tripleo-image-elements-6.0.0-0.20170126135810.00b9869.el7ost.noarch > rhosp-director-images-ipa-11.0-20170201.1.el7ost.noarch > openstack-tripleo-common-5.7.1-0.20170126235054.c75d3c6.el7ost.noarch How reproducible: Always, executed like: > openstack overcloud deploy --debug \ > --templates \ > --libvirt-type kvm \ > --ntp-server ntp.example.org \ > --control-scale 3 \ > --control-flavor controller \--compute-scale 1 \ > --compute-flavor compute \ > --environment-file /usr/share/openstack-tripleo-heat-templates/environments/services/sahara.yaml \ > --environment-file /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml \ > -e /home/stack/virt/network/network-environment-v6.yaml \ > -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \ > -e /home/stack/virt/hostnames.yml \ > -e /home/stack/virt/debug.yaml \ > --log-file overcloud_deployment_37.log mkrcmari is trying to get more detailed info (with ConfigDebug enabled) to pin down more specific cause of the failure.
Created attachment 1248632 [details] /home/stack/virt/debug.yaml
Created attachment 1248633 [details] /home/stack/virt/hostnames.yml
Created attachment 1248635 [details] /home/stack/virt/network/network-environment-v6.yaml original ipv6/ipv4 ranges replaced with dummy example
It seems that It's caused by failed command setting cluster authentication: "/sbin/pcs cluster auth controller-0 controller-1 controller-2 -u hacluster -p ***** --force", because the pacemaker communication is being blocked by iptables at the time of command execution, the rules are added for ipv6 later after the command is being executed. It's reproducible only on ipv6 based deployments because ipv4 iptables rules are empty at the command execution: [heat-admin@controller-0 ~]$ sudo iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination becuase: [heat-admin@controller-0 ~]$ sudo cat /etc/sysconfig/iptables # empty ruleset created by tripleo-image-elements But ipv6 iptables includes some iptable rules at the time of command execution: [heat-admin@controller-0 ~]$ sudo ip6tables -L Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT all anywhere anywhere state RELATED,ESTABLISHED ACCEPT ipv6-icmp anywhere anywhere ACCEPT all anywhere anywhere ACCEPT tcp anywhere anywhere state NEW tcp dpt:ssh ACCEPT udp anywhere fe80::/64 udp dpt:dhcpv6-client state NEW REJECT all anywhere anywhere reject-with icmp6-adm-prohibited Chain FORWARD (policy ACCEPT) target prot opt source destination REJECT all anywhere anywhere reject-with icmp6-adm-prohibited Chain OUTPUT (policy ACCEPT) target prot opt source destination And [heat-admin@controller-0 ~]$ sudo cat /etc/sysconfig/ip6tables # sample configuration for ip6tables service # you can edit this manually or use system-config-firewall # please do not ask us to add additional ports/services to this default configuration *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT -p ipv6-icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT -A INPUT -d fe80::/64 -p udp -m udp --dport 546 -m state --state NEW -j ACCEPT -A INPUT -j REJECT --reject-with icmp6-adm-prohibited -A FORWARD -j REJECT --reject-with icmp6-adm-prohibited COMMIT
So from an initial look this is the downstream ipv6 manifestation of this bug https://bugs.launchpad.net/tripleo/+bug/1657108/. The super short version is that if we start off an image that has prepopulated /etc/sysconfig/ip[6]tables rules (and the iptables package does ship such rules that only allow ssh and icmp), pcs will be executed when the firewall modules has not yet kicked in to open up the pacemaker/pcs ports and so it will fail. To verify/disprove this theory can you try the following on the undercloud: echo '' > /tmp/iptables echo '' > /tmp/ip6tables virt-copy-in -a overcloud-full.qcow2 /tmp/iptables /etc/sysconfig/ virt-copy-in -a overcloud-full.qcow2 /tmp/ip6tables /etc/sysconfig/ openstack overcloud image upload --image-path . --update-existing And then try and redeploy? Note that we already have fixes in order to empty these stock rules from the image building process. I assume that they have not yet hit downstream, because if that were the case we would not see the entries in ip[6]tables at comment 5.
(In reply to Michele Baldessari from comment #7) > And then try and redeploy? Note that we already have fixes in order to empty > these stock rules from the image building process. I assume that they have > not yet hit downstream, because if that were the case we would not see the > entries in ip[6]tables at comment 5. I am confirming Michele's assumption - The deployment was successful after placing empty iptables rules into overcloud image and relabeling selinux.
Mike, any idea when we will build images that have the following t-i-e patch? https://review.openstack.org/#/c/426144/ thanks, Michele
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1245