Red Hat Bugzilla – Bug 1308987
Overcloud nodes fail to boot
Last modified: 2017-01-31 06:56:18 EST
Created attachment 1127629 [details]
Snippet from /var/log/messsages showing context of failing iptables command
I run: openstack overcloud deploy --templates --control-flavor control --compute-flavor compute --control-scale 1 --compute-scale 3 --neutron-tunnel-types vxlan --neutron-network-type vxlan
Nodes power up and DHCP successfully, but then fail to tftp their boot images. Looking at tcpdump and iptables, it seems the TFTP requests are being rejected by iptables rules.
/var/log/messages shows the following iptables (possibly related?) messages from around the time of the deployment attempt:
Feb 16 13:53:07 calico-rh-director ironic-inspector: 2016-02-16 13:53:07.242 807 DEBUG ironic_inspector.firewall [-] ignoring failed iptables ('-D', 'INPUT', '-i', 'br-ctlplane', '-p', 'udp', '--dport', '67', '-j', 'ironic-inspector_temp'):
Feb 16 13:53:07 calico-rh-director ironic-inspector: iptables v1.4.21: Couldn't load target `ironic-inspector_temp':No such file or directory
Feb 16 13:53:07 calico-rh-director ironic-inspector: Try `iptables -h' or 'iptables --help' for more information.
Feb 16 13:53:07 calico-rh-director ironic-inspector: _iptables /usr/lib/python2.7/site-packages/ironic_inspector/firewall.py:45
Uploaded snippet of /var/log/messages provides a bit more context for this message.
Name : openstack-ironic-inspector
Arch : noarch
Version : 2.2.2
Release : 1.el7ost
Size : 733 k
Repo : installed
From repo : RH7-RHOS-8.0-director
This follows an upgrade from OSP8 beta 4 to beta 6 (by running yum update) - overcloud deployment was working fine on this system with beta 4.
Deleting the stack and re-creating hits the same problem. Deleting the stack, rebooting and re-creating hits the same problem.
Just to clarify: are you using TFTP (i.e. PXE) or HTTP (i.e. iPXE)? The default is iPXE. Also, could you please check if you're affected by https://bugzilla.redhat.com/show_bug.cgi?id=1308611 ?
Deployment is handled by ironic, not inspector - changing project.
I was using TFTP (tcpdump could see the servers sending TFTP requests in - which were then being dropped by iptables), though I don't recall changing anything to do that (i.e. I didn't notice the default was HTTP, despite snooping the boot traffic). Has the default changed between Beta 4 and Beta 6?
As for whether I'm affected by 1308611 - I've no idea - the nodes were already introspected at Beta 4 - I was only doing the deployment with Beta 6.
Note that this system is now destroyed - I wiped it all and reinstalled from scratch using Beta 6 - and the system works fine now.
We are using iPXE by default, but we do have TFTP for bootstraping the iPXE image. I'm asking, because the iptables rules you mention are required for ironic-inspector to work and only affect DHCP. It could happen that ironic-inspector is doing something wrong with iptables, of course, but I don't see anything wrong in your snippets (no ERROR's, only DEBUG's).
So if you experience this problem again, could you please reopen this bug with providing logs from 'sudo journalctl -u openstack-ironic-conductor -u openstack-ironic-inspector -u openstack-ironic-inspector-dnsmasq'?