Bug 1308987 - Overcloud nodes fail to boot
Overcloud nodes fail to boot
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic (Show other bugs)
8.0 (Liberty)
x86_64 Linux
unspecified Severity unspecified
: ---
: 8.0 (Liberty)
Assigned To: Dmitry Tantsur
Toure Dunnon
: Reopened
Depends On:
  Show dependency treegraph
Reported: 2016-02-16 11:33 EST by lance
Modified: 2017-01-31 06:56 EST (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2017-01-31 06:53:21 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
Snippet from /var/log/messsages showing context of failing iptables command (5.64 KB, text/plain)
2016-02-16 11:33 EST, lance
no flags Details

  None (edit)
Description lance 2016-02-16 11:33:49 EST
Created attachment 1127629 [details]
Snippet from /var/log/messsages showing context of failing iptables command

I run: openstack overcloud deploy --templates --control-flavor control --compute-flavor compute --control-scale 1 --compute-scale 3 --neutron-tunnel-types vxlan --neutron-network-type vxlan

Nodes power up and DHCP successfully, but then fail to tftp their boot images.  Looking at tcpdump and iptables, it seems the TFTP requests are being rejected by iptables rules.

/var/log/messages shows the following iptables (possibly related?) messages from around the time of the deployment attempt:

Feb 16 13:53:07 calico-rh-director ironic-inspector: 2016-02-16 13:53:07.242 807 DEBUG ironic_inspector.firewall [-] ignoring failed iptables ('-D', 'INPUT', '-i', 'br-ctlplane', '-p', 'udp', '--dport', '67', '-j', 'ironic-inspector_temp'):
Feb 16 13:53:07 calico-rh-director ironic-inspector: iptables v1.4.21: Couldn't load target `ironic-inspector_temp':No such file or directory
Feb 16 13:53:07 calico-rh-director ironic-inspector: Try `iptables -h' or 'iptables --help' for more information.
Feb 16 13:53:07 calico-rh-director ironic-inspector: _iptables /usr/lib/python2.7/site-packages/ironic_inspector/firewall.py:45

Uploaded snippet of /var/log/messages provides a bit more context for this message.

I'm running:
Name        : openstack-ironic-inspector
Arch        : noarch
Version     : 2.2.2
Release     : 1.el7ost
Size        : 733 k
Repo        : installed
From repo   : RH7-RHOS-8.0-director

This follows an upgrade from OSP8 beta 4 to beta 6 (by running yum update) - overcloud deployment was working fine on this system with beta 4.

Deleting the stack and re-creating hits the same problem.  Deleting the stack, rebooting and re-creating hits the same problem.
Comment 2 Dmitry Tantsur 2016-02-19 06:45:58 EST
Just to clarify: are you using TFTP (i.e. PXE) or HTTP (i.e. iPXE)? The default is iPXE. Also, could you please check if you're affected by https://bugzilla.redhat.com/show_bug.cgi?id=1308611 ?

Deployment is handled by ironic, not inspector - changing project.
Comment 3 lance 2016-02-19 06:54:54 EST
I was using TFTP (tcpdump could see the servers sending TFTP requests in - which were then being dropped by iptables), though I don't recall changing anything to do that (i.e. I didn't notice the default was HTTP, despite snooping the boot traffic).  Has the default changed between Beta 4 and Beta 6?

As for whether I'm affected by 1308611 - I've no idea - the nodes were already introspected at Beta 4 - I was only doing the deployment with Beta 6.

Note that this system is now destroyed - I wiped it all and reinstalled from scratch using Beta 6 - and the system works fine now.
Comment 4 Dmitry Tantsur 2016-02-19 07:10:58 EST
We are using iPXE by default, but we do have TFTP for bootstraping the iPXE image. I'm asking, because the iptables rules you mention are required for ironic-inspector to work and only affect DHCP. It could happen that ironic-inspector is doing something wrong with iptables, of course, but I don't see anything wrong in your snippets (no ERROR's, only DEBUG's).

So if you experience this problem again, could you please reopen this bug with providing logs from 'sudo journalctl -u openstack-ironic-conductor -u openstack-ironic-inspector -u openstack-ironic-inspector-dnsmasq'?

Note You need to log in before you can comment on or make changes to this bug.