Bug 1308987

Summary: Overcloud nodes fail to boot
Product: Red Hat OpenStack Reporter: lance
Component: openstack-ironicAssignee: Dmitry Tantsur <dtantsur>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Toure Dunnon <tdunnon>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: cshastri, dtantsur, lance, mburns, rhel-osp-director-maint, slinaber, yeylon
Target Milestone: ---Keywords: Reopened
Target Release: 8.0 (Liberty)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-31 11:53:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Snippet from /var/log/messsages showing context of failing iptables command none

Description lance 2016-02-16 16:33:49 UTC
Created attachment 1127629 [details]
Snippet from /var/log/messsages showing context of failing iptables command

I run: openstack overcloud deploy --templates --control-flavor control --compute-flavor compute --control-scale 1 --compute-scale 3 --neutron-tunnel-types vxlan --neutron-network-type vxlan

Nodes power up and DHCP successfully, but then fail to tftp their boot images.  Looking at tcpdump and iptables, it seems the TFTP requests are being rejected by iptables rules.

/var/log/messages shows the following iptables (possibly related?) messages from around the time of the deployment attempt:

Feb 16 13:53:07 calico-rh-director ironic-inspector: 2016-02-16 13:53:07.242 807 DEBUG ironic_inspector.firewall [-] ignoring failed iptables ('-D', 'INPUT', '-i', 'br-ctlplane', '-p', 'udp', '--dport', '67', '-j', 'ironic-inspector_temp'):
Feb 16 13:53:07 calico-rh-director ironic-inspector: iptables v1.4.21: Couldn't load target `ironic-inspector_temp':No such file or directory
Feb 16 13:53:07 calico-rh-director ironic-inspector: Try `iptables -h' or 'iptables --help' for more information.
Feb 16 13:53:07 calico-rh-director ironic-inspector: _iptables /usr/lib/python2.7/site-packages/ironic_inspector/firewall.py:45

Uploaded snippet of /var/log/messages provides a bit more context for this message.

I'm running:
Name        : openstack-ironic-inspector
Arch        : noarch
Version     : 2.2.2
Release     : 1.el7ost
Size        : 733 k
Repo        : installed
From repo   : RH7-RHOS-8.0-director

This follows an upgrade from OSP8 beta 4 to beta 6 (by running yum update) - overcloud deployment was working fine on this system with beta 4.

Deleting the stack and re-creating hits the same problem.  Deleting the stack, rebooting and re-creating hits the same problem.

Comment 2 Dmitry Tantsur 2016-02-19 11:45:58 UTC
Just to clarify: are you using TFTP (i.e. PXE) or HTTP (i.e. iPXE)? The default is iPXE. Also, could you please check if you're affected by https://bugzilla.redhat.com/show_bug.cgi?id=1308611 ?

P.S.
Deployment is handled by ironic, not inspector - changing project.

Comment 3 lance 2016-02-19 11:54:54 UTC
I was using TFTP (tcpdump could see the servers sending TFTP requests in - which were then being dropped by iptables), though I don't recall changing anything to do that (i.e. I didn't notice the default was HTTP, despite snooping the boot traffic).  Has the default changed between Beta 4 and Beta 6?

As for whether I'm affected by 1308611 - I've no idea - the nodes were already introspected at Beta 4 - I was only doing the deployment with Beta 6.

Note that this system is now destroyed - I wiped it all and reinstalled from scratch using Beta 6 - and the system works fine now.

Comment 4 Dmitry Tantsur 2016-02-19 12:10:58 UTC
We are using iPXE by default, but we do have TFTP for bootstraping the iPXE image. I'm asking, because the iptables rules you mention are required for ironic-inspector to work and only affect DHCP. It could happen that ironic-inspector is doing something wrong with iptables, of course, but I don't see anything wrong in your snippets (no ERROR's, only DEBUG's).

So if you experience this problem again, could you please reopen this bug with providing logs from 'sudo journalctl -u openstack-ironic-conductor -u openstack-ironic-inspector -u openstack-ironic-inspector-dnsmasq'?