Bug 1302402

Summary: pxe boot time outs (PXE-E32) on ospd 7 during introspection and during deployment
Product: Red Hat OpenStack Reporter: Alex Krzos <akrzos>
Component: rhosp-directorAssignee: Hugh Brock <hbrock>
Status: CLOSED CURRENTRELEASE QA Contact: Shai Revivo <srevivo>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: akrzos, dtantsur, jcoufal, jtaleric, lmartins, mburns, rhel-osp-director-maint
Target Milestone: ---   
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-10-14 16:20:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alex Krzos 2016-01-27 18:33:05 UTC
Description of problem:
I am testing the bits from rhn (OSPD 7.2?) for an anticipated deployment with similar hardware and I can not get consistent behavior with introspection and can not get a working deployment.  I can not get past PXE booting nodes in a consistent manner.

Version-Release number of selected component (if applicable):
python-rdomanager-oscplugin-0.0.10-22.el7ost.noarch
openstack-ironic-common-2015.1.2-2.el7ost.noarch
python-ironicclient-0.5.1-12.el7ost.noarch
python-ironic-discoverd-1.1.0-8.el7ost.noarch
openstack-ironic-conductor-2015.1.2-2.el7ost.noarch
openstack-ironic-api-2015.1.2-2.el7ost.noarch
openstack-ironic-discoverd-1.1.0-8.el7ost.noarch

How reproducible:
Always

Steps to Reproduce:
1.  Follow deployment instructions on access.redhat.com and attempt introspection while viewing nodes consoles.
2.
3.

Actual results:
PXE-E32: TFTP open timeout on each nodes consoles

Expected results:
Nodes to boot (consistently) and run through discovery so then they can be provisioned.

Additional info:
The network is correctly setup as previous builds worked on this hardware.  last working version used on this hardware was (python-rdomanager-oscplugin-0.0.10-19.el7ost.noarch)  I have a lab network, provisioning network and single vlan-ed network for all other required networks.

Several attempts at deploying (this version) on this setup have results in occasionally successful discovery process.  However the following deployment has consistently failed everytime afterwards.  Something is also adding iptable rules to prevent pxe booting consistently:

Chain discovery (1 references)
target     prot opt source               destination
DROP       all  --  anywhere             anywhere             MAC D4:85:64:79:EB:FC
DROP       all  --  anywhere             anywhere             MAC D4:85:64:79:AF:C0
ACCEPT     all  --  anywhere             anywhere

Even if I manually drop them, the rules reappear after 5-10seconds.  This behavior is not exhibited during introspection however I still see timeouts.

Comment 2 Jaromir Coufal 2016-01-27 22:06:45 UTC
Hi Lucas, on MikeO's behalf -- do you mind having a quick look into this one and provide some investigation what might be a cause?

Comment 4 Mike Burns 2016-04-07 21:07:13 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 6 Dmitry Tantsur 2016-10-14 15:41:12 UTC
Hi! Could you please check if the issue can be reproduced on OSP 10?

Comment 7 Alex Krzos 2016-10-14 15:51:32 UTC
(In reply to Dmitry Tantsur from comment #6)
> Hi! Could you please check if the issue can be reproduced on OSP 10?

I have not seen this(or reproduced it) on any of the OSP10 (Newton) deployments I have done via director.

Comment 8 Dmitry Tantsur 2016-10-14 16:20:37 UTC
Thanks! I suspect it was fixed after we updated the shipped firmware.