Bug 1243109

Summary: Discovery fails if multiple interfaces on provisioning network
Product: Red Hat OpenStack Reporter: Dan Sneddon <dsneddon>
Component: rhosp-directorAssignee: Dan Sneddon <dsneddon>
Status: CLOSED DUPLICATE QA Contact: Shai Revivo <srevivo>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0 (Kilo)CC: athomas, bfournie, dmacpher, dsneddon, dtantsur, hbrock, jmelvin, kimi.zhang, lmartins, mburns, rhel-osp-director-maint, tcrowe, ukalifon, vcojot
Target Milestone: ---Keywords: Triaged
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Discovery fails if multiple network interfaces on a node are connected to the Provisioning network. Only one interface can connect to the Provisioning network. This interface cannot be part of a bond.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-19 17:42:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Sneddon 2015-07-14 19:18:56 UTC
Description of problem:
If multiple interfaces are attached to the ctlplane, discovery times out randomly.

Version-Release number of selected component (if applicable):
All versions as of 2015-07-14

How reproducible:
100%

Steps to Reproduce:
1. Put both eth0 and eth1 on provisioning net
2. Add MAC address from eth0 to instackenv.json
3. Launch discovery

Actual results:
Both interfaces will request DHCP addresses, and the undercloud responds to both interfaces with an IP address. The interface that the system chooses to communicate with the ctlplane may not be the one we want (the one with the MAC address in instackenv.json). Since the traffic to the undercloud is coming from a different IP address than expected, discovery times out.

Expected results:
Discovery should work, even if mutliple interfaces are attached to the provisioning network. This will allow situations where we provision on one interface, then place that interface into a bond. It will also help with virt testing, since we can't currently use multiple interfaces and bonding in virt environments.

Additional info:
If we could somehow modify the dnsmasq that ironic-discoverd uses to only respond to known MAC addresses (from instackenv.json), I think that would clear up this behavior. I'm not sure if that is feasible.

Comment 4 Dan Sneddon 2015-07-20 18:35:48 UTC
Note that this bug affects virt environments (meaning that we can't test bonding in virt), but it also affects BM environments.

If you have a system with only two 10Gb NICs, you should be able to provision off on one interface, then bond both interfaces together and add the other networks as VLANs. Unfortunately, discovery fails in this configuration, because we can't guarantee the system will use the same interface that it booted from (and Ironic expects this).

So we need to fix this bug for baremetal as well as virt, because it widens the range of supported hardware in an HA environment (right now we require at least 3 nics + IPMI interface).

Comment 5 Mike Burns 2015-07-21 11:58:41 UTC
*** Bug 1244906 has been marked as a duplicate of this bug. ***

Comment 6 Mike Burns 2016-04-07 20:43:53 UTC
This bug did not make the OSP 8.0 release.  It is being deferred to OSP 10.

Comment 10 Dmitry Tantsur 2016-10-03 08:55:43 UTC
Dan, can you still reproduce this bug? I assumed it was related to the old iPXE ROM. Now that we're shipping the new iPXE ROM from Jan 2016 in OSPd >= 8, this bug may be gone.

Comment 11 Dan Sneddon 2016-10-04 17:29:29 UTC
(In reply to Dmitry Tantsur from comment #10)
> Dan, can you still reproduce this bug? I assumed it was related to the old
> iPXE ROM. Now that we're shipping the new iPXE ROM from Jan 2016 in OSPd >=
> 8, this bug may be gone.

My environment has mitigations in place to ensure that I don't hit this bug. I'll remove those mitigations and test to make sure I am no longer seeing this behavior and will comment here with results.

Comment 12 Dmitry Tantsur 2016-10-17 09:27:56 UTC
If the issue is still there, it's unlikely to be an easy fix. And we have a work around (meh) in place. So pushing it out of Newton, but let's keep track of it in Ocata for real.

Comment 16 Dmitry Tantsur 2016-10-20 10:02:15 UTC
I have a gut feeling that this is actually related: https://bugs.launchpad.net/puppet-ironic/+bug/1635191. Dan, what do you think?

Comment 17 Dan Sneddon 2017-04-06 01:29:33 UTC
@dtantsur I think it is likely that this bug is related to the Ironic bug. I think we need to retest, since I think it is possible that this issue has been cleared up.

Comment 22 Bob Fournier 2017-12-19 17:42:06 UTC

*** This bug has been marked as a duplicate of bug 1411696 ***