Description of problem: If multiple interfaces are attached to the ctlplane, discovery times out randomly. Version-Release number of selected component (if applicable): All versions as of 2015-07-14 How reproducible: 100% Steps to Reproduce: 1. Put both eth0 and eth1 on provisioning net 2. Add MAC address from eth0 to instackenv.json 3. Launch discovery Actual results: Both interfaces will request DHCP addresses, and the undercloud responds to both interfaces with an IP address. The interface that the system chooses to communicate with the ctlplane may not be the one we want (the one with the MAC address in instackenv.json). Since the traffic to the undercloud is coming from a different IP address than expected, discovery times out. Expected results: Discovery should work, even if mutliple interfaces are attached to the provisioning network. This will allow situations where we provision on one interface, then place that interface into a bond. It will also help with virt testing, since we can't currently use multiple interfaces and bonding in virt environments. Additional info: If we could somehow modify the dnsmasq that ironic-discoverd uses to only respond to known MAC addresses (from instackenv.json), I think that would clear up this behavior. I'm not sure if that is feasible.
Note that this bug affects virt environments (meaning that we can't test bonding in virt), but it also affects BM environments. If you have a system with only two 10Gb NICs, you should be able to provision off on one interface, then bond both interfaces together and add the other networks as VLANs. Unfortunately, discovery fails in this configuration, because we can't guarantee the system will use the same interface that it booted from (and Ironic expects this). So we need to fix this bug for baremetal as well as virt, because it widens the range of supported hardware in an HA environment (right now we require at least 3 nics + IPMI interface).
*** Bug 1244906 has been marked as a duplicate of this bug. ***
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
Dan, can you still reproduce this bug? I assumed it was related to the old iPXE ROM. Now that we're shipping the new iPXE ROM from Jan 2016 in OSPd >= 8, this bug may be gone.
(In reply to Dmitry Tantsur from comment #10) > Dan, can you still reproduce this bug? I assumed it was related to the old > iPXE ROM. Now that we're shipping the new iPXE ROM from Jan 2016 in OSPd >= > 8, this bug may be gone. My environment has mitigations in place to ensure that I don't hit this bug. I'll remove those mitigations and test to make sure I am no longer seeing this behavior and will comment here with results.
If the issue is still there, it's unlikely to be an easy fix. And we have a work around (meh) in place. So pushing it out of Newton, but let's keep track of it in Ocata for real.
I have a gut feeling that this is actually related: https://bugs.launchpad.net/puppet-ironic/+bug/1635191. Dan, what do you think?
@dtantsur I think it is likely that this bug is related to the Ironic bug. I think we need to retest, since I think it is possible that this issue has been cleared up.
*** This bug has been marked as a duplicate of bug 1411696 ***