Bug 1318095

Summary: IPA uses wrong mac in posting collected data when host has multiple nics (1 + 5)
Product: Red Hat OpenStack Reporter: Steve Baker <sbaker>
Component: rhosp-directorAssignee: Dmitry Tantsur <dtantsur>
Status: CLOSED NOTABUG QA Contact: Arik Chernetsky <achernet>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.0 (Liberty)CC: dbecker, dtantsur, lmartins, mburns, mcornea, morazi, rhel-osp-director-maint, sbaker, slinaber, yeylon
Target Milestone: ---   
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-31 02:41:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Screenshot of IPA log when inspector posting collected data is attempted
none
IPA journal from /var/log/ironic-inspector/ramdisk none

Description Steve Baker 2016-03-16 03:41:23 UTC
Created attachment 1136821 [details]
Screenshot of IPA log when inspector posting collected data is attempted

In an OVB environment where nodes have a nic for provisioning network plus nics for full multi-nic network isolation. The isolated networks are not running DHCP yet so IPA doesn't configure IPs for these nics.

The problem is that when inspector is posted collected data, the mac used is for one of the non-configured isolation nics instead of the configured provisioning nic.

See the attached screenshot, which shows the mac for unconfigured eth1 instead of the provisioning nic eth0.

This is in an OVB environment on the rhos-central-ci cloud, so can be replicated on demand.

This bug will prevent using OVB environments like rhos-central-ci to do CI testing of network isolation.

Comment 1 Steve Baker 2016-03-16 03:43:00 UTC
Setting needinfo for mburns to evaluate this for blocker

Comment 2 Dmitry Tantsur 2016-03-16 12:31:34 UTC
One guess: did you try it with the latest iPXE ROM available in poodles? Last time I saw it, we provided a wrong BOOTIF.

Also in your case there may be a workaround: will it work if you set https://github.com/openstack/ironic-inspector/blob/master/example.conf#L602 to "active"?

Comment 3 Steve Baker 2016-03-16 19:51:59 UTC
This is with 8.0 puddle 20160311.1. Is the iPXE rom in latest poodles even newer?

I'll try the workaround.

Comment 4 Steve Baker 2016-03-16 22:49:51 UTC
The following worked for me before doing introspection:

    openstack-config --set /etc/ironic-inspector/inspector.conf processing add_ports active
    systemctl restart openstack-ironic-inspector

I'm assigning this bug to instack-undercloud so they can evaluate whether add_ports should be set to active for all undercloud installs.

Comment 6 Dmitry Tantsur 2016-03-17 10:18:07 UTC
I don't think it should be set to "active". It has high chances of breaking other use cases.

I still wonder why it doesn't work in your case. Could you please get ironic-inspector ramdisk logs for me? Please set https://github.com/openstack/ironic-inspector/blob/master/example.conf#L647 to true, restart ironic-inspector, restart introspection, and grab the tarball from /var/log/ironic-inspector/ramdisk.

Comment 7 Steve Baker 2016-03-22 02:57:41 UTC
Created attachment 1138862 [details]
IPA journal from /var/log/ironic-inspector/ramdisk

Comment 8 Steve Baker 2016-03-22 03:03:07 UTC
The attached journal file shows that the incorrect mac is being passed in as a BOOTIF kernel parameter, so iPXE is specifiying this.

My /httpboot/inspector.ipxe has BOOTIF=${mac} but looking at the ipxe docs[1] the only examples show explicitly specifying which interface such as ${net0/mac}

Sure enough, setting inspector.ipxe BOOTIF=${net0/mac} fixed this problem for me.

Is the inspector interface ever anything other than net0? I've tried in the past to make it later in the interface order and the result was no booting. What I'm hoping is that BOOTIF=${net0/mac} can be proposed as a fix to ironic/drivers/modules/ipxe_config.template

[1] http://ipxe.org/cfg/mac

Comment 9 Steve Baker 2016-03-22 04:17:18 UTC
... and there is a vaguely related upstream bug whose root cause was iPXE ${mac} not corresponding to the boot mac https://bugs.launchpad.net/ironic/+bug/1504482

Comment 10 Dmitry Tantsur 2016-03-28 17:49:22 UTC
Folks, please stop changing projects randomly :( there is nothing in ironic-inspector itself related to the discussion right now... Puppet is managing the iPXE setting for us.

I was expecting the iPXE update to fix the issue, but it seems to be not the case. We probably need to get our iPXE experts involved again, as always assuming the 1st NIC (which is what net0/mac does) is not the way to go either..

Comment 11 Steve Baker 2016-03-29 01:20:02 UTC
The version of iPXE I'm seeing during introspection boot is c4bce43, which I believe is still the old one. This is with 20160318.2 puddle and images from http://rhos-release.virt.bos.redhat.com/mburns/latest-8.0-images/

Comment 12 Dmitry Tantsur 2016-03-29 13:59:02 UTC
Steve,

what's ipxe-bootimgs package version for you? I see 20160127-1.git6366fa7a.el7 with the latest poodle.

Comment 13 Steve Baker 2016-03-29 19:47:34 UTC
Dmitry, my undercloud has ipxe-bootimgs-20160127-1.git6366fa7a.el7.

However this is an openstack-virtual-baremetal environment and I suspect the iPXE being booted is the one that comes from ipxe-roms-qemu on the *host* cloud (in this case, the rhos-dev-ci cloud running 7.3)

I'd like to explore a couple of options for not having to upgrade ipxe-roms-qemu on the host cloud. Do you have any suggestions of how I might chain the first iPXE to boot the iPXE in undercloud /tftpboot?

Comment 14 Lucas Alvares Gomes 2016-03-30 10:41:13 UTC
(In reply to Steve Baker from comment #13)
> Dmitry, my undercloud has ipxe-bootimgs-20160127-1.git6366fa7a.el7.
> 
> However this is an openstack-virtual-baremetal environment and I suspect the
> iPXE being booted is the one that comes from ipxe-roms-qemu on the *host*
> cloud (in this case, the rhos-dev-ci cloud running 7.3)
> 
> I'd like to explore a couple of options for not having to upgrade
> ipxe-roms-qemu on the host cloud. Do you have any suggestions of how I might
> chain the first iPXE to boot the iPXE in undercloud /tftpboot?

Hi Steve,

Yes this is a tricky one, because the VMs will no chainload the iPXE ROM from the /tftpboot directory since it's already booting from iPXE. This happens because the DHCP server has a simple conditional: If not booting from iPXE then chainload; if booting from iPXE fetch the iPXE script and continue with the boot process. See: https://github.com/openstack/ironic/blob/69c33f7ed5004afd4fd1589f1aed0e498845a952/ironic/common/pxe_utils.py#L316-L321

Now, this is even trickier for inspector. As you rightly pointed out in comment #9 we did have this problem in Ironic and the way we solved it in a generic way was by iterating on all nics and trying to find the iPXE configuration in the /httpboot dir that matches the MAC address of that nic. That works for Ironic because Ironic has the node's MAC address registered in the database, but that is not the case for inspector.

So two solutions here, but I don't think that neither of them should go upstream because they make the inspector.ipxe script rigid, the right solution upstream is to ask people to update their packages (unfortunately):

* Solution 1: Since it's VM, you can edit the inspector.ipxe script and add the right nic number to it just like you did on comment comment #8. In VMs the order of the NICs are static so you won't have a problem of net0 net1 being switched between boots.

* Solution 2: Force a chainload to a newer iPXE ROM. Apart from chainloading it using the DHCP server options we can do it directly in the inspector.ipxe script. E.g we could check which version of the iPXE ROM we are using and if that does not match the one we expect we tell it fetch the right one from the /tftpboot dir, e.g:

#!ipxe

set EXPECTED_VERSION 1.0.0+ (abcdef) 

# Check if version is set and if the version matches the one we expect, if not chainload
isset ${version} && iseq ${version} ${EXPECTED_VERSION} && goto boot_inspector ||
echo "Not the current version, upgrading"
chain tftp://{{next-server}}/undionly.kpxe

:boot_inspector
<original inspector.ipxe content here>


ps*: I have not tested the script above yet.

...

I was looking at a way to tell QEMU to use standard PXE instead of iPXE for network boot so the chainload would happen automatically. But I couldn't find a way to do it.

Hope that helps,
Lucas

Comment 15 Dmitry Tantsur 2016-03-30 10:44:41 UTC
Thanks Lucas. I have nothing to add unfortunately.

Comment 16 Steve Baker 2016-03-30 20:33:49 UTC
Thanks Lucas, solution 2 sounds worth trying. I was thinking of a solution 3 changing the ironic-inspector dnsmasq.conf tag filtering which is currently:

  dhcp-boot=tag:!ipxe,undionly.kpxe,localhost.localdomain,192.0.2.1
  dhcp-boot=tag:ipxe,http://192.0.2.1:8088/inspector.ipxe

What I'm hoping is that there is some revealed difference between the old qemu iPXE and the one served by undionly.kpxe so that I can add a third dhcp-boot entry which boots undionly.kpxe. I'm not sure if there is enough information to do this, or how to discover what tags are available to filter on.

Comment 17 Steve Baker 2016-03-31 02:41:04 UTC
I've confirmed that this isn't an issue when iPXE 6366fa7a is loaded, which I achieved using Lucas's solution 2. Below is my modified inspector.ipxe which is populated with an appropriate git version by a handful of ansible tasks.

I do wonder if something like this should be contributed upstream (well, puppet-ironic). How often will ancient iPXE be running on flashed hardware rather than being loaded via tftp?

#!ipxe

dhcp

set EXPECTED_VERSION 1.0.0+ (6366fa7a)

# Check if version is set and if the version matches the one we expect, if not chainload
isset ${EXPECTED_VERSION} || goto boot_inspector
echo Expected iPXE version ${EXPECTED_VERSION}
isset ${version} || goto boot_chained
iseq ${version} ${EXPECTED_VERSION} && goto boot_inspector

:boot_chained
echo Booting chained iPXE
chain undionly.kpxe

:boot_inspector
echo Booting inspector