Bug 1234601 - Node introspection fails when servers have multiple NICs
Summary: Node introspection fails when servers have multiple NICs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ga
: 8.0 (Liberty)
Assignee: Mike Burns
QA Contact: Marius Cornea
URL:
Whiteboard:
: 1249199 1326481 (view as bug list)
Depends On:
Blocks: 1273561
TreeView+ depends on / blocked
 
Reported: 2015-06-22 19:08 UTC by Marius Cornea
Modified: 2023-02-22 23:02 UTC (History)
38 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
The Ramdisk and Kernel images booted without specifying a particular interface. This meant the system booted from any network adapter, which caused problems when more than one interface was on the Provisioning network. In those cases it was necessary to specify which interface the system should use to boot. The interface specified should correspond to the interface which carried the MAC address from the instackenv.json file. As a workaround, copy and paste the following block of text as the root user into the director's terminal.This creates a systemd startup script sets these parameters on every boot. The script contains a sed command which includes "net0/mac". This sets the director to use the first Ethernet interface. Change this to "net1/mac" to use the second interface, and so on. ##################################### cat << EOF > /usr/bin/bootif-fix #!/usr/bin/env bash while true; do find /httpboot/ -type f ! -iname "kernel" ! -iname "ramdisk" ! -iname "*.kernel" ! -iname "*.ramdisk" -exec sed -i 's|{mac|{net0/mac|g' {} +; done EOF chmod a+x /usr/bin/bootif-fix cat << EOF > /usr/lib/systemd/system/bootif-fix.service [Unit] Description=Automated fix for incorrect iPXE BOOFIF [Service] Type=simple ExecStart=/usr/bin/bootif-fix [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl enable bootif-fix systemctl start bootif-fix ####################################### The bootif-fix script runs on every boot. This enables booting from a specified NIC when more than one NIC is on the Provisioning network. To disable the service and return to the previous behavior, run "systemctl disable bootif-fix" and reboot.
Clone Of:
Environment:
Last Closed: 2016-04-07 21:37:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
rdsosreport (198.76 KB, text/plain)
2015-06-22 19:08 UTC, Marius Cornea
no flags Details
output of /log file (39.66 KB, image/png)
2015-07-07 15:17 UTC, Ola Pavlenko
no flags Details
console (45.72 KB, image/png)
2015-07-07 15:17 UTC, Ola Pavlenko
no flags Details
console 1 (38.85 KB, image/png)
2015-07-07 15:18 UTC, Ola Pavlenko
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:0604 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 8 director Enhancement Advisory 2016-04-08 01:03:56 UTC

Description Marius Cornea 2015-06-22 19:08:04 UTC
Created attachment 1041981 [details]
rdsosreport

Description of problem:
I'm trying to manually assign an additional NIC (in an isolated bridge) to the baremetal VM in a virtual environment. Node introspection fails with the VMs ending on the dracut shell. Apparently the IP address of the first NIC (from the 192.0.2.0 subnet) gets assigned to the 2nd NIC which shouldn't have an IP address. 
 
Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. On the physical host create an OVS bridge 
ovs-vsctl add-br isolated
2. Define new libvirt network which uses the bridge
cat > isolated-net << EOF
> <network>
>   <name>isolated</name>
>   <forward mode='bridge'/>
>   <bridge name='isolated'/>
>   <virtualport type='openvswitch'/>
> </network>
> EOF
virsh net-define isolated-net
virsh net-start isolated

3. Add additional interface in the isolated network on the baremetal VMs:
virsh dumpxml baremetal_0
...
   <interface type='bridge'>
      <mac address='00:31:fb:87:97:54'/>
      <source network='brbm' bridge='brbm'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='a056b407-8319-46d3-b440-0fa3b7acea8c'/>
      </virtualport>
      <target dev='vnet2'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:86:ee:e2'/>
      <source network='isolated' bridge='isolated'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='33a8db53-9277-4f85-95e2-a6274ec97d7c'/>
      </virtualport>
      <target dev='vnet3'/>
      <model type='virtio'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </interface> 
...

4. On the undercloud run:
openstack baremetal import --json instackenv.json
openstack baremetal configure boot
openstack baremetal introspection bulk start

Actual results:
VMs end in the dracut shell.

Expected results:
VMs complete the discovery process.

Additional info:
rdsosreport.txt attached.

Comment 3 Dmitry Tantsur 2015-06-25 16:35:32 UTC
Note this insane thing:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:1b:97:12:95:47 brd ff:ff:ff:ff:ff:ff
    inet 192.0.2.118/24 brd 192.0.2.255 scope global dynamic eth0
       valid_lft 70sec preferred_lft 70sec
    inet6 fe80::21b:97ff:fe12:9547/64 scope link 
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:8d:23:4d brd ff:ff:ff:ff:ff:ff
    inet 192.0.2.118 peer 192.0.2.1/32 scope global eth1
       valid_lft forever preferred_lft forever

that does not look right - the same IP address.

Would be really cool to get output of $ sudo journalctl -u openstack-ironic-discoverd-dnsmasq if it's still possible.

Comment 4 Dmitry Tantsur 2015-06-25 16:37:47 UTC
We've seen one more case of the same issue, where I was able to investigate more. Looked like the correct MAC did PXE booting, but then somehow it turned out that IP address was assigned to incorrect MAC. BOOTIF kernel parameter on a node also contained MAC address of incorrect MAC. No surprise that the ramdisk could not reach discoverd. Magic...

Comment 5 Salman Khan 2015-06-29 18:05:24 UTC
Any way around to cater this, we are hitting this too.

Comment 7 Ola Pavlenko 2015-07-07 15:17:27 UTC
Created attachment 1049413 [details]
output of /log file

Comment 8 Ola Pavlenko 2015-07-07 15:17:47 UTC
Created attachment 1049414 [details]
console

Comment 9 Ola Pavlenko 2015-07-07 15:18:35 UTC
(In reply to Ola Pavlenko from comment #8)
> Created attachment 1049414 [details]
> console

output of journalctl

Comment 10 Ola Pavlenko 2015-07-07 15:18:58 UTC
Created attachment 1049415 [details]
console 1

Comment 13 Rhys Oxenham 2015-08-04 11:46:01 UTC
Whilst it seems as though libvirt and iPXE consistently order the NICs based on their PCI bus ID, i.e. first nic being net0, second net1, etc, the use of BOOTIF=${mac} is problematic when you have multiple NICs.

In my testing, what does seem to be consistent is that 'net0' is always the one that gets assigned the BOOTIF MAC address, which is not what we want if net1 is the pxe/provisioning network.

If you're sure that net1 is your provisioning network you can do:

# sed -i 's/mac/net1\/mac/' /httpboot/discoverd.ipxe

To give you BOOTIF=${net1/mac}, but this assumes that you're using net1, and that you've already ensured that net1 is your boot priority.

I suspect that this is an iPXE issue where it doesn't correctly assign ${mac} as the booting interface.

Comment 14 Dan Macpherson 2015-08-07 05:23:48 UTC
Confirming this issue. I've been doing trying out discovery with a set of multi-NIC libvirt VMs on my test environment. For some reason, discoverd.ipxe provides my system with the 2nd NIC's mac address when it boots off the 1st NIC. This conflict causes the discovery image to fail and hit dracut. Workaround for this issue for me was adding BOOTIF=${net0/mac} for my environment.

However, can't confirm if this affects a bare metal/IPMI setup. Someone might need to test this out on BM.

Comment 15 Jon Thomas 2015-08-11 16:29:30 UTC
*** Bug 1249199 has been marked as a duplicate of this bug. ***

Comment 16 Sadique Puthen 2015-08-18 14:45:14 UTC
(In reply to Rhys Oxenham from comment #13)
> Whilst it seems as though libvirt and iPXE consistently order the NICs based
> on their PCI bus ID, i.e. first nic being net0, second net1, etc, the use of
> BOOTIF=${mac} is problematic when you have multiple NICs.
> 
> In my testing, what does seem to be consistent is that 'net0' is always the
> one that gets assigned the BOOTIF MAC address, which is not what we want if
> net1 is the pxe/provisioning network.
> 
> If you're sure that net1 is your provisioning network you can do:
> 
> # sed -i 's/mac/net1\/mac/' /httpboot/discoverd.ipxe
> 
> To give you BOOTIF=${net1/mac}, but this assumes that you're using net1, and
> that you've already ensured that net1 is your boot priority.
> 
> I suspect that this is an iPXE issue where it doesn't correctly assign
> ${mac} as the booting interface.

One of my customer confirmed that this work around helps him get past the discovery process, but again hits the bug when the node is rebooted for overcloud deployment. Is there any work around to overcome this stage?

"Unfortunately the workaround works for the discvovery image, but if I try to deploy the overcloud the PXE will fail.
It looks like that he starts booting from the 1st NIC, as I have set it in the Boot Options, but then he tried to get pxelinux.cfg for the 2nd NIC ??

I have attached a screenshot."

Comment 19 Rhys Oxenham 2015-08-18 21:32:07 UTC
Ack on Dan's comments above, pxe_ssh really needs to be able to set the iPXE BOOTIF flag based on the NIC order of the virtual machine, or we hard code the requirement for the provisioning NIC to ALWAYS be on eth0/net0 and forcefully ensure that we use, say, BOOTIF=${net0/mac}.

The same is with deploy too; ${mac} also gets used there to uniquely identify the node, and deployments also fail. My hacky workaround is as follows, but works 100% of the time in my experience:


----
undercloud# cat << EOF > /usr/bin/bootif-fix
#!/usr/bin/env bash

while true;
        do find /httpboot/ -type f ! -iname "kernel" ! -iname "ramdisk" ! -iname "*.kernel" ! -iname "*.ramdisk" -exec sed -i 's|{mac|{net0/mac|g' {} +;
done
EOF

undercloud# chmod a+x /usr/bin/bootif-fix
undercloud# cat << EOF > /usr/lib/systemd/system/bootif-fix.service
[Unit]
Description=Automated fix for incorrect iPXE BOOFIF

[Service]
Type=simple
ExecStart=/usr/bin/bootif-fix

[Install]
WantedBy=multi-user.target
EOF

undercloud# systemctl daemon-reload
undercloud# systemctl enable bootif-fix
ln -s '/usr/lib/systemd/system/bootif-fix.service' '/etc/systemd/system/multi-user.target.wants/bootif-fix.service'
undercloud# systemctl start bootif-fix

------

Comment 22 Dan Sneddon 2015-08-25 16:04:42 UTC
Here is the hacky workaround that Rhys suggested in ready to copy-and-paste form. Make sure you are logged in as root and copy and paste this blob:

#####################################
cat << EOF > /usr/bin/bootif-fix
#!/usr/bin/env bash

while true;
        do find /httpboot/ -type f ! -iname "kernel" ! -iname "ramdisk" ! -iname "*.kernel" ! -iname "*.ramdisk" -exec sed -i 's|{mac|{net0/mac|g' {} +;
done
EOF

chmod a+x /usr/bin/bootif-fix

cat << EOF > /usr/lib/systemd/system/bootif-fix.service
[Unit]
Description=Automated fix for incorrect iPXE BOOFIF

[Service]
Type=simple
ExecStart=/usr/bin/bootif-fix

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable bootif-fix
systemctl start bootif-fix

#######################################

Comment 23 Dan Sneddon 2015-09-01 07:05:41 UTC
(In reply to Dan Sneddon from comment #22)

The above hacky workaround does seem to work in the case of virt. I have not (yet) been able to confirm that this workaround works in bare metal. In my test (on Juniper EX4200 switches configured for LACP bonds) it did not work.

Comment 26 David Juran 2015-09-04 10:03:28 UTC
regarding the doc-text, systemd will not only run the script at boot, but rather in a tight loop...

Comment 27 Felipe Alfaro Solana 2015-10-02 12:04:30 UTC
Any progress on this?

Comment 28 Edu Alcaniz 2015-10-05 05:09:44 UTC
(In reply to Felipe Alfaro Solana from comment #27)
> Any progress on this?

Apply Comment 22 like work around

Comment 38 Felipe Alfaro Solana 2015-10-19 08:00:45 UTC
Any chance that manually patching ironic/drivers/modules/ipxe_config.template would help? I mean, instead of using the ugly "while true" loop?

Comment 39 Felipe Alfaro Solana 2015-10-19 08:06:05 UTC
(In reply to Edu Alcaniz from comment #28)
> (In reply to Felipe Alfaro Solana from comment #27)
> > Any progress on this?
> 
> Apply Comment 22 like work around

Yeah, I mean something that doesn't peg the CPU all the time :-)

Comment 40 Rhys Oxenham 2015-10-19 10:18:02 UTC
(In reply to Felipe Alfaro Solana from comment #39)
> (In reply to Edu Alcaniz from comment #28)
> > (In reply to Felipe Alfaro Solana from comment #27)
> > > Any progress on this?
> > 
> > Apply Comment 22 like work around
> 
> Yeah, I mean something that doesn't peg the CPU all the time :-)

Patching the ipxe template should work just fine too.

Comment 41 Mark Chappell 2015-10-19 11:51:03 UTC
I'm using the following:

    sed -i 's|${mac}|${net0/mac}|g' \
         /usr/share/instack-undercloud/ironic-discoverd/os-apply-config/httpboot/discoverd.ipxe \
         /usr/lib/python2.7/site-packages/ironic/drivers/modules/boot.ipxe \
         /usr/lib/python2.7/site-packages/ironic/drivers/modules/ipxe_config.template \
         /httpboot/*.ipxe

Comment 42 Dan Sneddon 2015-10-19 17:39:05 UTC
When we fix this bug (no matter how it gets fixed), we will need an option for choosing which NIC gets used. Most of the time, this will be net0, but it could just as easily be another NIC that should be used.

Comment 43 Lucas Alvares Gomes 2015-11-24 17:57:24 UTC
Hi,

If I get it correctly the problem is that the macro ${mac} in the iPXE template is pointing to the wrong mac address right? I've seem it and talked to iPXE developers about it, this problem has been fixed in iPXE since 2013 [1]

This has been fixed in Ironic as well [2], but the fix won't apply to discoverd since it uses a default iPXE configuration for booting all the machines.

I believe the right fix for this is just to use a new iPXE ROM, we should update the ipxe-bootimgs package (which is on the way I believe).

In the meantime, can you guys try fixing this by just replace the undionly.kpxe in the /tftpboot directory with a newer version?

You can get a pre-built one from http://boot.ipxe.org/undionly.kpxe

[1] http://git.ipxe.org/ipxe.git/commitdiff/66ea458
[2] https://bugs.launchpad.net/ironic/+bug/1504482

Comment 44 chris alfonso 2015-11-24 18:27:46 UTC
Rhys, Are you interested in trying this out?

Comment 46 chris alfonso 2015-12-07 17:36:12 UTC
Lucas, is this going to land for our next release based on liberty?

Comment 47 Lucas Alvares Gomes 2015-12-07 19:40:41 UTC
Hi @Chris,

The fix for this problem is in the ipxe-bootimgs package for RHEL/Centos which contains a old and buggy iPXE ROM, the BZ where the package rebase is being tracked is this one [0]. Once that's fixed this bug will go away.

The workarounds we currently have is to get a newer pre-built iPXE ROM from the fedora package or directly from the iPXE project [1] or build one locally [2]

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1267030
[1] http://boot.ipxe.org/undionly.kpxe
[2] http://ipxe.org/download#source_code

Comment 48 Robin Cernin 2015-12-17 10:49:32 UTC
Hi Lucas,

it was reported that pre-built one using the latest undionly.kpxe from boot.ipxe.org fails back to booting from localdisk with model Dell PowerEdge R720.

Comment 49 Dan Prince 2016-01-07 18:04:55 UTC
Perhaps I misunderstand something but would the mapping/orderings here work better if we used biosdevname (which should give predicable NIC ordering) in both the discovery ramdisk and the overcloud images? So long as NICs are ordered the same this should make things happier right?

Comment 50 Jason Montleon 2016-01-27 16:08:50 UTC
This is not fixed in virt environments using the ipxe-bootimgs-20150821-1.git4e03af8e.el7.noarch.rpm package from https://bugzilla.redhat.com/show_bug.cgi?id=1267030

It does not work with http://boot.ipxe.org/undionly.kpxe either.

The only way I've gotten it to work is with a slightly modified command from c41:
sed -i 's|${mac|${net0/mac|g' \
  /usr/share/instack-undercloud/ironic-discoverd/os-apply-config/httpboot/discoverd.ipxe \
  /usr/lib/python2.7/site-packages/ironic/drivers/modules/boot.ipxe \
  /usr/lib/python2.7/site-packages/ironic/drivers/modules/ipxe_config.template \
  /httpboot/*.ipxe

With the trailing } the line didn't get update in boot.ipxe (${mac:hexhyp}), and once passed discovery the installation fails because the vm's can't deploy.

Comment 51 Dmitry Tantsur 2016-02-03 14:34:49 UTC
Jason, could you please provide us an environment for testing it? I'd ask the iPXE guy to take a look.

Comment 52 Lucas Alvares Gomes 2016-02-03 14:52:04 UTC
Hi Jason
> This is not fixed in virt environments using the
> ipxe-bootimgs-20150821-1.git4e03af8e.el7.noarch.rpm package from
> https://bugzilla.redhat.com/show_bug.cgi?id=1267030
> 
> It does not work with http://boot.ipxe.org/undionly.kpxe either.
> 

Yes, that's expected for virtual environments. QEMU already have iPXE embedded in the NICs (installed by the "ipxe-roms-qemu" package) so, updating ipxe-bootimags or copying the one from http://boot.ipxe.org to the TFTP directory will not work*.

* It doesn't work because we will chainload to the ROM image from the TFTP directory *only* if the node booting is booting from a standard PXE image. If it already boots from an iPXE image, as it happens on virtual environments, there's no chainloading. The flow for *non-virtual* environments goes like this:

1. Node boots with standard PXE and issue a DHCPREQUEST
2. DHCP server detects it and instruct PXE to fetch the iPXE image from the TFTP server and load it
3. Node boots with iPXE and issue a DHCPREQUEST
4. DHCP server now sees the request is coming from iPXE and return the iPXE script to boot the kernel/ramdisk and proceed with the deployment.

For virtual enviroments, since it's already booting from iPXE, the boot starts at the step 3. So the newer iPXE will never be fetched from the TFTP directory.

> The only way I've gotten it to work is with a slightly modified command from
> c41:
> sed -i 's|${mac|${net0/mac|g' \
>  
> /usr/share/instack-undercloud/ironic-discoverd/os-apply-config/httpboot/
> discoverd.ipxe \
>   /usr/lib/python2.7/site-packages/ironic/drivers/modules/boot.ipxe \
>  
> /usr/lib/python2.7/site-packages/ironic/drivers/modules/ipxe_config.template
> \
>   /httpboot/*.ipxe
> 
> With the trailing } the line didn't get update in boot.ipxe (${mac:hexhyp}),
> and once passed discovery the installation fails because the vm's can't
> deploy.

Right, yeah, by doing that you are hardcoding the use of the first NIC (net0). That seems a valid way of workarounding this problem while we don't have any a new "ipxe-roms-qemu" package to solve the problem for virtual environments.

One thing tho... The Ironic services (not discoverd) allows the user to set a custom script via it's configuration file so you don't have to change the default ones that will be overwritten again if there's an update in the package (because realistically, that's something people will have to do to adapt their environment).

To do that, you can change two configuration options under /etc/ironic/ironic.conf

[pxe]
ipxe_boot_script=<path to your custom boot.ipxe script>
pxe_config_template=<path to your custom ipxe_config.template>

(restart the openstack-ironic-conductor service after the changes)

Unfortunately for discoverd you will need to customize it at the /httpboot directory directly.

Hope that helps,
Lucas

Comment 53 Hugh Brock 2016-02-03 15:49:07 UTC
Moving this to MODIFIED per Lucas -- the problem should be fixed on bare metal by the IPXE RPM we are now shipping with RHEL-OSP 8.

Comment 54 Jason Montleon 2016-02-09 21:48:11 UTC
Would an updated ipxe-roms-qemu package fix the problem with virt environments? 

I am more than happy to file bugs against RHEL and Fedora in order to get them updated sooner rather than later.

Comment 57 Marius Cornea 2016-02-25 14:55:30 UTC
Introspection completed ok on VMs with 2 nics.

Comment 58 Donny Davis 2016-02-27 06:51:11 UTC
Im running into the same issue on HW. One set of hardware runs through introspection fine, while the other does not. 

I have 3 supermicro servers with 6 nics
I have 10 dell m610's with 4 nics

The supermicro servers work fine
The dell servers boot from ipxe fine, however it tries to bring up the first nic, which is not the nic that is connected to the pxe network. I cannot complete introspection on the dell machines because they keep saying the network cannot be found

Comment 59 Marius Cornea 2016-02-27 10:09:27 UTC
Hi Donny,

What OSP-d version are you running? The initial report for this ticket was for a virtual environment and I wasn't able to reproduce it in version 8 with VMs with multiple nics. 

Can you also check BZ#1301694 and see if matches the issue you're seeing.This was another introspection timeout related issue that came up only on HW during the 7.3 testing. 

Thanks

Comment 62 errata-xmlrpc 2016-04-07 21:37:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0604.html

Comment 65 Alexandre Maumené 2016-07-25 07:52:50 UTC
Hi,

I'm actually hitting this issue with virtualized controllers with 2 virtio NIC. I'm using the last OSP-d 8. Let me know which kind of information you need to help debug (but please be aware that it's a customer deployment so I'll not be able to connect to this env at some point!)

By the way, I though the find/sed loop given was using to much resources and was going through unrelated files so I switch it to:

#!/usr/bin/env bash

while true;
        do find /httpboot/ -type f \( -name "boot.ipxe" -o -name "inspector.pxe" -o -name "config" \) -exec sed -i 's|{mac|{net0/mac|g' {} +;
done

In my current deployment the undercloud is hosted on a very slow server (customer's choice, not mine obviously) and with the previous loop the undercloud VM was getting stuck at some point.

Comment 66 Dmitry Tantsur 2016-08-19 08:17:24 UTC
*** Bug 1326481 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.