Red Hat Bugzilla – Bug 1293422
Over cloud provisioning network interface created with incorrect MAC address, suspect (old) FW issue
Last modified: 2016-11-01 20:36:44 EDT
Description of problem:
During overcloud deployment the overcloud nodes created with a broken provisioning (ControlPlane) network. It seems a new interface was created with a new MAC address (incorrect) for the provisioning network
Version-Release number of selected component (if applicable):
Beta2 on IBM x3550 M5 servers
Steps to Reproduce:
1. Deploy an overcloud
2. use IBM x3550 M5 servers with FW: DSA 10.0, IMM2 1.02, UEFI 1.03, Bootcode 1.38, Broadcom GigE 16.8.0
heat overcloud-create completes after a long time, but overcloud nodes are unusable because they have a broken control plane network
A completed usable overcloud deployment.
During testing I found deploying the Beta 2 overcloud to one group of servers succeeded and to another group failed with the same undercloud and network yaml definition. The servers are all the same vendor/model/network configuration. The only difference I am aware of is FW levels. The failing servers have an older firmware from ~12 months ago.
Servers: IBM x3550 M5
Server FW versions which worked:
FW: DSA 10.1, IMM2 1.72, UEFI 1.10, Bootcode NA, Broadcom GigE 220.127.116.11a
Server FW versions which didn’t work:
FW: DSA 10.0, IMM2 1.02, UEFI 1.03, Bootcode 1.38, Broadcom GigE 16.8.0
For the servers which failed, the problem was that a new interface had been created with a new MAC address for the provisioning network. It looks like the initial PXE load, OS load and some customization completed. but in the end it created a broken configuration for the Provisioning network. In addition I noticed another NIC which I had configured in my yamls to be ignored had been given a new interface name and was configured. I tried a solution to NIC enumeration issues I had seen on rhos-tech to modify the overcloud image but it did not correct the situation.
virt-customize -a overcloud-full.qcow2 --run-command "sed -i 's/net.ifnames=0 //g' /etc/default/grub"
virt-customize -a overcloud-full.qcow2 --run-command "grub2-mkconfig -o /boot/grub2/grub.cfg"
I’m not surprised that the solution can be FW sensitive in terms of PXE/Nics. I have a request into the lab to have all the servers brought up to the latest available FW.
My recommendation is that the documentation or release notes should advise that A. director solution can be sensitive to server FW versions and B. that as a best practice you should update your servers to the latest FW before starting an install.
I couldn’t find anything like that in the beta or 7.0 documentation
Upgrading the equipment to the latest fw appears to eliminate the incorrect MAC issue.
Deployment of non-HA and HA overcloud configurations succeed.
Could this be moved to a doc defect? i.e. we need a warning/recommendation to make sure you are using current FW when using OSP-D?
This bug did not make the OSP 8.0 release. It is being deferred to OSP 10.
Doc text was updated to indicate that a firmware upgrade is required for these particular servers.