Description of problem: A simple installation of overcloud ironic with the ctlplane network used for provisioning baremetal nodes Trying to clean the nodes, they boot up into http://ipxe.org/err/040ee1 and remain there indefinitely in "clean wait": (overcloud) [stack@undercloud-0 ~]$ openstack baremetal node list +--------------------------------------+----------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+----------+---------------+-------------+--------------------+-------------+ | 6d0f3803-d1e5-4565-a8bc-4bf92a5cb1db | ironic-1 | None | power on | clean wait | False | | d5471d52-bb98-4919-9729-aa6337c10aca | ironic-0 | None | power on | clean wait | False | +--------------------------------------+----------+---------------+-------------+--------------------+-------------+ On the controller: [root@controller-0 ~]# tcpdump -n port 67 and port 68 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 19:02:00.970424 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:64:88:48, length 396 19:02:04.910861 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:64:88:48, length 396 19:02:12.819908 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:64:88:48, length 396 19:02:28.638163 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:64:88:48, length 396 19:02:55.939528 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:ca:34:c4, length 396 19:02:59.882079 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:ca:34:c4, length 396 19:03:07.791158 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:ca:34:c4, length 396 19:03:23.609301 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:ca:34:c4, length 396 Version-Release number of selected component (if applicable): python-ironic-lib-2.9.0-0.20170821163713.dcc5a47.el7ost.noarch openstack-ironic-common-9.1.1-0.20170824135903.d783dff.el7ost.noarch openstack-ironic-api-9.1.1-0.20170824135903.d783dff.el7ost.noarch openstack-ironic-inspector-6.0.1-0.20170824132804.0e72dcb.el7ost.noarch puppet-ironic-11.3.1-0.20170825175845.407b7d8.el7ost.noarch python-ironicclient-1.16.0-0.20170821151022.835c5d4.el7ost.noarch python-ironic-inspector-client-2.0.0-0.20170814165407.0ccc767.el7ost.noarch openstack-ironic-conductor-9.1.1-0.20170824135903.d783dff.el7ost.noarch How reproducible: always Steps to Reproduce: 1. deploy with oc ironic enabled 2. try to clean a node 3. Actual results: failed as described above Expected results: clean should work Additional info:
This looks like its not a bug, just a configuration issue with the virtual network that the baremetal node is using. We see DHCP requests received at the controller eth0 interface which is attached to the ctlplane. We do not see DHCP requests received at the qdhcp namespace which is on the baremetal network. From the virt-host we can see the following: This is the interface that the BM host is using, it only uses the "data" network which corresponds to the ctlplane [root@sealusa5 ~]# virsh domiflist ironic-0 Interface Type Source Model MAC ------------------------------------------------------- vnet12 network data virtio 52:54:00:64:88:48 From the controller and undercloud-0, we can see the "data" network maps to eth0 which is on the ctlplane: [root@sealusa5 ~]# virsh domiflist controller-0 Interface Type Source Model MAC ------------------------------------------------------- vnet6 network data virtio 52:54:00:ab:01:d3 <== eth0 vnet8 network management virtio 52:54:00:47:e6:2e <== eth1 & br-isolated vnet10 network external virtio 52:54:00:a9:fc:7a <== eth2 & br-ex 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 52:54:00:ab:01:d3 brd ff:ff:ff:ff:ff:ff inet 192.168.24.10/24 brd 192.168.24.255 scope global eth0 [root@sealusa5 ~]# virsh domiflist undercloud-0 Interface Type Source Model MAC ------------------------------------------------------- vnet0 network data virtio 52:54:00:0f:4d:2b <== eth0 & br-ctlplane vnet1 network management virtio 52:54:00:a5:0f:66 <== eth1 vnet2 network external virtio 52:54:00:f0:02:5a <== eth2 8: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000 link/ether 52:54:00:0f:4d:2b brd ff:ff:ff:ff:ff:ff inet 192.168.24.1/24 brd 192.168.24.255 scope global br-ctlplane So in order for ironic-0 to access the baremetal network it should be using eth1 and the management network. That will allow it to reach tap48ce1bef-b0 on br-int and should therefore get a DHCP response back. Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "tap48ce1bef-b0" tag: 1 Interface "tap48ce1bef-b0" type: internal Port br-int Interface br-int type: internal Port int-br-isolated Interface int-br-isolated type: patch options: {peer=phy-br-isolated} Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex}
I'm closing this for now as the reason the BM node is not getting a DHCP response and logging the ipxe code 040ee119 is because its sending DHCP requests on the ctlplane, not baremetal network. This is due to the infrared virtual network setup. I'd like to understand more how the infrared virsh setup is done for virt networks so we can get cleaning to complete but I don't see evidence of this being an Ironic bug. We can revisit this once virt networks are changed. BTW, including for reference the infrared virtual networking setup Dan had sent: http://infrared.readthedocs.io/en/latest/virsh.html#network-layout