Description of problem: Unable to finish cleaning up nodes for BMaaS (ironic) in Overcloud (OSP12 only). Node that is being attempted to clean-up receives IP address from Neutron DHCP server, but then fails TFTP process. TCPdump on the controller interface used for baremetal cleaning network shows: 15:09:57.055682 IP 172.31.10.12.ah-esp-encap > chrisj-controller-0.external.home.lab.tftp: 30 RRQ "undionly.kpxe" octet tsize 0 15:09:57.055725 IP chrisj-controller-0.external.home.lab > 172.31.10.12: ICMP chrisj-controller-0.external.home.lab udp port tftp unreachable, length 66 For some reason cleaning process is trying to use tftp rather then http but also regular udp port 69 seems not to be open in either ironic-conductor container or controller itself [root@chrisj-controller-0 ~]# docker exec f141f3dc3800 netstat -ulnp | grep 69 [root@chrisj-controller-0 ~]# netstat -ulnp | grep 69 [root@chrisj-controller-0 ~]# docker ps | grep ironic-conductor f141f3dc3800 172.31.0.10:8787/rhosp12/openstack-ironic-conductor:12.0-20180124.1 "kolla_start" 21 hours ago Up About an hour (healthy) ironic_conductor Ironic has been deployed with following parameters: NovaSchedulerDefaultFilters: - RetryFilter - AggregateInstanceExtraSpecsFilter - AggregateMultiTenancyIsolation - AvailabilityZoneFilter - RamFilter - DiskFilter - ComputeFilter - ComputeCapabilitiesFilter - ImagePropertiesFilter IronicCleaningDiskErase: metadata IronicIPXEEnabled: true ServiceNetMap: IronicApiNetwork: external IronicNetwork: external Cleaning network is a Provider network defined in Neutron and Native vlan on BM nodes. DHCP seems to be working fine. This exact same setup worked on OSP11 and OSP10 Version-Release number of selected component (if applicable): OSP12 How reproducible: Everytime Steps to Reproduce: 1. Deploy overcloud with ironic support 2. add BM nodes and update cleaning network parameter 3. attempt to clean up nodes with: openstack baremetal node manage baremetal2 openstack baremetal node provide baremetal2 Actual results: Cleaning fails .. not able to pxe boot Expected results: Cleaning complete Additional info: Attaching sosreport from the controller node (running this with single controller)
Created attachment 1403104 [details] screenshot of the failed pxe attempt
So I was able to fix this issue by updating my overcloud with /usr/share/openstack-tripleo-heat-templates/environments/services-docker/ironic.yaml rather then -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml that I originally used. There are really 2 bugs: 1. Update the docs - https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html/bare_metal_provisioning/sect-deploy Section 3.1 and indicate a correct file to be invoked with overcloud deploy 2. Remove /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml from default templates (since it doesn't work) Thanks
Thanks Chris, this issue was just recently found and is being tracked here -https://bugzilla.redhat.com/show_bug.cgi?id=1549770 The fix is being tracked via that BZ so will make this one a duplicate. *** This bug has been marked as a duplicate of bug 1549770 ***