Bug 1551075

Summary: Cleaning of BM nodes fails with - udp port tftp unreachable in - pxe boot fails - OSP12
Product: Red Hat OpenStack Reporter: Chris Janiszewski <cjanisze>
Component: openstack-ironicAssignee: RHOS Maint <rhos-maint>
Status: CLOSED DUPLICATE QA Contact: mlammon
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 12.0 (Pike)CC: bfournie, david.costakos, mburns, rhel-osp-director-maint, srevivo
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-02 18:02:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot of the failed pxe attempt none

Description Chris Janiszewski 2018-03-02 17:02:29 UTC
Description of problem:
Unable to finish cleaning up nodes for BMaaS (ironic) in Overcloud (OSP12 only).

Node that is being attempted to clean-up receives IP address from Neutron DHCP server, but then fails TFTP process. TCPdump on the controller interface used for baremetal cleaning network shows:
15:09:57.055682 IP 172.31.10.12.ah-esp-encap > chrisj-controller-0.external.home.lab.tftp:  30 RRQ "undionly.kpxe" octet tsize 0
15:09:57.055725 IP chrisj-controller-0.external.home.lab > 172.31.10.12: ICMP chrisj-controller-0.external.home.lab udp port tftp unreachable, length 66

For some reason cleaning process is trying to use tftp rather then http but also regular udp port 69 seems not to be open in either ironic-conductor container or controller itself
[root@chrisj-controller-0 ~]# docker exec f141f3dc3800 netstat -ulnp | grep 69
[root@chrisj-controller-0 ~]# netstat -ulnp | grep 69
[root@chrisj-controller-0 ~]# docker ps | grep ironic-conductor
f141f3dc3800        172.31.0.10:8787/rhosp12/openstack-ironic-conductor:12.0-20180124.1          "kolla_start"            21 hours ago        Up About an hour (healthy)                       ironic_conductor

Ironic has been deployed with following parameters:
  NovaSchedulerDefaultFilters:
    - RetryFilter
    - AggregateInstanceExtraSpecsFilter
    - AggregateMultiTenancyIsolation
    - AvailabilityZoneFilter
    - RamFilter
    - DiskFilter
    - ComputeFilter
    - ComputeCapabilitiesFilter
    - ImagePropertiesFilter

  IronicCleaningDiskErase: metadata   
  IronicIPXEEnabled: true   

  ServiceNetMap:
    IronicApiNetwork: external
    IronicNetwork: external

Cleaning network is a Provider network defined in Neutron and Native vlan on BM nodes. DHCP seems to be working fine.

This exact same setup worked on OSP11 and OSP10


Version-Release number of selected component (if applicable):
OSP12

How reproducible:
Everytime

Steps to Reproduce:
1. Deploy overcloud with ironic support
2. add BM nodes and update cleaning network parameter
3. attempt to clean up nodes with:
openstack baremetal node manage baremetal2
openstack baremetal node provide baremetal2


Actual results:
Cleaning fails .. not able to pxe boot

Expected results:
Cleaning complete

Additional info:
Attaching sosreport from the controller node (running this with single controller)

Comment 1 Chris Janiszewski 2018-03-02 17:06:02 UTC
Created attachment 1403104 [details]
screenshot of the failed pxe attempt

Comment 2 Chris Janiszewski 2018-03-02 17:55:23 UTC
So I was able to fix this issue by updating my overcloud with /usr/share/openstack-tripleo-heat-templates/environments/services-docker/ironic.yaml rather then -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml that I originally used.

There are really 2 bugs:
1. Update the docs - https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html/bare_metal_provisioning/sect-deploy

Section 3.1 and indicate a correct file to be invoked with overcloud deploy

2. Remove /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml from default templates (since it doesn't work)

Thanks

Comment 3 Bob Fournier 2018-03-02 18:02:28 UTC
Thanks Chris, this issue was just recently found and is being tracked here -https://bugzilla.redhat.com/show_bug.cgi?id=1549770

The fix is being tracked via that BZ so will make this one a duplicate.

*** This bug has been marked as a duplicate of bug 1549770 ***