Bug 1551075 - Cleaning of BM nodes fails with - udp port tftp unreachable in - pxe boot fails - OSP12
Summary: Cleaning of BM nodes fails with - udp port tftp unreachable in - pxe boot fai...
Keywords:
Status: CLOSED DUPLICATE of bug 1549770
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: mlammon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-02 17:02 UTC by Chris Janiszewski
Modified: 2018-03-02 18:02 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-02 18:02:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
screenshot of the failed pxe attempt (22.82 KB, image/png)
2018-03-02 17:06 UTC, Chris Janiszewski
no flags Details

Description Chris Janiszewski 2018-03-02 17:02:29 UTC
Description of problem:
Unable to finish cleaning up nodes for BMaaS (ironic) in Overcloud (OSP12 only).

Node that is being attempted to clean-up receives IP address from Neutron DHCP server, but then fails TFTP process. TCPdump on the controller interface used for baremetal cleaning network shows:
15:09:57.055682 IP 172.31.10.12.ah-esp-encap > chrisj-controller-0.external.home.lab.tftp:  30 RRQ "undionly.kpxe" octet tsize 0
15:09:57.055725 IP chrisj-controller-0.external.home.lab > 172.31.10.12: ICMP chrisj-controller-0.external.home.lab udp port tftp unreachable, length 66

For some reason cleaning process is trying to use tftp rather then http but also regular udp port 69 seems not to be open in either ironic-conductor container or controller itself
[root@chrisj-controller-0 ~]# docker exec f141f3dc3800 netstat -ulnp | grep 69
[root@chrisj-controller-0 ~]# netstat -ulnp | grep 69
[root@chrisj-controller-0 ~]# docker ps | grep ironic-conductor
f141f3dc3800        172.31.0.10:8787/rhosp12/openstack-ironic-conductor:12.0-20180124.1          "kolla_start"            21 hours ago        Up About an hour (healthy)                       ironic_conductor

Ironic has been deployed with following parameters:
  NovaSchedulerDefaultFilters:
    - RetryFilter
    - AggregateInstanceExtraSpecsFilter
    - AggregateMultiTenancyIsolation
    - AvailabilityZoneFilter
    - RamFilter
    - DiskFilter
    - ComputeFilter
    - ComputeCapabilitiesFilter
    - ImagePropertiesFilter

  IronicCleaningDiskErase: metadata   
  IronicIPXEEnabled: true   

  ServiceNetMap:
    IronicApiNetwork: external
    IronicNetwork: external

Cleaning network is a Provider network defined in Neutron and Native vlan on BM nodes. DHCP seems to be working fine.

This exact same setup worked on OSP11 and OSP10


Version-Release number of selected component (if applicable):
OSP12

How reproducible:
Everytime

Steps to Reproduce:
1. Deploy overcloud with ironic support
2. add BM nodes and update cleaning network parameter
3. attempt to clean up nodes with:
openstack baremetal node manage baremetal2
openstack baremetal node provide baremetal2


Actual results:
Cleaning fails .. not able to pxe boot

Expected results:
Cleaning complete

Additional info:
Attaching sosreport from the controller node (running this with single controller)

Comment 1 Chris Janiszewski 2018-03-02 17:06:02 UTC
Created attachment 1403104 [details]
screenshot of the failed pxe attempt

Comment 2 Chris Janiszewski 2018-03-02 17:55:23 UTC
So I was able to fix this issue by updating my overcloud with /usr/share/openstack-tripleo-heat-templates/environments/services-docker/ironic.yaml rather then -e /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml that I originally used.

There are really 2 bugs:
1. Update the docs - https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html/bare_metal_provisioning/sect-deploy

Section 3.1 and indicate a correct file to be invoked with overcloud deploy

2. Remove /usr/share/openstack-tripleo-heat-templates/environments/services/ironic.yaml from default templates (since it doesn't work)

Thanks

Comment 3 Bob Fournier 2018-03-02 18:02:28 UTC
Thanks Chris, this issue was just recently found and is being tracked here -https://bugzilla.redhat.com/show_bug.cgi?id=1549770

The fix is being tracked via that BZ so will make this one a duplicate.

*** This bug has been marked as a duplicate of bug 1549770 ***


Note You need to log in before you can comment on or make changes to this bug.