Bug 1574672 - Cleaning nodes in overcloud boots the discovery image.
Summary: Cleaning nodes in overcloud boots the discovery image.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic-inspector
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 13.0 (Queens)
Assignee: Harald Jensås
QA Contact: mlammon
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-03 19:56 UTC by Alexander Chuzhoy
Modified: 2018-06-27 13:56 UTC (History)
7 users (show)

Fixed In Version: openstack-ironic-inspector-7.2.1-0.20180409163360.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-27 13:55:29 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 566682 'None' 'MERGED' 'PXE Filter dnsmasq: blacklist unknown host' 2019-12-06 06:10:46 UTC
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 13:56:49 UTC

Description Alexander Chuzhoy 2018-05-03 19:56:53 UTC
Cleaning nodes in overcloud  boots the discovery image.


Environment:
python2-ironicclient-2.2.0-1.el7ost.noarch
openstack-neutron-openvswitch-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
python-ironic-lib-2.12.1-1.el7ost.noarch
puppet-neutron-12.4.1-0.20180412211913.el7ost.noarch
puppet-ironic-12.4.0-0.20180329034302.8285d85.el7ost.noarch
openstack-neutron-ml2-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
openstack-ironic-common-10.1.2-3.el7ost.noarch
openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch
python-ironic-inspector-client-3.1.1-1.el7ost.noarch
instack-undercloud-8.4.1-3.el7ost.noarch
python-neutron-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
openstack-neutron-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
openstack-ironic-api-10.1.2-3.el7ost.noarch
openstack-ironic-inspector-7.2.1-0.20180409163359.2435d97.el7ost.noarch
openstack-neutron-common-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
python2-ironic-neutron-agent-1.0.0-1.el7ost.noarch
openstack-ironic-conductor-10.1.2-3.el7ost.noarch
python2-neutronclient-6.7.0-1.el7ost.noarch
python2-neutron-lib-1.13.0-1.el7ost.noarch


Steps to reproduce:
1. Deploy OC with ironic.
2. Attempt to clean BM instances in overcloud.


Result:
The instances boot the discovery image, instead of proper image for cleaning.

If you reboot the node right away - it'll pick up the right image.

Comment 1 Alexander Chuzhoy 2018-05-03 19:58:48 UTC
Note: 
ctlplane is 192.168.24.0
the network used for cleaning is also 192.168.24.0
The nic used for cleaning network is bridged to provisioning network.

Comment 2 Alexander Chuzhoy 2018-05-03 20:15:33 UTC
[stack@undercloud-0 ~]$ sudo iptables -S|grep ironic
-A INPUT -p tcp -m multiport --dports 6385,13385 -m state --state NEW -m comment --comment "135 ironic ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 5050 -m state --state NEW -m comment --comment "137 ironic-inspector ipv4" -j ACCEPT

Comment 3 Alexander Chuzhoy 2018-05-03 20:18:52 UTC
[root@undercloud-0 ~]# ls /var/lib/ironic-inspector/dhcp-hostsdir/
52:54:00:2a:67:76  52:54:00:5e:c7:b5  52:54:00:74:27:c2  52:54:00:7f:f8:b8  52:54:00:8f:af:fe  52:54:00:be:59:d7  52:54:00:cb:c5:52  52:54:00:cc:57:1b  52:54:00:e1:04:a1  52:54:00:f6:1f:4f



(undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| abc92400-280e-42e4-a28a-effd26a302e1 | ceph-0       | d928abbe-7621-4aac-9fc3-5faadfc1d784 | power on    | active             | False       |
| 1e8ad13f-3a27-4bd6-bfdd-508a85ce1515 | ceph-1       | 1c57e49d-07d7-4b54-bc59-71e29a6aa702 | power on    | active             | False       |
| ecaa81b8-425b-487c-b4ab-ff2adad791a6 | ceph-2       | 2804207e-b746-4b32-ab20-525c1e114d21 | power on    | active             | False       |
| d4fd3e45-b9b2-4658-8aec-bc5f0d59c1f7 | compute-0    | 18e868b4-c5c0-4d94-ae90-5b968b307adc | power on    | active             | False       |
| df51d847-dbd2-40ba-a72e-76456de2ac93 | compute-1    | 3d8bc413-ecda-416d-abc8-ad0ede16774c | power on    | active             | False       |
| 02c044b9-1c10-46ab-b402-9a54f01661a0 | controller-0 | bd7dd5a3-38b8-4016-a952-77f1ebc45c39 | power on    | active             | False       |
| 00ffb38e-3901-4560-a8e3-4d45e6abc547 | controller-1 | ea92ebd2-5aab-4bb2-8c0f-57dd3bb7b596 | power on    | active             | False       |
| d701d7a1-8b74-4701-a1a2-31698f5f8afc | controller-2 | 20e8d6a5-2f65-4636-9c82-e677502c9c70 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+


sources stackrc:

for node in `openstack baremetal node list -f value -c Name`; do echo $node; openstack baremetal port list --node $node -f value -c Address; done
ceph-0
52:54:00:cb:c5:52
ceph-1
52:54:00:be:59:d7
ceph-2
52:54:00:cc:57:1b
compute-0
52:54:00:5e:c7:b5
compute-1
52:54:00:e1:04:a1
controller-0
52:54:00:8f:af:fe
controller-1
52:54:00:7f:f8:b8
controller-2
52:54:00:f6:1f:4f

sourced overcloudrc:

(overcloud) [stack@undercloud-0 ~]$ for node in `openstack baremetal node list -f value -c Name`; do echo $node; openstack baremetal port list --node $node -f value -c Address; done
ironic-0
52:54:00:2a:67:76
ironic-1
52:54:00:74:27:c2
(overcloud) [stack@undercloud-0 ~]$

Comment 5 Harald Jensås 2018-05-03 23:31:56 UTC
(In reply to Alexander Chuzhoy from comment #1)
> Note: 
> ctlplane is 192.168.24.0
> the network used for cleaning is also 192.168.24.0
> The nic used for cleaning network is bridged to provisioning network.

So what we are seeing is the overcloud's baremetal nodes are booting an image from the inspector service on the undercloud?

The interfaces used for Ironic in the overcloud cannot be on the same L2 network as the undercloud.

It is possible this got worse with the dnsmasq driver, but even with the iptables driver this would not be without issues. 

With the iptables driver:
* Undercloud cloud operator initiates introspection.
* Overcloud tenant initiates Ironic operation while undercloud introspection is running.

__Result: We have a race. If the undercloud DHCP server responds first, the overcloud tenant operation will fail.

With the dnsmasq driver:
* We no longer filter dhcp requests by default.
* No matter if the cloud operatator has initiates inspection or not.
* Overcloud tenant initiates Ironic operation while undercloud introspection is running.

__Result: We have a race. If the undercloud DHCP server responds first, the overcloud tenant operation will fail.

Comment 10 Alexander Chuzhoy 2018-05-11 21:21:55 UTC
Verified:
Version: openstack-ironic-inspector-7.2.1-0.20180409163360.el7ost.noarch

Was able to clean the nodes in overcloud.

Comment 12 errata-xmlrpc 2018-06-27 13:55:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.