Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1574672

Summary: Cleaning nodes in overcloud boots the discovery image.
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: openstack-ironic-inspectorAssignee: Harald Jensås <hjensas>
Status: CLOSED ERRATA QA Contact: mlammon
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: bfournie, hjensas, jschluet, mburns, racedoro, slinaber, srevivo
Target Milestone: rcKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ironic-inspector-7.2.1-0.20180409163360.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:55:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alexander Chuzhoy 2018-05-03 19:56:53 UTC
Cleaning nodes in overcloud  boots the discovery image.


Environment:
python2-ironicclient-2.2.0-1.el7ost.noarch
openstack-neutron-openvswitch-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
python-ironic-lib-2.12.1-1.el7ost.noarch
puppet-neutron-12.4.1-0.20180412211913.el7ost.noarch
puppet-ironic-12.4.0-0.20180329034302.8285d85.el7ost.noarch
openstack-neutron-ml2-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
openstack-ironic-common-10.1.2-3.el7ost.noarch
openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch
python-ironic-inspector-client-3.1.1-1.el7ost.noarch
instack-undercloud-8.4.1-3.el7ost.noarch
python-neutron-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
openstack-neutron-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
openstack-ironic-api-10.1.2-3.el7ost.noarch
openstack-ironic-inspector-7.2.1-0.20180409163359.2435d97.el7ost.noarch
openstack-neutron-common-12.0.2-0.20180421011358.0ec54fd.el7ost.noarch
python2-ironic-neutron-agent-1.0.0-1.el7ost.noarch
openstack-ironic-conductor-10.1.2-3.el7ost.noarch
python2-neutronclient-6.7.0-1.el7ost.noarch
python2-neutron-lib-1.13.0-1.el7ost.noarch


Steps to reproduce:
1. Deploy OC with ironic.
2. Attempt to clean BM instances in overcloud.


Result:
The instances boot the discovery image, instead of proper image for cleaning.

If you reboot the node right away - it'll pick up the right image.

Comment 1 Alexander Chuzhoy 2018-05-03 19:58:48 UTC
Note: 
ctlplane is 192.168.24.0
the network used for cleaning is also 192.168.24.0
The nic used for cleaning network is bridged to provisioning network.

Comment 2 Alexander Chuzhoy 2018-05-03 20:15:33 UTC
[stack@undercloud-0 ~]$ sudo iptables -S|grep ironic
-A INPUT -p tcp -m multiport --dports 6385,13385 -m state --state NEW -m comment --comment "135 ironic ipv4" -j ACCEPT
-A INPUT -p tcp -m multiport --dports 5050 -m state --state NEW -m comment --comment "137 ironic-inspector ipv4" -j ACCEPT

Comment 3 Alexander Chuzhoy 2018-05-03 20:18:52 UTC
[root@undercloud-0 ~]# ls /var/lib/ironic-inspector/dhcp-hostsdir/
52:54:00:2a:67:76  52:54:00:5e:c7:b5  52:54:00:74:27:c2  52:54:00:7f:f8:b8  52:54:00:8f:af:fe  52:54:00:be:59:d7  52:54:00:cb:c5:52  52:54:00:cc:57:1b  52:54:00:e1:04:a1  52:54:00:f6:1f:4f



(undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name         | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+
| abc92400-280e-42e4-a28a-effd26a302e1 | ceph-0       | d928abbe-7621-4aac-9fc3-5faadfc1d784 | power on    | active             | False       |
| 1e8ad13f-3a27-4bd6-bfdd-508a85ce1515 | ceph-1       | 1c57e49d-07d7-4b54-bc59-71e29a6aa702 | power on    | active             | False       |
| ecaa81b8-425b-487c-b4ab-ff2adad791a6 | ceph-2       | 2804207e-b746-4b32-ab20-525c1e114d21 | power on    | active             | False       |
| d4fd3e45-b9b2-4658-8aec-bc5f0d59c1f7 | compute-0    | 18e868b4-c5c0-4d94-ae90-5b968b307adc | power on    | active             | False       |
| df51d847-dbd2-40ba-a72e-76456de2ac93 | compute-1    | 3d8bc413-ecda-416d-abc8-ad0ede16774c | power on    | active             | False       |
| 02c044b9-1c10-46ab-b402-9a54f01661a0 | controller-0 | bd7dd5a3-38b8-4016-a952-77f1ebc45c39 | power on    | active             | False       |
| 00ffb38e-3901-4560-a8e3-4d45e6abc547 | controller-1 | ea92ebd2-5aab-4bb2-8c0f-57dd3bb7b596 | power on    | active             | False       |
| d701d7a1-8b74-4701-a1a2-31698f5f8afc | controller-2 | 20e8d6a5-2f65-4636-9c82-e677502c9c70 | power on    | active             | False       |
+--------------------------------------+--------------+--------------------------------------+-------------+--------------------+-------------+


sources stackrc:

for node in `openstack baremetal node list -f value -c Name`; do echo $node; openstack baremetal port list --node $node -f value -c Address; done
ceph-0
52:54:00:cb:c5:52
ceph-1
52:54:00:be:59:d7
ceph-2
52:54:00:cc:57:1b
compute-0
52:54:00:5e:c7:b5
compute-1
52:54:00:e1:04:a1
controller-0
52:54:00:8f:af:fe
controller-1
52:54:00:7f:f8:b8
controller-2
52:54:00:f6:1f:4f

sourced overcloudrc:

(overcloud) [stack@undercloud-0 ~]$ for node in `openstack baremetal node list -f value -c Name`; do echo $node; openstack baremetal port list --node $node -f value -c Address; done
ironic-0
52:54:00:2a:67:76
ironic-1
52:54:00:74:27:c2
(overcloud) [stack@undercloud-0 ~]$

Comment 5 Harald Jensås 2018-05-03 23:31:56 UTC
(In reply to Alexander Chuzhoy from comment #1)
> Note: 
> ctlplane is 192.168.24.0
> the network used for cleaning is also 192.168.24.0
> The nic used for cleaning network is bridged to provisioning network.

So what we are seeing is the overcloud's baremetal nodes are booting an image from the inspector service on the undercloud?

The interfaces used for Ironic in the overcloud cannot be on the same L2 network as the undercloud.

It is possible this got worse with the dnsmasq driver, but even with the iptables driver this would not be without issues. 

With the iptables driver:
* Undercloud cloud operator initiates introspection.
* Overcloud tenant initiates Ironic operation while undercloud introspection is running.

__Result: We have a race. If the undercloud DHCP server responds first, the overcloud tenant operation will fail.

With the dnsmasq driver:
* We no longer filter dhcp requests by default.
* No matter if the cloud operatator has initiates inspection or not.
* Overcloud tenant initiates Ironic operation while undercloud introspection is running.

__Result: We have a race. If the undercloud DHCP server responds first, the overcloud tenant operation will fail.

Comment 10 Alexander Chuzhoy 2018-05-11 21:21:55 UTC
Verified:
Version: openstack-ironic-inspector-7.2.1-0.20180409163360.el7ost.noarch

Was able to clean the nodes in overcloud.

Comment 12 errata-xmlrpc 2018-06-27 13:55:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086