Bug 1575782

Summary: deleted node kept booting the introspection image
Product: Red Hat OpenStack Reporter: Alexander Chuzhoy <sasha>
Component: openstack-ironicAssignee: Harald Jensås <hjensas>
Status: CLOSED ERRATA QA Contact: mlammon
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: bfournie, hjensas, joflynn, jschluet, mburns, slinaber, srevivo
Target Milestone: z3Keywords: Rebase, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-ironic-10.1.6-1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1633746 (view as bug list) Environment:
Last Closed: 2018-11-13 22:14:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1633746    

Description Alexander Chuzhoy 2018-05-07 23:45:21 UTC
Deleting introspected nodes from ironic does not clean entries under /var/lib/ironic-inspector/dhcp-hostsdir/

Environment:
python2-ironicclient-2.2.0-1.el7ost.noarch
python-ironic-lib-2.12.1-1.el7ost.noarch
puppet-ironic-12.4.0-0.20180329034302.8285d85.el7ost.noarch
openstack-ironic-common-10.1.2-3.el7ost.noarch
openstack-ironic-staging-drivers-0.9.0-4.el7ost.noarch
python-ironic-inspector-client-3.1.1-1.el7ost.noarch
instack-undercloud-8.4.1-4.el7ost.noarch
openstack-ironic-api-10.1.2-3.el7ost.noarch
python2-ironic-neutron-agent-1.0.0-1.el7ost.noarch
openstack-ironic-conductor-10.1.2-3.el7ost.noarch
openstack-ironic-inspector-7.2.1-0.20180409163359.2435d97.el7ost.noarch

Steps to reproduce:
1. Import nodes in undercloud and introspect them
2. Delete one or more nodes from ironic.
3. List files under /var/lib/ironic-inspector/dhcp-hostsdir/

Result:
The files named after macs of removed nodes are still there.

Expected result:
The files named after macs of removed nodes should NOT be there.

Comment 1 Harald Jensås 2018-05-09 18:17:19 UTC
I have abandoned the proposed upstream fix, as it does not work.

Abandoned

*facepalm* Host records are only added dynamically. We would have to SIGHUP the process to capture the removal of an entry.

Since Ironic-Inspector does not spawn the dnsmasq service doing SIGHUP to reload does not seem feasable.


I think we have to close this with WONTFIX or CANTFIX?
The workaround is to purge the dhcp-hostsdir and reload the dnsmasq service.

Comment 3 Harald Jensås 2018-05-09 18:47:54 UTC
So the scenario:

- We add a node to ironic and introspects it.
- The baremtal port (MAC) is created for the node.
- the MAC is added to the blacklist
- We remove the node from ironic 

ISSUE: we cannot simply delete or trunkate the filter file. Because dnsmasq only adds records dynamically. And it does not keep any state for which record is in each file. So trunkating the file would leave the filter in dnsmasq until it is reloaded/restarted.

Options:

 a) We keep the nodes blacklisted.

Pro: Removed nodes would not boot intrsopection image.

Con: Adding the mac address would be required if re-enrolling the node in ironic. It is supported not to add the mac address, and have that discovered during introspection instead.

 b) We whitelist nodes not in ironic anymore. (*Curent behaviour*)

Pro: The node can be re-enrolled and introspected.
Con: Any that was enrolled and later removed from Ironic will boot the inspection image.


Neiter a) nor b) are good solutions, so we need to come up with option c) that actually works.

Comment 4 Harald Jensås 2018-05-09 23:44:54 UTC
I have restored the proposed patch upstream and re-worked it.

 * Blacklist all mac's no longer in Ironic when introspection is not active
 * Whitelist all mac's no longer in Ironic when introspection is active
 * Whitelist all mac's no longer in Ironic when node_not_found_hook is set

Comment 11 mlammon 2018-08-08 14:29:17 UTC
Installed latest osp13 (2018-08-07.4)

1. Import nodes in undercloud and introspect them
2. Delete one or more nodes from ironic.
3. List files under /var/lib/ironic-inspector/dhcp-hostsdir/

openstack baremetal introspection list
+--------------------------------------+---------------------+---------------------+-------+
| UUID                                 | Started at          | Finished at         | Error |
+--------------------------------------+---------------------+---------------------+-------+
| 766d0d26-ddf7-4285-be37-e066fee4f019 | 2018-08-07T23:32:56 | 2018-08-07T23:35:03 | None  |
| c3ded92e-2b31-4f89-aeb3-1c9c65835a33 | 2018-08-07T23:32:55 | 2018-08-07T23:34:59 | None  |
| 89bd7034-8e63-4391-abad-cad736741417 | 2018-08-07T23:32:55 | 2018-08-07T23:34:54 | None  |
| b3c98711-a4f3-4c43-b5a2-c11556296d25 | 2018-08-07T23:32:54 | 2018-08-07T23:34:49 | None  |
| f9348f4d-126f-4246-a076-367f301ab56f | 2018-08-07T23:32:53 | 2018-08-07T23:34:44 | None  |
| 68457ac3-c502-4539-8ae4-5180d3696ae0 | 2018-08-07T23:32:52 | 2018-08-07T23:34:38 | None  |
+--------------------------------------+---------------------+---------------------+-------+

deleted all nodes using openstack baremetal node delete <uuid>
 


I still see all the mac files as reported.  failedqa
ll /var/lib/ironic-inspector/dhcp-hostsdir/
total 36
-rw-r--r--. 1 ironic-inspector ironic-inspector 25 Aug  8 10:11 00:25:b5:02:a1:2f
-rw-r--r--. 1 ironic-inspector ironic-inspector 25 Aug  8 10:11 00:25:b5:02:a1:4f
-rw-r--r--. 1 ironic-inspector ironic-inspector 25 Aug  7 19:34 52:54:00:23:23:a5
-rw-r--r--. 1 ironic-inspector ironic-inspector 25 Aug  7 19:35 52:54:00:38:79:1b
-rw-r--r--. 1 ironic-inspector ironic-inspector 25 Aug  7 19:34 52:54:00:83:43:2d
-rw-r--r--. 1 ironic-inspector ironic-inspector 25 Aug  7 19:34 52:54:00:89:f1:e8
-rw-r--r--. 1 ironic-inspector ironic-inspector 25 Aug  7 19:35 52:54:00:d6:6b:1f
-rw-r--r--. 1 ironic-inspector ironic-inspector 25 Aug  7 19:34 52:54:00:f4:52:a0
-rw-r--r--. 1 ironic-inspector ironic-inspector 19 Aug  7 19:35 unknown_hosts_filter

Comment 12 Bob Fournier 2018-08-08 14:45:21 UTC
Note I believe that the fix for this bug is not to remove the entries under /var/lib/ironic-inspector/dhcp-hostsdir/ - they will stay, but to properly handle nodes that are removed and re-added.  I will let Harald comment, but I believe this should be retested.

Comment 13 Harald Jensås 2018-08-13 10:13:01 UTC
Bob is correct. We do not delete the files from dhcp-hostsdir. (Deleting the files would not cause the actual dhcp configuration in dnsmasq to change.)

What we do is blacklist the mac's of deleted nodes, unless introspection is active or discovery is enabled.

I belive the correct steps to test this is to:

1. Import nodes in undercloud and introspect them
2. Delete one or more nodes from ironic.
3. Boot one or more of the deleted nodes, and ensure they do not boot the inspection image.

Additionally:

4. Enable discovery
5. Ensure one of the deleted nodes can be discoverd

and:

6. Re-Import one or more of the deleted nodes
7. Ensure the node is successfully inspected again.

Comment 14 Bob Fournier 2018-08-13 11:26:43 UTC
Moving this back to ON_QA so it can be retested per Harald's Comment 13.

Comment 15 Joanne O'Flynn 2018-08-15 13:49:53 UTC
This bug is marked for inclusion in the errata but does not currently contain draft documentation text. To ensure the timely release of this advisory please provide draft documentation text for this bug as soon as possible.

If you do not think this bug requires errata documentation, set the requires_doc_text flag to "-".


To add draft documentation text:

* Select the documentation type from the "Doc Type" drop down field.

* A template will be provided in the "Doc Text" field based on the "Doc Type" value selected. Enter draft text in the "Doc Text" field.

Comment 16 Alexander Chuzhoy 2018-08-16 13:47:57 UTC
FailedQA
Environment:
openstack-ironic-inspector-7.2.1-2.el7ost.noarch


After adding 'enable_node_discovery = true' to undercloud.conf and re-running 'openstack undercloud install', inspectors dnsmasq service was not restarted.

auto discovery doesn't work.

Comment 17 Harald Jensås 2018-08-16 13:52:35 UTC
Essentially we hit:
  https://storyboard.openstack.org/#!/story/2002818

By default ther start/stop command is not configured.
But the option to purge the directory on start/stop of ironic inspector is.

Workarounds:

Option A: Set the start/stop commands
Option B: Set the purge option to False


We should change the defaults deployed for both undercloud and overcloud.



Note: Upstream story https://storyboard.openstack.org/#!/story/2002819 is also related. (If we had that we could have purged/deleted files immidiatly when deleting nodes.)

Comment 18 Harald Jensås 2018-08-16 14:40:44 UTC
Looks like we also need to set some sudo rules ...

Stderr: u'/usr/bin/ironic-inspector-rootwrap: Unauthorized command: systemctl start openstack-ironic-inspector-dnsmasq.service (no filter matched)\n'                                         │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector Traceback (most recent call last):                                                                                                       │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector   File "/usr/bin/ironic-inspector", line 10, in <module>                                                                                 │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector     sys.exit(main())                                                                                                                     │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector   File "/usr/lib/python2.7/site-packages/ironic_inspector/cmd/all.py", line 26, in main                                                  │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector     server.run()                                                                                                                         │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector   File "/usr/lib/python2.7/site-packages/ironic_inspector/wsgi_service.py", line 185, in run                                             │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector     self._init_host()                                                                                                                    │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector   File "/usr/lib/python2.7/site-packages/ironic_inspector/wsgi_service.py", line 120, in _init_host                                      │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector     driver.init_filter()                                                                                                                 │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector   File "/usr/lib/python2.7/site-packages/ironic_inspector/pxe_filter/base.py", line 81, in inner                                         │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector     return method(self, *args, **kwargs)                                                                                                 │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector   File "/usr/lib/python2.7/site-packages/ironic_inspector/pxe_filter/dnsmasq.py", line 141, in init_filter                               │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector     _execute(CONF.dnsmasq_pxe_filter.dnsmasq_start_command)                                                                              │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector   File "/usr/lib/python2.7/site-packages/ironic_inspector/pxe_filter/dnsmasq.py", line 313, in _execute                                  │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector     check_exit_code=not ignore_errors)                                                                                                   │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector   File "/usr/lib/python2.7/site-packages/oslo_concurrency/processutils.py", line 424, in execute                                         │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector     cmd=sanitized_cmd)                                                                                                                   │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector ProcessExecutionError: Unexpected error while running command.                                                                           │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector Command: sudo ironic-inspector-rootwrap /etc/ironic-inspector/rootwrap.conf systemctl start openstack-ironic-inspector-dnsmasq.service   │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector Exit code: 99                                                                                                                            │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector Stdout: u''                                                                                                                              │····················
2018-08-16 10:39:27.204 25148 ERROR ironic_inspector Stderr: u'/usr/bin/ironic-inspector-rootwrap: Unauthorized command: systemctl start openstack-ironic-inspector-dnsmasq.service (no filter│····················
 matched)\n'

Comment 20 Bob Fournier 2018-09-27 15:32:39 UTC
Looks 2nd set of patches have merged to stable/queens for both instack-undercloud and ironic-inspector, moving this back to POST.

Comment 27 mlammon 2018-11-05 17:34:04 UTC
(undercloud) [stack@undercloud-0 ~]$ cat /etc/yum.repos.d/latest-installed
13   -p 2018-10-24.1

Performed all steps in comment#13 and successfully can verify bug. Did not see any introspection image
after deletion with all variations.  

Environment:
python-ironic-lib-2.12.1-2.el7ost.noarch
openstack-ironic-api-10.1.6-1.el7ost.noarch
openstack-ironic-inspector-7.2.1-4.el7ost.noarch
python2-ironicclient-2.2.1-1.el7ost.noarch
puppet-ironic-12.4.0-3.el7ost.noarch
openstack-ironic-common-10.1.6-1.el7ost.noarch
openstack-ironic-staging-drivers-0.9.1-1.el7ost.noarch
python-ironic-inspector-client-3.1.1-1.el7ost.noarch
python2-ironic-neutron-agent-1.0.0-1.el7ost.noarch
openstack-ironic-conductor-10.1.6-1.el7ost.noarch

Comment 31 errata-xmlrpc 2018-11-13 22:14:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3605