Bug 1633287
| Summary: | node cleaning in undercloud occasionally fails for a subset of nodes. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||
| Component: | openstack-ironic | Assignee: | RHOS Maint <rhos-maint> | ||||
| Status: | CLOSED WONTFIX | QA Contact: | mlammon | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 14.0 (Rocky) | CC: | bfournie, derekh, dtantsur, hjensas, mburns | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2018-10-16 22:57:06 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 1487305 [details]
sosreports
It looks like node 1c5f5638-851d-4a6f-8017-620a15c46c7c was successfully introspected but then when cleaning was attempted we get these errors: ronic-conductor.log.1:2018-09-25 23:26:46.057 1 DEBUG ironic.conductor.manager [req-64a12ece-1b82-4ec8-8441-a845170a93ab a066e8e0c0044a37917bb2b7605a67ad 0d268b757aff4effbbeff6d5182a882e - default default] Starting manual cleaning for node 1c5f5638-851d-4a6f-8017-620a15c46c7c _do_node_clean /usr/lib/python2.7/site-packages/ironic/conductor/manager.py:1201 ironic-conductor.log.1:2018-09-25 23:26:50.764 1 DEBUG ironic.common.neutron [req-64a12ece-1b82-4ec8-8441-a845170a93ab a066e8e0c0044a37917bb2b7605a67ad 0d268b757aff4effbbeff6d5182a882e - default default] Removing ports on network b68ab7d3-2518-470b-a0b8-7f541e8dbaa2 on node 1c5f5638-851d-4a6f-8017-620a15c46c7c. remove_ports_from_network /usr/lib/python2.7/site-packages/ironic/common/neutron.py:305 ironic-conductor.log.1:2018-09-25 23:26:52.502 1 DEBUG ironic.common.neutron [req-64a12ece-1b82-4ec8-8441-a845170a93ab a066e8e0c0044a37917bb2b7605a67ad 0d268b757aff4effbbeff6d5182a882e - default default] No ports to remove for node 1c5f5638-851d-4a6f-8017-620a15c46c7c remove_neutron_ports /usr/lib/python2.7/site-packages/ironic/common/neutron.py:331 ironic-conductor.log.1:2018-09-25 23:26:52.502 1 INFO ironic.drivers.modules.network.flat [req-64a12ece-1b82-4ec8-8441-a845170a93ab a066e8e0c0044a37917bb2b7605a67ad 0d268b757aff4effbbeff6d5182a882e - default default] Adding cleaning network to node 1c5f5638-851d-4a6f-8017-620a15c46c7c ironic-conductor.log.1:2018-09-25 23:26:56.737 1 DEBUG ironic.common.neutron [req-64a12ece-1b82-4ec8-8441-a845170a93ab a066e8e0c0044a37917bb2b7605a67ad 0d268b757aff4effbbeff6d5182a882e - default default] For node 1c5f5638-851d-4a6f-8017-620a15c46c7c, creating neutron ports on network b68ab7d3-2518-470b-a0b8-7f541e8dbaa2 using FlatNetwork network interface. add_ports_to_network /usr/lib/python2.7/site-packages/ironic/common/neutron.py:217 ironic-conductor.log.1:2018-09-25 23:26:58.454 1 WARNING ironic.common.neutron [req-64a12ece-1b82-4ec8-8441-a845170a93ab a066e8e0c0044a37917bb2b7605a67ad 0d268b757aff4effbbeff6d5182a882e - default default] Could not create neutron port for node's 1c5f5638-851d-4a6f-8017-620a15c46c7c port 6840f692-1803-4930-a4cc-f3fa40075206 on the neutron network b68ab7d3-2518-470b-a0b8-7f541e8dbaa2. Host 1c5f5638-851d-4a6f-8017-620a15c46c7c is not connected to any segments on routed provider network 'b68ab7d3-2518-470b-a0b8-7f541e8dbaa2'. It should be connected to one. ironic-conductor.log.1:Neutron server returns request_ids: ['req-5e859a3f-8b88-47a2-9ea6-576cefc618bd']: Conflict: Host 1c5f5638-851d-4a6f-8017-620a15c46c7c is not connected to any segments on routed provider network 'b68ab7d3-2518-470b-a0b8-7f541e8dbaa2'. It should be connected to one. ironic-conductor.log. Followed by: 2018-09-25 23:27:00.127 1 DEBUG ironic.common.neutron [req-64a12ece-1b82-4ec8-8441-a845170a93ab a066e8e0c0044a37917bb2b7605a67ad 0d268b757aff4effbbeff6d5182a882e - default default] No ports to remove for node 1c5f5638-851d-4a6f-8017-620a15c46c7c remove_neutron_ports /usr/lib/python2.7/site-packages/ironic/common/neutron.py:331 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager [req-64a12ece-1b82-4ec8-8441-a845170a93ab a066e8e0c0044a37917bb2b7605a67ad 0d268b757aff4effbbeff6d5182a882e - default default] Failed to prepare node 1c5f5638-851d-4a6f-8017-620a15c46c7c for cleaning: Failed to create neutron ports for any PXE enabled port on node 1c5f5638-851d-4a6f-8017-620a15c46c7c.: NetworkError: Failed to create neutron ports for any PXE enabled port on node 1c5f5638-851d-4a6f-8017-620a15c46c7c. 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager Traceback (most recent call last): 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 1244, in _do_node_clean 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager prepare_result = task.driver.deploy.prepare_cleaning(task) 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 60, in wrapped 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager result = f(*args, **kwargs) 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 614, in prepare_cleaning 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager task, manage_boot=True) 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/deploy_utils.py", line 887, in prepare_inband_cleaning 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager task.driver.network.add_cleaning_network(task) 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/network/flat.py", line 119, in add_cleaning_network 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager task, self.get_cleaning_network_uuid(task)) 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager File "/usr/lib/python2.7/site-packages/ironic/common/neutron.py", line 277, in add_ports_to_network 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager "on node %s.") % node.uuid) 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager NetworkError: Failed to create neutron ports for any PXE enabled port on node 1c5f5638-851d-4a6f-8017-620a15c46c7c. 2018-09-25 23:27:00.127 1 ERROR ironic.conductor.manager Note that there are no neutron logs in the sosreport: ovpn-125-245:containers bfournie$ pwd /Users/bfournie/bug1633287/sosreport-undercloud.localdomain-20180926112915/var/log/containers ovpn-125-245:containers bfournie$ ls heat ironic mistral httpd ironic-inspector zaqar Potentially will be helped by adding delay after setting port's physical network. https://code.engineering.redhat.com/gerrit/#/c/151962/ The issue doesn't reproduce with a 30 sec delay added. Not sure if we want to document that or it's expected to have delays on real setups (non-automated deployment). WDYT re comment #6? WDYT re comment #6? Filed a doc bug to wait 30sec after setting ports: https://bugzilla.redhat.com/show_bug.cgi?id=1639924 No need for more info. Closing per comment #9. A 30 sec wait was needed. |
node cleaning in undercloud occasionally fails for a subset of nodes. Environment: python2-ironicclient-2.5.0-0.20180810135843.fb94fb8.el7ost.noarch python2-ironic-inspector-client-3.3.0-0.20180810080932.53bf4e8.el7ost.noarch instack-undercloud-9.3.1-0.20180831000258.e464799.el7ost.noarch puppet-ironic-13.3.1-0.20180831191239.61387eb.el7ost.noarch Steps to reproduce: 1. Introspect all nodes. 2. Try to clean all nodes with: openstack overcloud node clean --all-manageable Result: openstack overcloud node clean --all-manageable Waiting for messages on queue 'tripleo' with no timeout. Cleaned 7 node(s) openstack baremetal node provide ceph1-leaf0 The requested action "provide" can not be performed on node "1c5f5638-851d-4a6f-8017-620a15c46c7c" while it is in state "clean failed". (HTTP 400) (undercloud) [stack@undercloud ~]$ openstack baremetal node list +--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+ | 09bc03a0-e9e4-4b4d-b200-3270883c4812 | ceph0-leaf0 | None | power off | available | False | | 1c5f5638-851d-4a6f-8017-620a15c46c7c | ceph1-leaf0 | None | power off | clean failed | True | | 924f4d83-f199-47e1-aa06-9f9755c68b5b | compute-0-leaf0 | None | power off | clean failed | True | | 4b9c0106-9c79-48bb-972a-b55814632914 | ceph2-leaf1 | None | power off | manageable | False | | 660756af-1035-4556-9d3a-db72393b3692 | controller-0-leaf1 | None | power off | manageable | False | | 6ca2f3d9-b08d-4abe-8189-4ef82f93717a | controller-1-leaf1 | None | power off | manageable | False | | 56863c26-5791-48f6-a807-80795587b32d | controller-2-leaf1 | None | power off | manageable | False | +--------------------------------------+--------------------+---------------+-------------+--------------------+-------------+ The issue isn't persistent and sometimes all nodes are cleaned and sometimes only one is in failed state.