Created attachment 1277914 [details] sos report Description of problem: After deleting compute node from overcloud the vif_port_id remains in the node port details Version-Release number of selected component (if applicable): How reproducible: 1. Install Newton undercloud and overcloud with stand along networker (composable role) 2.upgrade theundercloud to Ocata 3.scale down - delete compute node from overcloud 4. scale up - add compute node to overcloud Steps to Reproduce: 1. 2. 3. Actual results: The scale up fails with error "No valid host was found. There are not enough hosts available" If you check `openstack baremetal node list` the node used for compute is available (sos report is attached) Expected results: Additional info:
The work around to re use the node it to manually delete the vif port from ironic node.
I suspect the upgrade is the key to this problem. We probably fail to clean up what was created by an older version.
I've noticed that Nova logs are missing from the sosreport. Could you please fetch them too?
Created attachment 1279348 [details] new ironic and nova logs New logs are attache, in this case the node in question is compute-1 UUID 0a68cec2-09ca-4c8d-aa89-7ebddccc8a7f Instance UUID c1209a8b-6172-4095-872d-889845402066
Created attachment 1279350 [details] new nova log
From the neutron logs it looks like we're getting a crash in openvswitch. neutron/openvswitch-agent.log 2017-05-11 08:37:49.124 21629 ERROR neutron.agent.linux.async_process [-] Process [ovsdb-client monitor Interface name,ofport,external_ids --format=json] dies due to the error: None 2017-05-11 08:37:49.131 21629 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 545, in close self.uninstantiate(app_name) File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 528, in uninstantiate app = self.applications.pop(name) KeyError: 'ofctl_service' Following that we're getting these timeout errors when processing the VIF ports, which I assume is why the vif_port_id is not removed 2017-05-11 08:45:28.734 14051 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-8b0f24b8-f69f-4c3f-8f7d-69b5fc94f41f - - - - -] Error while processing VIF ports 2017-05-11 08:47:33.752 14051 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-8b0f24b8-f69f-4c3f-8f7d-69b5fc94f41f - - - - -] Error while processing VIF ports 2017-05-11 08:51:40.724 14051 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-8b0f24b8-f69f-4c3f-8f7d-69b5fc94f41f - - - - -] Error while processing VIF ports And finally: 2017-05-11 08:51:40.740 14051 ERROR neutron.agent.linux.async_process [-] Error received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: None 2017-05-11 08:51:40.741 14051 ERROR neutron.agent.linux.async_process [-] Process [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json] dies due to the error: None 2017-05-11 08:51:40.827 14051 INFO oslo_rootwrap.client [-] Stopping rootwrap daemon process with pid=14120
Because the VIF has not been detached, Ironic ends up failing the attach: 2017-05-11 08:51:18.615 23418 DEBUG wsme.api [req-4a10c1ec-26cc-4a95-a5c3-ee04ac1c1a90 37f6c36867574db78b3f104aa70ab1ff 0727b18a0b6e48bba49fa657877187c9 - - -] Client-side error: Unable to attach VIF because VIF d54874a8-1eb7-4609-b940-e39d4c7ad5c7 is already attached to Ironic Port 60bbaa52-4ad9-48c7-b466-1a8cd67972dc Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 218, in inner return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 2546, in vif_attach task.driver.network.vif_attach(task, vif_info) File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/network/common.py", line 289, in vif_attach port_like_obj = get_free_port_like_object(task, vif_id) File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/network/common.py", line 124, in get_free_port_like_object free_portgroups, free_ports = _get_free_portgroups_and_ports(task, vif_id) File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/network/common.py", line 81, in _get_free_portgroups_and_ports if _vif_attached(p, vif_id): File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/network/common.py", line 52, in _vif_attached vif=vif_id, object_uuid=port_like_obj.uuid) VifAlreadyAttached: Unable to attach VIF because VIF d54874a8-1eb7-4609-b940-e39d4c7ad5c7 is already attached to Ironic Port 60bbaa52-4ad9-48c7-b466-1a8cd67972dc format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222 I'd like to see if the Neutron team can take a look at the neutron errors in Comment 6. It looks like Ironic attempted to detach the port but the detach failed in Neutron. This may be an issue that has been resolved, although I could not find the exact signature in the bug list.
Hi Raviv, The sosreport shows 0 bytes for var/log/neutron/server.log. We need either a complete sosreport with all Neutron logs or access to a reproducing machine.
As we weren't provided any logs and we have nobody to ask for it, I'm closing this bug. If there is still somebody testing this scenario, please feel free to re-open.