ironic node doesn't get scheduled: ERROR ironic.common.neutron MacAddressInUseClient: Unable to complete operation for network <UUID>. The mac address <MAC> is in use. Environment: puppet-nova-12.4.1-0.20180413152104.a054a18.el7ost.noarch openstack-ironic-common-10.1.2-2.el7ost.noarch python-nova-17.0.3-0.20180417162852.991c292.el7ost.noarch openstack-nova-compute-17.0.3-0.20180417162852.991c292.el7ost.noarch python2-ironic-neutron-agent-1.0.0-1.el7ost.noarch puppet-ironic-12.4.0-0.20180329034302.8285d85.el7ost.noarch openstack-ironic-conductor-10.1.2-2.el7ost.noarch python-ironic-inspector-client-3.1.1-1.el7ost.noarch instack-undercloud-8.4.0-4.el7ost.noarch python2-ironicclient-2.2.0-1.el7ost.noarch openstack-nova-scheduler-17.0.3-0.20180417162852.991c292.el7ost.noarch python-ironic-lib-2.12.0-1.el7ost.noarch openstack-nova-api-17.0.3-0.20180417162852.991c292.el7ost.noarch openstack-nova-conductor-17.0.3-0.20180417162852.991c292.el7ost.noarch openstack-ironic-staging-drivers-0.9.0-3.el7ost.noarch python-novaclient-9.1.1-1.el7ost.noarch openstack-ironic-api-10.1.2-2.el7ost.noarch openstack-ironic-inspector-7.2.1-0.20180409163359.2435d97.el7ost.noarch openstack-nova-common-17.0.3-0.20180417162852.991c292.el7ost.noarch openstack-nova-placement-api-17.0.3-0.20180417162852.991c292.el7ost.noarch Steps to reproduce: 1. Had all nodes imported and introspected with pxe_ipmitool driver. 2. Removed all nodes. 3. changed the driver in instackenv.json to ipmi 4. imported the nodes and introspected. 5. configured properties on nodes and flavor (as can be seen below). 6. Attempted an OC deployment. Result: Some nodes didn't get scheduled. In ironic-conductor log (attached), I see the following: 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron Traceback (most recent call last): 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron File "/usr/lib/python2.7/site-packages/ironic/common/neutron.py", line 151, in update_port_address 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron client.update_port(port_id, port_req_body) 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 799, in update_port 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron revision_number=revision_number) 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 2375, in _update_resource 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron return self.put(path, **kwargs) 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 363, in put 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron headers=headers, params=params) 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 331, in retry_request 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron headers=headers, params=params) 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 294, in do_request 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron self._handle_fault_response(status_code, replybody, resp) 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 269, in _handle_fault_response 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron exception_handler_v20(status_code, error_body) 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron File "/usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 93, in exception_handler_v20 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron request_ids=request_ids) 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron MacAddressInUseClient: Unable to complete operation for network 44bd180a-5344-47ad-8459-a47b5ad382d9. The mac address 52:54:00:1c:50:81 is in use. 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron Neutron server returns request_ids: ['req-002a446b-ab3d-43c1-9331-404c0f491fa9'] 2018-04-24 11:24:51.401 21530 ERROR ironic.common.neutron ###################################################### (undercloud) [root@undercloud-0 ~]# for node in `openstack baremetal node list -f value -c Name`; do echo $node; openstack baremetal node show $node|grep profile; done ceph-0 | properties | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'19', u'cpus': u'2', u'capabilities': u'profile:ceph,boot_option:local'} | ceph-1 | instance_info | {u'root_gb': u'17', u'display_name': u'overcloud-cephstorage-0', u'image_source': u'2e3b1d74-c2af-4f61-8890-95731de9c7d8', u'capabilities': u'{"profile": "ceph", "boot_option": "local"}', u'memory_mb': u'4096', u'vcpus': u'1', u'local_gb': u'19', u'configdrive': u'******', u'swap_mb': u'0', u'nova_host_id': u'undercloud-0.redhat.local'} | | properties | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'19', u'cpus': u'2', u'capabilities': u'profile:ceph,boot_option:local'} | ceph-2 | instance_info | {u'root_gb': u'17', u'display_name': u'overcloud-cephstorage-1', u'image_source': u'2e3b1d74-c2af-4f61-8890-95731de9c7d8', u'capabilities': u'{"profile": "ceph", "boot_option": "local"}', u'memory_mb': u'4096', u'vcpus': u'1', u'local_gb': u'19', u'configdrive': u'******', u'swap_mb': u'0', u'nova_host_id': u'undercloud-0.redhat.local'} | | properties | {u'memory_mb': u'4096', u'cpu_arch': u'x86_64', u'local_gb': u'19', u'cpus': u'2', u'capabilities': u'profile:ceph,boot_option:local'} | compute-0 | properties | {u'memory_mb': u'6144', u'cpu_arch': u'x86_64', u'local_gb': u'49', u'cpus': u'4', u'capabilities': u'profile:compute,boot_option:local'} | compute-1 | instance_info | {u'root_gb': u'47', u'display_name': u'overcloud-compute-0', u'image_source': u'2e3b1d74-c2af-4f61-8890-95731de9c7d8', u'capabilities': u'{"profile": "compute", "boot_option": "local"}', u'memory_mb': u'6144', u'vcpus': u'3', u'local_gb': u'49', u'configdrive': u'******', u'swap_mb': u'0', u'nova_host_id': u'undercloud-0.redhat.local'} | | properties | {u'memory_mb': u'6144', u'cpu_arch': u'x86_64', u'local_gb': u'49', u'cpus': u'4', u'capabilities': u'profile:compute,boot_option:local'} | controller-0 | instance_info | {u'root_gb': u'37', u'display_name': u'overcloud-controller-0', u'image_source': u'2e3b1d74-c2af-4f61-8890-95731de9c7d8', u'capabilities': u'{"profile": "controller", "boot_option": "local"}', u'memory_mb': u'32768', u'vcpus': u'7', u'local_gb': u'39', u'configdrive': u'******', u'swap_mb': u'0', u'nova_host_id': u'undercloud-0.redhat.local'} | | properties | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'8', u'capabilities': u'profile:controller,boot_option:local'} | controller-1 | instance_info | {u'root_gb': u'37', u'display_name': u'overcloud-controller-1', u'image_source': u'2e3b1d74-c2af-4f61-8890-95731de9c7d8', u'capabilities': u'{"profile": "controller", "boot_option": "local"}', u'memory_mb': u'32768', u'vcpus': u'7', u'local_gb': u'39', u'configdrive': u'******', u'swap_mb': u'0', u'nova_host_id': u'undercloud-0.redhat.local'} | | properties | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'8', u'capabilities': u'profile:controller,boot_option:local'} | controller-2 | instance_info | {u'root_gb': u'37', u'display_name': u'overcloud-controller-2', u'image_source': u'2e3b1d74-c2af-4f61-8890-95731de9c7d8', u'capabilities': u'{"profile": "controller", "boot_option": "local"}', u'memory_mb': u'32768', u'vcpus': u'7', u'local_gb': u'39', u'configdrive': u'******', u'swap_mb': u'0', u'nova_host_id': u'undercloud-0.redhat.local'} | | properties | {u'memory_mb': u'32768', u'cpu_arch': u'x86_64', u'local_gb': u'39', u'cpus': u'8', u'capabilities': u'profile:controller,boot_option:local'} | (undercloud) [root@undercloud-0 ~]# for flavor in ceph compute controller; do openstack flavor show $flavor; done +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | disk | 17 | | id | cd7a8c6b-4e04-46c2-99ef-7233d3decec9 | | name | ceph | | os-flavor-access:is_public | True | | properties | capabilities:boot_option='local', capabilities:profile='ceph', cpu_arch='x86_64', resources:CUSTOM_BAREMETAL='1', resources:DISK_GB='0', resources:MEMORY_MB='0', resources:VCPU='0' | | ram | 4096 | | rxtx_factor | 1.0 | | swap | | | vcpus | 1 | +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | disk | 47 | | id | 61bd3d2a-1457-4279-a7f9-6f3467bc7158 | | name | compute | | os-flavor-access:is_public | True | | properties | capabilities:boot_option='local', capabilities:profile='compute', cpu_arch='x86_64', resources:CUSTOM_BAREMETAL='1', resources:DISK_GB='0', resources:MEMORY_MB='0', resources:VCPU='0' | | ram | 6144 | | rxtx_factor | 1.0 | | swap | | | vcpus | 3 | +----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | disk | 37 | | id | b9aa5aba-55a2-4e0c-8294-de94a3f0c879 | | name | controller | | os-flavor-access:is_public | True | | properties | capabilities:boot_option='local', capabilities:profile='controller', cpu_arch='x86_64', resources:CUSTOM_BAREMETAL='1', resources:DISK_GB='0', resources:MEMORY_MB='0', resources:VCPU='0' | | ram | 32768 | | rxtx_factor | 1.0 | | swap | | | vcpus | 7 | +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Created attachment 1426156 [details] nova and ironic logs
Initial instance deploy failed due to a power failed, right after the vif was attached to the port in neutron, Investigating if we're failing to clean up or if we have some sort of race where we are marking resources free prior to cleanup finishing. If power failure occurs, likely some odd state can be encountered with deployments. 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [req-baf82681-285f-4b6b-b4d4-547368f338d5 4534df2deffe444bb1a8935133ce60c3 379c5d8a6de74116a83dbc1fe8fa3f99 - default default] [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] Instance failed to spawn: InstanceDeployFailure: Failed to provision instance 5ad9782a-2484-40cf-9019-af6a5b76d648: Failed to change power state to 'power on' by 'power on'. Error: IPMI call failed: power on. 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] Traceback (most recent call last): 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2248, in _build_resources 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] yield resources 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2031, in _build_and_run_instance 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] block_device_info=block_device_info) 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1131, in spawn 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] 'node': node_uuid}) 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] self.force_reraise() 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] six.reraise(self.type_, self.value, self.tb) 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1123, in spawn 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] timer.start(interval=CONF.ironic.api_retry_interval).wait() 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] return hubs.get_hub().switch() 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] return self.greenlet.switch() 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] File "/usr/lib/python2.7/site-packages/oslo_service/loopingcall.py", line 137, in _run_loop 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] result = func(*self.args, **self.kw) 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 505, in _wait_for_active 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] raise exception.InstanceDeployFailure(msg) 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] InstanceDeployFailure: Failed to provision instance 5ad9782a-2484-40cf-9019-af6a5b76d648: Failed to change power state to 'power on' by 'power on'. Error: IPMI call failed: power on. 2018-04-24 11:23:56.986 25823 ERROR nova.compute.manager [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] 2018-04-24 11:23:56.993 25823 INFO nova.compute.manager [req-baf82681-285f-4b6b-b4d4-547368f338d5 4534df2deffe444bb1a8935133ce60c3 379c5d8a6de74116a83dbc1fe8fa3f99 - default default] [instance: 5ad9782a-2484-40cf-9019-af6a5b76d648] Terminating instance
So the root cause is actually that the old port is never deleted by heat. I've summarized this at https://bugs.launchpad.net/heat/+bug/1766763
I posted https://review.openstack.org/564104 which might address this issue.
*** Bug 1572165 has been marked as a duplicate of this bug. ***
Removed heat patch (https://review.openstack.org/564104) and added virtualbmc return code patch to allow for ipmitool retry.
*** Bug 1571092 has been marked as a duplicate of this bug. ***
I'm seeing a lot of these failures two, raised https://bugzilla.redhat.com/show_bug.cgi?id=1576073 which will be closed if the proposed fix is sorting it out - I'm going to test it
Hmm, it feels like this is a regression in libvirt, not something we can/should fix in vbmc. It does not really do anything fancy, just start/stop VMs. We can add retries, but it will hide problem, not fix it. The next problem will be that power operations start taking minutes, leading to various timeouts.
Here is bug tracking libvirt issue - https://bugzilla.redhat.com/show_bug.cgi?id=1576464
*** Bug 1576073 has been marked as a duplicate of this bug. ***
*** Bug 1568578 has been marked as a duplicate of this bug. ***
*** Bug 1549571 has been marked as a duplicate of this bug. ***
Can we close this as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1576464 ?
it's not my area of expertise but I'd say the 1576464 is fixing a particular type of issues in libvirt while this (1571384) is about to fix vbmc handling issues like 1576464 or any other by, at least, retrying a few times if that's not what you want/plan to do so then yes this (1571384) looks like a duplicate of 1576464
> Can we close this as a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1576464 Yes, that's the plan (just waiting on final confirmation from Sasha + Mike). Waldemar - the retries will now be handled farther up the stack in ironic, see https://bugzilla.redhat.com/show_bug.cgi?id=1564918.
We've had multiple test cases in which the fix for the RHEL libvirt bz https://bugzilla.redhat.com/show_bug.cgi?id=1576464 has resolved the issue. Marking this as a duplicate. *** This bug has been marked as a duplicate of bug 1576464 ***
The 7.5 fix for this bug is tracked here https://bugzilla.redhat.com/show_bug.cgi?id=1581364 *** This bug has been marked as a duplicate of bug 1581364 ***