Bug 1795402 - Nova list --all-tenants | ERROR (ClientException): Unexpected API Error. | <type 'exceptions.TypeError'> (HTTP 500)
Summary: Nova list --all-tenants | ERROR (ClientException): Unexpected API Error. | <t...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Artom Lifshitz
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-01-27 21:34 UTC by chrisbro@redhat.com
Modified: 2023-03-24 16:51 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-02-06 20:43:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description chrisbro@redhat.com 2020-01-27 21:34:03 UTC
Description of problem:
Nova list --all-tenants | ERROR (ClientException): Unexpected API Error. | <type 'exceptions.TypeError'> (HTTP 500)

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Running any openstack commands for the project in question leads to this error [1]
2.
3.

We found the cause of the error `instance_info_caches` set `network_info='[]'` was set to `null` in the database for some reason as such was pulling an error below [1].
~~~
[1]
nova list --all-tenants
ERROR (ClientException): Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<type 'exceptions.TypeError'> (HTTP 500) (Request-ID: req-6e846215-3a14-45f9-ace3-ef1381712c03)
~~~

We Fixed this by updating the database 
  `update instance_info_caches set network_info='[]' where instance_uuid='xxx'`

After patching https://github.com/openstack/nova/blob/stable/queens/nova/api/openstack/common.py#L292 
We were able to see what was going on
~~~
def get_networks_for_instance_from_nw_info(nw_info):
    networks = collections.OrderedDict()
    for vif in nw_info:
        ips = vif.fixed_ips()
        floaters = vif.floating_ips()
        LOG.error("RHVIF: %(vif_output)s", {'vif_output': vif} )     # <<--- we added this line of code
        label = vif['network']['label']
        if label not in networks:
            networks[label] = {'ips': [], 'floating_ips': []}
        for ip in itertools.chain(ips, floaters):
            ip['mac_address'] = vif['address']
        networks[label]['ips'].extend(ips)
        networks[label]['floating_ips'].extend(floaters)
    return networks
~~~
The traceback we got after patching the code `nova/api/openstack/common.py`
  As you can see the `network: null,` below.

2020-01-24 05:38:16.288 28 ERROR nova.api.openstack.common [req-e5136a25-f874-4295-8bba-534b3239f716 c199d45777144446a5e6e94feb02a0a4 c9cd63468afb4ebfadedae10ea65eca7 - default default] RHVIF: {"profile": {}, "ovs_interfaceid": "4db8712e-fca1-42d3-b36c-9f3169ff19f3", "preserve_on_delete": false, "network": null, "devname": "tap4db8712e-fc", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": true}, "address": "fa:16:3e:e3:6e:2d", "active": false, "type": "ovs", "id": "4db8712e-fca1-42d3-b36c-9f3169ff19f3", "qbg_params": null}



Actual results:
Should not happen

Expected results:
OpenStack commands should run with this?

Additional info:

Comment 2 Artom Lifshitz 2020-01-30 18:08:31 UTC
While I can't speak to the root cause of instance_info_caches getting that Null value, I do know of a similar issue in bz 1703225. In that BZ, we implemented a fix to the periodic heal info cache job that allows it to properly recover from a corrupt/Null network_info by querying Neutron. Assuming Neutron has the correct information (ie, ports are still attached to their instances), the heal info cache job will rebuild instance_info_cache with the correct network_info.

Could you double check that:

1. openstack-nova-17.0.10-2.el7ost or later is what's in use in the environment - if not, please upgrade.
2. The value of the [DEFAULT]/heal_instance_info_cache_interval, and whether that job has had a chance to run and repair the info cache.

Thanks!


Note You need to log in before you can comment on or make changes to this bug.