Bug 1795402

Summary: Nova list --all-tenants | ERROR (ClientException): Unexpected API Error. | <type 'exceptions.TypeError'> (HTTP 500)
Product: Red Hat OpenStack Reporter: chrisbro <chrisbro>
Component: openstack-novaAssignee: Artom Lifshitz <alifshit>
Status: CLOSED INSUFFICIENT_DATA QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 13.0 (Queens)CC: alifshit, dasmith, eglynn, jhakimra, kchamart, sbauza, sgordon, vromanso
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-06 20:43:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description chrisbro@redhat.com 2020-01-27 21:34:03 UTC
Description of problem:
Nova list --all-tenants | ERROR (ClientException): Unexpected API Error. | <type 'exceptions.TypeError'> (HTTP 500)

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Running any openstack commands for the project in question leads to this error [1]
2.
3.

We found the cause of the error `instance_info_caches` set `network_info='[]'` was set to `null` in the database for some reason as such was pulling an error below [1].
~~~
[1]
nova list --all-tenants
ERROR (ClientException): Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
<type 'exceptions.TypeError'> (HTTP 500) (Request-ID: req-6e846215-3a14-45f9-ace3-ef1381712c03)
~~~

We Fixed this by updating the database 
  `update instance_info_caches set network_info='[]' where instance_uuid='xxx'`

After patching https://github.com/openstack/nova/blob/stable/queens/nova/api/openstack/common.py#L292 
We were able to see what was going on
~~~
def get_networks_for_instance_from_nw_info(nw_info):
    networks = collections.OrderedDict()
    for vif in nw_info:
        ips = vif.fixed_ips()
        floaters = vif.floating_ips()
        LOG.error("RHVIF: %(vif_output)s", {'vif_output': vif} )     # <<--- we added this line of code
        label = vif['network']['label']
        if label not in networks:
            networks[label] = {'ips': [], 'floating_ips': []}
        for ip in itertools.chain(ips, floaters):
            ip['mac_address'] = vif['address']
        networks[label]['ips'].extend(ips)
        networks[label]['floating_ips'].extend(floaters)
    return networks
~~~
The traceback we got after patching the code `nova/api/openstack/common.py`
  As you can see the `network: null,` below.

2020-01-24 05:38:16.288 28 ERROR nova.api.openstack.common [req-e5136a25-f874-4295-8bba-534b3239f716 c199d45777144446a5e6e94feb02a0a4 c9cd63468afb4ebfadedae10ea65eca7 - default default] RHVIF: {"profile": {}, "ovs_interfaceid": "4db8712e-fca1-42d3-b36c-9f3169ff19f3", "preserve_on_delete": false, "network": null, "devname": "tap4db8712e-fc", "vnic_type": "normal", "qbh_params": null, "meta": {}, "details": {"port_filter": true, "datapath_type": "system", "ovs_hybrid_plug": true}, "address": "fa:16:3e:e3:6e:2d", "active": false, "type": "ovs", "id": "4db8712e-fca1-42d3-b36c-9f3169ff19f3", "qbg_params": null}



Actual results:
Should not happen

Expected results:
OpenStack commands should run with this?

Additional info:

Comment 2 Artom Lifshitz 2020-01-30 18:08:31 UTC
While I can't speak to the root cause of instance_info_caches getting that Null value, I do know of a similar issue in bz 1703225. In that BZ, we implemented a fix to the periodic heal info cache job that allows it to properly recover from a corrupt/Null network_info by querying Neutron. Assuming Neutron has the correct information (ie, ports are still attached to their instances), the heal info cache job will rebuild instance_info_cache with the correct network_info.

Could you double check that:

1. openstack-nova-17.0.10-2.el7ost or later is what's in use in the environment - if not, please upgrade.
2. The value of the [DEFAULT]/heal_instance_info_cache_interval, and whether that job has had a chance to run and repair the info cache.

Thanks!