Bug 1887866

Summary: OVN: agent list command fails after OSP16.1 update
Product: Red Hat OpenStack Reporter: Eduardo Olivares <eolivare>
Component: python-networking-ovnAssignee: Jakub Libosvar <jlibosva>
Status: CLOSED DUPLICATE QA Contact: Eran Kuris <ekuris>
Severity: high Docs Contact:
Priority: medium    
Version: 16.1 (Train)CC: apevec, averi, bretm, ffernand, jlibosva, lhh, majopela, scohen
Target Milestone: z4Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-12-22 19:29:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eduardo Olivares 2020-10-13 13:47:56 UTC
Description of problem:
After an OSP16.1 update (the update was performed from RHOS-16.1-RHEL-8-20200903.n.0 to RHOS-16.1-RHEL-8-20201007.n.0), controller-1 acts as master haproxy and controller-0 hosts master OVN DBs. Nodes have not been rebooted (this is different from the scenario from BZ1885592).

Creation of network resources does not fail (this is different from BZ1885592 too). Network creation and deletion was performed multiple times successfully. However, openstack network agent list command fails 1/3 times. This is the failure message shown:
(overcloud) [stack@undercloud-0 ~]$ openstack network agent list
+--------------------------------------+------------------------------+---------------------------+-------------------+-------+-------+----------------+
| ID                                   | Agent Type                   | Host                      | Availability Zone | Alive | State | Binary         |
+--------------------------------------+------------------------------+---------------------------+-------------------+-------+-------+----------------+
| 0866d9c3-2e61-4a49-8585-cd25ca7e59bf | OVN Controller Gateway agent | controller-1.redhat.local |                   | XXX   | UP    | ovn-controller |
| 310c4d8e-7e70-42dd-b7e6-09c40f9b5c1c | OVN Controller Gateway agent | controller-2.redhat.local |                   | :-)   | UP    | ovn-controller |
| 7aab32fb-5001-49be-9a50-5c5e1856855d | OVN Controller agent         | compute-1.redhat.local    |                   | :-)   | UP    | ovn-controller |
| 1280a5b6-c18e-4cf9-a415-003effafca1c | OVN Controller Gateway agent | controller-0.redhat.local |                   | XXX   | UP    | ovn-controller |
| ccf0ef5f-8258-4f1f-82ab-b1497522a816 | OVN Controller agent         | compute-0.redhat.local    |                   | :-)   | UP    | ovn-controller |
+--------------------------------------+------------------------------+---------------------------+-------------------+-------+-------+----------------+



Logs show errors happen when controller-0's neutron-api handles the agent list request. 
The following errors are shown only on controller-0:
2020-10-08 16:41:15.172 34 WARNING neutron.db.agents_db [req-05155849-657f-4cd5-b9f5-2db4061cc246 - - - - -] Agent healthcheck: found 2 dead agents out of 5:
                Type       Last heartbeat host
OVN Controller Gateway agent 2020-10-08 16:41:15.167866 controller-1.redhat.local
OVN Controller Gateway agent 2020-10-08 16:41:15.170482 controller-0.redhat.local


pcs status shows no issues with OVN DBs
containers status is up
According to neutron-api logs, all processes were successfully connected to Neutron DBs after the OSP update.





Version-Release number of selected component (if applicable):
the update was performed from RHOS-16.1-RHEL-8-20200903.n.0 to RHOS-16.1-RHEL-8-20201007.n.0

How reproducible:
only tried once and it failed


Steps to Reproduce:
1. perform and osp16.1 update. After the overcloud update is completed, run 'openstack network agent list' several times (it failed 1/3 times)

Comment 8 Jakub Libosvar 2021-12-22 19:29:03 UTC

*** This bug has been marked as a duplicate of bug 1788336 ***