Bug 1887866 - OVN: agent list command fails after OSP16.1 update
Summary: OVN: agent list command fails after OSP16.1 update
Keywords:
Status: CLOSED DUPLICATE of bug 1788336
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-networking-ovn
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: z4
: 16.1 (Train on RHEL 8.2)
Assignee: Jakub Libosvar
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-13 13:47 UTC by Eduardo Olivares
Modified: 2024-06-13 23:12 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-22 19:29:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1901527 0 None None None 2020-10-26 11:12:12 UTC
Red Hat Issue Tracker OSP-498 0 None None None 2021-11-18 14:28:34 UTC

Description Eduardo Olivares 2020-10-13 13:47:56 UTC
Description of problem:
After an OSP16.1 update (the update was performed from RHOS-16.1-RHEL-8-20200903.n.0 to RHOS-16.1-RHEL-8-20201007.n.0), controller-1 acts as master haproxy and controller-0 hosts master OVN DBs. Nodes have not been rebooted (this is different from the scenario from BZ1885592).

Creation of network resources does not fail (this is different from BZ1885592 too). Network creation and deletion was performed multiple times successfully. However, openstack network agent list command fails 1/3 times. This is the failure message shown:
(overcloud) [stack@undercloud-0 ~]$ openstack network agent list
+--------------------------------------+------------------------------+---------------------------+-------------------+-------+-------+----------------+
| ID                                   | Agent Type                   | Host                      | Availability Zone | Alive | State | Binary         |
+--------------------------------------+------------------------------+---------------------------+-------------------+-------+-------+----------------+
| 0866d9c3-2e61-4a49-8585-cd25ca7e59bf | OVN Controller Gateway agent | controller-1.redhat.local |                   | XXX   | UP    | ovn-controller |
| 310c4d8e-7e70-42dd-b7e6-09c40f9b5c1c | OVN Controller Gateway agent | controller-2.redhat.local |                   | :-)   | UP    | ovn-controller |
| 7aab32fb-5001-49be-9a50-5c5e1856855d | OVN Controller agent         | compute-1.redhat.local    |                   | :-)   | UP    | ovn-controller |
| 1280a5b6-c18e-4cf9-a415-003effafca1c | OVN Controller Gateway agent | controller-0.redhat.local |                   | XXX   | UP    | ovn-controller |
| ccf0ef5f-8258-4f1f-82ab-b1497522a816 | OVN Controller agent         | compute-0.redhat.local    |                   | :-)   | UP    | ovn-controller |
+--------------------------------------+------------------------------+---------------------------+-------------------+-------+-------+----------------+



Logs show errors happen when controller-0's neutron-api handles the agent list request. 
The following errors are shown only on controller-0:
2020-10-08 16:41:15.172 34 WARNING neutron.db.agents_db [req-05155849-657f-4cd5-b9f5-2db4061cc246 - - - - -] Agent healthcheck: found 2 dead agents out of 5:
                Type       Last heartbeat host
OVN Controller Gateway agent 2020-10-08 16:41:15.167866 controller-1.redhat.local
OVN Controller Gateway agent 2020-10-08 16:41:15.170482 controller-0.redhat.local


pcs status shows no issues with OVN DBs
containers status is up
According to neutron-api logs, all processes were successfully connected to Neutron DBs after the OSP update.





Version-Release number of selected component (if applicable):
the update was performed from RHOS-16.1-RHEL-8-20200903.n.0 to RHOS-16.1-RHEL-8-20201007.n.0

How reproducible:
only tried once and it failed


Steps to Reproduce:
1. perform and osp16.1 update. After the overcloud update is completed, run 'openstack network agent list' several times (it failed 1/3 times)

Comment 8 Jakub Libosvar 2021-12-22 19:29:03 UTC

*** This bug has been marked as a duplicate of bug 1788336 ***


Note You need to log in before you can comment on or make changes to this bug.