Bug 1560872 - [Netvirt] ODL L2 Agent is dead after restarting a compute node
Summary: [Netvirt] ODL L2 Agent is dead after restarting a compute node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 13.0 (Queens)
Assignee: Josh Hershberg
QA Contact: Itzik Brown
URL:
Whiteboard: odl_netvirt
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-27 07:45 UTC by Itzik Brown
Modified: 2018-10-18 07:23 UTC (History)
4 users (show)

Fixed In Version: opendaylight-8.0.0-11.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
N/A
Last Closed: 2018-06-27 13:48:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Karaf log (2.78 MB, text/plain)
2018-03-27 10:49 UTC, Itzik Brown
no flags Details
Karaf log with OVSDB Trace (2.32 MB, text/plain)
2018-04-11 08:01 UTC, Itzik Brown
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenDaylight Bug NETVIRT-1178 0 None None None 2018-03-27 07:55:45 UTC
OpenDaylight gerrit 71203 0 None None None 2018-04-23 11:09:43 UTC
OpenDaylight gerrit 72188 0 None None None 2018-05-24 08:13:25 UTC
Red Hat Product Errata RHEA-2018:2086 0 None None None 2018-06-27 13:49:42 UTC

Description Itzik Brown 2018-03-27 07:45:33 UTC
Description of problem:
After rebooting a compute node the OVS is connected to the all the controllers but the pseudo agent is down.

In Neutron log:
2018-03-27 07:31:09.202 34 WARNING neutron.db.agents_db [req-86b57593-85d6-4c20-bba1-d408151e94ef - - - - -] Agent healthcheck: found 1 dead agents out of 11:
                Type       Last heartbeat host
              ODL L2  2018-03-27 07:05:06 compute-0.localdomain

Version-Release number of selected component (if applicable):
opendaylight-8.0.0-3.el7ost.noarch

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Itzik Brown 2018-03-27 10:49:44 UTC
Created attachment 1413668 [details]
Karaf log

Comment 2 Josh Hershberg 2018-04-02 13:37:12 UTC
Please add these to the karaf logging configuration and post the resultant karaf.log.

log4j2.logger.itzik.name = org.opendaylight.neutron.hostconfig.ovs.NeutronHostconfigOvsListener
log4j2.logger.itzik.level = DEBUG

Comment 4 Josh Hershberg 2018-04-11 08:00:10 UTC
Itzik and I sat on this today. What we saw was that indeed, the rebooted host is missing from /operational/neutron:neutron/hostconfigs. We also saw that the node was missing from /operational/network-topology:network-topology/ which seems to indicate that ovsdb plugin is failing to write that node to operational. This requires some additional digging.

Comment 5 Itzik Brown 2018-04-11 08:01:52 UTC
Created attachment 1420206 [details]
Karaf log with OVSDB Trace

Comment 6 Itzik Brown 2018-04-11 09:01:31 UTC
Restarting the OVS on the compute node - no problem
Power down the compute , waiting for 10 minutes and powering it on - no problem.

Comment 7 Josh Hershberg 2018-04-23 11:09:12 UTC
Please see the upstream bug for details on the root cause

https://jira.opendaylight.org/browse/NETVIRT-1178

Patch here: https://git.opendaylight.org/gerrit/#/c/71203/

Comment 10 Mike Kolesnik 2018-05-21 08:42:26 UTC
Moving non blocker OSP 13 bugs to z1

Comment 12 Josh Hershberg 2018-05-24 08:14:20 UTC
Attached link to patch on u/s stable/oxygen above

https://git.opendaylight.org/gerrit/#/c/72188/

Comment 17 Itzik Brown 2018-05-31 13:00:31 UTC
Checked with:
opendaylight-8.0.0-11.el7ost.noarch

Comment 19 errata-xmlrpc 2018-06-27 13:48:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.