Bug 1560872

Summary: [Netvirt] ODL L2 Agent is dead after restarting a compute node
Product: Red Hat OpenStack Reporter: Itzik Brown <itbrown>
Component: opendaylightAssignee: Josh Hershberg <jhershbe>
Status: CLOSED ERRATA QA Contact: Itzik Brown <itbrown>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: aadam, itbrown, mkolesni, nyechiel
Target Milestone: rcKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: odl_netvirt
Fixed In Version: opendaylight-8.0.0-11.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
N/A
Last Closed: 2018-06-27 13:48:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Karaf log
none
Karaf log with OVSDB Trace none

Description Itzik Brown 2018-03-27 07:45:33 UTC
Description of problem:
After rebooting a compute node the OVS is connected to the all the controllers but the pseudo agent is down.

In Neutron log:
2018-03-27 07:31:09.202 34 WARNING neutron.db.agents_db [req-86b57593-85d6-4c20-bba1-d408151e94ef - - - - -] Agent healthcheck: found 1 dead agents out of 11:
                Type       Last heartbeat host
              ODL L2  2018-03-27 07:05:06 compute-0.localdomain

Version-Release number of selected component (if applicable):
opendaylight-8.0.0-3.el7ost.noarch

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Itzik Brown 2018-03-27 10:49:44 UTC
Created attachment 1413668 [details]
Karaf log

Comment 2 Josh Hershberg 2018-04-02 13:37:12 UTC
Please add these to the karaf logging configuration and post the resultant karaf.log.

log4j2.logger.itzik.name = org.opendaylight.neutron.hostconfig.ovs.NeutronHostconfigOvsListener
log4j2.logger.itzik.level = DEBUG

Comment 4 Josh Hershberg 2018-04-11 08:00:10 UTC
Itzik and I sat on this today. What we saw was that indeed, the rebooted host is missing from /operational/neutron:neutron/hostconfigs. We also saw that the node was missing from /operational/network-topology:network-topology/ which seems to indicate that ovsdb plugin is failing to write that node to operational. This requires some additional digging.

Comment 5 Itzik Brown 2018-04-11 08:01:52 UTC
Created attachment 1420206 [details]
Karaf log with OVSDB Trace

Comment 6 Itzik Brown 2018-04-11 09:01:31 UTC
Restarting the OVS on the compute node - no problem
Power down the compute , waiting for 10 minutes and powering it on - no problem.

Comment 7 Josh Hershberg 2018-04-23 11:09:12 UTC
Please see the upstream bug for details on the root cause

https://jira.opendaylight.org/browse/NETVIRT-1178

Patch here: https://git.opendaylight.org/gerrit/#/c/71203/

Comment 10 Mike Kolesnik 2018-05-21 08:42:26 UTC
Moving non blocker OSP 13 bugs to z1

Comment 12 Josh Hershberg 2018-05-24 08:14:20 UTC
Attached link to patch on u/s stable/oxygen above

https://git.opendaylight.org/gerrit/#/c/72188/

Comment 17 Itzik Brown 2018-05-31 13:00:31 UTC
Checked with:
opendaylight-8.0.0-11.el7ost.noarch

Comment 19 errata-xmlrpc 2018-06-27 13:48:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086