Bug 1560872
| Summary: | [Netvirt] ODL L2 Agent is dead after restarting a compute node | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Itzik Brown <itbrown> | ||||||
| Component: | opendaylight | Assignee: | Josh Hershberg <jhershbe> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Itzik Brown <itbrown> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 13.0 (Queens) | CC: | aadam, itbrown, mkolesni, nyechiel | ||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||
| Target Release: | 13.0 (Queens) | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | odl_netvirt | ||||||||
| Fixed In Version: | opendaylight-8.0.0-11.el7ost | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: |
N/A
|
|||||||
| Last Closed: | 2018-06-27 13:48:49 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1413668 [details]
Karaf log
Please add these to the karaf logging configuration and post the resultant karaf.log. log4j2.logger.itzik.name = org.opendaylight.neutron.hostconfig.ovs.NeutronHostconfigOvsListener log4j2.logger.itzik.level = DEBUG Itzik and I sat on this today. What we saw was that indeed, the rebooted host is missing from /operational/neutron:neutron/hostconfigs. We also saw that the node was missing from /operational/network-topology:network-topology/ which seems to indicate that ovsdb plugin is failing to write that node to operational. This requires some additional digging. Created attachment 1420206 [details]
Karaf log with OVSDB Trace
Restarting the OVS on the compute node - no problem Power down the compute , waiting for 10 minutes and powering it on - no problem. Please see the upstream bug for details on the root cause https://jira.opendaylight.org/browse/NETVIRT-1178 Patch here: https://git.opendaylight.org/gerrit/#/c/71203/ Moving non blocker OSP 13 bugs to z1 Attached link to patch on u/s stable/oxygen above https://git.opendaylight.org/gerrit/#/c/72188/ Checked with: opendaylight-8.0.0-11.el7ost.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086 |
Description of problem: After rebooting a compute node the OVS is connected to the all the controllers but the pseudo agent is down. In Neutron log: 2018-03-27 07:31:09.202 34 WARNING neutron.db.agents_db [req-86b57593-85d6-4c20-bba1-d408151e94ef - - - - -] Agent healthcheck: found 1 dead agents out of 11: Type Last heartbeat host ODL L2 2018-03-27 07:05:06 compute-0.localdomain Version-Release number of selected component (if applicable): opendaylight-8.0.0-3.el7ost.noarch How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: