Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1568976 - [Deployment] One ODL controller is not started correctly returning 404 on every REST call leading to failed OSP+ODL deploy
[Deployment] One ODL controller is not started correctly returning 404 on eve...
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight (Show other bugs)
13.0 (Queens)
Unspecified Unspecified
high Severity high
: z1
: 13.0 (Queens)
Assigned To: Stephen Kitt
Waldemar Znoinski
Deployment,
: Triaged, ZStream
Depends On: 1570848
Blocks:
  Show dependency treegraph
 
Reported: 2018-04-18 09:14 EDT by Sai Sindhur Malleni
Modified: 2018-10-18 03:23 EDT (History)
10 users (show)

See Also:
Fixed In Version: opendaylight-8.3.0-1.el7ost
Doc Type: Known Issue
Doc Text:
During deployment, one or more OpenDaylight instances may fail to start correctly due to a feature loading bug. This may lead to a deployment or functional failure. When a deployment passes, only two of the three OpenDaylight instances must be functional for the deployment to succeed. It is possible that the third OpenDaylight instance started incorrectly. Check the health status of each container with the `docker ps` command. If it is unhealthy, restart the container with `docker restart opendaylight_api`. When a deployment fails, the only option is to restart the deployment. For TLS-based deployments, all OpenDaylight instances must boot correctly or deployment will fail.
Story Points: ---
Clone Of:
Environment:
N/A
Last Closed: 2018-07-19 09:53:05 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenDaylight gerrit 70979 None None None 2018-04-18 09:15 EDT
Red Hat Product Errata RHBA-2018:2215 None None None 2018-07-19 09:53 EDT

  None (edit)
Description Sai Sindhur Malleni 2018-04-18 09:14:55 EDT
Description of problem:
When deploying a 3 controller + 6 compute node deployment with ODLs in HA collocated with the OSP controllers, occasionally we see a deploy failing with the following error.

curl -k -o /dev/null --fail --silent --head -u admin:admin http://172.16.0.13:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1 returned 22 instead of one of [0]"

172.16.0.13 is the VIP for ODL.

On further investigation by Tim Rozet, we found out that there were no networking issues. The curl would work from all 3 controllers and some of the computes but not all computes. 

The issue seems to be that ODL features are not loaded in the correct order sometimes  leading to a non-functional ODL (started but returning HTTP 404) on one of the controllers. So this seems to be an initialization race condition where Jersey needs to finish initialization before ODL starts. More details in the commit message here: https://git.opendaylight.org/gerrit/#/c/70979/

java.lang.RuntimeException: Error obtaining AAAShiroProvider 


               

Version-Release number of selected component (if applicable):
OSP 13

How reproducible:

Very ocassionally only during scale deploys mostly

Steps to Reproduce:
1. Deploy OSP + ODL setup with a lot of computes (6 in our case)
2.
3.

Actual results:
Deploy failed

Expected results:
Deploy should succeeed everytime

Additional info:
Comment 15 Sai Sindhur Malleni 2018-04-30 09:42:14 EDT
Mike,

THis is happening pretty consistently on my environment. So the solution is, if the deploy fails, you manually restart the ODL controllers and run a stack update? Is this OK? Shouldn't we have documentation in place that talks about this? I believe an overcloud failed stack isn't a great sign.
Comment 16 Itzik Brown 2018-05-01 04:14:23 EDT
*** Bug 1573224 has been marked as a duplicate of this bug. ***
Comment 22 Mike Kolesnik 2018-05-21 08:53:08 EDT
This should be available once we rebase to stable/oxygen, moving to POST
Comment 32 Janki 2018-07-17 00:33:13 EDT
I have been doing successful deployments with this rpm for quite some time now and have not encountered this error.
Comment 34 errata-xmlrpc 2018-07-19 09:53:05 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2215

Note You need to log in before you can comment on or make changes to this bug.