Description of problem: When deploying a 3 controller + 6 compute node deployment with ODLs in HA collocated with the OSP controllers, occasionally we see a deploy failing with the following error. curl -k -o /dev/null --fail --silent --head -u admin:admin http://172.16.0.13:8081/restconf/operational/network-topology:network-topology/topology/netvirt:1 returned 22 instead of one of [0]" 172.16.0.13 is the VIP for ODL. On further investigation by Tim Rozet, we found out that there were no networking issues. The curl would work from all 3 controllers and some of the computes but not all computes. The issue seems to be that ODL features are not loaded in the correct order sometimes leading to a non-functional ODL (started but returning HTTP 404) on one of the controllers. So this seems to be an initialization race condition where Jersey needs to finish initialization before ODL starts. More details in the commit message here: https://git.opendaylight.org/gerrit/#/c/70979/ java.lang.RuntimeException: Error obtaining AAAShiroProvider Version-Release number of selected component (if applicable): OSP 13 How reproducible: Very ocassionally only during scale deploys mostly Steps to Reproduce: 1. Deploy OSP + ODL setup with a lot of computes (6 in our case) 2. 3. Actual results: Deploy failed Expected results: Deploy should succeeed everytime Additional info:
Mike, THis is happening pretty consistently on my environment. So the solution is, if the deploy fails, you manually restart the ODL controllers and run a stack update? Is this OK? Shouldn't we have documentation in place that talks about this? I believe an overcloud failed stack isn't a great sign.
*** Bug 1573224 has been marked as a duplicate of this bug. ***
This should be available once we rebase to stable/oxygen, moving to POST
I have been doing successful deployments with this rpm for quite some time now and have not encountered this error.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2215