Bug 1486917

Summary: Unable to boot a VM in an ODL Clustered setup.
Product: Red Hat OpenStack Reporter: Sridhar Gaddam <sgaddam>
Component: opendaylightAssignee: Stephen Kitt <skitt>
Status: CLOSED DUPLICATE QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: urgent    
Version: 12.0 (Pike)CC: jhershbe, mkolesni, nyechiel, sgaddam, smalleni, trozet
Target Milestone: betaKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
N/A
Last Closed: 2017-09-20 12:59:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sridhar Gaddam 2017-08-30 19:11:22 UTC
Description of problem:
In a multi-node setup with ODL running in clustered mode on three controllers and three computes, when we spawn a VM, it was seen that VM always goes into ERROR state.

Version-Release number of selected component (if applicable):
RHOSP12 image.
ODL rpm used: opendaylight-6.2.0-0.1.20170829rel1948.el7.noarch
networking-odl: python-networking-odl-11.0.0-0.20170821130625.c7d90bc.el7ost.noarch
neutron: python-neutron-11.0.0-0.20170807223712.el7ost.noarch

Steps to Reproduce:
1. Create a neutron router, network, subnet. 
2. Associate the subnet to the neutron router.
3. Spawn a VM on the network.
4. We can see that VM stays in "spawning" state for close to 5 mins after which it moves to ERROR state.

Comment 1 Sridhar Gaddam 2017-08-30 19:22:07 UTC
After some debugging, it was seen that Netvirt(/ELANService) is updating the port status properly in the operational datastore, but this notification is not seen on the websocket interface between ODL and networking-odl.

Manually when we added the following iptables rule [#] and restarted opendaylight, neutron-server processes on all the controllers, we noticed that we are able to boot VMs normally.

[#] sudo iptables -I INPUT 15 -p tcp -m multiport --dports 8081,8185 -m state --state NEW -j ACCEPT 

However, when we did a fresh deployment after updating the file [*] with the port 8185, we are still seeing the same issue even though the port (8185) is allowed by iptables. 

[*] tripleo-heat-templates/blob/master/puppet/services/opendaylight-api.yaml

The change in tripelo-heat-templates is indeed required, but it looks like it may not be sufficient to fix the issue. So, we have to identify all the necessary changes to fix the issue.

Comment 4 Tim Rozet 2017-09-06 19:55:57 UTC
Please identify the next issue after opening the port and report back.

Comment 5 Sridhar Gaddam 2017-09-06 20:03:47 UTC
(In reply to Tim Rozet from comment #4)
> Please identify the next issue after opening the port and report back.

Yes the issue is identified and Stephen is working on it. Following is the upstream bug - https://bugs.opendaylight.org//show_bug.cgi?id=9092

Comment 6 Stephen Kitt 2017-09-07 15:04:59 UTC
(In reply to Sridhar Gaddam from comment #5)
> (In reply to Tim Rozet from comment #4)
> > Please identify the next issue after opening the port and report back.
> 
> Yes the issue is identified and Stephen is working on it. Following is the
> upstream bug - https://bugs.opendaylight.org//show_bug.cgi?id=9092

Except that that upstream bug is Nitrogen-only, the affected code doesn't exist in Carbon (or rather, it uses org.json which doesn't have the problem reported upstream).

Comment 9 Stephen Kitt 2017-09-20 12:25:25 UTC
The real issue is related to the websockets used to send notifications to networking-odl and/or Neutron; see https://lists.opendaylight.org/pipermail/netvirt-dev/2017-September/005440.html for details (and https://bugs.opendaylight.org//show_bug.cgi?id=9147 to track upstream).

Comment 10 Tim Rozet 2017-09-20 12:59:39 UTC

*** This bug has been marked as a duplicate of bug 1491327 ***