Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1588186

Summary: [Netvirt] OVS not able to connect to opendaylight openflowplugin
Product: Red Hat OpenStack Reporter: jamo luhrsen <jluhrsen>
Component: opendaylightAssignee: Victor Pickard <vpickard>
Status: CLOSED ERRATA QA Contact: Tomas Jamrisko <tjamrisk>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: aadam, dcain, dfarrell, dmacpher, mkolesni, nyechiel, tjamrisk
Target Milestone: z1Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: Netvirt
Fixed In Version: opendaylight-8.3.0-1.el7ost Doc Type: Known Issue
Doc Text:
A race condition causes Open vSwitch to not connect to the Opendaylight openflowplugin. A fix is currently being implemented for a 13.z release of this product.
Story Points: ---
Clone Of: Environment:
N/A
Last Closed: 2018-07-19 13:53:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description jamo luhrsen 2018-06-06 21:22:17 UTC
Description of problem:

OVS openflow connections are not being established to OpenDaylight's
openflowplugin.

on the ovs side, it's indicative by not seeing the openflow manager
connected as true:

d37923a0-97e7-4ffc-9ece-750a05deb63f
    Manager "tcp:10.30.170.148:6640"
        is_connected: true
    Manager "tcp:10.30.170.138:6640"
        is_connected: true
    Manager "tcp:10.30.170.146:6640"
        is_connected: true
    Bridge br-int
        Controller "tcp:10.30.170.148:6653" 
            is_connected: true
        Controller "tcp:10.30.170.146:6653"
        Controller "tcp:10.30.170.138:6653"
            is_connected: true

<snip>

other symptoms we've seen would be a message like this in the ovs-vswitchd.log
file:

    rconn|WARN|br-int<->tcp:172.17.1.12:6653 <http://172.17.1.12:6653>: connection dropped (Connection refused)


or, messages like this in a karaf.log:

 Jun 06 10:50:42 overcloud-controller-0.opnfvlf.org dockerd-current[20953]: 2018-06-06T10:50:42,167 | INFO  | epollEventLoopGroup-9-5 | ConnectionAdapterImpl            | 392 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.2.SNAPSHOT | Hello received
Jun 06 10:50:42 overcloud-controller-0.opnfvlf.org dockerd-current[20953]: 2018-06-06T10:50:42,169 | INFO  | epollEventLoopGroup-9-5 | ContextChainHolderImpl           | 383 - org.opendaylight.openflowplugin.impl - 0.6.2.SNAPSHOT | Device openflow:5356928255129 connected.
Jun 06 10:50:42 overcloud-controller-0.opnfvlf.org dockerd-current[20953]: 2018-06-06T10:50:42,169 | WARN  | epollEventLoopGroup-9-5 | ContextChainHolderImpl           | 383 - org.opendaylight.openflowplugin.impl - 0.6.2.SNAPSHOT | Device openflow:5356928255129 is already trying to connect, wait until succeeded or disc



Version-Release number of selected component (if applicable):


How reproducible:

infrequently noticed, but only manual efforts are there to find this at
the moment. It's possible we can add a check for this in our automation
to understand how common it is.

Steps to Reproduce:
1. deploy with triple in a 3node HA setup and repeat until found

Actual results:

ovs doesn't connect to odl openflow plugin

Expected results:

ovs should connect to odl openflow plugin


Additional info:

Comment 2 Victor Pickard 2018-06-08 20:02:47 UTC
I've been able to reproduce this in my devstack setup, stable/queens with stable/oxygen. I wrote a small test script, test.sh, that basically does a del-controller, set-controller, and checks to see if the connection is established.

I've opened u/s bug for openflowplugin:

https://jira.opendaylight.org/browse/OPNFLWPLUG-1018

From what I can see so far, this appears to be a small timing window in openflowplugin that causes the connection context to get stuck in CLOSED state, whereby all new connections are immediately closed.

More details can be found in the u/s bug above, including logs and test script for reproducing.

Comment 3 Victor Pickard 2018-06-11 19:19:21 UTC
I applied the two openflowplugin patches to my local setup, and am no longer able to reproduce this issue on my local devstack setup using the updated test2.sh script (attached to u/s 1018 jira).

Comment 4 Ariel Adam 2018-06-12 04:57:20 UTC
Vic, if we have a fixed patch for this problem let's move the bug to POST and Mike will collect the fix when he rebases to the Oxygen

Comment 5 Mike Kolesnik 2018-06-12 05:08:33 UTC
(In reply to Ariel Adam from comment #4)
> Vic, if we have a fixed patch for this problem let's move the bug to POST
> and Mike will collect the fix when he rebases to the Oxygen

Be sure to move to POST only when the fix is merged to the stable branch.

Comment 6 Victor Pickard 2018-06-12 13:06:18 UTC
Yes, I was waiting for the patches to be merged u/s before moving to POST. Thanks for the reminder.

Comment 19 errata-xmlrpc 2018-07-19 13:53:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2215