Bug 1588186 - [Netvirt] OVS not able to connect to opendaylight openflowplugin
Summary: [Netvirt] OVS not able to connect to opendaylight openflowplugin
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: 13.0 (Queens)
Assignee: Victor Pickard
QA Contact: Tomas Jamrisko
URL:
Whiteboard: Netvirt
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-06 21:22 UTC by jamo luhrsen
Modified: 2018-10-18 07:18 UTC (History)
7 users (show)

Fixed In Version: opendaylight-8.3.0-1.el7ost
Doc Type: Known Issue
Doc Text:
A race condition causes Open vSwitch to not connect to the Opendaylight openflowplugin. A fix is currently being implemented for a 13.z release of this product.
Clone Of:
Environment:
N/A
Last Closed: 2018-07-19 13:53:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenDaylight Bug OPNFLWPLUG-1018 0 None None None 2018-06-08 19:58:21 UTC
OpenDaylight gerrit 72842 0 None None None 2018-06-11 19:16:01 UTC
OpenDaylight gerrit 72848 0 None None None 2018-06-11 19:15:34 UTC
Red Hat Product Errata RHBA-2018:2215 0 None None None 2018-07-19 13:54:14 UTC

Internal Links: 1594277

Description jamo luhrsen 2018-06-06 21:22:17 UTC
Description of problem:

OVS openflow connections are not being established to OpenDaylight's
openflowplugin.

on the ovs side, it's indicative by not seeing the openflow manager
connected as true:

d37923a0-97e7-4ffc-9ece-750a05deb63f
    Manager "tcp:10.30.170.148:6640"
        is_connected: true
    Manager "tcp:10.30.170.138:6640"
        is_connected: true
    Manager "tcp:10.30.170.146:6640"
        is_connected: true
    Bridge br-int
        Controller "tcp:10.30.170.148:6653" 
            is_connected: true
        Controller "tcp:10.30.170.146:6653"
        Controller "tcp:10.30.170.138:6653"
            is_connected: true

<snip>

other symptoms we've seen would be a message like this in the ovs-vswitchd.log
file:

    rconn|WARN|br-int<->tcp:172.17.1.12:6653 <http://172.17.1.12:6653>: connection dropped (Connection refused)


or, messages like this in a karaf.log:

 Jun 06 10:50:42 overcloud-controller-0.opnfvlf.org dockerd-current[20953]: 2018-06-06T10:50:42,167 | INFO  | epollEventLoopGroup-9-5 | ConnectionAdapterImpl            | 392 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.2.SNAPSHOT | Hello received
Jun 06 10:50:42 overcloud-controller-0.opnfvlf.org dockerd-current[20953]: 2018-06-06T10:50:42,169 | INFO  | epollEventLoopGroup-9-5 | ContextChainHolderImpl           | 383 - org.opendaylight.openflowplugin.impl - 0.6.2.SNAPSHOT | Device openflow:5356928255129 connected.
Jun 06 10:50:42 overcloud-controller-0.opnfvlf.org dockerd-current[20953]: 2018-06-06T10:50:42,169 | WARN  | epollEventLoopGroup-9-5 | ContextChainHolderImpl           | 383 - org.opendaylight.openflowplugin.impl - 0.6.2.SNAPSHOT | Device openflow:5356928255129 is already trying to connect, wait until succeeded or disc



Version-Release number of selected component (if applicable):


How reproducible:

infrequently noticed, but only manual efforts are there to find this at
the moment. It's possible we can add a check for this in our automation
to understand how common it is.

Steps to Reproduce:
1. deploy with triple in a 3node HA setup and repeat until found

Actual results:

ovs doesn't connect to odl openflow plugin

Expected results:

ovs should connect to odl openflow plugin


Additional info:

Comment 2 Victor Pickard 2018-06-08 20:02:47 UTC
I've been able to reproduce this in my devstack setup, stable/queens with stable/oxygen. I wrote a small test script, test.sh, that basically does a del-controller, set-controller, and checks to see if the connection is established.

I've opened u/s bug for openflowplugin:

https://jira.opendaylight.org/browse/OPNFLWPLUG-1018

From what I can see so far, this appears to be a small timing window in openflowplugin that causes the connection context to get stuck in CLOSED state, whereby all new connections are immediately closed.

More details can be found in the u/s bug above, including logs and test script for reproducing.

Comment 3 Victor Pickard 2018-06-11 19:19:21 UTC
I applied the two openflowplugin patches to my local setup, and am no longer able to reproduce this issue on my local devstack setup using the updated test2.sh script (attached to u/s 1018 jira).

Comment 4 Ariel Adam 2018-06-12 04:57:20 UTC
Vic, if we have a fixed patch for this problem let's move the bug to POST and Mike will collect the fix when he rebases to the Oxygen

Comment 5 Mike Kolesnik 2018-06-12 05:08:33 UTC
(In reply to Ariel Adam from comment #4)
> Vic, if we have a fixed patch for this problem let's move the bug to POST
> and Mike will collect the fix when he rebases to the Oxygen

Be sure to move to POST only when the fix is merged to the stable branch.

Comment 6 Victor Pickard 2018-06-12 13:06:18 UTC
Yes, I was waiting for the patches to be merged u/s before moving to POST. Thanks for the reminder.

Comment 19 errata-xmlrpc 2018-07-19 13:53:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2215


Note You need to log in before you can comment on or make changes to this bug.