Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1588186 - [Netvirt] OVS not able to connect to opendaylight openflowplugin
[Netvirt] OVS not able to connect to opendaylight openflowplugin
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight (Show other bugs)
13.0 (Queens)
Unspecified Unspecified
high Severity high
: z1
: 13.0 (Queens)
Assigned To: Victor Pickard
Tomas Jamrisko
Netvirt
: Triaged, ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-06-06 17:22 EDT by jamo luhrsen
Modified: 2018-10-18 03:18 EDT (History)
7 users (show)

See Also:
Fixed In Version: opendaylight-8.3.0-1.el7ost
Doc Type: Known Issue
Doc Text:
A race condition causes Open vSwitch to not connect to the Opendaylight openflowplugin. A fix is currently being implemented for a 13.z release of this product.
Story Points: ---
Clone Of:
Environment:
N/A
Last Closed: 2018-07-19 09:53:43 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
OpenDaylight Bug OPNFLWPLUG-1018 None None None 2018-06-08 15:58 EDT
OpenDaylight gerrit 72842 None None None 2018-06-11 15:16 EDT
OpenDaylight gerrit 72848 None None None 2018-06-11 15:15 EDT
Red Hat Product Errata RHBA-2018:2215 None None None 2018-07-19 09:54 EDT

  None (edit)
Description jamo luhrsen 2018-06-06 17:22:17 EDT
Description of problem:

OVS openflow connections are not being established to OpenDaylight's
openflowplugin.

on the ovs side, it's indicative by not seeing the openflow manager
connected as true:

d37923a0-97e7-4ffc-9ece-750a05deb63f
    Manager "tcp:10.30.170.148:6640"
        is_connected: true
    Manager "tcp:10.30.170.138:6640"
        is_connected: true
    Manager "tcp:10.30.170.146:6640"
        is_connected: true
    Bridge br-int
        Controller "tcp:10.30.170.148:6653" 
            is_connected: true
        Controller "tcp:10.30.170.146:6653"
        Controller "tcp:10.30.170.138:6653"
            is_connected: true

<snip>

other symptoms we've seen would be a message like this in the ovs-vswitchd.log
file:

    rconn|WARN|br-int<->tcp:172.17.1.12:6653 <http://172.17.1.12:6653>: connection dropped (Connection refused)


or, messages like this in a karaf.log:

 Jun 06 10:50:42 overcloud-controller-0.opnfvlf.org dockerd-current[20953]: 2018-06-06T10:50:42,167 | INFO  | epollEventLoopGroup-9-5 | ConnectionAdapterImpl            | 392 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.2.SNAPSHOT | Hello received
Jun 06 10:50:42 overcloud-controller-0.opnfvlf.org dockerd-current[20953]: 2018-06-06T10:50:42,169 | INFO  | epollEventLoopGroup-9-5 | ContextChainHolderImpl           | 383 - org.opendaylight.openflowplugin.impl - 0.6.2.SNAPSHOT | Device openflow:5356928255129 connected.
Jun 06 10:50:42 overcloud-controller-0.opnfvlf.org dockerd-current[20953]: 2018-06-06T10:50:42,169 | WARN  | epollEventLoopGroup-9-5 | ContextChainHolderImpl           | 383 - org.opendaylight.openflowplugin.impl - 0.6.2.SNAPSHOT | Device openflow:5356928255129 is already trying to connect, wait until succeeded or disc



Version-Release number of selected component (if applicable):


How reproducible:

infrequently noticed, but only manual efforts are there to find this at
the moment. It's possible we can add a check for this in our automation
to understand how common it is.

Steps to Reproduce:
1. deploy with triple in a 3node HA setup and repeat until found

Actual results:

ovs doesn't connect to odl openflow plugin

Expected results:

ovs should connect to odl openflow plugin


Additional info:
Comment 2 Victor Pickard 2018-06-08 16:02:47 EDT
I've been able to reproduce this in my devstack setup, stable/queens with stable/oxygen. I wrote a small test script, test.sh, that basically does a del-controller, set-controller, and checks to see if the connection is established.

I've opened u/s bug for openflowplugin:

https://jira.opendaylight.org/browse/OPNFLWPLUG-1018

From what I can see so far, this appears to be a small timing window in openflowplugin that causes the connection context to get stuck in CLOSED state, whereby all new connections are immediately closed.

More details can be found in the u/s bug above, including logs and test script for reproducing.
Comment 3 Victor Pickard 2018-06-11 15:19:21 EDT
I applied the two openflowplugin patches to my local setup, and am no longer able to reproduce this issue on my local devstack setup using the updated test2.sh script (attached to u/s 1018 jira).
Comment 4 Ariel Adam 2018-06-12 00:57:20 EDT
Vic, if we have a fixed patch for this problem let's move the bug to POST and Mike will collect the fix when he rebases to the Oxygen
Comment 5 Mike Kolesnik 2018-06-12 01:08:33 EDT
(In reply to Ariel Adam from comment #4)
> Vic, if we have a fixed patch for this problem let's move the bug to POST
> and Mike will collect the fix when he rebases to the Oxygen

Be sure to move to POST only when the fix is merged to the stable branch.
Comment 6 Victor Pickard 2018-06-12 09:06:18 EDT
Yes, I was waiting for the patches to be merged u/s before moving to POST. Thanks for the reminder.
Comment 19 errata-xmlrpc 2018-07-19 09:53:43 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2215

Note You need to log in before you can comment on or make changes to this bug.