Description of problem: OVS openflow connections are not being established to OpenDaylight's openflowplugin. on the ovs side, it's indicative by not seeing the openflow manager connected as true: d37923a0-97e7-4ffc-9ece-750a05deb63f Manager "tcp:10.30.170.148:6640" is_connected: true Manager "tcp:10.30.170.138:6640" is_connected: true Manager "tcp:10.30.170.146:6640" is_connected: true Bridge br-int Controller "tcp:10.30.170.148:6653" is_connected: true Controller "tcp:10.30.170.146:6653" Controller "tcp:10.30.170.138:6653" is_connected: true <snip> other symptoms we've seen would be a message like this in the ovs-vswitchd.log file: rconn|WARN|br-int<->tcp:172.17.1.12:6653 <http://172.17.1.12:6653>: connection dropped (Connection refused) or, messages like this in a karaf.log: Jun 06 10:50:42 overcloud-controller-0.opnfvlf.org dockerd-current[20953]: 2018-06-06T10:50:42,167 | INFO | epollEventLoopGroup-9-5 | ConnectionAdapterImpl | 392 - org.opendaylight.openflowplugin.openflowjava.openflow-protocol-impl - 0.6.2.SNAPSHOT | Hello received Jun 06 10:50:42 overcloud-controller-0.opnfvlf.org dockerd-current[20953]: 2018-06-06T10:50:42,169 | INFO | epollEventLoopGroup-9-5 | ContextChainHolderImpl | 383 - org.opendaylight.openflowplugin.impl - 0.6.2.SNAPSHOT | Device openflow:5356928255129 connected. Jun 06 10:50:42 overcloud-controller-0.opnfvlf.org dockerd-current[20953]: 2018-06-06T10:50:42,169 | WARN | epollEventLoopGroup-9-5 | ContextChainHolderImpl | 383 - org.opendaylight.openflowplugin.impl - 0.6.2.SNAPSHOT | Device openflow:5356928255129 is already trying to connect, wait until succeeded or disc Version-Release number of selected component (if applicable): How reproducible: infrequently noticed, but only manual efforts are there to find this at the moment. It's possible we can add a check for this in our automation to understand how common it is. Steps to Reproduce: 1. deploy with triple in a 3node HA setup and repeat until found Actual results: ovs doesn't connect to odl openflow plugin Expected results: ovs should connect to odl openflow plugin Additional info:
I've been able to reproduce this in my devstack setup, stable/queens with stable/oxygen. I wrote a small test script, test.sh, that basically does a del-controller, set-controller, and checks to see if the connection is established. I've opened u/s bug for openflowplugin: https://jira.opendaylight.org/browse/OPNFLWPLUG-1018 From what I can see so far, this appears to be a small timing window in openflowplugin that causes the connection context to get stuck in CLOSED state, whereby all new connections are immediately closed. More details can be found in the u/s bug above, including logs and test script for reproducing.
I applied the two openflowplugin patches to my local setup, and am no longer able to reproduce this issue on my local devstack setup using the updated test2.sh script (attached to u/s 1018 jira).
Vic, if we have a fixed patch for this problem let's move the bug to POST and Mike will collect the fix when he rebases to the Oxygen
(In reply to Ariel Adam from comment #4) > Vic, if we have a fixed patch for this problem let's move the bug to POST > and Mike will collect the fix when he rebases to the Oxygen Be sure to move to POST only when the fix is merged to the stable branch.
Yes, I was waiting for the patches to be merged u/s before moving to POST. Thanks for the reminder.
The patches to fix this were picked up in the last rebase. https://code.engineering.redhat.com/gerrit/gitweb?p=odl-openflowplugin.git;a=commit;h=cba74ef455c876541160afbdfe5e7246f3bc2edd https://code.engineering.redhat.com/gerrit/gitweb?p=odl-openflowplugin.git;a=commit;h=1ae94145e2f54c89c8677caea6cdf9ff424c5e20
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2215