Bug 1588116

Summary: [BasicFunctioniality] OpenFlow plugin misses pushing some tables when OVS connects
Product: Red Hat OpenStack Reporter: Tim Rozet <trozet>
Component: opendaylightAssignee: Vishal Thapar <vthapar>
Status: CLOSED ERRATA QA Contact: Tomas Jamrisko <tjamrisk>
Severity: urgent Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: aadam, asuryana, itbrown, jchhatba, jjoyce, jluhrsen, jschluet, lmarsh, mkolesni, nyechiel, oblaut, sgaddam, skitt, slinaber, smalleni, tjamrisk, trozet, tvignaud, vthapar, wznoinsk
Target Milestone: z1Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: BasicFunctioniality
Fixed In Version: opendaylight-8.3.0-1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1568989 Environment:
N/A
Last Closed: 2018-07-19 13:53:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ovs info (flows, uuid) and karaf logs for all 5 nodes none

Comment 2 Tim Rozet 2018-06-06 16:56:36 UTC
This bug is to track fixing this issue in opendaylight

Comment 3 jamo luhrsen 2018-06-06 16:58:52 UTC
Created attachment 1448387 [details]
ovs info (flows, uuid) and karaf logs for all 5 nodes

one controller (controller-2) rolled it's logs over very fast (TRACE
logging was enabled for openflowplugin) so I'm not sure everything
is captured on that ODL node. The other two controllers had not rolled
over.

the node affected was controller-0. It did not have table=48 for sure,
and possibly other tables.

Comment 4 Vishal Thapar 2018-06-11 15:08:22 UTC
This looks like some misconfiguration or infra issue. 

We're getting repeated handshakes with dpnid 277592196268432 from two IPs: 172.17.1.11 and 192.168.24.15. 192.168.24.x IP only shows up for this device, none other. All other have incoming connections on 172.17.1.x only.

172.17.1.10
172.17.1.14
172.17.1.15
172.17.1.20

We should look at configuarion/setup to determine if something gone wrong in networking.


Relevant log entries:
---------------------
2018-06-06T15:41:47,113 | DEBUG | nioEventLoopGroup-9-12 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | hello successfully sent, xid=21, addr=/172.17.1.11:36710
2018-06-06T15:41:47,114 | TRACE | nioEventLoopGroup-9-12 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | features are back
2018-06-06T15:41:47,115 | DEBUG | nioEventLoopGroup-9-12 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | obtained features: datapathId=277592196268432
2018-06-06T15:41:47,115 | DEBUG | nioEventLoopGroup-9-12 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | obtained features: auxiliaryId=0
2018-06-06T15:41:47,115 | TRACE | nioEventLoopGroup-9-12 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | handshake SETTLED: version=4, datapathId=277592196268432, auxiliaryId=0
2018-06-06T15:41:47,115 | DEBUG | nioEventLoopGroup-9-12 | HandshakeListenerImpl            | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | handshake succeeded: /172.17.1.11:36710
2018-06-06T15:41:47,116 | DEBUG | nioEventLoopGroup-9-12 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | postHandshake DONE
2018-06-06T15:41:47,117 | DEBUG | nioEventLoopGroup-9-12 | HandshakeListenerImpl            | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | succeeded by getting sweep barrier after post-handshake for device openflow:277592196268432
2018-06-06T15:41:47,117 | INFO  | nioEventLoopGroup-9-12 | ContextChainHolderImpl           | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Device openflow:277592196268432 connected.
2018-06-06T15:41:47,117 | WARN  | nioEventLoopGroup-9-12 | ContextChainHolderImpl           | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Device openflow:277592196268432 already connected. Closing previous connection
2018-06-06T15:41:47,118 | INFO  | nioEventLoopGroup-9-12 | ReconciliationManagerImpl        | 377 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.6.0 | Stopping reconciliation for node Uri [_value=openflow:277592196268432]
2018-06-06T15:41:47,119 | INFO  | nioEventLoopGroup-9-12 | ContextChainImpl                 | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Closed clustering services registration for node openflow:277592196268432


2018-06-06T15:41:46,123 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | hello successfully sent, xid=8, addr=/192.168.24.15:53612
2018-06-06T15:41:46,123 | TRACE | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | ret - DONE - versionBitmap
2018-06-06T15:41:46,125 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | version set: 4
2018-06-06T15:41:46,125 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | sending feature request for version=4 and xid=9
2018-06-06T15:41:46,127 | INFO  | ForkJoinPool-1-worker-7 | ElanTunnelInterfaceStateListener | 347 - org.opendaylight.netvirt.elanmanager-impl - 0.6.0 | Handling tunnel state event for srcDpId 277592196268432 and dstDpId 132658838792603
2018-06-06T15:41:46,125 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | future features [9] hooked ..
2018-06-06T15:41:46,131 | TRACE | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | features are back
2018-06-06T15:41:46,131 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | obtained features: datapathId=277592196268432
2018-06-06T15:41:46,131 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | obtained features: auxiliaryId=0
2018-06-06T15:41:46,132 | TRACE | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | handshake SETTLED: version=4, datapathId=277592196268432, auxiliaryId=0
2018-06-06T15:41:46,132 | DEBUG | nioEventLoopGroup-9-11 | HandshakeListenerImpl            | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | handshake succeeded: /192.168.24.15:53612
2018-06-06T15:41:46,132 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | postHandshake DONE
2018-06-06T15:41:46,134 | DEBUG | nioEventLoopGroup-9-11 | HandshakeListenerImpl            | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | succeeded by getting sweep barrier after post-handshake for device openflow:277592196268432
2018-06-06T15:41:46,134 | INFO  | nioEventLoopGroup-9-11 | ContextChainHolderImpl           | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Device openflow:277592196268432 connected.
2018-06-06T15:41:46,136 | WARN  | nioEventLoopGroup-9-11 | ContextChainHolderImpl           | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Device openflow:277592196268432 already connected. Closing previous connection
2018-06-06T15:41:46,138 | INFO  | nioEventLoopGroup-9-11 | ReconciliationManagerImpl        | 377 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.6.0 | Stopping reconciliation for node Uri [_value=openflow:277592196268432]
2018-06-06T15:41:46,142 | INFO  | nioEventLoopGroup-9-11 | ContextChainImpl                 | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Closed clustering services registration for node openflow:277592196268432

Comment 5 Tim Rozet 2018-06-11 15:27:54 UTC
This is a bug that has been around for a while in ODL.  The OF controller will be set to multiple networks for the same node by ODL.  Really OVSDB should only set the OF controller to the same IP used to connect to it via the client.

Comment 6 Vishal Thapar 2018-06-11 15:52:43 UTC
As per discussion with trozet, issue is two controller IPs, public and private, configured on ODL. Two connections from same switch result in cancelling reconciliation and end up in a loop of reconnecting the two IPs alternately. This can be fixed by configuring ODL to listen on internal IP only.

Reassigning to trozet as puppet bug to make the change in OF configuration to only listen on specific IP.

Comment 7 Tim Rozet 2018-06-11 18:59:03 UTC
After discussing more with Vishal, there is a deeper bug in how the OF controller is being set.  In ELAN netvirt code it calls southboundutils in ovsdb to set the openflow controller.  There is a check there to see what IP/protocol is used by the ovsdb node used to connect in ovsdb.  However, if that information is unavailable, the code assumes to use TCP with the addresses that ODL is currently listening on.  We need to fix that with this bug (otherwise TLS deployments could also fail by defaulting to TCP), but we should also provide configuration via puppet-odl to bind OF/OVSDB IP addresses when we start ODL.

Comment 21 errata-xmlrpc 2018-07-19 13:53:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2215