This bug is to track fixing this issue in opendaylight
Created attachment 1448387 [details] ovs info (flows, uuid) and karaf logs for all 5 nodes one controller (controller-2) rolled it's logs over very fast (TRACE logging was enabled for openflowplugin) so I'm not sure everything is captured on that ODL node. The other two controllers had not rolled over. the node affected was controller-0. It did not have table=48 for sure, and possibly other tables.
This looks like some misconfiguration or infra issue. We're getting repeated handshakes with dpnid 277592196268432 from two IPs: 172.17.1.11 and 192.168.24.15. 192.168.24.x IP only shows up for this device, none other. All other have incoming connections on 172.17.1.x only. 172.17.1.10 172.17.1.14 172.17.1.15 172.17.1.20 We should look at configuarion/setup to determine if something gone wrong in networking. Relevant log entries: --------------------- 2018-06-06T15:41:47,113 | DEBUG | nioEventLoopGroup-9-12 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | hello successfully sent, xid=21, addr=/172.17.1.11:36710 2018-06-06T15:41:47,114 | TRACE | nioEventLoopGroup-9-12 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | features are back 2018-06-06T15:41:47,115 | DEBUG | nioEventLoopGroup-9-12 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | obtained features: datapathId=277592196268432 2018-06-06T15:41:47,115 | DEBUG | nioEventLoopGroup-9-12 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | obtained features: auxiliaryId=0 2018-06-06T15:41:47,115 | TRACE | nioEventLoopGroup-9-12 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | handshake SETTLED: version=4, datapathId=277592196268432, auxiliaryId=0 2018-06-06T15:41:47,115 | DEBUG | nioEventLoopGroup-9-12 | HandshakeListenerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | handshake succeeded: /172.17.1.11:36710 2018-06-06T15:41:47,116 | DEBUG | nioEventLoopGroup-9-12 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | postHandshake DONE 2018-06-06T15:41:47,117 | DEBUG | nioEventLoopGroup-9-12 | HandshakeListenerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | succeeded by getting sweep barrier after post-handshake for device openflow:277592196268432 2018-06-06T15:41:47,117 | INFO | nioEventLoopGroup-9-12 | ContextChainHolderImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Device openflow:277592196268432 connected. 2018-06-06T15:41:47,117 | WARN | nioEventLoopGroup-9-12 | ContextChainHolderImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Device openflow:277592196268432 already connected. Closing previous connection 2018-06-06T15:41:47,118 | INFO | nioEventLoopGroup-9-12 | ReconciliationManagerImpl | 377 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.6.0 | Stopping reconciliation for node Uri [_value=openflow:277592196268432] 2018-06-06T15:41:47,119 | INFO | nioEventLoopGroup-9-12 | ContextChainImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Closed clustering services registration for node openflow:277592196268432 2018-06-06T15:41:46,123 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | hello successfully sent, xid=8, addr=/192.168.24.15:53612 2018-06-06T15:41:46,123 | TRACE | nioEventLoopGroup-9-11 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | ret - DONE - versionBitmap 2018-06-06T15:41:46,125 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | version set: 4 2018-06-06T15:41:46,125 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | sending feature request for version=4 and xid=9 2018-06-06T15:41:46,127 | INFO | ForkJoinPool-1-worker-7 | ElanTunnelInterfaceStateListener | 347 - org.opendaylight.netvirt.elanmanager-impl - 0.6.0 | Handling tunnel state event for srcDpId 277592196268432 and dstDpId 132658838792603 2018-06-06T15:41:46,125 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | future features [9] hooked .. 2018-06-06T15:41:46,131 | TRACE | nioEventLoopGroup-9-11 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | features are back 2018-06-06T15:41:46,131 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | obtained features: datapathId=277592196268432 2018-06-06T15:41:46,131 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | obtained features: auxiliaryId=0 2018-06-06T15:41:46,132 | TRACE | nioEventLoopGroup-9-11 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | handshake SETTLED: version=4, datapathId=277592196268432, auxiliaryId=0 2018-06-06T15:41:46,132 | DEBUG | nioEventLoopGroup-9-11 | HandshakeListenerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | handshake succeeded: /192.168.24.15:53612 2018-06-06T15:41:46,132 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | postHandshake DONE 2018-06-06T15:41:46,134 | DEBUG | nioEventLoopGroup-9-11 | HandshakeListenerImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | succeeded by getting sweep barrier after post-handshake for device openflow:277592196268432 2018-06-06T15:41:46,134 | INFO | nioEventLoopGroup-9-11 | ContextChainHolderImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Device openflow:277592196268432 connected. 2018-06-06T15:41:46,136 | WARN | nioEventLoopGroup-9-11 | ContextChainHolderImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Device openflow:277592196268432 already connected. Closing previous connection 2018-06-06T15:41:46,138 | INFO | nioEventLoopGroup-9-11 | ReconciliationManagerImpl | 377 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.6.0 | Stopping reconciliation for node Uri [_value=openflow:277592196268432] 2018-06-06T15:41:46,142 | INFO | nioEventLoopGroup-9-11 | ContextChainImpl | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Closed clustering services registration for node openflow:277592196268432
This is a bug that has been around for a while in ODL. The OF controller will be set to multiple networks for the same node by ODL. Really OVSDB should only set the OF controller to the same IP used to connect to it via the client.
As per discussion with trozet, issue is two controller IPs, public and private, configured on ODL. Two connections from same switch result in cancelling reconciliation and end up in a loop of reconnecting the two IPs alternately. This can be fixed by configuring ODL to listen on internal IP only. Reassigning to trozet as puppet bug to make the change in OF configuration to only listen on specific IP.
After discussing more with Vishal, there is a deeper bug in how the OF controller is being set. In ELAN netvirt code it calls southboundutils in ovsdb to set the openflow controller. There is a check there to see what IP/protocol is used by the ovsdb node used to connect in ovsdb. However, if that information is unavailable, the code assumes to use TCP with the addresses that ODL is currently listening on. We need to fix that with this bug (otherwise TLS deployments could also fail by defaulting to TCP), but we should also provide configuration via puppet-odl to bind OF/OVSDB IP addresses when we start ODL.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2215