Bug 1588116 - [BasicFunctioniality] OpenFlow plugin misses pushing some tables when OVS connects
Summary: [BasicFunctioniality] OpenFlow plugin misses pushing some tables when OVS con...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: opendaylight
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: z1
: 13.0 (Queens)
Assignee: Vishal Thapar
QA Contact: Tomas Jamrisko
URL:
Whiteboard: BasicFunctioniality
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-06 16:54 UTC by Tim Rozet
Modified: 2018-10-18 07:18 UTC (History)
20 users (show)

Fixed In Version: opendaylight-8.3.0-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1568989
Environment:
N/A
Last Closed: 2018-07-19 13:53:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ovs info (flows, uuid) and karaf logs for all 5 nodes (2.30 MB, application/x-xz)
2018-06-06 16:58 UTC, jamo luhrsen
no flags Details


Links
System ID Private Priority Status Summary Last Updated
OpenDaylight gerrit 73339 0 None None None 2018-06-25 18:50:40 UTC
Red Hat Product Errata RHBA-2018:2215 0 None None None 2018-07-19 13:53:47 UTC

Comment 2 Tim Rozet 2018-06-06 16:56:36 UTC
This bug is to track fixing this issue in opendaylight

Comment 3 jamo luhrsen 2018-06-06 16:58:52 UTC
Created attachment 1448387 [details]
ovs info (flows, uuid) and karaf logs for all 5 nodes

one controller (controller-2) rolled it's logs over very fast (TRACE
logging was enabled for openflowplugin) so I'm not sure everything
is captured on that ODL node. The other two controllers had not rolled
over.

the node affected was controller-0. It did not have table=48 for sure,
and possibly other tables.

Comment 4 Vishal Thapar 2018-06-11 15:08:22 UTC
This looks like some misconfiguration or infra issue. 

We're getting repeated handshakes with dpnid 277592196268432 from two IPs: 172.17.1.11 and 192.168.24.15. 192.168.24.x IP only shows up for this device, none other. All other have incoming connections on 172.17.1.x only.

172.17.1.10
172.17.1.14
172.17.1.15
172.17.1.20

We should look at configuarion/setup to determine if something gone wrong in networking.


Relevant log entries:
---------------------
2018-06-06T15:41:47,113 | DEBUG | nioEventLoopGroup-9-12 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | hello successfully sent, xid=21, addr=/172.17.1.11:36710
2018-06-06T15:41:47,114 | TRACE | nioEventLoopGroup-9-12 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | features are back
2018-06-06T15:41:47,115 | DEBUG | nioEventLoopGroup-9-12 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | obtained features: datapathId=277592196268432
2018-06-06T15:41:47,115 | DEBUG | nioEventLoopGroup-9-12 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | obtained features: auxiliaryId=0
2018-06-06T15:41:47,115 | TRACE | nioEventLoopGroup-9-12 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | handshake SETTLED: version=4, datapathId=277592196268432, auxiliaryId=0
2018-06-06T15:41:47,115 | DEBUG | nioEventLoopGroup-9-12 | HandshakeListenerImpl            | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | handshake succeeded: /172.17.1.11:36710
2018-06-06T15:41:47,116 | DEBUG | nioEventLoopGroup-9-12 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | postHandshake DONE
2018-06-06T15:41:47,117 | DEBUG | nioEventLoopGroup-9-12 | HandshakeListenerImpl            | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | succeeded by getting sweep barrier after post-handshake for device openflow:277592196268432
2018-06-06T15:41:47,117 | INFO  | nioEventLoopGroup-9-12 | ContextChainHolderImpl           | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Device openflow:277592196268432 connected.
2018-06-06T15:41:47,117 | WARN  | nioEventLoopGroup-9-12 | ContextChainHolderImpl           | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Device openflow:277592196268432 already connected. Closing previous connection
2018-06-06T15:41:47,118 | INFO  | nioEventLoopGroup-9-12 | ReconciliationManagerImpl        | 377 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.6.0 | Stopping reconciliation for node Uri [_value=openflow:277592196268432]
2018-06-06T15:41:47,119 | INFO  | nioEventLoopGroup-9-12 | ContextChainImpl                 | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Closed clustering services registration for node openflow:277592196268432


2018-06-06T15:41:46,123 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | hello successfully sent, xid=8, addr=/192.168.24.15:53612
2018-06-06T15:41:46,123 | TRACE | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | ret - DONE - versionBitmap
2018-06-06T15:41:46,125 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | version set: 4
2018-06-06T15:41:46,125 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | sending feature request for version=4 and xid=9
2018-06-06T15:41:46,127 | INFO  | ForkJoinPool-1-worker-7 | ElanTunnelInterfaceStateListener | 347 - org.opendaylight.netvirt.elanmanager-impl - 0.6.0 | Handling tunnel state event for srcDpId 277592196268432 and dstDpId 132658838792603
2018-06-06T15:41:46,125 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | future features [9] hooked ..
2018-06-06T15:41:46,131 | TRACE | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | features are back
2018-06-06T15:41:46,131 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | obtained features: datapathId=277592196268432
2018-06-06T15:41:46,131 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | obtained features: auxiliaryId=0
2018-06-06T15:41:46,132 | TRACE | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | handshake SETTLED: version=4, datapathId=277592196268432, auxiliaryId=0
2018-06-06T15:41:46,132 | DEBUG | nioEventLoopGroup-9-11 | HandshakeListenerImpl            | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | handshake succeeded: /192.168.24.15:53612
2018-06-06T15:41:46,132 | DEBUG | nioEventLoopGroup-9-11 | HandshakeManagerImpl             | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | postHandshake DONE
2018-06-06T15:41:46,134 | DEBUG | nioEventLoopGroup-9-11 | HandshakeListenerImpl            | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | succeeded by getting sweep barrier after post-handshake for device openflow:277592196268432
2018-06-06T15:41:46,134 | INFO  | nioEventLoopGroup-9-11 | ContextChainHolderImpl           | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Device openflow:277592196268432 connected.
2018-06-06T15:41:46,136 | WARN  | nioEventLoopGroup-9-11 | ContextChainHolderImpl           | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Device openflow:277592196268432 already connected. Closing previous connection
2018-06-06T15:41:46,138 | INFO  | nioEventLoopGroup-9-11 | ReconciliationManagerImpl        | 377 - org.opendaylight.openflowplugin.applications.reconciliation-framework - 0.6.0 | Stopping reconciliation for node Uri [_value=openflow:277592196268432]
2018-06-06T15:41:46,142 | INFO  | nioEventLoopGroup-9-11 | ContextChainImpl                 | 385 - org.opendaylight.openflowplugin.impl - 0.6.0 | Closed clustering services registration for node openflow:277592196268432

Comment 5 Tim Rozet 2018-06-11 15:27:54 UTC
This is a bug that has been around for a while in ODL.  The OF controller will be set to multiple networks for the same node by ODL.  Really OVSDB should only set the OF controller to the same IP used to connect to it via the client.

Comment 6 Vishal Thapar 2018-06-11 15:52:43 UTC
As per discussion with trozet, issue is two controller IPs, public and private, configured on ODL. Two connections from same switch result in cancelling reconciliation and end up in a loop of reconnecting the two IPs alternately. This can be fixed by configuring ODL to listen on internal IP only.

Reassigning to trozet as puppet bug to make the change in OF configuration to only listen on specific IP.

Comment 7 Tim Rozet 2018-06-11 18:59:03 UTC
After discussing more with Vishal, there is a deeper bug in how the OF controller is being set.  In ELAN netvirt code it calls southboundutils in ovsdb to set the openflow controller.  There is a check there to see what IP/protocol is used by the ovsdb node used to connect in ovsdb.  However, if that information is unavailable, the code assumes to use TCP with the addresses that ODL is currently listening on.  We need to fix that with this bug (otherwise TLS deployments could also fail by defaulting to TCP), but we should also provide configuration via puppet-odl to bind OF/OVSDB IP addresses when we start ODL.

Comment 21 errata-xmlrpc 2018-07-19 13:53:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2215


Note You need to log in before you can comment on or make changes to this bug.