1615437 – [IPv6] OVSDB controller not set properly

Bug 1615437 - [IPv6] OVSDB controller not set properly

Summary: [IPv6] OVSDB controller not set properly

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	opendaylight
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	z4
Target Release:	13.0 (Queens)
Assignee:	Vishal Thapar
QA Contact:	Tomas Jamrisko
Docs Contact:
URL:
Whiteboard:	IPv6
Depends On:
Blocks:	1488821
TreeView+	depends on / blocked

Reported:	2018-08-13 15:12 UTC by Janki
Modified:	2019-01-16 17:57 UTC (History)
CC List:	7 users (show)
Fixed In Version:	opendaylight-8.3.0-5.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-01-16 17:56:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
OpenDaylight Bug	OVSDB-466	None	None	None	2018-08-17 09:37:48 UTC
OpenDaylight gerrit	75157	None	None	None	2018-08-17 09:42:12 UTC
Red Hat Product Errata	RHBA-2019:0093	None	None	None	2019-01-16 17:57:08 UTC

Description Janki 2018-08-13 15:12:54 UTC

Description of problem:
OVSDB controller is not connected properly when ODL is deployed with IPv6.

Version-Release number of selected component (if applicable):
OSP13

How reproducible:
Always

Steps to Reproduce:
1. Deploy ODL + OS on v6 underlay networks

Actual results:
ODL doesnot add brackets around IPv6 address before setting controller

Expected results:
ODL should itself connect with [ ]

Additional info:
# ovs-vsctl show
    Bridge br-int
        Controller "tcp:[fd00:fd00:fd00:2000::17]:6653"
            is_connected: true
        Controller "tcp:[fd00:fd00:fd00:2000::15]:6653"
            is_connected: true
        Controller "tcp:[fd00:fd00:fd00:2000::1c]:6653"
            is_connected: true

The connections are good. Cloud is functional. But I am seeing below lines in ovs-vswitch.log

2018-08-09T08:36:10.072Z|00041|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2018-08-09T08:36:10.073Z|00042|connmgr|INFO|br-int: added primary controller "tcp:fd00:fd00:fd00:2000:0:0:0:17:6653"
2018-08-09T08:36:10.073Z|00043|rconn|INFO|br-int<->tcp:fd00:fd00:fd00:2000:0:0:0:17:6653: connecting...
2018-08-09T08:36:10.073Z|00044|socket_util|ERR|fd00:fd00:fd00:2000:0:0:0:17:6653: bad port number "fd00"
2018-08-09T08:36:10.073Z|00045|stream_tcp|ERR|tcp:fd00:fd00:fd00:2000:0:0:0:17:6653: connect: Address family not supported by protocol

<snippet>

2018-08-09T08:36:10.114Z|00052|rconn|INFO|br-int<->tcp:fd00:fd00:fd00:2000:0:0:0:17:6653: waiting 2 seconds before reconnect
2018-08-09T08:36:10.164Z|00053|bridge|INFO|bridge br-int: added interface br-ex-patch on port 1
2018-08-09T08:36:10.202Z|00054|bridge|INFO|bridge br-ex: added interface br-ex-int-patch on port 2
2018-08-09T08:36:11.195Z|00055|bridge|INFO|bridge br-int: added interface tun9265abb9a21 on port 2
2018-08-09T08:36:11.195Z|00056|bridge|INFO|bridge br-int: added interface tun3ec30529e26 on port 3
2018-08-09T08:36:11.195Z|00057|bfd|INFO|tun9265abb9a21: BFD state change: admin_down->down "No Diagnostic"->"No Diagnostic".

<snippet>

2018-08-09T08:38:14.345Z|00115|connmgr|INFO|br-int: removed primary controller "tcp:fd00:fd00:fd00:2000:0:0:0:17:6653"   <---- Note only 1 controller being not connected. No logs about other 2,
2018-08-09T08:38:14.404Z|00116|connmgr|INFO|br-int: added primary controller "tcp:[fd00:fd00:fd00:2000::17]:6653"
2018-08-09T08:38:14.404Z|00117|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::17]:6653: connecting...
2018-08-09T08:38:14.404Z|00118|connmgr|INFO|br-int: added primary controller "tcp:[fd00:fd00:fd00:2000::15]:6653"
2018-08-09T08:38:14.404Z|00119|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::15]:6653: connecting...
2018-08-09T08:38:14.404Z|00120|connmgr|INFO|br-int: added primary controller "tcp:[fd00:fd00:fd00:2000::1c]:6653"     <-----  Now all 3 ODL with  [ ] are connected. These brackets are added by TripleO*
2018-08-09T08:38:14.404Z|00121|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::1c]:6653: connecting...
2018-08-09T08:38:14.423Z|00122|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::17]:6653: connected
2018-08-09T08:38:14.423Z|00123|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::15]:6653: connected
2018-08-09T08:38:14.423Z|00124|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::1c]:6653: connected
2018-08-09T08:38:25.608Z|00125|connmgr|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::17]:6653: 89 flow_mods 10 s ago (89 adds)

From ovsdb-tool output, sequence of events are:

1. TripleO sets manager ([ ]  are added by TripleO)
record 32: 2018-08-09 08:36:09.898 "ovs-vsctl (invoked by /usr/bin/ruby): ovs-vsctl set-manager ptcp:6639:[::1] tcp:[fd00:fd00:fd00:2000::17]:6640 tcp:[fd00:fd00:fd00:2000::15]:6640 tcp:[fd00:fd00:fd00:2000::1c]:6640"
2. ovs configs, like local_ip, provider_mappings are applied via TripleO
3. ODL adds JUST 1 controller
record 38: 2018-08-09 08:36:10.058
        table Port insert row "br-int" (84392a0d):
                name=br-int
                interfaces=[6aff6970-b0b8-4695-a2eb-17057ebcef29]
        table Controller insert row 26e7edb4:
                target="tcp:fd00:fd00:fd00:2000:0:0:0:17:6653"
        table Interface insert row "br-int" (6aff6970):
                name=br-int
                type=internal
        table Bridge insert row "br-int" (47add927):
                name=br-int
                ports=[84392a0d-3ccf-4446-8b59-288243c1fde7]
                fail_mode=secure
                controller=[26e7edb4-c2ba-48b9-b1ae-51f470d8cf23]
                other_config={disable-in-band="true", hwaddr="1c:b6:83:9a:32:6a"}
                external_ids={opendaylight-iid="/network-topology:network-topology/network-topology:topology[network-topology:topology-id='ovsdb:1']/network-topology:node[network-topology:node-id='ovsdb://uuid/0cbe973b-0bc4-459c-8142-8fe1612d1928/bridge/br-int']"}
                protocols=["OpenFlow13"]
        table Open_vSwitch row 0cbe973b (0cbe973b):
                bridges=[059dab30-59d5-41e3-b382-53efc10ade46, 47add927-ebe1-4707-8467-83e8553d3991, e946b18a-8ee0-40b8-97cc-024e8e4e38a3]
4. Tunnel ports are added
5. TripleO checks for OF pipeline, finds flows are missing because there is no controller connected yet. So tries to sync it. code
    5.1 deletes controllers (in this case 1)
           record 70: 2018-08-09 08:38:14.344 "ovs-vsctl (invoked by sh): ovs-vsctl del-controller br-int"
    5.2 sets it properly  ([ ]  are added by code in TripleO)
          record 72: 2018-08-09 08:38:14.403 "ovs-vsctl (invoked by sh): ovs-vsctl set-controller br-int tcp:[fd00:fd00:fd00:2000::17]:6653 tcp:[fd00:fd00:fd00:2000::15]:6653 tcp:[fd00:fd00:fd00:2000::1c]:6653"
    5.3 and then resets the manager as well
          record 74: 2018-08-09 08:38:32.331 "ovs-vsctl (invoked by /usr/bin/ruby): ovs-vsctl set-manager ptcp:6639:[::1] tcp:[fd00:fd00:fd00:2000::17]:6640 tcp:[fd00:fd00:fd00:2000::15]:6640 tcp:[fd00:fd00:fd00:2000::1c]:6640"

Now that controllers are set with [ ] , OVSDB connects properly.
We can verify the sequence from puppet logs as well

# journalctl | grep 08:38:14
Aug 09 08:38:14 controller-0 ovs-vsctl[124547]: ovs|00001|vsctl|INFO|Called as ovs-vsctl del-controller br-int
Aug 09 08:38:14 controller-0 ovs-vsctl[124557]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set-controller br-int tcp:[fd00:fd00:fd00:2000::17]:6653 tcp:[fd00:fd00:fd00:2000::15]:6653 tcp:[fd00:fd00:fd00:2000::1c]:6653

My point is TripleO is adding  [ ] around v6 address and ONLY THEN ovsdb connects properly (as evident from the logs, orange lines) and only because flow sync function is failing. If we deploy removing the sync fucntion, I am pretty sure it will fail.

Comment 1 Vishal Thapar 2018-08-17 09:41:26 UTC

OVSDB util API called by netvirt to get controller IPs does a split on ':' to get ip address from manager configuration. This doesn't work for IPv6 which has ':' as part of address and controller is never configured.

Fix is to pick up whatever is between first and last occurences of ':'. No need to explicitly add []. If [] are present in manager, they will show up in controller too.

Comment 18 Noam Manos 2018-12-31 13:42:37 UTC

Since IPv6 is not a supported RFE in ODL OSP13 (it will be OSP15 RFE), changing priority to medium.

Comment 19 Tomas Jamrisko 2019-01-04 10:01:53 UTC

Looks like i'm still seeing this:

http://pastebin.test.redhat.com/690824

the puddle: 2018-12-13.4

Comment 20 Vishal Thapar 2019-01-04 10:20:31 UTC

(In reply to Tomas Jamrisko from comment #19)
> Looks like i'm still seeing this:
> 
> http://pastebin.test.redhat.com/690824
> 
> the puddle: 2018-12-13.4

It is configured correctly: http://pastebin.test.redhat.com/690837

Note entry at line 14. An extraneous entry is being added which is causing all the logs. Actual controller connections are already correctly configured. This is a different and lower priority issue. Add karaf logs from 3 controllers to troubleshoot this.

Comment 23 errata-xmlrpc 2019-01-16 17:56:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0093

Note You need to log in before you can comment on or make changes to this bug.