Bug 1615437

Summary: [IPv6] OVSDB controller not set properly
Product: Red Hat OpenStack Reporter: Janki <jchhatba>
Component: opendaylightAssignee: Vishal Thapar <vthapar>
Status: CLOSED ERRATA QA Contact: Tomas Jamrisko <tjamrisk>
Severity: high Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: aadam, mariel, mkolesni, nmanos, nyechiel, tjamrisk, vthapar
Target Milestone: z4Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: IPv6
Fixed In Version: opendaylight-8.3.0-5.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-16 17:56:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1488821    

Description Janki 2018-08-13 15:12:54 UTC
Description of problem:
OVSDB controller is not connected properly when ODL is deployed with IPv6.

Version-Release number of selected component (if applicable):
OSP13

How reproducible:
Always

Steps to Reproduce:
1. Deploy ODL + OS on v6 underlay networks

Actual results:
ODL doesnot add brackets around IPv6 address before setting controller

Expected results:
ODL should itself connect with [ ]

Additional info:
# ovs-vsctl show
    Bridge br-int
        Controller "tcp:[fd00:fd00:fd00:2000::17]:6653"
            is_connected: true
        Controller "tcp:[fd00:fd00:fd00:2000::15]:6653"
            is_connected: true
        Controller "tcp:[fd00:fd00:fd00:2000::1c]:6653"
            is_connected: true

The connections are good. Cloud is functional. But I am seeing below lines in ovs-vswitch.log

2018-08-09T08:36:10.072Z|00041|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt"
2018-08-09T08:36:10.073Z|00042|connmgr|INFO|br-int: added primary controller "tcp:fd00:fd00:fd00:2000:0:0:0:17:6653"
2018-08-09T08:36:10.073Z|00043|rconn|INFO|br-int<->tcp:fd00:fd00:fd00:2000:0:0:0:17:6653: connecting...
2018-08-09T08:36:10.073Z|00044|socket_util|ERR|fd00:fd00:fd00:2000:0:0:0:17:6653: bad port number "fd00"
2018-08-09T08:36:10.073Z|00045|stream_tcp|ERR|tcp:fd00:fd00:fd00:2000:0:0:0:17:6653: connect: Address family not supported by protocol

<snippet>

2018-08-09T08:36:10.114Z|00052|rconn|INFO|br-int<->tcp:fd00:fd00:fd00:2000:0:0:0:17:6653: waiting 2 seconds before reconnect
2018-08-09T08:36:10.164Z|00053|bridge|INFO|bridge br-int: added interface br-ex-patch on port 1
2018-08-09T08:36:10.202Z|00054|bridge|INFO|bridge br-ex: added interface br-ex-int-patch on port 2
2018-08-09T08:36:11.195Z|00055|bridge|INFO|bridge br-int: added interface tun9265abb9a21 on port 2
2018-08-09T08:36:11.195Z|00056|bridge|INFO|bridge br-int: added interface tun3ec30529e26 on port 3
2018-08-09T08:36:11.195Z|00057|bfd|INFO|tun9265abb9a21: BFD state change: admin_down->down "No Diagnostic"->"No Diagnostic".

<snippet>

2018-08-09T08:38:14.345Z|00115|connmgr|INFO|br-int: removed primary controller "tcp:fd00:fd00:fd00:2000:0:0:0:17:6653"   <---- Note only 1 controller being not connected. No logs about other 2,
2018-08-09T08:38:14.404Z|00116|connmgr|INFO|br-int: added primary controller "tcp:[fd00:fd00:fd00:2000::17]:6653"
2018-08-09T08:38:14.404Z|00117|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::17]:6653: connecting...
2018-08-09T08:38:14.404Z|00118|connmgr|INFO|br-int: added primary controller "tcp:[fd00:fd00:fd00:2000::15]:6653"
2018-08-09T08:38:14.404Z|00119|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::15]:6653: connecting...
2018-08-09T08:38:14.404Z|00120|connmgr|INFO|br-int: added primary controller "tcp:[fd00:fd00:fd00:2000::1c]:6653"     <-----  Now all 3 ODL with  [ ] are connected. These brackets are added by TripleO*
2018-08-09T08:38:14.404Z|00121|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::1c]:6653: connecting...
2018-08-09T08:38:14.423Z|00122|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::17]:6653: connected
2018-08-09T08:38:14.423Z|00123|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::15]:6653: connected
2018-08-09T08:38:14.423Z|00124|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::1c]:6653: connected
2018-08-09T08:38:25.608Z|00125|connmgr|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::17]:6653: 89 flow_mods 10 s ago (89 adds)

From ovsdb-tool output, sequence of events are:

1. TripleO sets manager ([ ]  are added by TripleO)
record 32: 2018-08-09 08:36:09.898 "ovs-vsctl (invoked by /usr/bin/ruby): ovs-vsctl set-manager ptcp:6639:[::1] tcp:[fd00:fd00:fd00:2000::17]:6640 tcp:[fd00:fd00:fd00:2000::15]:6640 tcp:[fd00:fd00:fd00:2000::1c]:6640"
2. ovs configs, like local_ip, provider_mappings are applied via TripleO
3. ODL adds JUST 1 controller
record 38: 2018-08-09 08:36:10.058
        table Port insert row "br-int" (84392a0d):
                name=br-int
                interfaces=[6aff6970-b0b8-4695-a2eb-17057ebcef29]
        table Controller insert row 26e7edb4:
                target="tcp:fd00:fd00:fd00:2000:0:0:0:17:6653"
        table Interface insert row "br-int" (6aff6970):
                name=br-int
                type=internal
        table Bridge insert row "br-int" (47add927):
                name=br-int
                ports=[84392a0d-3ccf-4446-8b59-288243c1fde7]
                fail_mode=secure
                controller=[26e7edb4-c2ba-48b9-b1ae-51f470d8cf23]
                other_config={disable-in-band="true", hwaddr="1c:b6:83:9a:32:6a"}
                external_ids={opendaylight-iid="/network-topology:network-topology/network-topology:topology[network-topology:topology-id='ovsdb:1']/network-topology:node[network-topology:node-id='ovsdb://uuid/0cbe973b-0bc4-459c-8142-8fe1612d1928/bridge/br-int']"}
                protocols=["OpenFlow13"]
        table Open_vSwitch row 0cbe973b (0cbe973b):
                bridges=[059dab30-59d5-41e3-b382-53efc10ade46, 47add927-ebe1-4707-8467-83e8553d3991, e946b18a-8ee0-40b8-97cc-024e8e4e38a3]
4. Tunnel ports are added
5. TripleO checks for OF pipeline, finds flows are missing because there is no controller connected yet. So tries to sync it. code
    5.1 deletes controllers (in this case 1)
           record 70: 2018-08-09 08:38:14.344 "ovs-vsctl (invoked by sh): ovs-vsctl del-controller br-int"
    5.2 sets it properly  ([ ]  are added by code in TripleO)
          record 72: 2018-08-09 08:38:14.403 "ovs-vsctl (invoked by sh): ovs-vsctl set-controller br-int tcp:[fd00:fd00:fd00:2000::17]:6653 tcp:[fd00:fd00:fd00:2000::15]:6653 tcp:[fd00:fd00:fd00:2000::1c]:6653"
    5.3 and then resets the manager as well
          record 74: 2018-08-09 08:38:32.331 "ovs-vsctl (invoked by /usr/bin/ruby): ovs-vsctl set-manager ptcp:6639:[::1] tcp:[fd00:fd00:fd00:2000::17]:6640 tcp:[fd00:fd00:fd00:2000::15]:6640 tcp:[fd00:fd00:fd00:2000::1c]:6640"

Now that controllers are set with [ ] , OVSDB connects properly.
We can verify the sequence from puppet logs as well

# journalctl | grep 08:38:14
Aug 09 08:38:14 controller-0 ovs-vsctl[124547]: ovs|00001|vsctl|INFO|Called as ovs-vsctl del-controller br-int
Aug 09 08:38:14 controller-0 ovs-vsctl[124557]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set-controller br-int tcp:[fd00:fd00:fd00:2000::17]:6653 tcp:[fd00:fd00:fd00:2000::15]:6653 tcp:[fd00:fd00:fd00:2000::1c]:6653

My point is TripleO is adding  [ ] around v6 address and ONLY THEN ovsdb connects properly (as evident from the logs, orange lines) and only because flow sync function is failing. If we deploy removing the sync fucntion, I am pretty sure it will fail.

Comment 1 Vishal Thapar 2018-08-17 09:41:26 UTC
OVSDB util API called by netvirt to get controller IPs does a split on ':' to get ip address from manager configuration. This doesn't work for IPv6 which has ':' as part of address and controller is never configured.

Fix is to pick up whatever is between first and last occurences of ':'. No need to explicitly add []. If [] are present in manager, they will show up in controller too.

Comment 18 Noam Manos 2018-12-31 13:42:37 UTC
Since IPv6 is not a supported RFE in ODL OSP13 (it will be OSP15 RFE), changing priority to medium.

Comment 19 Tomas Jamrisko 2019-01-04 10:01:53 UTC
Looks like i'm still seeing this:

http://pastebin.test.redhat.com/690824

the puddle: 2018-12-13.4

Comment 20 Vishal Thapar 2019-01-04 10:20:31 UTC
(In reply to Tomas Jamrisko from comment #19)
> Looks like i'm still seeing this:
> 
> http://pastebin.test.redhat.com/690824
> 
> the puddle: 2018-12-13.4

It is configured correctly: http://pastebin.test.redhat.com/690837

Note entry at line 14. An extraneous entry is being added which is causing all the logs. Actual controller connections are already correctly configured. This is a different and lower priority issue. Add karaf logs from 3 controllers to troubleshoot this.

Comment 23 errata-xmlrpc 2019-01-16 17:56:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0093