Description of problem: OVSDB controller is not connected properly when ODL is deployed with IPv6. Version-Release number of selected component (if applicable): OSP13 How reproducible: Always Steps to Reproduce: 1. Deploy ODL + OS on v6 underlay networks Actual results: ODL doesnot add brackets around IPv6 address before setting controller Expected results: ODL should itself connect with [ ] Additional info: # ovs-vsctl show Bridge br-int Controller "tcp:[fd00:fd00:fd00:2000::17]:6653" is_connected: true Controller "tcp:[fd00:fd00:fd00:2000::15]:6653" is_connected: true Controller "tcp:[fd00:fd00:fd00:2000::1c]:6653" is_connected: true The connections are good. Cloud is functional. But I am seeing below lines in ovs-vswitch.log 2018-08-09T08:36:10.072Z|00041|connmgr|INFO|br-int: added service controller "punix:/var/run/openvswitch/br-int.mgmt" 2018-08-09T08:36:10.073Z|00042|connmgr|INFO|br-int: added primary controller "tcp:fd00:fd00:fd00:2000:0:0:0:17:6653" 2018-08-09T08:36:10.073Z|00043|rconn|INFO|br-int<->tcp:fd00:fd00:fd00:2000:0:0:0:17:6653: connecting... 2018-08-09T08:36:10.073Z|00044|socket_util|ERR|fd00:fd00:fd00:2000:0:0:0:17:6653: bad port number "fd00" 2018-08-09T08:36:10.073Z|00045|stream_tcp|ERR|tcp:fd00:fd00:fd00:2000:0:0:0:17:6653: connect: Address family not supported by protocol <snippet> 2018-08-09T08:36:10.114Z|00052|rconn|INFO|br-int<->tcp:fd00:fd00:fd00:2000:0:0:0:17:6653: waiting 2 seconds before reconnect 2018-08-09T08:36:10.164Z|00053|bridge|INFO|bridge br-int: added interface br-ex-patch on port 1 2018-08-09T08:36:10.202Z|00054|bridge|INFO|bridge br-ex: added interface br-ex-int-patch on port 2 2018-08-09T08:36:11.195Z|00055|bridge|INFO|bridge br-int: added interface tun9265abb9a21 on port 2 2018-08-09T08:36:11.195Z|00056|bridge|INFO|bridge br-int: added interface tun3ec30529e26 on port 3 2018-08-09T08:36:11.195Z|00057|bfd|INFO|tun9265abb9a21: BFD state change: admin_down->down "No Diagnostic"->"No Diagnostic". <snippet> 2018-08-09T08:38:14.345Z|00115|connmgr|INFO|br-int: removed primary controller "tcp:fd00:fd00:fd00:2000:0:0:0:17:6653" <---- Note only 1 controller being not connected. No logs about other 2, 2018-08-09T08:38:14.404Z|00116|connmgr|INFO|br-int: added primary controller "tcp:[fd00:fd00:fd00:2000::17]:6653" 2018-08-09T08:38:14.404Z|00117|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::17]:6653: connecting... 2018-08-09T08:38:14.404Z|00118|connmgr|INFO|br-int: added primary controller "tcp:[fd00:fd00:fd00:2000::15]:6653" 2018-08-09T08:38:14.404Z|00119|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::15]:6653: connecting... 2018-08-09T08:38:14.404Z|00120|connmgr|INFO|br-int: added primary controller "tcp:[fd00:fd00:fd00:2000::1c]:6653" <----- Now all 3 ODL with [ ] are connected. These brackets are added by TripleO* 2018-08-09T08:38:14.404Z|00121|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::1c]:6653: connecting... 2018-08-09T08:38:14.423Z|00122|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::17]:6653: connected 2018-08-09T08:38:14.423Z|00123|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::15]:6653: connected 2018-08-09T08:38:14.423Z|00124|rconn|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::1c]:6653: connected 2018-08-09T08:38:25.608Z|00125|connmgr|INFO|br-int<->tcp:[fd00:fd00:fd00:2000::17]:6653: 89 flow_mods 10 s ago (89 adds) From ovsdb-tool output, sequence of events are: 1. TripleO sets manager ([ ] are added by TripleO) record 32: 2018-08-09 08:36:09.898 "ovs-vsctl (invoked by /usr/bin/ruby): ovs-vsctl set-manager ptcp:6639:[::1] tcp:[fd00:fd00:fd00:2000::17]:6640 tcp:[fd00:fd00:fd00:2000::15]:6640 tcp:[fd00:fd00:fd00:2000::1c]:6640" 2. ovs configs, like local_ip, provider_mappings are applied via TripleO 3. ODL adds JUST 1 controller record 38: 2018-08-09 08:36:10.058 table Port insert row "br-int" (84392a0d): name=br-int interfaces=[6aff6970-b0b8-4695-a2eb-17057ebcef29] table Controller insert row 26e7edb4: target="tcp:fd00:fd00:fd00:2000:0:0:0:17:6653" table Interface insert row "br-int" (6aff6970): name=br-int type=internal table Bridge insert row "br-int" (47add927): name=br-int ports=[84392a0d-3ccf-4446-8b59-288243c1fde7] fail_mode=secure controller=[26e7edb4-c2ba-48b9-b1ae-51f470d8cf23] other_config={disable-in-band="true", hwaddr="1c:b6:83:9a:32:6a"} external_ids={opendaylight-iid="/network-topology:network-topology/network-topology:topology[network-topology:topology-id='ovsdb:1']/network-topology:node[network-topology:node-id='ovsdb://uuid/0cbe973b-0bc4-459c-8142-8fe1612d1928/bridge/br-int']"} protocols=["OpenFlow13"] table Open_vSwitch row 0cbe973b (0cbe973b): bridges=[059dab30-59d5-41e3-b382-53efc10ade46, 47add927-ebe1-4707-8467-83e8553d3991, e946b18a-8ee0-40b8-97cc-024e8e4e38a3] 4. Tunnel ports are added 5. TripleO checks for OF pipeline, finds flows are missing because there is no controller connected yet. So tries to sync it. code 5.1 deletes controllers (in this case 1) record 70: 2018-08-09 08:38:14.344 "ovs-vsctl (invoked by sh): ovs-vsctl del-controller br-int" 5.2 sets it properly ([ ] are added by code in TripleO) record 72: 2018-08-09 08:38:14.403 "ovs-vsctl (invoked by sh): ovs-vsctl set-controller br-int tcp:[fd00:fd00:fd00:2000::17]:6653 tcp:[fd00:fd00:fd00:2000::15]:6653 tcp:[fd00:fd00:fd00:2000::1c]:6653" 5.3 and then resets the manager as well record 74: 2018-08-09 08:38:32.331 "ovs-vsctl (invoked by /usr/bin/ruby): ovs-vsctl set-manager ptcp:6639:[::1] tcp:[fd00:fd00:fd00:2000::17]:6640 tcp:[fd00:fd00:fd00:2000::15]:6640 tcp:[fd00:fd00:fd00:2000::1c]:6640" Now that controllers are set with [ ] , OVSDB connects properly. We can verify the sequence from puppet logs as well # journalctl | grep 08:38:14 Aug 09 08:38:14 controller-0 ovs-vsctl[124547]: ovs|00001|vsctl|INFO|Called as ovs-vsctl del-controller br-int Aug 09 08:38:14 controller-0 ovs-vsctl[124557]: ovs|00001|vsctl|INFO|Called as ovs-vsctl set-controller br-int tcp:[fd00:fd00:fd00:2000::17]:6653 tcp:[fd00:fd00:fd00:2000::15]:6653 tcp:[fd00:fd00:fd00:2000::1c]:6653 My point is TripleO is adding [ ] around v6 address and ONLY THEN ovsdb connects properly (as evident from the logs, orange lines) and only because flow sync function is failing. If we deploy removing the sync fucntion, I am pretty sure it will fail.
OVSDB util API called by netvirt to get controller IPs does a split on ':' to get ip address from manager configuration. This doesn't work for IPv6 which has ':' as part of address and controller is never configured. Fix is to pick up whatever is between first and last occurences of ':'. No need to explicitly add []. If [] are present in manager, they will show up in controller too.
Since IPv6 is not a supported RFE in ODL OSP13 (it will be OSP15 RFE), changing priority to medium.
Looks like i'm still seeing this: http://pastebin.test.redhat.com/690824 the puddle: 2018-12-13.4
(In reply to Tomas Jamrisko from comment #19) > Looks like i'm still seeing this: > > http://pastebin.test.redhat.com/690824 > > the puddle: 2018-12-13.4 It is configured correctly: http://pastebin.test.redhat.com/690837 Note entry at line 14. An extraneous entry is being added which is causing all the logs. Actual controller connections are already correctly configured. This is a different and lower priority issue. Add karaf logs from 3 controllers to troubleshoot this.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0093