Bug 1465161

Summary: Seeing ipv6 duplicate address, causing network issues
Product: Red Hat OpenStack Reporter: rlopez
Component: openstack-neutronAssignee: Daniel Alvarez Sanchez <dalvarez>
Status: CLOSED ERRATA QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: amuller, chrisw, dalvarez, nyechiel, rlopez, srevivo, tfreger
Target Milestone: z6Keywords: TestOnly, Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-neutron-9.3.1-9.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-15 13:53:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description rlopez 2017-06-26 20:37:07 UTC
I have an OSP10 environment running on the latest bits 3 controller, 3 compute , 3 existing ceph storage. 

Currently working on creating OCP3.4 heat templates for OSP10. When running the heat templates , /var/log/messages reports the following:


Jun 26 19:21:54 overcloud-controller-0 kernel: IPv6: qg-d5aa7c20-41: IPv6 duplicate address 2620:52:0:1372:f816:3eff:fe97:560 detected!

When this happens usually there is a freeze in accessing the instances and also at times this has caused our heat stack that installs OCP to fail. The OCP stack does not use ipv6 and nor does my OSP env use IPv6 (to my knowledge) so not sure why I'm seeing this message.


Info from controller:

# cat /etc/sysctl.conf | grep ipv6
net.ipv6.conf.default.autoconf=0
net.ipv6.conf.default.accept_ra=0
net.ipv6.conf.all.autoconf=0
net.ipv6.conf.all.accept_ra=0



# ip netns exec qrouter-43fc42e1-f6ee-40bf-9657-1f463ab5f901 ip a


1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
58: ha-5b4d3b49-e2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1446 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:b6:c4:a1 brd ff:ff:ff:ff:ff:ff
    inet 169.254.192.9/18 brd 169.254.255.255 scope global ha-5b4d3b49-e2
       valid_lft forever preferred_lft forever
    inet 169.254.0.2/24 scope global ha-5b4d3b49-e2
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:feb6:c4a1/64 scope link 
       valid_lft forever preferred_lft forever
59: qg-d5aa7c20-41: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1496 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:97:05:60 brd ff:ff:ff:ff:ff:ff
    inet 10.19.114.187/23 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.198/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.199/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.200/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.207/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.211/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.213/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe97:560/64 scope link nodad 
       valid_lft forever preferred_lft forever
60: qr-e0eb81e9-5d: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1446 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:bc:41:15 brd ff:ff:ff:ff:ff:ff
    inet 172.22.10.1/24 scope global qr-e0eb81e9-5d
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:febc:4115/64 scope link nodad 
       valid_lft forever preferred_lft forever


Via director node using overcloudrc, I see the router that uses that ID. 


(just a snippet)
[stack@osp10-pit-director ~]$ neutron router-list
+--------------------------------------+-----------------------------------+---------------------------------------------------------------+-------------+------+
| id                                   | name                              | external_gateway_info                                         | distributed | ha   |
+--------------------------------------+-----------------------------------+---------------------------------------------------------------+-------------+------+                                          |             |      |
| 43fc42e1-f6ee-40bf-9657-1f463ab5f901 | test-external_router-w2puzwdzxmwp | {"network_id": "084884f9-d9d2-477a-bae7-26dbb4ff1873",        | False       | True |
|                                      |                                   | "enable_snat": true, "external_fixed_ips": [{"subnet_id":     |             |      |
|                                      |                                   | "732844a3-7196-4fac-a75a-cdfca872462e", "ip_address":         |             |      |
|                                      |                                   | "10.19.114.187"}]} 



From controller:

# ovs-vsctl show
5150acbd-78b0-4daa-868d-71e44f6dd898
    Manager "ptcp:6640:127.0.0.1"
        is_connected: true
    Bridge br-int
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port br-int
            Interface br-int
                type: internal
        Port "ha-5b4d3b49-e2"
            tag: 19
            Interface "ha-5b4d3b49-e2"
                type: internal
        Port "ha-3767b6fe-e0"
            tag: 1
            Interface "ha-3767b6fe-e0"
                type: internal
        Port "tap7e11f374-25"
            tag: 4
            Interface "tap7e11f374-25"
                type: internal
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port "qr-607a1668-c9"
            tag: 7
            Interface "qr-607a1668-c9"
                type: internal
        Port "ha-90061261-bf"
            tag: 9
            Interface "ha-90061261-bf"
                type: internal
        Port "ha-5801926f-08"
            tag: 3
            Interface "ha-5801926f-08"
                type: internal
        Port int-br-ex
            Interface int-br-ex
                type: patch
                options: {peer=phy-br-ex}
        Port "qr-2dff3443-6a"
            tag: 2
            Interface "qr-2dff3443-6a"
                type: internal
        Port "qr-72a5264b-47"
            tag: 20
            Interface "qr-72a5264b-47"
                type: internal
        Port "tap1964f1af-24"
            tag: 2
            Interface "tap1964f1af-24"
                type: internal
        Port "qr-d468c597-ab"
            tag: 8
            Interface "qr-d468c597-ab"
                type: internal
        Port "qr-cbde6da8-d4"
            tag: 6
            Interface "qr-cbde6da8-d4"
                type: internal
        Port "qr-e0eb81e9-5d"
            tag: 21
            Interface "qr-e0eb81e9-5d"
                type: internal
        Port "qr-e6cd86b9-8d"
            tag: 10
            Interface "qr-e6cd86b9-8d"
                type: internal
        Port "qr-f21200a3-1f"
            tag: 4
            Interface "qr-f21200a3-1f"
                type: internal
        Port "ha-98fb8c11-81"
            tag: 19
            Interface "ha-98fb8c11-81"
                type: internal
        Port "tap1c3e217e-16"
            tag: 6
            Interface "tap1c3e217e-16"
                type: internal
    Bridge br-tun
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port "vxlan-ac100416"
            Interface "vxlan-ac100416"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.16.4.16", out_key=flow, remote_ip="172.16.4.22"}
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "vxlan-ac100411"
            Interface "vxlan-ac100411"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.16.4.16", out_key=flow, remote_ip="172.16.4.17"}
        Port "vxlan-ac10040f"
            Interface "vxlan-ac10040f"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.16.4.16", out_key=flow, remote_ip="172.16.4.15"}
        Port "vxlan-ac100414"
            Interface "vxlan-ac100414"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.16.4.16", out_key=flow, remote_ip="172.16.4.20"}
        Port "vxlan-ac10040b"
            Interface "vxlan-ac10040b"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.16.4.16", out_key=flow, remote_ip="172.16.4.11"}
        Port br-tun
            Interface br-tun
                type: internal
    Bridge br-tenant
        fail_mode: standalone
        Port "p3p1"
            Interface "p3p1"
        Port br-tenant
            Interface br-tenant
                type: internal
    Bridge br-ex
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        Port "qg-5af7d801-3c"
            Interface "qg-5af7d801-3c"
                type: internal
        Port "qg-a2db91ba-78"
            Interface "qg-a2db91ba-78"
                type: internal
        Port "qg-d5aa7c20-41"
            Interface "qg-d5aa7c20-41"
                type: internal
        Port "qg-d13b53fc-b3"
            Interface "qg-d13b53fc-b3"
                type: internal
        Port "qg-2b45ac4d-a1"
            Interface "qg-2b45ac4d-a1"
                type: internal
        Port phy-br-ex
            Interface phy-br-ex
                type: patch
                options: {peer=int-br-ex}
        Port br-ex
            Interface br-ex
                type: internal
        Port "em3"
            Interface "em3"
    ovs_version: "2.5.0"


Let me know if you need anything else, this is stopping us from shipping out the ocp3.4 heat template rpm.

Comment 1 rlopez 2017-06-27 14:33:55 UTC
Not sure if I'm running into something like: https://bugs.launchpad.net/neutron/+bug/1459856


Also found this (older): https://bugs.launchpad.net/nova/+bug/1011134/comments/2

FYI: I've gone into every osp instance regarding this stack and disabled ipv6 within sysctl.conf as follows and rebooted. 

net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

I STILL see the dups :-/

Comment 2 rlopez 2017-06-27 15:13:24 UTC
The duplicate and the interface it is complaining about:

Jun 27 14:33:56 overcloud-controller-0 kernel: IPv6: qg-d5aa7c20-41: IPv6 duplicate address 2620:52:0:1372:f816:3eff:fe97:560 detected!

Look at the netns of the router running that qg-d5aa7c20-41

59: qg-d5aa7c20-41: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1496 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:97:05:60 brd ff:ff:ff:ff:ff:ff
    inet 10.19.114.187/23 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.198/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.199/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.200/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.207/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.211/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.213/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet 10.19.114.188/32 scope global qg-d5aa7c20-41
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe97:560/64 scope link nodad 
       valid_lft forever preferred_lft forever
60: qr-e0eb81e9-5d: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1446 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:bc:41:15 brd ff:ff:ff:ff:ff:ff
    inet 172.22.10.1/24 scope global qr-e0eb81e9-5d
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:febc:4115/64 scope link nodad 
       valid_lft forever preferred_lft forever


The only one that has a inet6 is an instance labeled test-devs which on the system itself has the ipv6 disabled but for some reason the inet6 exists on the qg but not within the instance itself there are no inet6

[cloud-user@test-devs ~]$ sudo -i
[root@test-devs ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1446 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:5c:e5:5e brd ff:ff:ff:ff:ff:ff
    inet 172.22.10.10/24 brd 172.22.10.255 scope global dynamic eth0
       valid_lft 86288sec preferred_lft 86288sec

[root@test-devs ~]# cat /etc/sysctl.conf 
# sysctl settings are defined through files in
# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
#
# Vendors settings live in /usr/lib/sysctl.d/.
# To override a whole file, create a new file with the same in
# /etc/sysctl.d/ and put new settings there. To override
# only specific settings, add a file with a lexically later
# name in /etc/sysctl.d/ and put new settings there.
#
# For more information, see sysctl.conf(5) and sysctl.d(5).
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
[root@test-devs ~]# cat /proc/sys/net/ipv6/conf/default/disable_ipv6
1

When I reboot this system, or shut it off, the duplicate address go quiet. As soon as its back up, the dup messages come back up.

Comment 3 rlopez 2017-06-27 15:28:39 UTC
Disabling ipv6 via the netns removes the duplicate:

 ip netns exec qrouter-43fc42e1-f6ee-40bf-9657-1f463ab5f901 sysctl -w net.ipv6.conf.qr-e0eb81e9-5d.disable_ipv6=1

I went ahead and did for all

 ip netns exec qrouter-43fc42e1-f6ee-40bf-9657-1f463ab5f901 sysctl -w net.ipv6.conf.all.disable_ipv6=1

However, the above seems like a total hack. Why does an inet6 get linked if its not being used to begin with?

Comment 4 rlopez 2017-06-27 15:34:07 UTC
More info that might be useful: https://bugs.launchpad.net/mos/+bug/1596846

Comment 6 rlopez 2017-07-11 14:35:21 UTC
Can I get a response please?

Comment 7 Assaf Muller 2017-08-07 13:41:01 UTC
Assigned to Daniel for triage.

Comment 8 Daniel Alvarez Sanchez 2017-08-07 15:10:01 UTC
Could this be a duplicate of [0]?
If so, it would be fixed in openstack-neutron-9.3.1-9.el7ost

We could confirm by capturing traffic in the controllers. However, if it's easy to replicate, I would try that version since it looks very similar and disabling ipv6 forwarding on the backup instance could possibly solve it.
Or at least:

ip netns exec qrouter-43fc42e1-f6ee-40bf-9657-1f463ab5f901 sysctl -w net.ipv6.conf.all.forwarding=0

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1426735

Comment 9 rlopez 2017-08-18 15:54:28 UTC
Hi Daniel,

Thanks for your reply. 

Question:

Can I easily implement this by just upgrading my controllers with that specific RPM package and restarting neutron?

Would I always require having to disable forwarding for every router that is created?

Any idea why ipv4 and ipv6 are not happy with each other when both enabled? Especially since I'm not even using the ipv6...

Comment 10 Daniel Alvarez Sanchez 2017-08-20 17:39:48 UTC
(In reply to rlopez from comment #9)
> Hi Daniel,
> 
> Thanks for your reply. 
> 
> Question:
> 
> Can I easily implement this by just upgrading my controllers with that
> specific RPM package and restarting neutron?

Yes, that should be fine.
> 
> Would I always require having to disable forwarding for every router that is
> created?

The RPM package includes the patch that does it automatically every time
a failover occurs.

> 
> Any idea why ipv4 and ipv6 are not happy with each other when both enabled?
> Especially since I'm not even using the ipv6...

In this case, as the interface has ipv6 forwarding enabled, it will automatically
get subscribed to several multicast groups. Therefore, when multicast traffic
is received, it will respond to the ToR switch and this will learn the MAC address from the backup node on its port, disrupting traffic to the master.

Please, note that I'll be away for the next two weeks and I won't be able to look into this case until then. Sorry for the inconvenience.

Comment 11 Lon Hohberger 2017-10-10 18:09:21 UTC
According to our records, this should be resolved by openstack-neutron-9.4.1-1.el7ost.  This build is available now.

Comment 16 Toni Freger 2017-10-17 16:45:44 UTC
Thanks Daniel, we can consider it verified on latest osp10 with openstack-neutron-9.4.1-2.el7ost.noarch

Comment 19 errata-xmlrpc 2017-11-15 13:53:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3234