Description of problem: ----------------------- After FFU upgrade from RHOS-10 to RHOS-13 next tempest test fails: tempest.scenario.test_network_v6.TestGettingAddress.test_dualnet_dhcp6_stateless_from_os[compute,id-76f26acd-9688-42b4-bc3e-cd134c4cb09e,network,slow]: ... 2018-09-15 05:55:15,198 12884 INFO [tempest.lib.common.rest_client] Request (TestGettingAddress:_run_cleanups): 202 DELETE https://[2620:52:0:13b8:5054:ff:fe3e:1]:13774/v2.1/os-keypairs/tempest-TestGettingAddress-1200369630 0.092s 2018-09-15 05:55:15,198 12884 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'} Body: None Response - Headers: {'status': '202', u'content-length': '0', 'content-location': 'https://[2620:52:0:13b8:5054:ff:fe3e:1]:13774/v2.1/os-keypairs/tempest-TestGettingAddress-1200369630', u'x-compute-request-id': 'req-3b18f64e-69d8-45a3-928c-ebf2850ced49', u'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version', u'server': 'Apache', u'openstack-api-version': 'compute 2.1', u'connection': 'close', u'x-openstack-nova-api-version': '2.1', u'date': 'Sat, 15 Sep 2018 09:55:15 GMT', u'content-type': 'application/json', u'x-openstack-request-id': 'req-3b18f64e-69d8-45a3-928c-ebf2850ced49'} Body: }}} Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tempest/common/utils/__init__.py", line 88, in wrapper return f(*func_args, **func_kwargs) File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_v6.py", line 245, in test_dualnet_dhcp6_stateless_from_os self._prepare_and_test(address6_mode='dhcpv6-stateless', dualnet=True) File "/usr/lib/python2.7/site-packages/tempest/scenario/test_network_v6.py", line 196, in _prepare_and_test (ip, srv['id'], ssh.exec_command("ip address"))) File "/usr/lib/python2.7/site-packages/unittest2/case.py", line 666, in fail raise self.failureException(msg) AssertionError: Address 2003::f816:3eff:fed3:6c93 not configured for instance d8f62279-2d5a-4386-8eaa-155d2e0c92e7, ip address output is 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast qlen 1000 link/ether fa:16:3e:c6:9b:8c brd ff:ff:ff:ff:ff:ff inet 10.100.0.5/28 brd 10.100.0.15 scope global eth0 inet6 fe80::f816:3eff:fec6:9b8c/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether fa:16:3e:d3:6c:93 brd ff:ff:ff:ff:ff:ff inet6 fe80::f816:3eff:fed3:6c93/64 scope link valid_lft forever preferred_lft forever We'd like to understand if this is a valid failure or this test has to be skipped. Version-Release number of selected component (if applicable): ------------------------------------------------------------- openstack-neutron-12.0.3-5.el7ost.noarch openstack-neutron-bigswitch-agent-12.0.3-0.20180419202353.a59c13d.el7ost.noarch openstack-neutron-bigswitch-lldpr-12.0.3-0.20180419202353.a59c13d.el7ost.noarch openstack-neutron-common-12.0.3-5.el7ost.noarch openstack-neutron-lbaas-12.0.1-0.20180424200349.cdbf25c.el7ost.noarch openstack-neutron-metering-agent-12.0.3-5.el7ost.noarch openstack-neutron-ml2-12.0.3-5.el7ost.noarch openstack-neutron-openvswitch-12.0.3-5.el7ost.noarch openstack-neutron-sriov-nic-agent.noarch openstack-tripleo-heat-templates-8.0.4-20.el7ost.noarch Steps to Reproduce: ------------------- 1. Run test after FFWD upgrade from RHOS-10 to RHOS-13 Additional info: ---------------- Virtual environment: 3controllers + 2computes + 3ceph, uc/oc with ssl, ipv6
Seems related to an upstream bug that was filed recently as well. As a note, this is failing on both OSP 10, 13 and 14 and upstream master, so isn't specific to 14.
I reproduced this issue on my dev env today. What I found is that in case when everything works there is 2 radvd processes for network: ps aux | grep radvd 42435 90456 0.0 0.0 6556 516 ? Ss 12:25 0:00 radvd -C /var/lib/neutron/ra/1aa8bd9b-5934-48dd-a812-e5ef99769b6e.radvd.conf -p /var/lib/neutron/external/pids/1aa8bd9b-5934-48dd-a812-e5ef99769b6e.pid.radvd -u neutron -m syslog 42435 90457 0.0 0.0 6556 332 ? S 12:25 0:00 radvd -C /var/lib/neutron/ra/1aa8bd9b-5934-48dd-a812-e5ef99769b6e.radvd.conf -p /var/lib/neutron/external/pids/1aa8bd9b-5934-48dd-a812-e5ef99769b6e.pid.radvd -u neutron -m syslog But when there was this issue with one test, I found in L3 agent log error like below from radvd. And also there was one of radvd processes on host in DEFUNCT state. Kill of radvd processes and restart l3 agent container make it working for this network in my case. 2019-01-23 11:40:14.667 788919 ERROR neutron.agent.linux.utils [-] Exit code: -6; Stdin: ; Stdout: ; Stderr: *** Error in `radvd': double free or corruption (!prev): 0x000055b3508a24c0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x81489)[0x7fd114c26489] radvd(+0x6502)[0x55b34f074502] radvd(+0x3904)[0x55b34f071904] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd114bc73d5] radvd(+0x3c9c)[0x55b34f071c9c] ======= Memory map: ======== 55b34f06e000-55b34f086000 r-xp 00000000 fc:02 25323013 /usr/sbin/radvd 55b34f285000-55b34f286000 r--p 00017000 fc:02 25323013 /usr/sbin/radvd 55b34f286000-55b34f287000 rw-p 00018000 fc:02 25323013 /usr/sbin/radvd 55b3508a1000-55b3508c2000 rw-p 00000000 00:00 0 [heap] 7fd110000000-7fd110021000 rw-p 00000000 00:00 0 7fd110021000-7fd114000000 ---p 00000000 00:00 0 7fd11477c000-7fd114791000 r-xp 00000000 fc:02 92730563 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7fd114791000-7fd114990000 ---p 00015000 fc:02 92730563 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7fd114990000-7fd114991000 r--p 00014000 fc:02 92730563 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7fd114991000-7fd114992000 rw-p 00015000 fc:02 92730563 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7fd114992000-7fd11499e000 r-xp 00000000 fc:02 92776532 /usr/lib64/libnss_files-2.17.so 7fd11499e000-7fd114b9d000 ---p 0000c000 fc:02 92776532 /usr/lib64/libnss_files-2.17.so 7fd114b9d000-7fd114b9e000 r--p 0000b000 fc:02 92776532 /usr/lib64/libnss_files-2.17.so 7fd114b9e000-7fd114b9f000 rw-p 0000c000 fc:02 92776532 /usr/lib64/libnss_files-2.17.so 7fd114b9f000-7fd114ba5000 rw-p 00000000 00:00 0 7fd114ba5000-7fd114d67000 r-xp 00000000 fc:02 92730590 /usr/lib64/libc-2.17.so 7fd114d67000-7fd114f67000 ---p 001c2000 fc:02 92730590 /usr/lib64/libc-2.17.so 7fd114f67000-7fd114f6b000 r--p 001c2000 fc:02 92730590 /usr/lib64/libc-2.17.so 7fd114f6b000-7fd114f6d000 rw-p 001c6000 fc:02 92730590 /usr/lib64/libc-2.17.so 7fd114f6d000-7fd114f72000 rw-p 00000000 00:00 0 7fd114f72000-7fd114f94000 r-xp 00000000 fc:02 92730575 /usr/lib64/ld-2.17.so 7fd11518a000-7fd11518d000 rw-p 00000000 00:00 0 7fd115191000-7fd115193000 rw-p 00000000 00:00 0 7fd115193000-7fd115194000 r--p 00021000 fc:02 92730575 /usr/lib64/ld-2.17.so 7fd115194000-7fd115195000 rw-p 00022000 fc:02 92730575 /usr/lib64/ld-2.17.so 7fd115195000-7fd115196000 rw-p 00000000 00:00 0 7ffeb89e3000-7ffeb8a04000 rw-p 00000000 00:00 0 [stack] 7ffeb8bba000-7ffeb8bbc000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] 2019-01-23 11:40:14.667 788919 DEBUG oslo_concurrency.lockutils [-] Lock "enable_radvd" released by "neutron.agent.l3.ha_router.enable_radvd" :: held 0.024s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:285 2019-01-23 11:40:14.668 788919 INFO eventlet.wsgi.server [-] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 490, in handle_one_response result = self.application(self.environ, start_response) File "/usr/lib/python2.7/site-packages/webob/dec.py", line 131, in __call__ resp = self.call_func(req, *args, **self.kwargs) File "/usr/lib/python2.7/site-packages/webob/dec.py", line 196, in call_func return self.func(req, *args, **kwargs) File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha.py", line 44, in __call__ self.enqueue(router_id, state) File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha.py", line 50, in enqueue self.agent.enqueue_state_change(router_id, state) File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 158, in wrapper result = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha.py", line 127, in enqueue_state_change self._update_radvd_daemon(ri, state) File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha.py", line 167, in _update_radvd_daemon ri.enable_radvd() File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 274, in inner return f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 467, in enable_radvd super(HaRouter, self).enable_radvd(internal_ports) File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 518, in enable_radvd self.radvd.enable(internal_ports) File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ra.py", line 173, in enable self._spawn_radvd(radvd_conf) File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ra.py", line 161, in _spawn_radvd pm.enable(reload_cfg=True) File "/usr/lib/python2.7/site-packages/neutron/agent/linux/external_process.py", line 88, in enable run_as_root=self.run_as_root) File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 903, in execute log_fail_as_error=log_fail_as_error, **kwargs) File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 151, in execute raise ProcessExecutionError(msg, returncode=returncode) ProcessExecutionError: Exit code: -6; Stdin: ; Stdout: ; Stderr: *** Error in `radvd': double free or corruption (!prev): 0x000055b3508a24c0 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x81489)[0x7fd114c26489] radvd(+0x6502)[0x55b34f074502] radvd(+0x3904)[0x55b34f071904] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fd114bc73d5] radvd(+0x3c9c)[0x55b34f071c9c] ======= Memory map: ======== 55b34f06e000-55b34f086000 r-xp 00000000 fc:02 25323013 /usr/sbin/radvd 55b34f285000-55b34f286000 r--p 00017000 fc:02 25323013 /usr/sbin/radvd 55b34f286000-55b34f287000 rw-p 00018000 fc:02 25323013 /usr/sbin/radvd 55b3508a1000-55b3508c2000 rw-p 00000000 00:00 0 [heap] 7fd110000000-7fd110021000 rw-p 00000000 00:00 0 7fd110021000-7fd114000000 ---p 00000000 00:00 0 7fd11477c000-7fd114791000 r-xp 00000000 fc:02 92730563 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7fd114791000-7fd114990000 ---p 00015000 fc:02 92730563 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7fd114990000-7fd114991000 r--p 00014000 fc:02 92730563 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7fd114991000-7fd114992000 rw-p 00015000 fc:02 92730563 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7fd114992000-7fd11499e000 r-xp 00000000 fc:02 92776532 /usr/lib64/libnss_files-2.17.so 7fd11499e000-7fd114b9d000 ---p 0000c000 fc:02 92776532 /usr/lib64/libnss_files-2.17.so 7fd114b9d000-7fd114b9e000 r--p 0000b000 fc:02 92776532 /usr/lib64/libnss_files-2.17.so 7fd114b9e000-7fd114b9f000 rw-p 0000c000 fc:02 92776532 /usr/lib64/libnss_files-2.17.so 7fd114b9f000-7fd114ba5000 rw-p 00000000 00:00 0 7fd114ba5000-7fd114d67000 r-xp 00000000 fc:02 92730590 /usr/lib64/libc-2.17.so 7fd114d67000-7fd114f67000 ---p 001c2000 fc:02 92730590 /usr/lib64/libc-2.17.so 7fd114f67000-7fd114f6b000 r--p 001c2000 fc:02 92730590 /usr/lib64/libc-2.17.so 7fd114f6b000-7fd114f6d000 rw-p 001c6000 fc:02 92730590 /usr/lib64/libc-2.17.so 7fd114f6d000-7fd114f72000 rw-p 00000000 00:00 0 7fd114f72000-7fd114f94000 r-xp 00000000 fc:02 92730575 /usr/lib64/ld-2.17.so 7fd11518a000-7fd11518d000 rw-p 00000000 00:00 0 7fd115191000-7fd115193000 rw-p 00000000 00:00 0 7fd115193000-7fd115194000 r--p 00021000 fc:02 92730575 /usr/lib64/ld-2.17.so 7fd115194000-7fd115195000 rw-p 00022000 fc:02 92730575 /usr/lib64/ld-2.17.so 7fd115195000-7fd115196000 rw-p 00000000 00:00 0 7ffeb89e3000-7ffeb8a04000 rw-p 00000000 00:00 0 [stack] 7ffeb8bba000-7ffeb8bbc000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] I will investigate why such error happens.
In L3 agent log I found some interesting thing related to radvd config for this router. It looks that it prepare it "twice": 2019-01-23 11:40:14.646 788919 DEBUG neutron.agent.linux.ra [-] Enable IPv6 RA for router 1aa8bd9b-5934-48dd-a812-e5ef99769b6e enable /usr/lib/python2.7/site-packages/neutron/agent/linux/ra.py:171 2019-01-23 11:40:14.646 788919 DEBUG neutron.agent.linux.ra [-] radvd config = interface qr-29c030a8-26 { AdvSendAdvert on; MinRtrAdvInterval 30; MaxRtrAdvInterval 100; AdvLinkMTU 1500; AdvOtherConfigFlag on; prefix 2003:0:0:1::/64 { AdvOnLink on; AdvAutonomous on; }; prefix 2003::/64 { AdvOnLink on; AdvAutonomous on; }; };interface qr-29c030a8-26 { AdvSendAdvert on; MinRtrAdvInterval 30; MaxRtrAdvInterval 100; AdvLinkMTU 1500; AdvOtherConfigFlag on; prefix 2003::/64 { AdvOnLink on; AdvAutonomous on; }; }; _generate_radvd_conf /usr/lib/python2.7/site-packages/neutron/agent/linux/ra.py:130
I did same test with spawning radvd with similar config manually. On RHEL with radvd 2.17 it failed as below: [root@controller-0 neutron]# ip netns add test [root@controller-0 neutron]# ip link add veth0 type veth peer name veth1 [root@controller-0 neutron]# ip link set veth1 netns test [root@controller-0 neutron]# ip netns exec test ip link set up dev veth1 [root@controller-0 neutron]# ip link set up dev veth0 [root@controller-0 neutron]# vi /tmp/radvd.conf [root@controller-0 neutron]# ip netns exec test radvd -C /tmp/radvd.conf -p /tmp/radvd.pid --debug radvd: option '--debug' requires an argument [root@controller-0 neutron]# ip netns exec test radvd -C /tmp/radvd.conf -p /tmp/radvd.pid --debug=10 [Jan 23 15:13:16] radvd (387166): version 2.17 started [Jan 23 15:13:16] radvd (387166): veth1 interface definition ok [Jan 23 15:13:16] radvd (387166): duplicate interface definition for veth1 [Jan 23 15:13:16] radvd (387166): Freeing Interfaces [Jan 23 15:13:16] radvd (387166): freeing interface veth1 [Jan 23 15:13:16] radvd (387166): Freeing Interfaces [Jan 23 15:13:16] radvd (387166): IPv6 forwarding setting is: 0, should be 1 or 2 [Jan 23 15:13:16] radvd (387166): IPv6 forwarding seems to be disabled, but continuing anyway [Jan 23 15:13:16] radvd (387166): radvd startup PID is 387166 [Jan 23 15:13:16] radvd (387166): opened pid file /tmp/radvd.pid [Jan 23 15:13:16] radvd (387166): locked pid file /tmp/radvd.pid [Jan 23 15:13:16] radvd (387167): opened pid file /tmp/radvd.pid [Jan 23 15:13:16] radvd (387167): radvd PID is 387167 [Jan 23 15:13:16] radvd (387167): wrote pid 387167 to pid file: /tmp/radvd.pid [Jan 23 15:13:16] radvd (387166): child signaled pid file written: 387167 [Jan 23 15:13:16] radvd (387166): Freeing Interfaces [Jan 23 15:13:16] radvd (387166): freeing interface *** Error in `radvd': double free or corruption (out): 0x00007fbae8c07580 *** ======= Backtrace: ========= /lib64/libc.so.6(+0x81489)[0x7fbae8c03489] radvd(+0x642c)[0x55558303642c] radvd(+0x3904)[0x555583033904] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fbae8ba43d5] radvd(+0x3c9c)[0x555583033c9c] ======= Memory map: ======== 555583030000-555583048000 r-xp 00000000 fc:02 13864839 /usr/sbin/radvd 555583247000-555583248000 r--p 00017000 fc:02 13864839 /usr/sbin/radvd 555583248000-555583249000 rw-p 00018000 fc:02 13864839 /usr/sbin/radvd 555584bc7000-555584be8000 rw-p 00000000 00:00 0 [heap] 7fbae4000000-7fbae4021000 rw-p 00000000 00:00 0 7fbae4021000-7fbae8000000 ---p 00000000 00:00 0 7fbae8759000-7fbae876e000 r-xp 00000000 fc:02 2550645 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7fbae876e000-7fbae896d000 ---p 00015000 fc:02 2550645 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7fbae896d000-7fbae896e000 r--p 00014000 fc:02 2550645 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7fbae896e000-7fbae896f000 rw-p 00015000 fc:02 2550645 /usr/lib64/libgcc_s-4.8.5-20150702.so.1 7fbae896f000-7fbae897b000 r-xp 00000000 fc:02 1725904 /usr/lib64/libnss_files-2.17.so 7fbae897b000-7fbae8b7a000 ---p 0000c000 fc:02 1725904 /usr/lib64/libnss_files-2.17.so 7fbae8b7a000-7fbae8b7b000 r--p 0000b000 fc:02 1725904 /usr/lib64/libnss_files-2.17.so 7fbae8b7b000-7fbae8b7c000 rw-p 0000c000 fc:02 1725904 /usr/lib64/libnss_files-2.17.so 7fbae8b7c000-7fbae8b82000 rw-p 00000000 00:00 0 7fbae8b82000-7fbae8d44000 r-xp 00000000 fc:02 1837011 /usr/lib64/libc-2.17.so 7fbae8d44000-7fbae8f44000 ---p 001c2000 fc:02 1837011 /usr/lib64/libc-2.17.so 7fbae8f44000-7fbae8f48000 r--p 001c2000 fc:02 1837011 /usr/lib64/libc-2.17.so 7fbae8f48000-7fbae8f4a000 rw-p 001c6000 fc:02 1837011 /usr/lib64/libc-2.17.so 7fbae8f4a000-7fbae8f4f000 rw-p 00000000 00:00 0 7fbae8f4f000-7fbae8f71000 r-xp 00000000 fc:02 443 /usr/lib64/ld-2.17.so 7fbae915f000-7fbae9162000 rw-p 00000000 00:00 0 7fbae916e000-7fbae9170000 rw-p 00000000 00:00 0 7fbae9170000-7fbae9171000 r--p 00021000 fc:02 443 /usr/lib64/ld-2.17.so 7fbae9171000-7fbae9172000 rw-p 00022000 fc:02 443 /usr/lib64/ld-2.17.so 7fbae9172000-7fbae9173000 rw-p 00000000 00:00 0 7ffdc0601000-7ffdc0622000 rw-p 00000000 00:00 0 [stack] 7ffdc0696000-7ffdc0698000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] Przerwane
Bug against radvd in RHEL reported: https://bugzilla.redhat.com/show_bug.cgi?id=1668812 But still we should find a way how to avoid such bad configs in Neutron.
What I found in neutron so far is that issue is somewhere in neutron.agent.l3.router_info.RouterInfo._process_internal_ports() method. In such bad run problematic port is added twice to internal_ports list. We can check this list and delete from it duplicates with lower revision number and that should solve this issue but I want to find out why it happens like that. I also think that the same issue may happen in upstream master branch as there are no changes between master and queens in this method.
I proposed patch in U/S neutron: https://review.openstack.org/633236 I'm also testing it in my dev environment. So far it passed tests 8 times which is good :)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:0935