Description of problem: Running TrafficOperationsScenarioTest tempest scenarios fails if ACTIVE_STANDBY topology is enabled. Version-Release number of selected component (if applicable): 16 (RHOS_TRUNK-16.0-RHEL-8-20200113.n.0) How reproducible: 100% Steps to Reproduce: 1. enable ACTIVE_STANDBY topology in octavia controllers 2. launch tempest TrafficOperationsScenarioTest 3. Actual results: Tests fail because LB doesn't process the traffic: {0} octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest.test_basic_traffic [75.602990s] ... ok {0} octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest.test_healthmonitor_traffic [86.241716s] ... ok {0} octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest.test_l7policies_and_l7rules [125.917772s] ... FAILED [..] Traceback (most recent call last): File "/home/stack/devel/octavia-tempest-plugin/octavia_tempest_plugin/tests/scenario/v2/test_traffic_ops.py", line 618, in test_l7policies_and_l7rules url_for_member1) File "/home/stack/devel/octavia-tempest-plugin/octavia_tempest_plugin/tests/test_base.py", line 973, in assertConsistentResponse timeout=timeout, **kwargs) File "/home/stack/devel/tempest/.tox/tempest/lib/python3.6/site-packages/requests/sessions.py", line 533, in request resp = self.send(prep, **send_kwargs) File "/home/stack/devel/tempest/.tox/tempest/lib/python3.6/site-packages/requests/sessions.py", line 646, in send r = adapter.send(request, **kwargs) File "/home/stack/devel/tempest/.tox/tempest/lib/python3.6/site-packages/requests/adapters.py", line 504, in send raise ConnectTimeout(e, request=request) requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='10.0.0.217', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f8e69ee8048>, 'Connection to 10.0.0.217 timed out. (connect timeout=2)')) Expected results: Tests should pass Additional info: Bug is also reproducible with centos8 amphora. Reading amphorae logs, it seems that keepalived (2.0.10) goes into a FAULT STATE during the execution of the test: In MASTER amphora: Jan 15 16:46:33 amphora-fe00831a-b328-4587-87aa-11063687deb8 gunicorn.gunicorn.error: [898] GET // Jan 15 16:46:33 amphora-fe00831a-b328-4587-87aa-11063687deb8 gunicorn.gunicorn.access: [898] ::ffff:192.168.0.156 - - [15/Jan/2020:16:46:33 +0000] "GET // HTTP/1.1" 200 21 "-" "Octavia HaProxy Rest Client/0.5 (https://wiki.openstack.org/wi ki/Octavia)" Jan 15 16:46:33 amphora-fe00831a-b328-4587-87aa-11063687deb8 gunicorn.gunicorn.error: [898] POST /1.0/plug/network Jan 15 16:46:33 amphora-fe00831a-b328-4587-87aa-11063687deb8 amphora-agent[898]: 2020-01-15 16:46:33.810 898 INFO octavia.amphorae.backends.agent.api_server.plug [-] Plugged interface ens8 will become eth2 in the namespace amphora-haproxy Jan 15 16:46:33 amphora-fe00831a-b328-4587-87aa-11063687deb8 Keepalived_vrrp[1312]: Interface name has changed from ens8 to eth2 Jan 15 16:46:33 amphora-fe00831a-b328-4587-87aa-11063687deb8 Keepalived_vrrp[1312]: (baa7f2985b7f42ce8901194bc4c482bc) Entering FAULT STATE Jan 15 16:46:33 amphora-fe00831a-b328-4587-87aa-11063687deb8 Keepalived_vrrp[1312]: (baa7f2985b7f42ce8901194bc4c482bc) sent 0 priority In BACKUP amphora: Jan 15 16:46:34 amphora-29f925aa-0c9c-4e04-99c4-ba359ce0e785 Keepalived_vrrp[1301]: (baa7f2985b7f42ce8901194bc4c482bc) Entering MASTER STATE [...] Jan 15 16:46:53 amphora-29f925aa-0c9c-4e04-99c4-ba359ce0e785 gunicorn.gunicorn.error: [885] GET // Jan 15 16:46:53 amphora-29f925aa-0c9c-4e04-99c4-ba359ce0e785 gunicorn.gunicorn.access: [885] ::ffff:192.168.0.156 - - [15/Jan/2020:16:46:53 +0000] "GET // HTTP/1.1" 200 21 "-" "Octavia HaProxy Rest Client/0.5 (https://wiki.openstack.org/wiki/Octavia)" Jan 15 16:46:53 amphora-29f925aa-0c9c-4e04-99c4-ba359ce0e785 gunicorn.gunicorn.error: [885] POST /1.0/plug/network Jan 15 16:46:53 amphora-29f925aa-0c9c-4e04-99c4-ba359ce0e785 amphora-agent[885]: 2020-01-15 16:46:53.256 885 INFO octavia.amphorae.backends.agent.api_server.plug [-] Plugged interface ens9 will become eth3 in the namespace amphora-haproxy Jan 15 16:46:53 amphora-29f925aa-0c9c-4e04-99c4-ba359ce0e785 Keepalived_vrrp[1301]: Interface name has changed from ens9 to eth3 Jan 15 16:46:53 amphora-29f925aa-0c9c-4e04-99c4-ba359ce0e785 Keepalived_vrrp[1301]: (baa7f2985b7f42ce8901194bc4c482bc) Entering FAULT STATE Then no amphora has a VRRP VIP address on eth1, so the LB is not reachable Simple reproducer I reproduced the issue using one LB in ACTIVE_STANDBY, one listener and some iproute commands that simulate what amphora-agent does, it shows that the issue might be in keepalived code. Connect into the MASTER amphora, check VRRP ip address and VRRP state in the logs bash-4.4# ip -n amphora-haproxy a 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000 link/ether fa:16:3e:1f:f5:b5 brd ff:ff:ff:ff:ff:ff inet 10.0.0.189/26 brd 10.0.0.191 scope global eth1 valid_lft forever preferred_lft forever inet 10.0.0.161/32 scope global eth1 valid_lft forever preferred_lft forever bash-4.4# journalctl -le | grep Keep [..] Jan 15 14:56:25 amphora-f3af9d57-790c-4471-ae1f-82ee48ec2f68.novalocal Keepalived_vrrp[1346]: (2877477fd523485ebb7edbb2bf0967e4) Entering BACKUP STATE Jan 15 14:56:28 amphora-f3af9d57-790c-4471-ae1f-82ee48ec2f68.novalocal Keepalived_vrrp[1346]: (2877477fd523485ebb7edbb2bf0967e4) received lower priority (90) advert from 10.0.0.173 - discarding Jan 15 14:56:29 amphora-f3af9d57-790c-4471-ae1f-82ee48ec2f68.novalocal Keepalived_vrrp[1346]: (2877477fd523485ebb7edbb2bf0967e4) Entering MASTER STATE Create a dummy interface, move it to amphora-haproxy ns, check VRRP ip address bash-4.4# ip link add veth0 type veth peer veth1 bash-4.4# ip link set veth1 up bash-4.4# ip link set veth1 netns amphora-haproxy bash-4.4# ip -n amphora-haproxy a 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000 link/ether fa:16:3e:1f:f5:b5 brd ff:ff:ff:ff:ff:ff inet 10.0.0.189/26 brd 10.0.0.191 scope global eth1 valid_lft forever preferred_lft forever inet 10.0.0.161/32 scope global eth1 valid_lft forever preferred_lft forever 4: veth1@if5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether fa:0f:5b:55:44:f5 brd ff:ff:ff:ff:ff:ff link-netnsid 0 Rename the dummy interface, check VRRP ip address and the logs -> address has been removed, keepalived is in FAULT STATE bash-4.4# ip -n amphora-haproxy link set veth1 name foo0 bash-4.4# ip -n amphora-haproxy a 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000 link/ether fa:16:3e:1f:f5:b5 brd ff:ff:ff:ff:ff:ff inet 10.0.0.189/26 brd 10.0.0.191 scope global eth1 valid_lft forever preferred_lft forever 4: foo0@if5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000 link/ether fa:0f:5b:55:44:f5 brd ff:ff:ff:ff:ff:ff link-netnsid 0 bash-4.4# journalctl -le | grep Keep [..] Jan 15 16:31:27 amphora-f3af9d57-790c-4471-ae1f-82ee48ec2f68.novalocal Keepalived_vrrp[1346]: Interface name has changed from veth1 to foo0 Jan 15 16:31:27 amphora-f3af9d57-790c-4471-ae1f-82ee48ec2f68.novalocal Keepalived_vrrp[1346]: (2877477fd523485ebb7edbb2bf0967e4) Entering FAULT STATE Jan 15 16:31:27 amphora-f3af9d57-790c-4471-ae1f-82ee48ec2f68.novalocal Keepalived_vrrp[1346]: (2877477fd523485ebb7edbb2bf0967e4) sent 0 priority
The bug is not present when using more recent keepalived releases (compiled from source). Using git bisect I found out that https://github.com/acassen/keepalived/commit/30eeb48b1a0737dc7443fd421fd6613e0d55fd17 fixes the issue, this commit is included in keepalived>=2.0.11 releases. After cherrypicking that commit in keepalived 2.0.10, and updating the keepalived binary in the amphora, tests passed successfully: all run-test-pre: PYTHONHASHSEED='567330168' all run-test: commands[0] | find . -type f -name '*.pyc' -delete all run-test: commands[1] | tempest run --regex test_traffic_ops {0} octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest.test_basic_traffic [80.633249s] ... ok {0} octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest.test_healthmonitor_traffic [86.448448s] ... ok {0} octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest.test_l7policies_and_l7rules [139.114542s] ... ok {0} octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest.test_mixed_ipv4_ipv6_members_traffic [49.931187s] ... ok