changing to Networking as main dfg since the issue is specifically network related as it reproduces : #running the boot_oc_workload.sh on the openstackclient breaks on : + openstack router set --external-gateway public internal_net_6804d6651b_router HttpException: 504: Server Error for url: https://overcloud.osptest.test.metalkube.org:13696/v2.0/routers?name=internal_net_6804d6651b_router, 504 Gateway Time-out: The server didn't respond in time. From the neutron logs I couldn't find the root cause yet: [cloud-admin@openstackclient ~]$ ansible controller -b -mshell -a 'grep -ri internal_net_6804d6651b_router /var/log' controller-1 | CHANGED | rc=0 >> Binary file /var/log/journal/e3929330acc880a10e2702906537133c/system.journal matches /var/log/containers/neutron/server.log:2022-08-15 15:00:59.995 15 DEBUG neutron.api.v2.base [req-91d4669f-26af-4722-8a54-2abeecf3575c 63b79e343f904c12a331da5e56e088e3 084edfb8d4e9430f90ab05f9f840373f - default default] Request body: {'router': {'admin_state_up': True, 'name': 'internal_net_6804d6651b_router'}} prepare_request_body /usr/lib/python3.6/site-packages/neutron/api/v2/base.py:719 /var/log/containers/neutron/server.log:2022-08-15 15:01:00.567 15 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): AddLRouterCommand(name=neutron-db9f1c9d-3611-43b8-93ee-903449e30f11, columns={'external_ids': {'neutron:router_name': 'internal_net_6804d6651b_router', 'neutron:gw_port_id': '', 'neutron:revision_number': '1', 'neutron:availability_zone_hints': ''}, 'enabled': True, 'options': {'always_learn_from_arp_request': 'false', 'dynamic_neigh_routers': 'true'}}, may_exist=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:89 /var/log/messages:Aug 15 15:26:58 controller-1 platform-python[581006]: ansible-command Invoked with _raw_params=grep -ri internal_net_6804d6651b_router /var/log _uses_shell=True warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None controller-2 | CHANGED | rc=0 >> Binary file /var/log/journal/e3929330acc880a10e2702906537133c/system.journal matches /var/log/containers/neutron/server.log:2022-08-15 15:01:16.833 15 INFO neutron.wsgi [req-129eff0e-51f4-40e1-97d5-cc82dd722828 63b79e343f904c12a331da5e56e088e3 084edfb8d4e9430f90ab05f9f840373f - default default] 172.17.0.20 "GET /v2.0/routers/internal_net_6804d6651b_router HTTP/1.1" status: 404 len: 311 time: 0.1050549 /var/log/containers/neutron/server.log:2022-08-15 15:01:39.799 17 INFO neutron.wsgi [req-45c683be-665a-45ea-975e-0cf01f6a5cef 63b79e343f904c12a331da5e56e088e3 084edfb8d4e9430f90ab05f9f840373f - default default] 172.17.0.20 "GET /v2.0/routers/internal_net_6804d6651b_router HTTP/1.1" status: 404 len: 311 time: 0.2596943 /var/log/messages:Aug 15 15:26:58 controller-2 platform-python[974210]: ansible-command Invoked with _raw_params=grep -ri internal_net_6804d6651b_router /var/log _uses_shell=True warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None controller-0 | CHANGED | rc=0 >> Binary file /var/log/journal/e3929330acc880a10e2702906537133c/system.journal matches /var/log/containers/haproxy/haproxy.log:Aug 15 15:01:16 controller-0 haproxy[12]: 10.0.0.251:46100 [15/Aug/2022:15:01:16.727] neutron~ neutron/controller-2.internalapi.osptest.test.metalkube.org 0/0/1/108/109 404 311 - - ---- 3/1/0/1/0 0/0 "GET /v2.0/routers/internal_net_6804d6651b_router HTTP/1.1" /var/log/containers/haproxy/haproxy.log:Aug 15 15:01:28 controller-0 haproxy[12]: 10.0.0.251:46100 [15/Aug/2022:15:01:16.842] neutron~ neutron/controller-0.internalapi.osptest.test.metalkube.org 0/0/0/11224/11224 200 652 - - ---- 1/1/0/1/0 0/0 "GET /v2.0/routers?name=internal_net_6804d6651b_router HTTP/1.1" /var/log/containers/haproxy/haproxy.log:Aug 15 15:01:39 controller-0 haproxy[12]: 10.0.0.251:47478 [15/Aug/2022:15:01:39.538] neutron~ neutron/controller-2.internalapi.osptest.test.metalkube.org 0/0/2/261/263 404 311 - - ---- 3/1/0/1/0 0/0 "GET /v2.0/routers/internal_net_6804d6651b_router HTTP/1.1" /var/log/containers/haproxy/haproxy.log:Aug 15 15:03:39 controller-0 haproxy[12]: 10.0.0.251:47478 [15/Aug/2022:15:01:39.827] neutron~ neutron/controller-0.internalapi.osptest.test.metalkube.org 0/0/2/-1/120004 504 194 - - sH-- 1/1/0/0/0 0/0 "GET /v2.0/routers?name=internal_net_6804d6651b_router HTTP/1.1" /var/log/containers/neutron/server.log:2022-08-15 15:01:28.066 16 INFO neutron.wsgi [req-607d6aab-289d-4596-9e2c-46ba539870c3 63b79e343f904c12a331da5e56e088e3 084edfb8d4e9430f90ab05f9f840373f - default default] 172.17.0.20 "GET /v2.0/routers?name=internal_net_6804d6651b_router HTTP/1.1" status: 200 len: 652 time: 11.2204776 /var/log/containers/neutron/server.log:2022-08-15 15:04:28.133 16 INFO neutron.wsgi [req-983eedc0-0851-43fd-beba-0c75cc3a4eb3 63b79e343f904c12a331da5e56e088e3 084edfb8d4e9430f90ab05f9f840373f - default default] 172.17.0.20 "GET /v2.0/routers?name=internal_net_6804d6651b_router HTTP/1.1" status: 200 len: 0 time: 168.3015454 /var/log/containers/neutron/server.log:2022-08-15 15:04:48.795 22 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): CheckRevisionNumberCommand(name=neutron-db9f1c9d-3611-43b8-93ee-903449e30f11, resource={'id': 'db9f1c9d-3611-43b8-93ee-903449e30f11', 'name': 'internal_net_6804d6651b_router', 'tenant_id': '084edfb8d4e9430f90ab05f9f840373f', 'admin_state_up': True, 'status': 'ACTIVE', 'external_gateway_info': None, 'gw_port_id': None, 'description': '', 'availability_zones': [], 'distributed': False, 'ha': False, 'ha_vr_id': 0, 'availability_zone_hints': [], 'routes': [], 'tags': [], 'created_at': '2022-08-15T15:01:00Z', 'updated_at': '2022-08-15T15:01:31Z', 'revision_number': 2, 'project_id': '084edfb8d4e9430f90ab05f9f840373f'}, resource_type=routers, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:89 /var/log/containers/neutron/server.log:2022-08-15 15:04:48.796 22 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): UpdateLRouterCommand(name=neutron-db9f1c9d-3611-43b8-93ee-903449e30f11, columns={'external_ids': {'neutron:router_name': 'internal_net_6804d6651b_router', 'neutron:gw_port_id': '', 'neutron:revision_number': '2', 'neutron:availability_zone_hints': ''}, 'enabled': True}, if_exists=True) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:89 /var/log/messages:Aug 15 15:26:58 controller-0 platform-python[932056]: ansible-command Invoked with _raw_params=grep -ri internal_net_6804d6651b_router /var/log _uses_shell=True warn=True stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
Closing as this issue doesn't reproduce on a BM host with more power - 96 vs 60 cores and 250 G mem
Re-opening this one as we see the same error in HA testing. Only after reboot on the OVN master node/controller does the cluster regain activity. IMO, not related to HW. Creating network internal_net_3244a4dee9 + openstack network create internal_net_3244a4dee9 Error while executing command: HttpException: 504, The server didn't respond in time.: 504 Gateway Time-out + echo 'Creating subnet internal_net_3244a4dee9_subnet' Creating subnet internal_net_3244a4dee9_subnet + openstack subnet create --subnet-range 192.168.0.0/24 --allocation-pool start=192.168.0.10,end=192.168.0.100 --gateway 192.168.0.254 --dns-nameserver 172.22.0.1 --network internal_net_3244a4dee9 internal_net_3244a4dee9_subnet HttpException: 504: Server Error for url: https://overcloud.osptest.test.metalkube.org:13696/v2.0/subnets, 504 Gateway Time-out: The server didn't respond in time. + echo 'Add subnet internal_net_3244a4dee9_subnet to router internal_net_3244a4dee9_router' Add subnet internal_net_3244a4dee9_subnet to router internal_net_3244a4dee9_router + openstack router add subnet internal_net_3244a4dee9_router internal_net_3244a4dee9_subnet No Router found for internal_net_3244a4dee9_router + echo 'Set external-gateway for internal_net_3244a4dee9_router' Set external-gateway for internal_net_3244a4dee9_router + openstack router set --external-gateway public internal_net_3244a4dee9_router No Router found for internal_net_3244a4dee9_router ## create security group openstack security group list | grep ${SECGROUP_NAME} + grep allow-icmp-ssh-3244a4dee9 + openstack security group list if [ $? -ne 0 ]; then echo "Creating security group ${SECGROUP_NAME}" openstack security group create ${SECGROUP_NAME} echo "Creating rules for ports 22,80,443 in security group ${SECGROUP_NAME}" openstack security group rule create --proto icmp ${SECGROUP_NAME} openstack security group rule create --proto tcp --dst-port 22 ${SECGROUP_NAME} openstack security group rule create --proto tcp --dst-port 80 ${SECGROUP_NAME} openstack security group rule create --proto tcp --dst-port 443 ${SECGROUP_NAME} fi + '[' 1 -ne 0 ']' + echo 'Creating security group allow-icmp-ssh-3244a4dee9' Creating security group allow-icmp-ssh-3244a4dee9 + openstack security group create allow-icmp-ssh-3244a4dee9 Error while executing command: HttpException: 504, 504 Gateway Time-out: The server didn't respond in time. + echo 'Creating rules for ports 22,80,443 in security group allow-icmp-ssh-3244a4dee9' Creating rules for ports 22,80,443 in security group allow-icmp-ssh-3244a4dee9 + openstack security group rule create --proto icmp allow-icmp-ssh-3244a4dee9 Error while executing command: HttpException: 504, 504 Gateway Time-out: The server didn't respond in time. + openstack security group rule create --proto tcp --dst-port 22 allow-icmp-ssh-3244a4dee9 Error while executing command: HttpException: 504, The server didn't respond in time.: 504 Gateway Time-out + openstack security group rule create --proto tcp --dst-port 80 allow-icmp-ssh-3244a4dee9 Error while executing command: HttpException: 504, The server didn't respond in time.: 504 Gateway Time-out + openstack security group rule create --proto tcp --dst-port 443 allow-icmp-ssh-3244a4dee9
While trying to have a reproducer environment it seems that the problem is fixed 5/5 HA runs are passing