Description of problem: Issue reproduced by a tobiko job, more specifically, by the test test_controllers_shutdown Link to tobiko results: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/gate-sanity-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko_faults/467/infrared/.workspaces/workspace_2022-02-09_20-08-17/tobiko_gate/tobiko_gate_05_faults_faults.html Link to openstack logs: http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/gate-sanity-16.1_director-rhel-virthost-3cont_2comp-ipv4-geneve-tobiko_faults/467/ Test steps: 1) power off and power on two controllers 2022-02-09 22:18:31.142 235662 INFO tobiko.tests.faults.ha.cloud_disruptions - Power off 2 random controller nodes: ['controller-0', 'controller-2'] 2022-02-09 22:18:31.184 235662 INFO tobiko.openstack.ironic._node - Power off baremetal node '3f165d79-de5e-4c54-8f0e-9d4e02590a36' (power state = 'power on'). 2022-02-09 22:18:46.381 235662 INFO tobiko.openstack.ironic._node - Power off baremetal node 'f6fcba0c-08da-4dfd-b678-7822e43c7340' (power state = 'power on'). 2022-02-09 22:19:09.641 235662 INFO tobiko.tests.faults.ha.cloud_disruptions - Power on controller nodes: ['controller-0', 'controller-2'] 2022-02-09 22:19:09.684 235662 INFO tobiko.openstack.ironic._node - Power on baremetal node '3f165d79-de5e-4c54-8f0e-9d4e02590a36' (power_state='power off'). 2022-02-09 22:19:24.875 235662 INFO tobiko.openstack.ironic._node - Power on baremetal node 'f6fcba0c-08da-4dfd-b678-7822e43c7340' (power_state='power off'). 2) Wait until all the pacemaker resources are healthy 2022-02-09 22:23:13.672 235662 INFO tobiko.tripleo.pacemaker - pcs status checks: all resources are in healthy state 3) Create a heat stack, including a port, a VM instance, etc 2022-02-09 22:31:33.398 235662 WARNING tobiko.openstack.heat._stack - Stack 'tobiko.openstack.tests._nova.TestServerCreationStack-235662-0' reached unexpected status: 'CREATE_FAILED' 2022-02-09 22:31:33.399 235662 INFO tobiko.openstack.heat._stack - Stack 'tobiko.openstack.tests._nova.TestServerCreationStack-235662-0' status is 'CREATE_FAILED'. Reason: Resource CREATE failed: NeutronClientException: resources.port: <html><body><h1>504 Gateway Time-out</h1> The server didn't respond in time. </body></html> The previous failure affects the rest of the job, making many tests fail. During the rest of the job, there are neutron requests rejected with the error code 504. 62 504 errors occurred during the execution of the job: $ zgrep ' 504 ' controller-0/var/log/containers/haproxy/haproxy.log.1.gz | grep -c neutron 62 First 504 error at 22:31:30: $ zgrep ' 504 ' controller-0/var/log/containers/haproxy/haproxy.log.1.gz | grep neutron | head -1 Feb 9 22:31:30 controller-0 haproxy[14]: 10.0.0.122:58088 [09/Feb/2022:22:29:30.051] neutron neutron/controller-1.internalapi.redhat.local 0/0/0/-1/120001 504 194 - - sH-- 305/1/0/0/0 0/0 "POST /v2.0/ports HTTP/1.1" Last 504 error at 23:38:03: $ zgrep ' 504 ' controller-0/var/log/containers/haproxy/haproxy.log.1.gz | grep neutron | tail -1 Feb 9 23:38:03 controller-0 haproxy[14]: 172.17.1.44:54154 [09/Feb/2022:23:36:03.867] neutron neutron/controller-0.internalapi.redhat.local 0/0/0/-1/120008 504 194 - - sH-- 544/6/0/0/0 0/0 "GET /v2.0/ports?network_id=425e4e84-1c49-4e94-8dfd-f966abb7d27e&device_owner=network%3Adhcp HTTP/1.1" Some information from neutron server logs (see comment #1 for the link to these logs): 1) many pymysql exceptions on controller-1 between 22:18:56 and 22:21:29 while controller-0 and -2 are rebooting, 2) many logs like these ones on controller-1 between 22:23:19 and 22:45:46: 2022-02-09 22:23:19.606 30 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): CheckLivenessCommand() do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 2022-02-09 22:23:19.606 30 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] OVSDB transaction returned TRY_AGAIN, retrying do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:93 ... 2022-02-09 22:45:46.802 29 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=244067 command(idx=0): CheckLivenessCommand() do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84 2022-02-09 22:45:46.802 29 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] OVSDB transaction returned TRY_AGAIN, retrying do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:93 3) The neutron server process restart at that moment on controller-1 (22:48:23) 4) Some exceptions like the following ones are raised on controller-1 at 22:49:19: 2022-02-09 22:49:19.532 25 ERROR ovsdbapp.event Traceback (most recent call last): 2022-02-09 22:49:19.532 25 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/ovsdbapp/event.py", line 143, in notify_loop 2022-02-09 22:49:19.532 25 ERROR ovsdbapp.event match.run(event, row, updates) 2022-02-09 22:49:19.532 25 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/networking_ovn/ovsdb/ovsdb_monitor.py", line 448, in run 2022-02-09 22:49:19.532 25 ERROR ovsdbapp.event self.driver.delete_mac_binding_entries(row.external_ip) 2022-02-09 22:49:19.532 25 ERROR ovsdbapp.event File "/usr/lib/python3.6/site-packages/networking_ovn/ml2/mech_driver.py", line 954, in delete_mac_binding_entries 2022-02-09 22:49:19.532 25 ERROR ovsdbapp.event mac_binds = self._sb_ovn.db_find_rows( 2022-02-09 22:49:19.532 25 ERROR ovsdbapp.event AttributeError: 'NoneType' object has no attribute 'db_find_rows' Version-Release number of selected component (if applicable): RHOS-16.1-RHEL-8-20220203.n.1 python3-networking-ovn-7.3.1-1.20220113183502.el8ost.noarch.rpm How reproducible: Not often - I have only seen it once and this job is executed daily Steps to Reproduce: 1. reboot two controller nodes (in an environment with 3 controllers) 2. wait until pacemaker status is healthy 3. try to send any networking request
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8795