Bug 2209452

Summary: [17.1] After a disruptive action, a request for server creation ends up with "504 Gateway Time-out".
Product: Red Hat OpenStack Reporter: Julia Marciano <jmarcian>
Component: openstack-heatAssignee: OSP Team <rhos-maint>
Status: CLOSED NOTABUG QA Contact: David Rosenfeld <drosenfe>
Severity: medium Docs Contact:
Priority: medium    
Version: 17.1 (Wallaby)CC: apevec, eolivare, lmiccini, mburns, rhos-maint, tkajinam, zbitter
Target Milestone: ---Keywords: Triaged
Target Release: ---Flags: ifrangs: needinfo? (rhos-maint)
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-05 21:16:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Julia Marciano 2023-05-23 23:29:44 UTC
Description of problem:

From:
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/Phase3/view/OSP%2017.1/view/PidOne/job/DFG-pidone-sanity-17.1_director-rhel-virthost-3cont_2comp-ipv6-geneve-tobiko_faults-sanity/20/artifact/infrared/.workspaces/active/tobiko_faults_1/tobiko_faults_1_03_faults_faults.html
tobiko/tests/faults/ha/test_cloud_recovery.py::DisruptTripleoNodesTest::test_network_disruptor_main_vip:

File "/home/stack/src/x/tobiko/tobiko/openstack/heat/_stack.py", line 192, in setup_stack
stack = self.create_stack()
File "/home/stack/src/x/tobiko/tobiko/openstack/heat/_stack.py", line 208, in create_stack
stack = self.try_create_stack()
File "/home/stack/src/x/tobiko/tobiko/openstack/heat/_stack.py", line 281, in try_create_stack
stack_id: str = self.setup_client().stacks.create(
File "/home/stack/src/x/tobiko/.tox/py3/lib/python3.9/site-packages/heatclient/v1/stacks.py", line 170, in create
resp = self.client.post('/stacks',
File "/home/stack/src/x/tobiko/.tox/py3/lib/python3.9/site-packages/keystoneauth1/adapter.py", line 401, in post
return self.request(url, 'POST', **kwargs)
File "/home/stack/src/x/tobiko/.tox/py3/lib/python3.9/site-packages/heatclient/common/http.py", line 323, in request
raise exc.from_response(resp)
heatclient.exc.HTTPException: ERROR: b"<html><body><h1>504 Gateway Time-out</h1>\nThe server didn't respond in time.\n</body></html>\n"

and:
2023-05-12 22:52:12.106 180674 DEBUG tobiko.openstack.heat._stack - Stack 'tobiko.openstack.tests._nova.TestServerCreationStack-180674-0' not found


When looking in the haproxy.log[1] we can see that POST request has been sent:
May 12 22:52:11 controller-0 haproxy[7]: 2620:52:0:13b8::fe:a8:55294 [12/May/2023:22:42:11.289] heat_api heat_api/controller-0.internalapi.redhat.local 0/0/0/-1/600002 504 198 - - sH-- 1/1/0/0/0 0/0 "POST /v1/`/stacks HTTP/1.1" 

And that request was received at controller-0's heat-api (pls see [2]):
2023-05-12 22:42:11.296 20 INFO heat.common.wsgi [req-92e71a2c-1ff7-437b-83a6-ab2110ec57d1 admin admin - default default] Processing request: POST /v1/f47c400d739b49a1bc1969d317f1258a/stacks

And then we can see the following error ocured: 
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi [req-92e71a2c-1ff7-437b-83a6-ab2110ec57d1 admin admin - default default] Unexpected error occurred serving API: Timed out waiting for a reply to message ID f5db602e46e84438afee6523f0beadb9: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f5db602e46e84438afee6523f0beadb9
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi Traceback (most recent call last):
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 433, in get
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     return self._queues[msg_id].get(block=True, timeout=timeout)
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib64/python3.9/queue.py", line 179, in get
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     raise Empty
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi _queue.Empty
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi 
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi During handling of the above exception, another exception occurred:
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi 
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi Traceback (most recent call last):
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/heat/common/wsgi.py", line 890, in __call__
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     action_result = self.dispatch(self.controller, action,
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/heat/common/wsgi.py", line 964, in dispatch
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     return method(*args, **kwargs)
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/heat/api/openstack/v1/util.py", line 46, in handle_stack_method
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     return handler(controller, req, **kwargs)
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/heat/api/openstack/v1/stacks.py", line 414, in create
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     result = self.rpc_client.create_stack(
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/heat/rpc/client.py", line 274, in create_stack
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     return self._create_stack(ctxt, stack_name, template, params, files,
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/heat/rpc/client.py", line 295, in _create_stack
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     return self.call(
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/heat/rpc/client.py", line 89, in call
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     return client.call(ctxt, method, **kwargs)
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/oslo_messaging/rpc/client.py", line 175, in call
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     self.transport._send(self.target, msg_ctxt, msg,
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/oslo_messaging/transport.py", line 123, in _send
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     return self._driver.send(target, ctxt, message,
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 681, in send
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     return self._send(target, ctxt, message, wait_for_reply, timeout,
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 670, in _send
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     result = self._waiter.wait(msg_id, timeout,
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in wait
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     message = self.waiters.get(msg_id, timeout=timeout)
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi   File "/usr/lib/python3.9/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 435, in get
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi     raise oslo_messaging.MessagingTimeout(
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID f5db602e46e84438afee6523f0beadb9
2023-05-12 22:52:11.314 20 ERROR heat.common.wsgi 

[1] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-pidone-sanity-17.1_director-rhel-virthost-3cont_2comp-ipv6-geneve-tobiko_faults-sanity/20/controller-0/var/log/containers/haproxy/haproxy.log.1.gz
[2] http://rhos-ci-logs.lab.eng.tlv2.redhat.com/logs/rcj/DFG-pidone-sanity-17.1_director-rhel-virthost-3cont_2comp-ipv6-geneve-tobiko_faults-sanity/20/controller-0/var/log/containers/heat/heat_api.log.1.gz

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Run the following job:
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/Phase3/view/OSP%2017.1/view/PidOne/job/DFG-pidone-sanity-17.1_director-rhel-virthost-3cont_2comp-ipv6-geneve-tobiko_faults-sanity

2.
3.

Actual results:
Faults tests fail to create instance.

Expected results:
The tests pass

Additional info:

Comment 8 Julia Marciano 2023-06-05 21:16:53 UTC
Hi Takashi. 

It seems that test_network_disruptor_main_vip does not entirely fulfill the disruption on the ipv6 deployments - traffic to the controller node is not dropped (we're working on fixing it). So for now I'm closing the BZ as 'NOTABUG'.

Thank you,
Julia.