Bug 1647566
Summary: | overcloud deployment received 504 Gateway Time-out | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Benjamin Schmaus <bschmaus> | ||||||||
Component: | openstack-tripleo-common | Assignee: | Adriano Petrich <apetrich> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Artem Hrechanychenko <ahrechan> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 14.0 (Rocky) | CC: | bschmaus, ccopello, emacchi, jjoyce, jschluet, mburns, mcornea, slinaber, tvignaud | ||||||||
Target Milestone: | --- | Keywords: | TestOnly, Triaged, ZStream | ||||||||
Target Release: | 14.0 (Rocky) | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | openstack-tripleo-common-9.4.1-0.20181012010880.el7ost | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2019-03-18 13:03:13 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Benjamin Schmaus
2018-11-07 19:39:16 UTC
mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server [req-a860645d-f26e-478d-924a-5e16d7351164 711cecebd3364fff84095d0d4e851397 b9de7c02071b4437879b5a00e8608329 - default default] Exception during message handling: MessagingTimeout: Timed out waiting for a reply to message ID c9a09aba5df04446851c10277f6a7c38 mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server Traceback (most recent call last): mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/mistral/engine/engine_server.py", line 137, in start_action mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server **params mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/mistral/db/utils.py", line 88, in decorate mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server return retry.call(_with_auth_context, auth_ctx, func, *args, **kw) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 330, in call mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server start_time=start_time) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 279, in iter mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server return fut.result() mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 422, in result mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server return self.__get_result() mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 333, in call mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server result = fn(*args, **kwargs) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/mistral/db/utils.py", line 45, in _with_auth_context mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server return func(*args, **kw) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/mistral/engine/action_queue.py", line 94, in decorate mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server res = func(*args, **kw) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/mistral/engine/default_engine.py", line 104, in start_action mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server timeout=timeout) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/mistral/engine/actions.py", line 301, in run mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server timeout=timeout mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server result = f(*args, **kwargs) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/mistral/rpc/clients.py", line 384, in run_action mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server return rpc_client_method(auth_ctx.ctx(), 'run_action', **rpc_kwargs) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/mistral/rpc/oslo/oslo_client.py", line 38, in sync_call mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server **kwargs mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 179, in call mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server retry=self.retry) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133, in _send mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server retry=retry) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 584, in send mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server call_monitor_timeout, retry=retry) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 573, in _send mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server call_monitor_timeout) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in wait mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server message = self.waiters.get(msg_id, timeout=timeout) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 336, in get mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server 'to message ID %s' % msg_id) mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server MessagingTimeout: Timed out waiting for a reply to message ID c9a09aba5df04446851c10277f6a7c38 mistral/engine.log:2018-11-07 19:24:55.321 1 ERROR oslo_messaging.rpc.server mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base [req-a860645d-f26e-478d-924a-5e16d7351164 711cecebd3364fff84095d0d4e851397 b9de7c02071b4437879b5a00e8608329 - default default] Caught a messaging remote error. See details of the original exception.: MessagingTimeout: Timed out waiting for a reply to message ID 166a68eb61cf41efbfbdf9cc13de5438 mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base Traceback (most recent call last): mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base File "/usr/lib/python2.7/site-packages/mistral/rpc/base.py", line 99, in decorator mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base return method(*args, **kwargs) mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base File "/usr/lib/python2.7/site-packages/mistral/rpc/clients.py", line 163, in start_action mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base params=params mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base File "/usr/lib/python2.7/site-packages/mistral/rpc/oslo/oslo_client.py", line 38, in sync_call mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base **kwargs mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 179, in call mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base retry=self.retry) mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133, in _send mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base retry=retry) mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 584, in send mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base call_monitor_timeout, retry=retry) mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 573, in _send mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base call_monitor_timeout) mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 459, in wait mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base message = self.waiters.get(msg_id, timeout=timeout) mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 336, in get mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base 'to message ID %s' % msg_id) mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base MessagingTimeout: Timed out waiting for a reply to message ID 166a68eb61cf41efbfbdf9cc13de5438 mistral/api.log:2018-11-07 19:24:55.236 26 ERROR mistral.rpc.base mistral/api.log:2018-11-07 19:24:55.245 26 ERROR wsme.api [req-a860645d-f26e-478d-924a-5e16d7351164 711cecebd3364fff84095d0d4e851397 b9de7c02071b4437879b5a00e8608329 - default default] Server-side error: "MessagingTimeout: Timed out waiting for a reply to message ID 166a68eb61cf41efbfbdf9cc13de5438". Detail: Created attachment 1503104 [details]
Deploy Logs
Created attachment 1503105 [details]
Templates used
Created attachment 1503106 [details]
Mistral Logs
The issue is with tripleo.parameters.update action. It's run as a sync action, so it's subjec to default rpc timeout of Mistral (120s), but in this case it takes a little bit longer (126s to be precise in the logs we see). It's hard to know why it's so long. The vast majority seems to be spent in the action itself and in Swift call. We can see a log: Template: overcloud.yaml And nothing until 75s later. Which means we spend this time on plan_utils.build_env_paths and plan_utils.build_env_paths, which sounds suspicious. We need to either investigate the time and cut it down, or not call the action synchronously. Just a note the undercloud in this case has 14 cores and 32gb of memory. I'll be testing puddle: RH7-RHOS-14.0 2018-11-07.2 this morning and see if I get any different results. Also is there anyway I can work around this in the meantime? As a short work around you can raise rpc_response_timeout in mistral.conf to 240. Would it possible to access your env to debug it further? Alright, I've pushed a bunch of fixes for that bug. The hammer is https://review.openstack.org/616556 which turns the action to a workflow. We won't care as much about the time. https://review.openstack.org/616679 and https://review.openstack.org/616683 are 2 performances enhancements which ought to improve things. I wasn't able to test on the environment (it needs enough services enable to appear). I attempted to set the rpc_response_timeout in /usr/share/openstack-tripleo-heat-templates/puppet/services/mistral-base.yaml sed -i -e 's/mistral::rpc_response_timeout: 120/mistral::rpc_response_timeout: 720/g' /usr/share/openstack-tripleo-heat-templates/puppet/services/mistral-base.yaml Then I initiated the undercloud install and moved onto the overcloud and still ended up with the gateway timeout. It appears to be set: [root@ks-oseosp2-ospd ~]# docker ps|grep mistral 5729c89ed5c6 10.19.139.251:5000/rhosp14/openstack-mistral-api:latest "kolla_start" About an hour ago Up About an hour (healthy) mistral_api f72d59309e99 10.19.139.251:5000/rhosp14/openstack-mistral-engine:latest "kolla_start" About an hour ago Up About an hour (healthy) mistral_engine deab24685fd6 10.19.139.251:5000/rhosp14/openstack-mistral-event-engine:latest "kolla_start" About an hour ago Up About an hour (healthy) mistral_event_engine 266454b91577 10.19.139.251:5000/rhosp14/openstack-mistral-executor:latest "kolla_start" About an hour ago Up About an hour (healthy) mistral_executor [root@ks-oseosp2-ospd ~]# docker exec -ti f72d59309e99 cat /etc/mistral/mistral.conf|grep rpc rpc_response_timeout=720 [root@ks-oseosp2-ospd ~]# docker exec -ti deab24685fd6 cat /etc/mistral/mistral.conf|grep rpc rpc_response_timeout=720 [root@ks-oseosp2-ospd ~]# docker exec -ti 266454b91577 cat /etc/mistral/mistral.conf|grep rpc rpc_response_timeout=720 [root@ks-oseosp2-ospd ~]# docker exec -ti 5729c89ed5c6 cat /etc/mistral/mistral.conf|grep rpc rpc_response_timeout=720 [root@ks-oseosp2-ospd ~]# OK, I don't know then. I'd need to look with the change applied. Thomas the patch you applied on the undercloud seemed to resolve the issue. The deploy failed for another reason but that is outside of mistral :) Can you provide the code bits you updated? Hi Benjamin, can you provide a link how you configured TLS for Undercloud. I know how to configure TLS everywhere for Overcloud nodes but I don't know anything about TLS for Undercloud. Thx! I wasn't able to reproduce this issue with the latest OSP14 build(openstack-tripleo-common-9.4.1-0.20181012010888.el7ost.noarch). Please re-open if you hit this issue again. If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text. If this bug does not require doc text, please set the 'requires_doc_text' flag to -. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0446 |