Description of problem: Deleting a plan from the GUI takes up to 3 times longer than it used to. I am running on a bare metal system with 32GB RAM and 12 cores (2.4 GHz). Sometimes delete plan fails with this server error, indicating a timeout: Internal Server Error {"debuginfo": null, "faultcode": "Server", "faultstring": "MessagingTimeout: Timed out waiting for a reply to message ID 085fdca34cb84fe7954c75fa560d7406"} Internal Server Error {"debuginfo": null, "faultcode": "Server", "faultstring": "MessagingTimeout: Timed out waiting for a reply to message ID 085fdca34cb84fe7954c75fa560d7406"} Version-Release number of selected component (if applicable): openstack-tripleo-ui-3.1.0-9.el7ost.noarch openstack-mistral-engine-4.0.0-4.el7ost.noarch How reproducible: randomly Steps to Reproduce: 1. Upload a plan 2. Delete the plan Actual results: It takes much longer than expected for something as simple as delete a plan. Additional info: After you get the error in the GUI that the deletion failed with a timeout, you can press F5 and then you see that the deletion succeeded... This bug can be related to the websockets bug https://bugzilla.redhat.com/show_bug.cgi?id=1459926
Created attachment 1287003 [details] Screenshot Attached a screenshot of the error. You can see that the plan "BasicWorkflowPlan" is still displayed in the list after it failed to get deleted. From the browser console you see: https://puma01.scl.lab.tlv.redhat.com/mistral/v2/action_executions Failed to load resource: the server responded with a status of 500 (Internal Server Error) logger.js:58 Error deleting plan MistralApiService.runAction XMLHttpRequesterror @ logger.js:58(anonymous function) @ logger.js:149dispatch @ logger.js:142_this3.(anonymous function) @ logger.js:109(anonymous function) @ PlansActions.js:382tryCatchReject @ makePromise.js:845runContinuation1 @ makePromise.js:804Rejected.when @ makePromise.js:625Pending.run @ makePromise.js:483Scheduler._drain @ Scheduler.js:62Scheduler.drain @ Scheduler.js:27run @ env.js:63 Error deleting plan MistralApiService.runAction XMLHttpRequesterror @ logger.js:58(anonymous function) @ logger.js:149dispatch @ logger.js:142_this3.(anonymous function) @ logger.js:109(anonymous function) @ PlansActions.js:382tryCatchReject @ makePromise.js:845runContinuation1 @ makePromise.js:804Rejected.when @ makePromise.js:625Pending.run @ makePromise.js:483Scheduler._drain @ Scheduler.js:62Scheduler.drain @ Scheduler.js:27run @ env.js:63
We believe that this is not an issue with the UI but rather the Mistral service.
I'm not seeing this happening. I did a bunch of plans creates and deletes (in master) and I didn't get anything over 20s (undercloud) [stack@undercloud foo]$ time openstack overcloud plan delete foo Deleting plan foo... real 0m13.661s user 0m1.336s sys 0m0.088s (undercloud) [stack@undercloud foo]$ time openstack overcloud plan create foo Started Mistral Workflow tripleo.plan_management.v1.create_deployment_plan. Execution ID: 9cb4ae57-60c8-42cd-9e7f-d66ec1227658 Plan created. real 0m25.724s user 0m1.349s sys 0m0.109s (undercloud) [stack@undercloud foo]$ time openstack overcloud plan delete foo Deleting plan foo... real 0m13.449s user 0m1.341s sys 0m0.086s (undercloud) [stack@undercloud foo]$ time openstack overcloud plan create foo Started Mistral Workflow tripleo.plan_management.v1.create_deployment_plan. Execution ID: 94a3f5ce-35a8-4ae3-bb49-08478db30cab Plan created. real 0m25.357s user 0m1.327s sys 0m0.108s (undercloud) [stack@undercloud foo]$ time openstack overcloud plan delete foo Deleting plan foo... real 0m13.630s user 0m1.342s sys 0m0.089s Is this still an issue?
There is an improvement, on my system it takes about 30 seconds now: ]$ time openstack overcloud plan delete test-plan2 Deleting plan test-plan2... real 0m32.040s user 0m0.946s sys 0m0.128s However I still think that deleting a plan should not take so long, so the bug is still an issue (but a much lower severity one).
There are a number of aspects related to this work; The error happened because the action hit the maximum execution time for direct action calls. This is why we are transitioning the UI to only calling workflows (which then run the actions without a time limit). This limit was increased, so shouldn't be an issue [1]. There is also work against master to improve the performance. So it wont be an issue in the future. [2] [1]: https://review.openstack.org/#/c/509811/ [2]: https://review.openstack.org/#/c/553616/ I believe backporting these changes isn't realistic as they would be non-trivial and the bug is marked as low - so I don't think wise at this point. I am closing this bug for these reasons, if you believe that is in error please re-open and correct me.