Bug 2228385
| Summary: | [RHOSP16.2]: Message collection size is too large for Zaqar when importing large no. of nodes | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Shravan Kumar Tiwari <shtiwari> |
| Component: | openstack-tripleo-common | Assignee: | Nobody <nobody> |
| Status: | NEW --- | QA Contact: | David Rosenfeld <drosenfe> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 16.2 (Train) | CC: | apevec, hjensas, jkreger, jschluet, lhh, mburns, rrasouli, slinaber, tkajinam |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | Flags: | jkreger:
needinfo?
(shtiwari) |
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This is not a bug in Zaqar but a problem caused by the too large message sent by mistral workflows provided by tripleo-common. I'm reassigning this to the correct component. |
Description of problem: Customer has deployed a new cluster with RHOSP16.2.4 (with ansible deployed external CEPH) They have close to 300 nodes (including compute and ceph nodes) During the provisioning of the nodes and at the time of node import i.e. running `openstack overcloud node import <baremetal.json-file>` it fails with following error that zaqar can't handle that size. 6653 zaqarclient.transport.errors.MalformedRequest: Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 2097152. 6654 : zaqarclient.transport.errors.MalformedRequest: Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 2097152. 6655 2023-08-01 10:17:31.014 7 WARNING mistral.executors.default_executor [req-4d983a6f-c9fe-434d-8bb1-d2b94512bff8 e16c4cfb4aca4ce6a8b71ebda330e5c6 24860aa96975472096468319e7943ccd - default default] The action raised an exception [action_ex_id=a78915d3-28d4-43a2-8881-319970ebb32b, msg='ZaqarAction.queue_post failed: Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 2097152.', action_cls='<class 'mistral.actions.action_factory.ZaqarAction'>', attributes='{'client_method_name': 'queue_post'}', params='{'queue_name': 'tripleo', 'messages': {' Version-Release number of selected component (if applicable): RHOSP16.2.4 Actual results: - node import fails with "Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 2097152" Expected results: - Though there is a workaround to increase the size by setting param max_messages_post_size with appropriate but this leaves customer in trail and error situation as it is not known which size is good for how many nodes and also to know if this increased size can have impact somewhere else during the workflow. Additional info: There was a bug[1] in the past that was backported in RHOSP13 and seems to be available with RHOSP16.x releases. but, it seems that it takes care of the scenario/workflow during overcloud deploy. So, for the current scenario where customer observed it in during node import time it has to be analyzed further and worked upon accordingly. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1712278