Bug 2228385

Summary: [RHOSP16.2]: Message collection size is too large for Zaqar when importing large no. of nodes
Product: Red Hat OpenStack Reporter: Shravan Kumar Tiwari <shtiwari>
Component: openstack-tripleo-commonAssignee: Nobody <nobody>
Status: NEW --- QA Contact: David Rosenfeld <drosenfe>
Severity: high Docs Contact:
Priority: medium    
Version: 16.2 (Train)CC: apevec, hjensas, jkreger, jschluet, lhh, mburns, rrasouli, slinaber, tkajinam
Target Milestone: ---Keywords: Triaged
Target Release: ---Flags: jkreger: needinfo? (shtiwari)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Shravan Kumar Tiwari 2023-08-02 10:00:23 UTC
Description of problem:

Customer has deployed a new cluster with RHOSP16.2.4 (with ansible deployed external CEPH)

They have close to 300 nodes (including compute and ceph nodes)

During the provisioning of the nodes and at the time of node import i.e. running `openstack overcloud node import <baremetal.json-file>`

it fails with following error that zaqar can't handle that size.

6653 zaqarclient.transport.errors.MalformedRequest: Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 2097152.
6654 : zaqarclient.transport.errors.MalformedRequest: Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 2097152.
6655 2023-08-01 10:17:31.014 7 WARNING mistral.executors.default_executor [req-4d983a6f-c9fe-434d-8bb1-d2b94512bff8 e16c4cfb4aca4ce6a8b71ebda330e5c6 24860aa96975472096468319e7943ccd - default default] The action      raised an exception [action_ex_id=a78915d3-28d4-43a2-8881-319970ebb32b, msg='ZaqarAction.queue_post failed: Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection      size is too large. Max size 2097152.', action_cls='<class 'mistral.actions.action_factory.ZaqarAction'>', attributes='{'client_method_name': 'queue_post'}', params='{'queue_name': 'tripleo', 'messages': {'

Version-Release number of selected component (if applicable):

RHOSP16.2.4

Actual results:

- node import fails with "Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 2097152"

Expected results:

- Though there is a workaround to increase the size by setting param max_messages_post_size with appropriate but this leaves customer in trail and error situation as it is not known which size is good for how many nodes and also to know if this increased size can have impact somewhere else during the workflow.

Additional info:

There was a bug[1] in the past that was backported in RHOSP13 and seems to be available with RHOSP16.x releases. but, it seems that it takes care of the scenario/workflow during overcloud deploy.
So, for the current scenario where customer observed it in during node import time  it has to be analyzed further and worked upon accordingly.

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1712278

Comment 2 Takashi Kajinami 2023-08-02 10:52:54 UTC
This is not a bug in Zaqar but a problem caused by the too large message sent by mistral workflows provided by tripleo-common.

I'm reassigning this to the correct component.