Hide Forgot
Run update on all nodes: (undercloud) [stack@undercloud-0 ~ (undercloud-12-TLV)]$ openstack overcloud update stack Started Mistral Workflow tripleo.package_update.v1.update_nodes. Execution ID: 0b455e17-611d-4cb3-91a2-714f13a3a30e Waiting for messages on queue '04b0b808-da54-4d55-b01a-6bb13194ad71' with no timeout. Update finished but it gets stuck waiting for mistral execution. It was waiting for message: fig', u'type': u'direct'}}, u'name': u'update_nodes', u'tags': [u'tripleo-common-managed'], u'version': u'2.0', u'input': [{u'node_user': u'heat-admin'}, u'nodes', u'playbook', u'inventory_file', {u'queue_name': u'tripleo'}], u'description': u'Take a container and perform an update nodes by nodes'}}}}}}'] ZaqarAction.queue_post failed: Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 1048576. ] (execution_id=0b455e17-611d-4cb3-91a2-714f13a3a30e) 2017-10-05 10:05:53.812 17734 DEBUG mistral.services.triggers [req-eac2a476-cb08-4d63-8ddb-c2dc788d3f6d 81c58aff164344a9811bddd511c739ea 9ed039ecceba4438b980339efe25a93a - default default] No JSON object could be decoded on_workflow_complete /usr/lib/python2.7/site-packages/mistral/services/triggers.py:239
Can you provide the Mistral logs for this? I'm having trouble tracking down the issue. It looks like the workflow is attempting to send a message to Zaqar that is larger than the allowed limit. From reading the tripleo.package_update.v1 workflow and the custom action I can't figure out where that would come from. I'm hoping that a traceback in the logs will provide more details
The message size is already set by instack. The messages posted here is the result of the ansible/puppet upgrade run, it's about 1.2M, more than the 1M allowed. I suggest limiting the message, something like this: http://paste.openstack.org/show/625389/ in tripleo-common. That said, it's bad to have that much data transit in ansible/mistral. Long term, it'd be nice to either produce less logs, or push them to swift directly. There is also an unhealthy amount of warnings produced by the puppet run.
o/ thanks Thomas - yeah agree the truncate is not ideal and have been holding off on posting the review to tripleo-common this morning hoping someone would come up with a better way. I haven't heard one so I'll post it in a moment anyway and we can take it from there.
this is merged to pike so moving POST. Note that thankfully there is a better fix being tracked for https://bugzilla.redhat.com/show_bug.cgi?id=1505926 which will prevent these huge messages in the first place.
openstack-tripleo-common-7.6.3-4.el7ost
Verified with openstack-tripleo-common-7.6.3-8.el7ost.noarch tail oc-update-*log ==> oc-update-00-Controller.log <== u'TASK [debug] *******************************************************************', u'skipping: [192.168.24.20]', u'', u'PLAY RECAP *********************************************************************', u'192.168.24.15 : ok=112 changed=56 unreachable=0 failed=0 ', u'192.168.24.17 : ok=114 changed=56 unreachable=0 failed=0 ', u'192.168.24.20 : ok=112 changed=56 unreachable=0 failed=0 ', u''] ('Response is not a JSON object.', ValueError('No JSON object could be decoded',)) Success ==> oc-update-CephStorage.log <== u'TASK [debug] *******************************************************************', u'skipping: [192.168.24.18]', u'', u'PLAY RECAP *********************************************************************', u'192.168.24.14 : ok=56 changed=13 unreachable=0 failed=0 ', u'192.168.24.18 : ok=56 changed=13 unreachable=0 failed=0 ', u'192.168.24.9 : ok=56 changed=13 unreachable=0 failed=0 ', u''] ('Response is not a JSON object.', ValueError('No JSON object could be decoded',)) Success ==> oc-update-Compute.log <== u'', u'TASK [debug] *******************************************************************', u'skipping: [192.168.24.10]', u'', u'PLAY RECAP *********************************************************************', u'192.168.24.10 : ok=58 changed=13 unreachable=0 failed=0 ', u'192.168.24.12 : ok=58 changed=13 unreachable=0 failed=0 ', u''] ('Response is not a JSON object.', ValueError('No JSON object could be decoded',)) Success
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462