Hide Forgot
Description of problem: ----------------------- During minor update of RHOS-12 got error: openstack overcloud update stack --nodes Networker ... u'TError response from Zaqar. Code: 503. Title: Service temporarily unavailable. Description: Claim could not be created. Please try again in a few seconds.. ASK [Set host puppet debugging fact string] ***********************************', u'skipping: [192.168.24.8]', u'', u'TASK [Write the config_step hieradata] *****************************************', u'changed: [192.168.24.8]', u'', u'TASK [Run puppet host configuration for step 4] ********************************', u'changed: [192.168.24.8]'] and this cause playbook to fail: ... SG: non-zero return code changed: [undercloud-0] => (item=Messaging) msg: All items completed to retry, use: --limit @/root/IR2/IR-SEALUSA-7/plugins/tripleo-upgrade/infrared_plugin/main.retry PLAY RECAP ******************************************************************************************************************************************************************************************************** undercloud-0 : ok=17 changed=2 unreachable=0 failed=1 ERROR Playbook "/root/IR2/IR-SEALUSA-7/plugins/tripleo-upgrade/infrared_plugin/main.yml" failed! Version-Release number of selected component (if applicable): ------------------------------------------------------------- openstack-zaqar-5.0.0-3.el7ost.noarch python-zaqarclient-1.7.0-1.el7ost.noarch puppet-zaqar-11.3.0-3.el7ost.noarch openstack-tripleo-puppet-elements-7.0.1-1.el7ost.noarch openstack-tripleo-common-containers-7.6.3-4.el7ost.noarch python-tripleoclient-7.3.3-5.el7ost.noarch puppet-tripleo-7.4.3-9.el7ost.noarch openstack-tripleo-common-7.6.3-4.el7ost.noarch openstack-tripleo-ui-7.4.3-4.el7ost.noarch openstack-tripleo-validations-7.4.2-1.el7ost.noarch openstack-tripleo-heat-templates-7.0.3-13.el7ost.noarch openstack-tripleo-image-elements-7.0.1-1.el7ost.noarch Steps to Reproduce: ------------------- 1. Run update of composable deployment (~15nodes) 2. Unfortunately this is not always reproducable Actual results: --------------- Update fails and has to be re-run Expected results: ----------------- Such events/tracebacks are handled and retried Additional info: ---------------- Virtual setup: 3controllers + 3messaging + 3database + 2networker + 2computes + 3ceph
From zaqar.log: ... 2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims [(None,) 664ef39f4cff49ec8109f901af05eff8 8793d5e72bf74354b8b8194940c56daa - - -] Queue update does not exist for project 8793d5e72bf74354b8b81 94940c56daa: QueueDoesNotExist: Queue update does not exist for project 8793d5e72bf74354b8b8194940c56daa 2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims Traceback (most recent call last): 2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims File "/usr/lib/python2.7/site-packages/zaqar/transport/wsgi/v2_0/claims.py", line 85, in on_post 2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims **claim_options) 2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims File "/usr/lib/python2.7/site-packages/zaqar/common/pipeline.py", line 97, in consumer 2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims tmp = target(*args, **kwargs) 2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims File "/usr/lib/python2.7/site-packages/zaqar/storage/swift/claims.py", line 107, in create 2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims include_claimed=False) 2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims File "/usr/lib/python2.7/site-packages/zaqar/storage/swift/messages.py", line 102, in _list 2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims raise errors.QueueDoesNotExist(queue, project) 2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims QueueDoesNotExist: Queue update does not exist for project 8793d5e72bf74354b8b8194940c56daa 2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims
Can you elaborate on how frequently this occurs and whether or not modifying timeouts would like resolve the issue? I'm leaning towards saying this is not a blocker but it would help to understand the frequency + impact that bug actually has before making that statement.
And the other question is -- will a re-run of update reliably fix this?
Link to spec change on pike-rdo: https://review.rdoproject.org/r/#/c/10741/
Verified with python-tripleoclient-7.3.3-7.el7ost.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3462