Bug 1518221

Summary: [UPDATES] Error response from Zaqar. Code: 503. Title: Service temporarily unavailable
Product: Red Hat OpenStack Reporter: Yurii Prokulevych <yprokule>
Component: python-tripleoclientAssignee: mathieu bultel <mbultel>
Status: CLOSED ERRATA QA Contact: Yurii Prokulevych <yprokule>
Severity: high Docs Contact:
Priority: urgent    
Version: 12.0 (Pike)CC: apevec, augol, dbecker, hbrock, jpichon, jschluet, jslagle, lbezdick, lhh, mbracho, mbultel, mburns, morazi, rhel-osp-director-maint, rrasouli, sclewis, tvignaud, yprokule
Target Milestone: gaKeywords: Triaged
Target Release: 12.0 (Pike)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-tripleoclient-7.3.3-7.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-13 22:23:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yurii Prokulevych 2017-11-28 12:55:20 UTC
Description of problem:
-----------------------
During minor update of RHOS-12 got  error:
openstack overcloud update stack --nodes Networker
...
 u'TError response from Zaqar. Code: 503. Title: Service temporarily unavailable. Description: Claim could not be created. Please try again in a few seconds..
ASK [Set host puppet debugging fact string] ***********************************',
 u'skipping: [192.168.24.8]',
 u'',
 u'TASK [Write the config_step hieradata] *****************************************',
 u'changed: [192.168.24.8]',
 u'',
 u'TASK [Run puppet host configuration for step 4] ********************************',
 u'changed: [192.168.24.8]']

and this cause playbook to fail:
...

SG:

non-zero return code

changed: [undercloud-0] => (item=Messaging)                                                                                                     

msg: All items completed
        to retry, use: --limit @/root/IR2/IR-SEALUSA-7/plugins/tripleo-upgrade/infrared_plugin/main.retry

PLAY RECAP ********************************************************************************************************************************************************************************************************
undercloud-0               : ok=17   changed=2    unreachable=0    failed=1   

ERROR   Playbook "/root/IR2/IR-SEALUSA-7/plugins/tripleo-upgrade/infrared_plugin/main.yml" failed!



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-zaqar-5.0.0-3.el7ost.noarch
python-zaqarclient-1.7.0-1.el7ost.noarch
puppet-zaqar-11.3.0-3.el7ost.noarch
openstack-tripleo-puppet-elements-7.0.1-1.el7ost.noarch
openstack-tripleo-common-containers-7.6.3-4.el7ost.noarch
python-tripleoclient-7.3.3-5.el7ost.noarch
puppet-tripleo-7.4.3-9.el7ost.noarch
openstack-tripleo-common-7.6.3-4.el7ost.noarch
openstack-tripleo-ui-7.4.3-4.el7ost.noarch
openstack-tripleo-validations-7.4.2-1.el7ost.noarch
openstack-tripleo-heat-templates-7.0.3-13.el7ost.noarch
openstack-tripleo-image-elements-7.0.1-1.el7ost.noarch

Steps to Reproduce:
-------------------
1. Run update of composable deployment (~15nodes)
2. Unfortunately this is not always reproducable


Actual results:
---------------
Update fails and has to be re-run


Expected results:
-----------------
Such events/tracebacks are handled and retried


Additional info:
----------------
Virtual setup: 3controllers + 3messaging + 3database + 2networker + 2computes + 3ceph

Comment 2 Yurii Prokulevych 2017-11-28 13:06:52 UTC
From zaqar.log:
...
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims [(None,) 664ef39f4cff49ec8109f901af05eff8 8793d5e72bf74354b8b8194940c56daa - - -] Queue update does not exist for project 8793d5e72bf74354b8b81
94940c56daa: QueueDoesNotExist: Queue update does not exist for project 8793d5e72bf74354b8b8194940c56daa
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims Traceback (most recent call last):
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims   File "/usr/lib/python2.7/site-packages/zaqar/transport/wsgi/v2_0/claims.py", line 85, in on_post
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims     **claim_options)
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims   File "/usr/lib/python2.7/site-packages/zaqar/common/pipeline.py", line 97, in consumer
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims     tmp = target(*args, **kwargs)
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims   File "/usr/lib/python2.7/site-packages/zaqar/storage/swift/claims.py", line 107, in create
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims     include_claimed=False)
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims   File "/usr/lib/python2.7/site-packages/zaqar/storage/swift/messages.py", line 102, in _list
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims     raise errors.QueueDoesNotExist(queue, project)
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims QueueDoesNotExist: Queue update does not exist for project 8793d5e72bf74354b8b8194940c56daa
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims

Comment 3 Mike Orazi 2017-11-28 20:58:44 UTC
Can you elaborate on how frequently this occurs and whether or not modifying timeouts would like resolve the issue?

I'm leaning towards saying this is not a blocker but it would help to understand the frequency + impact that bug actually has before making that statement.

Comment 4 Mike Orazi 2017-11-29 05:18:06 UTC
And the other question is -- will a re-run of update reliably fix this?

Comment 9 Julie Pichon 2017-11-30 16:48:21 UTC
Link to spec change on pike-rdo: https://review.rdoproject.org/r/#/c/10741/

Comment 12 Yurii Prokulevych 2017-12-13 15:25:44 UTC
Verified with python-tripleoclient-7.3.3-7.el7ost.noarch

Comment 14 errata-xmlrpc 2017-12-13 22:23:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462