Bug 1518221 - [UPDATES] Error response from Zaqar. Code: 503. Title: Service temporarily unavailable
Summary: [UPDATES] Error response from Zaqar. Code: 503. Title: Service temporarily un...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-tripleoclient
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ga
: 12.0 (Pike)
Assignee: mathieu bultel
QA Contact: Yurii Prokulevych
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-28 12:55 UTC by Yurii Prokulevych
Modified: 2018-02-05 19:18 UTC (History)
18 users (show)

Fixed In Version: python-tripleoclient-7.3.3-7.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 22:23:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1734957 0 None None None 2017-11-28 19:04:59 UTC
OpenStack gerrit 523500 0 None master: MERGED python-tripleoclient: Catch zaqar exception when no message to claim (I802ffd553c54e4a4f9998420645aa078f508b9f0) 2017-11-30 16:52:14 UTC
OpenStack gerrit 524128 0 None stable/pike: NEW python-tripleoclient: Catch zaqar exception when no message to claim (I802ffd553c54e4a4f9998420645aa078f508b9f0) 2017-11-30 16:52:07 UTC
RDO 10741 0 None pike-rdo: MERGED openstack/tripleoclient-distgit: Add python-zaqarclient as requires for tripleoclient (I481f86ea251ae27e259e9a6a0224fe82... 2017-11-30 16:52:27 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Yurii Prokulevych 2017-11-28 12:55:20 UTC
Description of problem:
-----------------------
During minor update of RHOS-12 got  error:
openstack overcloud update stack --nodes Networker
...
 u'TError response from Zaqar. Code: 503. Title: Service temporarily unavailable. Description: Claim could not be created. Please try again in a few seconds..
ASK [Set host puppet debugging fact string] ***********************************',
 u'skipping: [192.168.24.8]',
 u'',
 u'TASK [Write the config_step hieradata] *****************************************',
 u'changed: [192.168.24.8]',
 u'',
 u'TASK [Run puppet host configuration for step 4] ********************************',
 u'changed: [192.168.24.8]']

and this cause playbook to fail:
...

SG:

non-zero return code

changed: [undercloud-0] => (item=Messaging)                                                                                                     

msg: All items completed
        to retry, use: --limit @/root/IR2/IR-SEALUSA-7/plugins/tripleo-upgrade/infrared_plugin/main.retry

PLAY RECAP ********************************************************************************************************************************************************************************************************
undercloud-0               : ok=17   changed=2    unreachable=0    failed=1   

ERROR   Playbook "/root/IR2/IR-SEALUSA-7/plugins/tripleo-upgrade/infrared_plugin/main.yml" failed!



Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-zaqar-5.0.0-3.el7ost.noarch
python-zaqarclient-1.7.0-1.el7ost.noarch
puppet-zaqar-11.3.0-3.el7ost.noarch
openstack-tripleo-puppet-elements-7.0.1-1.el7ost.noarch
openstack-tripleo-common-containers-7.6.3-4.el7ost.noarch
python-tripleoclient-7.3.3-5.el7ost.noarch
puppet-tripleo-7.4.3-9.el7ost.noarch
openstack-tripleo-common-7.6.3-4.el7ost.noarch
openstack-tripleo-ui-7.4.3-4.el7ost.noarch
openstack-tripleo-validations-7.4.2-1.el7ost.noarch
openstack-tripleo-heat-templates-7.0.3-13.el7ost.noarch
openstack-tripleo-image-elements-7.0.1-1.el7ost.noarch

Steps to Reproduce:
-------------------
1. Run update of composable deployment (~15nodes)
2. Unfortunately this is not always reproducable


Actual results:
---------------
Update fails and has to be re-run


Expected results:
-----------------
Such events/tracebacks are handled and retried


Additional info:
----------------
Virtual setup: 3controllers + 3messaging + 3database + 2networker + 2computes + 3ceph

Comment 2 Yurii Prokulevych 2017-11-28 13:06:52 UTC
From zaqar.log:
...
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims [(None,) 664ef39f4cff49ec8109f901af05eff8 8793d5e72bf74354b8b8194940c56daa - - -] Queue update does not exist for project 8793d5e72bf74354b8b81
94940c56daa: QueueDoesNotExist: Queue update does not exist for project 8793d5e72bf74354b8b8194940c56daa
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims Traceback (most recent call last):
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims   File "/usr/lib/python2.7/site-packages/zaqar/transport/wsgi/v2_0/claims.py", line 85, in on_post
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims     **claim_options)
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims   File "/usr/lib/python2.7/site-packages/zaqar/common/pipeline.py", line 97, in consumer
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims     tmp = target(*args, **kwargs)
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims   File "/usr/lib/python2.7/site-packages/zaqar/storage/swift/claims.py", line 107, in create
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims     include_claimed=False)
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims   File "/usr/lib/python2.7/site-packages/zaqar/storage/swift/messages.py", line 102, in _list
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims     raise errors.QueueDoesNotExist(queue, project)
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims QueueDoesNotExist: Queue update does not exist for project 8793d5e72bf74354b8b8194940c56daa
2017-11-28 07:29:33.736 1564 ERROR zaqar.transport.wsgi.v2_0.claims

Comment 3 Mike Orazi 2017-11-28 20:58:44 UTC
Can you elaborate on how frequently this occurs and whether or not modifying timeouts would like resolve the issue?

I'm leaning towards saying this is not a blocker but it would help to understand the frequency + impact that bug actually has before making that statement.

Comment 4 Mike Orazi 2017-11-29 05:18:06 UTC
And the other question is -- will a re-run of update reliably fix this?

Comment 9 Julie Pichon 2017-11-30 16:48:21 UTC
Link to spec change on pike-rdo: https://review.rdoproject.org/r/#/c/10741/

Comment 12 Yurii Prokulevych 2017-12-13 15:25:44 UTC
Verified with python-tripleoclient-7.3.3-7.el7ost.noarch

Comment 14 errata-xmlrpc 2017-12-13 22:23:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.