Bug 1417675

Summary: [UPDATES] zaqar enters failed state during undercloud update
Product: Red Hat OpenStack Reporter: Yurii Prokulevych <yprokule>
Component: openstack-zaqarAssignee: Thomas Hervé <therve>
Status: CLOSED ERRATA QA Contact: Ola Pavlenko <opavlenk>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 11.0 (Ocata)CC: apevec, augol, emacchi, jcoufal, jpichon, jschluet, lbezdick, lhh, mandreou, mburns, mcornea, opavlenk, rhel-osp-director-maint, sathlang, yprokule
Target Milestone: rcKeywords: Triaged
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-zaqar-4.0.0-1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-17 19:43:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1394025    

Description Yurii Prokulevych 2017-01-30 15:59:23 UTC
Description of problem:
-----------------------
Sometimes during update openstack-zaqar service fails, causing other depending operations to fail.

E.g. while updating deployment plan:

export THT=/usr/share/openstack-tripleo-heat-templates/
export MYDIR=/home/stack/yprokule_notes/ComposableTelemetryRole/

time openstack overcloud deploy --templates \
-r ${THT}/roles_data.yaml \
--ntp-server clock1.rdu2.redhat.com \
-e ${THT}/environments/network-isolation.yaml \
-e ${THT}/environments/storage-environment.yaml \
-e ${THT}/environments/tls-endpoints-public-ip.yaml \
-e ${THT}/environments/puppet-pacemaker.yaml \
-e ${MYDIR}/telemetry-nodes.yaml \
-e ${MYDIR}/network/network-environment-v4.yaml \
-e ${MYDIR}/ceph-disk-layout.yaml \
-e ${MYDIR}/enable_panko.yaml \
-e ${MYDIR}/public_vip_4.yaml \
-e ${MYDIR}/enable-tls.yaml \
-e ${MYDIR}/inject-trust-anchor.yaml \
-e ${MYDIR}/neutron_config.yaml --update-plan-only
...
Removing the current plan files
Uploading new plan files
Started Mistral Workflow tripleo.plan_management.v1.update_deployment_plan. Execution ID: 4700f609-c936-4015-a003-495aec8566e3
('The read operation timed out',)


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
python-zaqarclient-1.3.0-0.20170116162225.cb635f5.el7ost.noarch
openstack-zaqar-4.0.0-0.20170116183040.f8e0d80.el7ost.noarch
puppet-zaqar-10.1.0-0.20170113002530.aaaa38c.el7ost.noarch

openstack-heat-templates-0.0.1-0.20170109231310.01b1768.el7ost.noarch
openstack-tripleo-heat-templates-6.0.0-0.20170117170440.371f086.el7ost.noarch


Steps to Reproduce:
-------------------
1. Setup latest RHOS-11 repos
2. Run:
    openstack undercloud upgrade


Additional info:
----------------
This doesn't reproduce 100%

Comment 2 Ola Pavlenko 2017-02-06 11:19:24 UTC
Hi Yurii,

Was there an overcloud deployed before performing undercloud update?

Comment 3 Yurii Prokulevych 2017-02-06 11:30:15 UTC
Hi Ola,

Yep, overcloud was successfully deployed. 

Looks like this also affects upgrades:
https://bugs.launchpad.net/tripleo/+bug/1661227

Comment 4 Thomas Hervé 2017-03-02 11:09:30 UTC
The issue ought to be worked around in packaging. Can you confirm that it works for you? Thanks.

Comment 8 errata-xmlrpc 2017-05-17 19:43:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245