Bug 1730435

Summary: [RHOSP10]"openstack overcloud update" command fails because of timeout. The same messages are looped in heat-engine.log
Product: Red Hat OpenStack Reporter: Alex Stupnikov <astupnik>
Component: openstack-heatAssignee: Alex Schultz <aschultz>
Status: CLOSED DUPLICATE QA Contact: Victor Voronkov <vvoronko>
Severity: medium Docs Contact:
Priority: medium    
Version: 10.0 (Newton)CC: aschultz, mburns, ojanas, ramishra, rsunog, sbaker, shardy
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-19 11:05:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alex Stupnikov 2019-07-16 17:49:27 UTC
Description of problem:

Customer is unable to complete minor update procedure because "openstack overcloud update stack -i overcloud" fails after timeout without reasonable output.

Observations:

- originally customer issued overcloud update command against environment without active subscription. We have observed yum-related errors and fixed them by allocating proper subscriptions and enabling proper repos;
- "openstack overcloud deploy" command was successfully issued by customer to restore overcloud stack's health;
- Currently "openstack overcloud update stack -i overcloud" command fails because of timeout.

In heat-engine.log we see a loop of common messages, for example:

2019-07-12 16:46:19.179 4410 DEBUG heat.engine.scheduler [req-037781ea-281c-401c-b710-ad2029b71a45 - - - - -] Task update_task from Stack "overcloud-Compute-jyjpqgykup3r-0-wlgyupxughgk" [30f3cb61-8a59-4c24-bbcb-63693a03e84f] running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:216

Latest failed command was issued at 2019-07-12 12:42 (local time). Log files and sosreport will be provided privately.

At this point we need a help to identify a root cause...

Comment 2 Alex Schultz 2019-07-16 20:17:16 UTC
Traditionally when it's looping like that, a deployment task never completed on one of the systems.  Was that compute down at time of deployment or os-collect-config not running?  Please provide a sosreport from the compute node

Comment 6 Alex Schultz 2019-07-17 17:10:24 UTC
They are likely hitting Bug 1678225 which was resolved in openstack-heat-7.0.6-6.el7ost but according to the sosreports they have openstack-heat-7.0.6-4.el7ost.noarch

Comment 7 Alex Schultz 2019-07-17 17:12:37 UTC
Please have them update the undercloud and try again. I will leave this bug open for a bit longer, but it likely needs to be marked as a duplicate of Bug 1678225

Comment 8 Alex Stupnikov 2019-07-19 09:32:37 UTC
Alex, thank you for you help and sorry for ambiguous report. Customer confirmed your conclusion and closed the case. I believe that this bug can also be closed.

Regards, Alex S.

Comment 9 Alex Schultz 2019-07-19 22:05:39 UTC

*** This bug has been marked as a duplicate of bug 1678225 ***