Description of problem:
Customer is unable to complete minor update procedure because "openstack overcloud update stack -i overcloud" fails after timeout without reasonable output.
- originally customer issued overcloud update command against environment without active subscription. We have observed yum-related errors and fixed them by allocating proper subscriptions and enabling proper repos;
- "openstack overcloud deploy" command was successfully issued by customer to restore overcloud stack's health;
- Currently "openstack overcloud update stack -i overcloud" command fails because of timeout.
In heat-engine.log we see a loop of common messages, for example:
2019-07-12 16:46:19.179 4410 DEBUG heat.engine.scheduler [req-037781ea-281c-401c-b710-ad2029b71a45 - - - - -] Task update_task from Stack "overcloud-Compute-jyjpqgykup3r-0-wlgyupxughgk" [30f3cb61-8a59-4c24-bbcb-63693a03e84f] running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:216
Latest failed command was issued at 2019-07-12 12:42 (local time). Log files and sosreport will be provided privately.
At this point we need a help to identify a root cause...
Traditionally when it's looping like that, a deployment task never completed on one of the systems. Was that compute down at time of deployment or os-collect-config not running? Please provide a sosreport from the compute node
They are likely hitting Bug 1678225 which was resolved in openstack-heat-7.0.6-6.el7ost but according to the sosreports they have openstack-heat-7.0.6-4.el7ost.noarch
Please have them update the undercloud and try again. I will leave this bug open for a bit longer, but it likely needs to be marked as a duplicate of Bug 1678225
Alex, thank you for you help and sorry for ambiguous report. Customer confirmed your conclusion and closed the case. I believe that this bug can also be closed.
Regards, Alex S.
*** This bug has been marked as a duplicate of bug 1678225 ***