Description of problem:
os-collect-config should retry signaling success of a resource if it receives a 500 error message from heat
In a customer environment, we can observe the following behavior:
* Director services are making the undercloud churn a lot
* communication between heat and keystone breaks temporarily due to load on undercloud
* heat returns 500 code to the compute's os-collect-config when the latter signals the success of resource ComputeSshKnownHostsDeployment
* os-collect-config never tries to signal the success of that resource again
os-collect-config should retry
Turns out I fixed this back in 2017. https://review.opendev.org/#/c/519417/ So it's in openstack-heat-agents >= 1.5.1 which shipped with OSP13. I'll see about backporting this to OSP10.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.