Description of problem:
When executing the 'openstack overcloud ffwd-upgrade prepare' on an OSP10 environment which uses RHELRegistration, Heat attempts to delete the RHELRegistration resource which triggers a deployment of RHELUnregister.
This deployment successfully executes on the nodes, but the completion callback is never registered by Heat.
Version-Release number of selected component (if applicable):
Deploy an OSP10 overcloud registered with Satellite using the included rhel-registration Heat templates and attempt to upgrade to OSP 13.
Steps to Reproduce:
1. Begin with a deployed OSP10 overcloud registered to Satellite using the rhel-registration Heat template.
2. Proceed with the FFU process to the point of running 'ffwd-upgrade prepare' on the upgraded OSP13 undercloud.
3. Observe that the 'ffwd-upgrade prepare' hangs when appearing to update NodeExtraConfig on the overcloud resources
4. Check 'openstack software deployment list | grep -v COMPL' and observe an in-progress deployment for each overcloud node
5. Check the config for each of the in-progress deployments and observe that they are all RHELUnregister
6. Observe that the overcloud nodes are no longer registered to Satellite
Nodes are unregistered from Satellite and the ffwd-upgrade prepare process is stuck.
Nodes remain registered to Satellite so they can eventually update to OSP13
When attempting to manually trigger the signal with curl, the following error is returned by Heat:
<ErrorResponse><Error><Message>A bad or out-of-range value was supplied:signal is not supported for resource.
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/heat/common/context.py", line 409, in wrapped
return func(self, ctx, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 1824, in resource_signal
_resource_signal(stack, rsrc, details, False)
File "/usr/lib/python2.7/site-packages/heat/engine/service.py", line 1789, in _resource_signal
needs_metadata_updates = rsrc.signal(details, need_check)
File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 2508, in signal
File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 2453, in _handle_signal
ResourceActionNotSupported: signal is not supported for resource.
We hit this during testing FFWD, it's a blocker as the overcloud Heat stack effectively got into a state which was (AFAIU) unrecoverable without manual editing of the DB.
I think we managed to work around the issue by manually applying patches on the OSP 10 templates before going through FFWD procedure. I think the patches were probably:
but will confirm with Andrew / Randy.
^ If the above is correct, we may have already patched OSP 10.z ready, through bug 1547091.
At another look, it seems that we'll also need this one:
You'll also need https://review.openstack.org/#/c/558541/
For me modifying the software_deployment table in heat from IN-PROGRESS to COMPLETE, got the ‘ffwd-upgrade prepare’ unstuck (verified 3 times now). Also, had to reregister all overcloud nodes before ‘ffwd-upgrade run’.
It would be nice if we could have 'DeleteOnRHELUnregistration: True' or something like that in OSP10, based on https://review.openstack.org/#/c/492970
With #1574610 we should just document the issue in FFU now.
The 10 backport is MODIFIED so i'll move this at least to POST.
I am thinking you are saying this bug is really meant as a queue to retest once all the bits have landed in OSP 10. Are there any merges that we need to track in this bug against 13 or should this be TestOnly or something similar?
Yes exactly, we're just waiting for the dependent BZ to land in OSP 10 puddle so that we can retest, i was unsure about how to properly reflect that in the state of this BZ.
I'm adding TestOnly keyword. Should this stay in POST or move to some other state?
Thanks TestOnly is right flag to add, and once we have build in a puddle you can test with update this BZ to ON_QA
Bug 1574610 is now ON_QA, moving this bug to ON_QA too.