Red Hat Bugzilla – Bug 1267318
[Docs] [Director] Overcloud Updates failed due new resource type
Last modified: 2015-12-17 20:28:58 EST
Description of problem:
100% reproducible, when trying to upgrade GA to the latest puddle, after the upgrade of the undercloud, the update of the overcloud failed with :
openstack overcloud update stack -i overcloud --template /usr/share/openstack-tripleo-heat-templates
starting package update on stack overcloud
ERROR: openstack ERROR: Unknown resource Type : OS::TripleO::Network::Ports::StorageVipPort
The problem is that by upgrading the undercloud, we have also updated the tripleo-heat-templates. This requires changes to the environment files (e.g. to support new resource types, as we see here). The "openstack overcloud update" command, however, is designed to not pass any environment files but just retain the existing environment in Heat. This does not work if the templates have been modified.
A solution may be to simply require the user to do an update explicitly passing all of the environment files again (including explicitly specifying the default environment files) after an undercloud update. We may be able to improve on this by adding a command-line option to include the default env files so that the user doesn't have to figure out the correct paths to them. (Even better would be to have a confirmation step when this option is specified, to make sure that users remember to include all of their extra environment files too.)
Assigning to the rhel-osp-director component for now. This may end up as a docs-only bug, or it may require changes to python-rdomanager-oscplugin.
So I chatted to Jan about this, wrt how we might validate this before attempting the stack update.
Unfortunately, despite us having a preview_update_stack interface which could probably validate the PATCH update, I missed updating that interface recently when I implemented PATCH updates for update_stack proper.
I raised this upstream bug to track it, they should behave consistently, and we may be able to backport the fix to preview_update_stack as AFAICT it's only a refactoring change inside service.py.
Because the same situation (old environment with new template) may happen for all CLI commands (scaling down, pkg updates, re-deploy) it seems to me that it's best to instruct users to update the existing overcloud with new environment files right after undercloud machine upgrade.
A doc patch with this instruction is here:
An alternative to adding an extra parameter for adding default env file might be pre-update validation mentioned by Steven. If backporting of patch for preview_update_stack might happen anytime soon, I would lean to add a check to CLI commands which calls preview_update_stack before running stack-update and if this preview would fail, we could warn user.
Upstream bug for this issue:
(In reply to Jan Provaznik from comment #4)
> Because the same situation (old environment with new template) may happen
> for all CLI commands (scaling down, pkg updates, re-deploy) it seems to me
> that it's best to instruct users to update the existing overcloud with new
> environment files right after undercloud machine upgrade.
I'm inclined to agree. My only reservation is that this could result in the user ending up with new templates but old packages until they then go and run a package update. Off the top of my head I can't think of any circumstances where that would cause a problem though.
> A doc patch with this instruction is here:
Dan, please check the docs patch and let us know if you need more info for the product doc change.
Assigning to Dan for review.
Added Overcloud stack upgrade procedure:
Zane, Steve, Jan - How does this look? Any further changes required?
Thanks Dan. Unfortunately it seems that this fix is not sufficient because Zane's concern from comment 5 is probably already happening. Step https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Updating_Overcloud_Stack.html
failed for Gael Lambert because node resources failed with error:
Error: Could not find class tripleo::packages for strg00-prv.localdomain on node strg00-prv.localdomain
tripleo::packages puppet class is defined in openstack-puppet-modules-2015.1.8-19.el7ost.noarch but I guess it was not yet in 7.0 rpms.
So a potential fix would run directly package update and pass it explicitly all the env files again (comment 2).
I think the solution here is pretty simple, actually. This error (tripleo::packages not defined) is because of an old openstack-puppet-modules on the deployed machine. This can be solved in 2 ways:
1. user manually runs yum update openstack-puppet-modules on every machine
2. we change update stack to do update of openstack-puppet-modules first, then proceed with the rest.
#1 can be a good short term solution
#2 is likely the right long term solution. OPM is a safe update in general, as far as openstack services go. It's also going to hit us that THT requires something that is not in the old OPM at some point in the future, especially on major version upgrades.
We are testing if running:
openstack overcloud update stack overcloud -i --templates -e /usr/share/openstack-tripleo-heat-templates/overcloud-resource-registry-puppet.yaml
Basically skipping https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Updating_Overcloud_Stack.html and doing directly https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/7/html/Director_Installation_and_Usage/sect-Updating_Overcloud_Packages.html with explicitly setting default environment file.
If this works it should be just matter of updating doc.
(In reply to Mike Burns from comment #11)
> I think the solution here is pretty simple, actually. This error
> (tripleo::packages not defined) is because of an old
> openstack-puppet-modules on the deployed machine. This can be solved in 2
> 1. user manually runs yum update openstack-puppet-modules on every machine
> 2. we change update stack to do update of openstack-puppet-modules first,
> then proceed with the rest.
> #1 can be a good short term solution
> #2 is likely the right long term solution. OPM is a safe update in general,
> as far as openstack services go. It's also going to hit us that THT
> requires something that is not in the old OPM at some point in the future,
> especially on major version upgrades.
That's what we did and it's solve to do a openstack deploy but we have a other issue cf https://bugzilla.redhat.com/show_bug.cgi?id=1272347
Unfortunately a fix for this can't be tested properly until https://bugzilla.redhat.com/show_bug.cgi?id=1272357 is fixed (this one makes update fail always).
So far it seems that running directly "openstack overcloud update stack" (comment 12) solves the issue with missing tripleo::packages class on OC nodes - openstack-puppet-modules is updated by yum update before puppet runs (at least for 7.0->7.1 upgrade). So no pre-patching of OC nodes should be required for this BZ (only doc update).
Update process is still failing but in much later phase (probably related to 1272347), I'll send a doc patch for this BZ once update finishes successfully.
I can confirm that running package update directly solves this particular issue for 7.0->7.1 upgrades.
An upstream doc patch:
I think this BZ is obsolete due to this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1286798
Can anyone confirm this?
hi Dan, yes the update process tracked by #1286798 covers this particular issue too.
In that case, I'll close this BZ since the other bug is more relevant. However, if we need this BZ open for whatever reason, please feel free to reopen.
*** This bug has been marked as a duplicate of bug 1286798 ***