Bug 1387245

Summary: openstack overcloud update stack does not update plan with changed THT
Product: Red Hat OpenStack Reporter: Lukas Bezdicka <lbezdick>
Component: openstack-tripleo-commonAssignee: Ryan Brady <rbrady>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: unspecified Docs Contact:
Priority: urgent    
Version: 10.0 (Newton)CC: brad, dbecker, jcoufal, jschluet, jslagle, lbezdick, mandreou, mburns, morazi, ohochman, rbrady, rhel-osp-director-maint, sasha, slinaber
Target Milestone: rcKeywords: Triaged
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-5.3.0-5.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-14 16:23:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lukas Bezdicka 2016-10-20 12:35:10 UTC
Description of problem:
Update of older deploy of OSP10 will fail as there was change in puppet-ceph, puppet-tripleo and subsequently tripleo-heat-templates which won't cause update of the plan and will fail. 

Steps to Reproduce:
1. /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-base.yaml
        ceph::profile::params::fsid: {get_param: CephClusterFSID}
        # NOTE: bind IP is found in Heat replacing the network name with the local node IP
        # for the given network; replacement examples (eg. for internal_api):
        # internal_api -> IP
        # internal_api_uri -> [IP]
        # internal_api_subnet - > IP/CIDR
        ceph::profile::params::cluster_network:
2. swift download overcloud puppet/services/ceph-base.yaml
        ceph::profile::params::fsid: {get_param: CephClusterFSID}
        # NOTE: bind IP is found in Heat replacing the network name with the local node IP
        # for the given network; replacement examples (eg. for internal_api):
        # internal_api -> IP
        # internal_api_uri -> [IP]
        # internal_api_subnet - > IP/CIDR
        ceph::profile::params::cluster_network:
3. overcloud upgrade -> /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-base.yaml
        ceph::profile::params::fsid: {get_param: CephClusterFSID}
        # FIXME(gfidente): we should not have to list the packages explicitly in the templates,
        # but this has to stay until https://bugs.launchpad.net/puppet-ceph/+bug/1629933 is fixed
        ceph::params::packages:
          - ceph-base
          - ceph-mon
          - ceph-osd
        # NOTE: bind IP is found in Heat replacing the network name with the local node IP
4. overcloud update -> puppet/services/ceph-base.yaml
        ceph::profile::params::fsid: {get_param: CephClusterFSID}
        # NOTE: bind IP is found in Heat replacing the network name with the local node IP
        # for the given network; replacement examples (eg. for internal_api):
        # internal_api -> IP
        # internal_api_uri -> [IP]
        # internal_api_subnet - > IP/CIDR
        ceph::profile::params::cluster_network:

We need to update the plan from changed templates otherwise we'll hit more issues like this.

Comment 1 Jaromir Coufal 2016-10-20 14:37:31 UTC
If using new templates, please make sure that this is properly documented and any user changes will be reflected in new templates as well before applying.

Comment 2 Jaromir Coufal 2016-10-20 14:42:28 UTC
Blocking updates procedure and testing, increasing priority to urgent.

Comment 3 Ryan Brady 2016-10-20 15:03:49 UTC
With the recent change[1] merged upstream to the update command, the templates arg is no longer utilized and a user is expected to update the templates in the deployment plan prior to running the command.  The cli doesn't expose a command to update a deployment plan yet, but the workflow does exist to update a plan after the swift container has been updated.  I've created a critical bug to track this upstream and we will update the client as needed.

Comment 4 Ryan Brady 2016-10-20 15:23:42 UTC
After additional research it looks like the correct steps would be to:
1) update tripleo-heat-templates on undercloud
2) do a deploy (which would update the plan)
3) run the update.

If the deploy doesn't automatically update the plan, then delete plan, deploy, update packages.

Can you test this?

Comment 5 Lukas Bezdicka 2016-10-21 11:58:12 UTC
(In reply to Ryan Brady from comment #4)
> After additional research it looks like the correct steps would be to:
> 1) update tripleo-heat-templates on undercloud
> 2) do a deploy (which would update the plan)
This updates the plan but also reruns whole deploy which means it runs puppet on older packages and will fail as there again can be inconsistency between tht and puppet packages. Even if it does not fail it's unacceptable to wait this long.
> 3) run the update.
> 
> If the deploy doesn't automatically update the plan, then delete plan,
> deploy, update packages.
> 
> Can you test this?

2016-10-21 11:54:07Z [overcloud-AllNodesDeploySteps-iir5iikjkema.ControllerDeployment_Step2]: UPDATE_FAILED  UPDATE aborted

As expected, difference between THT and old puppet-tripleo package is too big.

We really need to just update plan but not run it, I could kill zaqar, run deploy, wait for it to fail after mistral timeouts on zaqar and run update.

Comment 6 Alexander Chuzhoy 2016-10-21 14:23:43 UTC
The error I see during update debugging heat is:
Error: Could not find data item gnocchi_redis_password in any Hiera data file and no default supplied at /etc/puppet/modules/tripleo/manifests/profile/base/gnocchi/api.pp:56 on node controller-1.localdomain

Comment 7 Brad P. Crochet 2016-10-21 16:51:01 UTC
Can someone try updating the packages on the overcloud first, then update the templates and deploy?

Comment 8 Ryan Brady 2016-10-24 13:47:02 UTC
Sasha/Lukas,

Please try Brad's patch (linked in bug) and see if it fixes the issue for now.  I believe after applying the patch you'd run the deploy command with --update-plan-only to update the templates before you run the update command.

Comment 9 Alexander Chuzhoy 2016-10-25 15:02:15 UTC
Hi Ryan,
Trying the patch resulted in:
https://bugzilla.redhat.com/show_bug.cgi?id=1388203

Thanks.

Comment 10 Brad P. Crochet 2016-11-07 15:12:15 UTC
Do we have confirmation that updating packages on the overcloud, then updating the templates (via deploy) works or not? My observation during testing was that this works.

Comment 11 Lukas Bezdicka 2016-11-07 22:44:10 UTC
Updating packages on overcloud and than updating plan defies the point of updates.
We are now using https://review.openstack.org/#/c/389830/ and updating plan without deploy and that wouldn't work either. Updates work if we update plan with new THT and than start regular update.

Comment 12 Brad P. Crochet 2016-11-07 22:56:22 UTC
It does not defy the point. The templates may not even be updated. This is meant for minor updates only. It could mean security updates only. So, updating the templates before applying patches does not make sense to me. In my testing, I have not had any failures updating without updating the plan.

Comment 13 Lukas Bezdicka 2016-11-08 11:05:15 UTC
If for example puppet module updates without tht update in plan it will fail as it will lack the configuration it needs or it'll missconfigure the system. You were just lucky you didn't hit such change.

Comment 14 Brad P. Crochet 2016-11-08 12:25:08 UTC
That should not be occurring for minor updates within a series. If it does, we're doing it wrong.

Comment 15 Lukas Bezdicka 2016-11-08 12:52:47 UTC
We are not doing it wrong, example is nova vnc connection misconfiguration where we didn't add any access limitation and as an security update tht got new variable for what subnets can connect and puppet got update with vnc allowed hosts. This is common usecase and updating packages wouldn't help much, it's security update in form of changing(updating) the config.

Comment 21 Alexander Chuzhoy 2016-11-16 02:21:09 UTC
Verified:

Environment:
openstack-tripleo-common-5.3.0-6.el7ost.noarch


Successfully completed minor update without any patches.

Comment 23 errata-xmlrpc 2016-12-14 16:23:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html