Bug 1439121

Summary: Check for legacy hiera data fails preventing the upgrade from proceeding
Product: Red Hat OpenStack Reporter: Marios Andreou <mandreou>
Component: openstack-heat-agentsAssignee: Marios Andreou <mandreou>
Status: CLOSED ERRATA QA Contact: Marius Cornea <mcornea>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 11.0 (Ocata)CC: augol, dprince, jcoufal, jschluet, mburns, rhel-osp-director-maint, rhos-flags, sathlang, sbaker, sclewis, slinaber
Target Milestone: rcKeywords: Triaged
Target Release: 11.0 (Ocata)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-heat-agents-1.0.1-1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-17 20:17:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
os-apply-config --key hiera.datafiles --type raw --key-default empty none

Description Marios Andreou 2017-04-05 09:35:29 UTC
Created attachment 1268904 [details]
os-apply-config --key hiera.datafiles --type raw --key-default empty

Description of problem:
With latest downstream packages OSP11 puddle the check for legacy hiera data fails preventing the upgrade from proceeding. The legacy hiera data check is at [1] and latest heat-agents
package I got was openstack-heat-agents-1.0.0-3.el7ost.noarch (possibly this check
just landed into the package so we didn't hit it earlier).

We have [2] which removes the old hook and data as part of the upgrade, but it seems something further is needed here.

On the compute node itself running the check os-apply-config --key hiera.datafiles --type raw --key-default empty indeed yields the occ data (attached file).


[1] https://review.openstack.org/#/c/426241/2/heat-config-hiera/install.d/hook-hiera.py
[2] https://github.com/openstack/tripleo-heat-templates/blob/master/environments/major-upgrade-composable-steps.yaml#L13-L15



=================================
The stack fail is like:

 openstack overcloud deploy --templates /usr/share/openstack-tripleo-heat-templates -e /usr/share/openstack-tripleo-heat-templates/environments/puppet-pacemaker.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml -e enable_swap.yaml  -e network_env.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps.yaml -e init-repo.yaml


2017-04-04 16:26:56Z [overcloud-Controller-jluehgbmi53c-0-gh3qbywyso7g.NovaComputeDeployment]: UPDATE_IN_PROGRESS  state changed
2017-04-04 16:33:09Z [overcloud-Compute-b2aaihlp2xru-0-gh3qbywyso7g.NovaComputeDeployment]: SIGNAL_IN_PROGRESS  Signal: deployment 9a007bd3-51d8-4837-b65e-6312cf234450 failed (1)
2017-04-04 16:33:10Z [overcloud-Compute-b2aaihlp2xru-0-gh3qbywyso7g.NovaComputeDeployment]: UPDATE_FAILED  Error: resources.NovaComputeDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
2017-04-04 16:33:10Z [overcloud-Compute-b2aaihlp2xru-0-gh3qbywyso7g]: UPDATE_FAILED  Error: resources.NovaComputeDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
2017-04-04 16:33:10Z [overcloud-Compute-b2aaihlp2xru.0]: UPDATE_FAILED  resources[0]: Error: resources.NovaComputeDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
2017-04-04 16:33:10Z [overcloud-Compute-b2aaihlp2xru]: UPDATE_FAILED  resources[0]: Error: resources.NovaComputeDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
2017-04-04 16:33:11Z [Compute]: UPDATE_FAILED  resources.Compute: resources[0]: Error: resources.NovaComputeDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
2017-04-04 16:33:11Z [Controller]: UPDATE_FAILED  UPDATE aborted
2017-04-04 16:33:11Z [overcloud]: UPDATE_FAILED  resources.Compute: resources[0]: Error: resources.NovaComputeDeployment: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 1
2017-04-04 16:33:12Z [overcloud-Controller-jluehgbmi53c.0]: UPDATE_FAILED  UPDATE aborted
2017-04-04 16:33:12Z [overcloud-Controller-jluehgbmi53c-0-mfw6jjt3dfnx.UpdateDeployment]: UPDATE_FAILED  UPDATE aborted
2017-04-04 16:33:12Z [overcloud-Controller-jluehgbmi53c.2]: UPDATE_FAILED  UPDATE aborted
2017-04-04 16:33:12Z [overcloud-Controller-jluehgbmi53c.1]: UPDATE_FAILED  UPDATE aborted
2017-04-04 16:33:12Z [overcloud-Controller-jluehgbmi53c]: UPDATE_FAILED  Operation cancelled
2017-04-04 16:33:12Z [overcloud-Controller-jluehgbmi53c-2-xcytjafgewws.UpdateDeployment]: UPDATE_FAILED  UPDATE aborted
2017-04-04 16:33:13Z [overcloud-Controller-jluehgbmi53c-1-qlkfeovu4k7l.UpdateDeployment]: UPDATE_FAILED  UPDATE aborted

 Stack overcloud UPDATE_FAILED

Comment 1 Marios Andreou 2017-04-05 11:17:14 UTC
update - I was able to proceed by manually disabling the check in /usr/libexec/heat-config/hooks/hiera on all controllers and ran the ansible upgrade steps... after the upgrade was complete the 'old' data is now completely gone:

[root@overcloud-compute-0 os-collect-config]# os-apply-config --key hiera.datafiles --type raw --key-default empty
empty

[root@overcloud-controller-1 ~]# os-apply-config --key hiera.datafiles --type raw --key-default empty
empty

Comment 2 Sofer Athlan-Guyot 2017-04-05 17:07:43 UTC
First try to overcome this problem by using a more deterministic approach to the check.  Hard to test.  Must be deployed and patch on all running nodes:

On each node before the composable upgrade:


     curl https://review.openstack.org/changes/453726/revisions/current/patch?download | base64 -d | sed -e 's,heat-config-hiera/install.d/hook-hiera.py,hiera,' | sudo patch  -d /usr/libexec/heat-config/hooks/ -p1

Comment 3 Marios Andreou 2017-04-06 06:59:07 UTC
(In reply to Sofer Athlan-Guyot from comment #2)
> First try to overcome this problem by using a more deterministic approach to
> the check.  Hard to test.  Must be deployed and patch on all running nodes:
> 
> On each node before the composable upgrade:
> 
> 
>      curl
> https://review.openstack.org/changes/453726/revisions/current/patch?download
> | base64 -d | sed -e 's,heat-config-hiera/install.d/hook-hiera.py,hiera,' |
> sudo patch  -d /usr/libexec/heat-config/hooks/ -p1

I'm also running this (mostly to get to the upgraded state again so we can check that) on my 2nd env... fwiw you can also apply to the upgrade init by adding this to the end of your /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps.yaml

    curl "https://review.openstack.org/gitweb?p=openstack/heat-agents.git;a=blob_plain;f=heat-config-hiera/install.d/hook-hiera.py;hb=55508e3fb05cf99f003b8309119d6861e85ad082" -o /usr/libexec/heat-config/hooks/hiera

Comment 4 Marios Andreou 2017-04-06 13:19:22 UTC
adding DFG:DF for visibility and see if anyone has better suggestion than https://review.openstack.org/#/c/453726/3/heat-config-hiera/install.d/hook-hiera.py

Comment 5 Marios Andreou 2017-04-06 13:54:20 UTC
also adding DFG:CloudApp (heat folks) who are the ones that can actually land the proposed change anyway

Comment 6 Dan Prince 2017-04-06 19:09:54 UTC
One option might be to package up this:

http://git.openstack.org/cgit/openstack/tripleo-puppet-elements/tree/elements/hiera/10-hiera-disable

Installing this file into the /usr/libexec/os-refresh-config location would remove the old hook files before they cause problems I think and effectively disable the old Hiera data as well. Because this runs before 50- the existing heat checks could stay in place right?

I suppose we'd need a way to deploy this file though. Perhaps we could inline it into another project so it gets installed via an RPM. Or alternatively use the 'DeploymentArtifacts' mechanism to deploy it as a tarball or rpm. The DeploymentArtifacts mechanism runs very early on

Dan

Comment 7 Steve Baker 2017-04-06 21:10:05 UTC
I see major-upgrade-composable-steps.yaml is doing the same as 10-hiera-disable, which is good.

It looks like Marios is correct, https://review.openstack.org/#/c/453726/ should be proposed to stable/ocata only. I'll keep an eye on it and propose an upstream heat-agents-1.0.1 as soon as it lands.

Comment 8 Sofer Athlan-Guyot 2017-04-07 08:04:17 UTC
Patch proposed as a stable/ocata only review.

Comment 10 Marios Andreou 2017-04-11 15:23:30 UTC
adding a note here on request, the fix for this by Sofer in the trackers above in https://review.openstack.org/#/c/454556/ openstack/heat-agents. To make the fix available we need a release of heat-agents and sbaker kindly sorted that with heat-agents-1.0.1 https://review.openstack.org/#/c/454912 (added to trackers). 

This isn't yet appearing in downstream puddle, at least it wasn't during my upgrade run today/from this morning (openstack-heat-agents-1.0.0-3.el7ost.noarch). You can apply the fix as a workaround until it does appear in 11 by "patching" the UpgradeInitCommand from the upstream review (proceed cautiously, first line backs up the templates):

    # sudo cp -r /usr/share/openstack-tripleo-heat-templates /usr/share/openstack-tripleo-heat-templates.ORIG_BZ_1439121
    # https://review.openstack.org/#/c/454556/ Simplify legacy hieradata check to avoid false positive.
    echo '    curl "https://review.openstack.org/gitweb?p=openstack/heat-agents.git;a=blob_plain;f=heat-config-hiera/install.d/hook-hiera.py;h=db69f69c517e2693222080862db52abfaa3fbdc5;hb=769d0de4d9acb07563202a2ef3d3a0aa1b9542d1" -o /usr/libexec/heat-config/hooks/hiera' \
          | sudo tee -a /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps.yaml > /dev/null

So +1 morazi moving to POST

Comment 13 errata-xmlrpc 2017-05-17 20:17:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1245