Bug 1492590 - OSP11 -> OSP12 upgrade: rerunning major-upgrade-composable-steps-docker.yaml for a second time fails with: ERROR: The specified reference "WorkflowTasks_Step1_Execution" (in NetworkerDeployment_Step1) is incorrect.
Summary: OSP11 -> OSP12 upgrade: rerunning major-upgrade-composable-steps-docker.yaml ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat
Version: 12.0 (Pike)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 12.0 (Pike)
Assignee: Zane Bitter
QA Contact: Ronnie Rasouli
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-18 09:26 UTC by Marius Cornea
Modified: 2023-02-22 23:02 UTC (History)
16 users (show)

Fixed In Version: openstack-heat-9.0.1-0.20171004002955.633da7f.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-13 22:08:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
stack home (196.57 KB, application/x-gzip)
2017-09-18 09:33 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1701677 0 None None None 2017-09-20 14:41:10 UTC
OpenStack gerrit 505668 0 None None None 2017-09-20 14:40:44 UTC
Red Hat Product Errata RHEA-2017:3462 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 12.0 Enhancement Advisory 2018-02-16 01:43:25 UTC

Description Marius Cornea 2017-09-18 09:26:34 UTC
Description of problem:
OSP11 -> OSP12 upgrade: rerunning major-upgrade-composable-steps-docker.yaml for a second time fails with: ERROR: The specified reference "WorkflowTasks_Step1_Execution" (in NetworkerDeployment_Step1) is incorrect.

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-7.0.0-0.20170913050523.0rc2.el7ost.noarch

How reproducible:
1/1

Steps to Reproduce:
1. Deploy OSP11 with standalone nodes including Ceph storage nodes

2. Set DockerCephDaemonImage parameter to non existing location

3. Run major-upgrade-composable-steps-docker.yaml

4. Wait for deployment to fail because of missing image:

[root@undercloud-0 stack]# tail /var/log/mistral/ceph-install-workflow.log 
2017-09-17 19:59:39,292 p=11305 u=mistral |  TASK [ceph-docker-common : pull ceph/rhceph-2-rhel7 image] *********************
2017-09-17 19:59:39,675 p=11305 u=mistral |  fatal: [192.168.24.19]: FAILED! => {"changed": false, "cmd": ["docker", "pull", "192.168.24.1:8787/ceph/rhceph-2-rhel7:latest"], "delta": "0:00:00.031890", "end": "2017-09-17 23:59:40.694261", "failed": true, "rc": 1, "start": "2017-09-17 23:59:40.662371", "stderr": "Error: image ceph/rhceph-2-rhel7:latest not found", "stderr_lines": ["Error: image ceph/rhceph-2-rhel7:latest not found"], "stdout": "Trying to pull repository 192.168.24.1:8787/ceph/rhceph-2-rhel7 ... \nPulling repository 192.168.24.1:8787/ceph/rhceph-2-rhel7", "stdout_lines": ["Trying to pull repository 192.168.24.1:8787/ceph/rhceph-2-rhel7 ... ", "Pulling repository 192.168.24.1:8787/ceph/rhceph-2-rhel7"]}

5. Fix the issue by uploading image to the location specified in DockerCephDaemonImage

6. Rerun the major-upgrade-composable-steps-docker.yaml

Actual results:

Fails right away with:
ERROR: The specified reference "WorkflowTasks_Step1_Execution" (in NetworkerDeployment_Step1) is incorrect.

Expected results:
Rerunning major-upgrade-composable-steps-docker.yaml is possible.

Additional info:
Attaching sosreport and deploy script/environment files used.

Comment 1 Marius Cornea 2017-09-18 09:33:19 UTC
Created attachment 1327281 [details]
stack home

Comment 3 Marios Andreou 2017-09-19 14:46:34 UTC
Spent some more time looking here to try and triage it as we discussed on scrum yesterday. AFAICS it is indeed related to ansible-ceph - the workflow tasks are here https://github.com/openstack/tripleo-heat-templates/blob/ab682ed638a63b435037d5b2a34df7770e2c4d5a/common/deploy-steps.j2#L98-L151    

Those steps are included after the upgrade_tasks, here https://github.com/openstack/tripleo-heat-templates/blob/ab682ed638a63b435037d5b2a34df7770e2c4d5a/common/major_upgrade_steps.j2.yaml#L179 and here https://github.com/openstack/tripleo-heat-templates/blob/ab682ed638a63b435037d5b2a34df7770e2c4d5a/common/post-upgrade.j2.yaml

There may be some issue with the way the workflow tasks are defined or some recent change in the deploy-steps which broke it? From the attached https://bugzilla.redhat.com/attachment.cgi?id=1327281 stack-home and the 
overcloud_composable_upgrade.log the trace is like

        2017-09-17 23:57:38Z [overcloud-AllNodesDeploySteps-gr5fyevs3224.AllNodesPostUpgradeSteps.WorkflowTasks_Step2]: CREATE_COMPLETE  state changed
        2017-09-17 23:57:38Z [overclouHeat Stack update failed.
        Heat Stack update failed.
        d-AllNodesDeploySteps-gr5fyevs3224.AllNodesPostUpgradeSteps.WorkflowTasks_Step2_Execution]: CREATE_IN_PROGRESS  state changed
        2017-09-17 23:59:43Z [overcloud-AllNodesDeploySteps-gr5fyevs3224.AllNodesPostUpgradeSteps.WorkflowTasks_Step2_Execution]: CREATE_FAILED  resources.WorkflowTasks_Step2_Execution: ERROR
        2017-09-17 23:59:44Z [overcloud-AllNodesDeploySteps-gr5fyevs3224.AllNodesPostUpgradeSteps]: CREATE_FAILED  Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
        2017-09-17 23:59:45Z [overcloud-AllNodesDeploySteps-gr5fyevs3224.AllNodesPostUpgradeSteps]: CREATE_FAILED  resources.AllNodesPostUpgradeSteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
        2017-09-17 23:59:45Z [overcloud-AllNodesDeploySteps-gr5fyevs3224]: UPDATE_FAILED  resources.AllNodesPostUpgradeSteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
        2017-09-17 23:59:45Z [AllNodesDeploySteps]: UPDATE_FAILED  resources.AllNodesDeploySteps: resources.AllNodesPostUpgradeSteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
        2017-09-17 23:59:46Z [overcloud]: UPDATE_FAILED  resources.AllNodesDeploySteps: resources.AllNodesPostUpgradeSteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR

         Stack overcloud UPDATE_FAILED 

        overcloud.AllNodesDeploySteps.AllNodesPostUpgradeSteps.WorkflowTasks_Step2_Execution:
          resource_type: OS::Mistral::ExternalResource
          physical_resource_id: a9ef9aed-ec71-4abe-b762-888373d49a3e
          status: CREATE_FAILED
          status_reason: |
            resources.WorkflowTasks_Step2_Execution: ERROR

I am holding off on marking triaged for now and I think we should reach out to the ceph dfg for help on that, since the workflow tasks in question are ceph-ansible related. I'll try ping Jeff on irc now - DFG:Ceph can we please get some help to triage this ceph-ansible related issue.

Comment 4 Giulio Fidente 2017-09-19 14:51:24 UTC
the error from ceph-ansible in comment #0 seems a fine error from the initial run due to image url being unset

the real blocker seems to be instead that NetworkDeployment_Step1 has a dependency on a resource which doesn't exist

Comment 9 Zane Bitter 2017-09-20 14:40:45 UTC
This sounds a lot like https://bugs.launchpad.net/heat/+bug/1701677

The patch for that appears to have merged on master just after Pike branched, so it's not present in OSP12. I've proposed a backport upstream.

Comment 16 errata-xmlrpc 2017-12-13 22:08:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3462


Note You need to log in before you can comment on or make changes to this bug.