Bug 1665664 - OSP10 -> OSP13 FFU upgrade : Ceph upgrade is failing while running after including rolling_update and switch-from-non-containerized-to-containerized-ceph-daemons yamls
Summary: OSP10 -> OSP13 FFU upgrade : Ceph upgrade is failing while running after inc...
Keywords:
Status: CLOSED DUPLICATE of bug 1663026
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-heat-templates
Version: 13.0 (Queens)
Hardware: All
OS: Linux
high
high
Target Milestone: zstream
: 13.0 (Queens)
Assignee: Lukas Bezdicka
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-12 09:45 UTC by MD Sufiyan
Modified: 2019-01-20 21:14 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-20 21:14:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description MD Sufiyan 2019-01-12 09:45:59 UTC
Description of problem:

Ceph upgrade is failing  while running after including rolling_update and switch-from-non-containerized-to-containerized-ceph-daemons playbooks


~~~
check_and_complete /usr/lib/python2.7/site-packages/mistral/engine/workflows.py:377
2019-01-10 00:28:19.649 4439 INFO workflow_trace [req-cea75d78-740b-4658-9f8d-260420296aab 028064b6348242088c4616cf00689bf4 75ca20e1101a4999bf77d5d8b0252f88 - - -] Workflow 'tripleo.overcloud.workflow_tasks.step2' [RUNNING -> ERROR, msg=Failure caused by error in tasks: ceph_base_ansible_workflow
 
  ceph_base_ansible_workflow [task_ex_id=06d8f84f-facd-4720-adb5-0c13d8072578] -> Failure caused by error in tasks: ceph_install
 
  ceph_install [task_ex_id=587db2e6-73ac-497a-95ea-81774a8c690a] -> One or more actions had failed..
.
.
.
\nXSX2QQKBgB0DNfCxexmm79o9cSb2U6Orr1DU6cXxbinXxNd7C9TUZQa53SEZPmUr\nNHijADbAZ5cVkFg1wtinMhkkax4UO7wcBhltKuIqx7aSSogJQjEtnVSLWwI38Hga\nokBzWcFZT1Ot86/LsdvljmBob2IOfOoUIfLRgCKAqDOv9olCgS0i\n-----END RSA PRIVATE KEY-----'}']
 Unexpected error while running command.
Command: ansible-playbook -vv /usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml --user tripleo-admin --become --become-user root --extra-vars {"ireallymeanit": "yes"} --inventory-file /tmp/ansible-mistral-actiontC9XNy/inventory.yaml --private-key /tmp/ansible-mistral-actiontC9XNy/ssh_private_key --skip*** package-install,with_pkg
Exit code: 2


http://pastebin.test.redhat.com/694681

~~~



Version-Release number of selected component (if applicable):
FFU RHOSP10->13

How reproducible:
Tested once



Steps to Reproduce:
1. Perform the ffu upgrade from 10 to 13 till compute node upgrade "https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/fast_forward_upgrades/#upgrading_all_compute_nodes"
2. perform upgrade for ceph
a) source stackrc
b) openstack overcloud upgrade run --nodes CephStorage --skip-tags validation >>>  Wait until the node upgrade completes.  >>> completed successfully
c) Run the Ceph Storage upgrade command
   
     sh 03-ceph-update.sh 

~~~
cat 03-ceph-update.sh 
#!/bin/bash

nohup openstack overcloud ceph-upgrade run \
--timeout 120 \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
-r /home/stack/virt/roles_data.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml -e /home/stack/virt/internal.yaml \
-e /home/stack/virt/custom_repositories_script.yaml   \
-e /home/stack/virt/overcloud_images.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
--ceph-ansible-playbook '/usr/share/ceph-ansible/infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml,/usr/share/ceph-ansible/infrastructure-playbooks/rolling_update.yml' \
--log-file overcloud_deployment_30.log &
~~~

4) Stack update failed :

~~~~
Heat Stack update failed.
StorageArtifactsConfig]: UPDATE_COMPLETE  state changed
2019-01-11 22:58:15Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm.Role1ComputeArtifactsDeploy]: UPDATE_COMPLETE  state changed
2019-01-11 22:58:16Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm.ControllerArtifactsDeploy]: UPDATE_COMPLETE  state changed
2019-01-11 22:58:16Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm.ComputeArtifactsDeploy]: UPDATE_COMPLETE  state changed
2019-01-11 22:58:16Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm-CephStorageHostPrepDeployment-c7igpolmpaqn]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2019-01-11 22:58:16Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm-ControllerHostPrepDeployment-vf6pwq5kazy4]: UPDATE_IN_PROGRESS  Stack UPDATE started
2019-01-11 22:58:17Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm.CephStorageHostPrepDeployment]: UPDATE_COMPLETE  state changed
2019-01-11 22:58:17Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm.BlockStorageArtifactsDeploy]: UPDATE_IN_PROGRESS  state changed
2019-01-11 22:58:19Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm.BlockStorageArtifactsDeploy]: UPDATE_COMPLETE  state changed
2019-01-11 22:58:19Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm-ControllerHostPrepDeployment-vf6pwq5kazy4]: UPDATE_COMPLETE  Stack UPDATE completed successfully
2019-01-11 22:58:20Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm.ControllerHostPrepDeployment]: UPDATE_COMPLETE  state changed
2019-01-11 22:58:23Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm.WorkflowTasks_Step2_Execution]: UPDATE_IN_PROGRESS  state changed
2019-01-11 23:13:09Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm.WorkflowTasks_Step2_Execution]: UPDATE_FAILED  resources.WorkflowTasks_Step2_Execution: ERROR
2019-01-11 23:13:09Z [overcloud-AllNodesDeploySteps-7u4orfe3o4cm]: UPDATE_FAILED  Resource UPDATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2019-01-11 23:13:09Z [AllNodesDeploySteps]: UPDATE_FAILED  resources.AllNodesDeploySteps: Resource UPDATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2019-01-11 23:13:10Z [overcloud]: UPDATE_FAILED  Resource UPDATE failed: resources.AllNodesDeploySteps: Resource UPDATE failed: resources.WorkflowTasks_Step2_Execution: ERROR

 Stack overcloud UPDATE_FAILED

overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::TripleO::WorkflowSteps
  physical_resource_id: 29ad4398-a7ac-4b44-810e-3a90854e2b43
  status: UPDATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR
~~~

~~~
(undercloud) [stack@undercloud-0 ~]$ openstack stack failures list overcloud --long
overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
  resource_type: OS::TripleO::WorkflowSteps
  physical_resource_id: 29ad4398-a7ac-4b44-810e-3a90854e2b43
  status: UPDATE_FAILED
  status_reason: |
    resources.WorkflowTasks_Step2_Execution: ERROR
~~~

Actual results:
Ceph upgrade failed


Expected results:
Upgrade should pass

Additional info:
Will be adding the templates & sosreport from, undercloud, ceph-0,1,2 and controller nodes in next update.

Comment 2 Lukas Bezdicka 2019-01-14 15:58:01 UTC
Sounds like duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1663026

Comment 6 Giulio Fidente 2019-01-20 21:14:28 UTC

*** This bug has been marked as a duplicate of bug 1663026 ***


Note You need to log in before you can comment on or make changes to this bug.