Bug 1489353 - OSP11 -> OSP12 upgrade: unable to rerun a failed major-upgrade-composable-steps-docker with ceph-ansible enabled because 'collect running osds' task fails
Summary: OSP11 -> OSP12 upgrade: unable to rerun a failed major-upgrade-composable-ste...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: rc
: 3.0
Assignee: Sébastien Han
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-09-07 09:13 UTC by Marius Cornea
Modified: 2018-06-26 23:46 UTC (History)
14 users (show)

Fixed In Version: RHEL: ceph-ansible-3.0.0-0.1.rc7.el7cp Ubuntu: ceph-ansible_3.0.0~rc7-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-05 23:42:05 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 1873 0 None None None 2017-09-08 09:32:47 UTC
Red Hat Product Errata RHBA-2017:3387 0 normal SHIPPED_LIVE Red Hat Ceph Storage 3.0 bug fix and enhancement update 2017-12-06 03:03:45 UTC

Description Marius Cornea 2017-09-07 09:13:03 UTC
Description of problem:

OSP11 -> OSP12 upgrade: unable to rerun a failed major-upgrade-composable-steps-docker with ceph-ansible enabled because 'collect running osds' task fails.

We can see that systemctl list-units | grep \"loaded active\" | grep -Eo 'ceph-osd@[0-9]{1,2}.service command fails as it has been run in the previous run and the service was already disabled. 

We should make it idempotent and be able to run this playbook multiple times to allow the user to run the OSP upgrade process multiple times in case failures occur.

2017-09-07 04:59:31,753 p=24012 u=mistral |  PLAY [switching from non-containerized to containerized ceph mgr] **************
2017-09-07 04:59:31,753 p=24012 u=mistral |  skipping: no hosts matched
2017-09-07 04:59:31,764 p=24012 u=mistral |  PLAY [switching from non-containerized to containerized ceph osd] **************
2017-09-07 04:59:31,821 p=24012 u=mistral |  TASK [Gathering Facts] *********************************************************
2017-09-07 04:59:35,159 p=24012 u=mistral |  ok: [192.168.24.20]
2017-09-07 04:59:35,165 p=24012 u=mistral |  TASK [collect running osds] ****************************************************
2017-09-07 04:59:35,511 p=24012 u=mistral |  fatal: [192.168.24.20]: FAILED! => {"changed": false, "cmd": "systemctl list-units | grep \"loaded active\" | grep -Eo 'ceph-osd@[0-9]{1,2}.service'", "delta": "0:00:00.013085", "end": "2017-09-07 08:59:35.029177", "failed": true, "rc": 1, "start": "2017-09-07 08:59:35.016092", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
2017-09-07 04:59:35,512 p=24012 u=mistral |  PLAY RECAP *********************************************************************
2017-09-07 04:59:35,512 p=24012 u=mistral |  192.168.24.10              : ok=50   changed=5    unreachable=0    failed=0   
2017-09-07 04:59:35,512 p=24012 u=mistral |  192.168.24.11              : ok=52   changed=7    unreachable=0    failed=0   
2017-09-07 04:59:35,512 p=24012 u=mistral |  192.168.24.13              : ok=4    changed=0    unreachable=0    failed=0   
2017-09-07 04:59:35,512 p=24012 u=mistral |  192.168.24.20              : ok=5    changed=0    unreachable=0    failed=1   
2017-09-07 04:59:35,513 p=24012 u=mistral |  192.168.24.8               : ok=4    changed=0    unreachable=0    failed=0   
2017-09-07 04:59:35,513 p=24012 u=mistral |  192.168.24.9               : ok=57   changed=6    unreachable=0    failed=0   
2017-09-07 04:59:35,513 p=24012 u=mistral |  localhost                  : ok=0    changed=0    unreachable=0    failed=0   


Version-Release number of selected component (if applicable):
ceph-ansible-3.0.0-0.rc6.4.g0d9489f.el7.noarch

Comment 2 seb 2017-09-08 11:36:15 UTC
Patch upstream has merged.

Comment 5 Harish NV Rao 2017-09-12 07:03:36 UTC
@Marius, if you are verifying this bug fix, could you please provide the qa_ack?

Comment 9 errata-xmlrpc 2017-12-05 23:42:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3387


Note You need to log in before you can comment on or make changes to this bug.