Bug 1829985

Summary: [OSP13->OSP16.1] ceph's systemd step not idempotent
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Jose Luis Franco <jfrancoa>
Component: Ceph-AnsibleAssignee: Dimitri Savineau <dsavinea>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact:
Priority: high    
Version: 4.1CC: aschoen, ceph-eng-bugs, dsavinea, ealcaniz, gabrioux, gfidente, gmeno, hyelloji, jthomas, mbracho, morazi, nthomas, pgrist, sathlang, tserlin, vereddy, ykaul, yrabl
Target Milestone: z1   
Target Release: 4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.24-1.el8cp, ceph-ansible-4.0.24-1.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-20 14:21:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1760354    
Attachments:
Description Flags
ceph-ansible logs none

Description Jose Luis Franco 2020-04-30 16:59:21 UTC
Description of problem:

The OSP13 to OSP16.1 procedure requires each of the ceph-enabled nodes to run a step in which the systemd units get modified from using docker into podman. The step is:

openstack overcloud external-upgrade run \
        --skip-tags validation,opendev-validation-ceph,opendev-validation \
        --stack qe-Cloud-0 \
        --tags ceph_systemd \
        -e ceph_ansible_limit=controller-0

However, the step isn't idempotent, as if it's run in a node which has already been upgraded to RHEL8 then it will fail with:

2020-04-30 12:41:58 | TASK [tripleo-ceph-run-ansible : run ceph-ansible] *****************************
2020-04-30 12:41:58 | Thursday 30 April 2020  12:41:19 -0400 (0:00:00.311)       0:00:43.619 ********
2020-04-30 12:41:58 | changed: [undercloud] => (item=None) => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}
2020-04-30 12:41:59 |
2020-04-30 12:41:59 | changed: [undercloud] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}
2020-04-30 12:41:59 |
2020-04-30 12:41:59 | TASK [tripleo-ceph-run-ansible : search output of ceph-ansible run(s) non-zero return codes] ***
2020-04-30 12:41:59 | Thursday 30 April 2020  12:41:58 -0400 (0:00:39.129)       0:01:22.748 ********
2020-04-30 12:41:59 | ok: [undercloud] => (item=None) => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-04-30 12:41:59 | ok: [undercloud] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-04-30 12:41:59 |
2020-04-30 12:41:59 | TASK [tripleo-ceph-run-ansible : print ceph-ansible output in case of failure] ***
2020-04-30 12:41:59 | Thursday 30 April 2020  12:41:58 -0400 (0:00:00.403)       0:01:23.152 ********
2020-04-30 12:41:59 | fatal: [undercloud]: FAILED! => {
2020-04-30 12:41:59 |     "ceph_ansible_std_out_err": [
2020-04-30 12:41:59 |         "ansible-playbook 2.8.11",
2020-04-30 12:41:59 |         "  config file = /usr/share/ceph-ansible/ansible.cfg",
2020-04-30 12:41:59 |         "  configured module search path = ['/usr/share/ceph-ansible/library']",
2020-04-30 12:41:59 |         "  ansible python module location = /usr/lib/python3.6/site-packages/ansible",
2020-04-30 12:41:59 |         "  executable location = /bin/ansible-playbook",
2020-04-30 12:41:59 |         "  python version = 3.6.8 (default, Dec  5 2019, 15:45:45) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]",
2020-04-30 12:41:59 |         "Using /usr/share/ceph-ansible/ansible.cfg as config file",
....
2020-04-30 12:41:59 |         "TASK [ceph-handler : set_fact handler_osd_status] ******************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/main.yml:11",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:56 -0400 (0:00:00.116)       0:00:36.166 ******** ",
2020-04-30 12:41:59 |         "TASK [ceph-handler : set_fact handler_mds_status] ******************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/main.yml:16",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:56 -0400 (0:00:00.112)       0:00:36.278 ******** ",
2020-04-30 12:41:59 |         "TASK [ceph-handler : set_fact handler_rgw_status] ******************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/main.yml:21",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:57 -0400 (0:00:00.107)       0:00:36.385 ******** ",
2020-04-30 12:41:59 |         "TASK [ceph-handler : set_fact handler_nfs_status] ******************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/main.yml:26",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:57 -0400 (0:00:00.107)       0:00:36.493 ******** ",
2020-04-30 12:41:59 |         "TASK [ceph-handler : set_fact handler_rbd_status] ******************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/main.yml:31",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:57 -0400 (0:00:00.107)       0:00:36.601 ******** ",
2020-04-30 12:41:59 |         "TASK [ceph-handler : set_fact handler_mgr_status] ******************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/main.yml:36",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:57 -0400 (0:00:00.107)       0:00:36.709 ******** ",
2020-04-30 12:41:59 |         "    handler_mgr_status: false",
2020-04-30 12:41:59 |         "TASK [get docker version] ******************************************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/infrastructure-playbooks/docker-to-podman.yml:58",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:57 -0400 (0:00:00.118)       0:00:36.827 ******** ",
2020-04-30 12:41:59 |         "fatal: [controller-0]: FAILED! => changed=false ",
2020-04-30 12:41:59 |         "  cmd: docker --version",
2020-04-30 12:41:59 |         "  msg: '[Errno 2] No such file or directory: ''docker'': ''docker'''",
2020-04-30 12:41:59 |         "PLAY RECAP *********************************************************************",
2020-04-30 12:41:59 |         "controller-0               : ok=37   changed=0    unreachable=0    failed=1    skipped=57   rescued=0    ignored=0   ",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:57 -0400 (0:00:00.256)       0:00:37.084 ******** ",
2020-04-30 12:41:59 |         "=============================================================================== ",
2020-04-30 12:41:59 |         "gather and delegate facts ---------------------------------------------- 17.78s",

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Run the FFWD2 procedure into one of the controllers
2. Re-run the ceph_systemd step in the same controller it got already upgraded.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Giulio Fidente 2020-04-30 18:10:12 UTC
it looks like the docker command wasn't found on the second run, is this due to the hosting node being upgraded to rhel8 as well with docker being replaced by podman?

Comment 3 RHEL Program Management 2020-05-04 13:29:04 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 6 Jose Luis Franco 2020-05-04 14:35:13 UTC
Created attachment 1684870 [details]
ceph-ansible logs

Comment 19 Yogev Rabl 2020-07-13 19:07:36 UTC
verified

Comment 21 errata-xmlrpc 2020-07-20 14:21:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3003