Bug 1829985 - [OSP13->OSP16.1] ceph's systemd step not idempotent
Summary: [OSP13->OSP16.1] ceph's systemd step not idempotent
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: 4.1
Assignee: Dimitri Savineau
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks: 1760354
TreeView+ depends on / blocked
 
Reported: 2020-04-30 16:59 UTC by Jose Luis Franco
Modified: 2020-07-20 14:21 UTC (History)
18 users (show)

Fixed In Version: ceph-ansible-4.0.24-1.el8cp, ceph-ansible-4.0.24-1.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-20 14:21:03 UTC
Embargoed:


Attachments (Terms of Use)
ceph-ansible logs (13.71 MB, text/plain)
2020-05-04 14:35 UTC, Jose Luis Franco
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 5354 0 None closed docker-to-podman: conditional docker commands 2021-01-01 22:23:45 UTC
Red Hat Product Errata RHSA-2020:3003 0 None None None 2020-07-20 14:21:28 UTC

Description Jose Luis Franco 2020-04-30 16:59:21 UTC
Description of problem:

The OSP13 to OSP16.1 procedure requires each of the ceph-enabled nodes to run a step in which the systemd units get modified from using docker into podman. The step is:

openstack overcloud external-upgrade run \
        --skip-tags validation,opendev-validation-ceph,opendev-validation \
        --stack qe-Cloud-0 \
        --tags ceph_systemd \
        -e ceph_ansible_limit=controller-0

However, the step isn't idempotent, as if it's run in a node which has already been upgraded to RHEL8 then it will fail with:

2020-04-30 12:41:58 | TASK [tripleo-ceph-run-ansible : run ceph-ansible] *****************************
2020-04-30 12:41:58 | Thursday 30 April 2020  12:41:19 -0400 (0:00:00.311)       0:00:43.619 ********
2020-04-30 12:41:58 | changed: [undercloud] => (item=None) => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}
2020-04-30 12:41:59 |
2020-04-30 12:41:59 | changed: [undercloud] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": true}
2020-04-30 12:41:59 |
2020-04-30 12:41:59 | TASK [tripleo-ceph-run-ansible : search output of ceph-ansible run(s) non-zero return codes] ***
2020-04-30 12:41:59 | Thursday 30 April 2020  12:41:58 -0400 (0:00:39.129)       0:01:22.748 ********
2020-04-30 12:41:59 | ok: [undercloud] => (item=None) => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-04-30 12:41:59 | ok: [undercloud] => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result", "changed": false}
2020-04-30 12:41:59 |
2020-04-30 12:41:59 | TASK [tripleo-ceph-run-ansible : print ceph-ansible output in case of failure] ***
2020-04-30 12:41:59 | Thursday 30 April 2020  12:41:58 -0400 (0:00:00.403)       0:01:23.152 ********
2020-04-30 12:41:59 | fatal: [undercloud]: FAILED! => {
2020-04-30 12:41:59 |     "ceph_ansible_std_out_err": [
2020-04-30 12:41:59 |         "ansible-playbook 2.8.11",
2020-04-30 12:41:59 |         "  config file = /usr/share/ceph-ansible/ansible.cfg",
2020-04-30 12:41:59 |         "  configured module search path = ['/usr/share/ceph-ansible/library']",
2020-04-30 12:41:59 |         "  ansible python module location = /usr/lib/python3.6/site-packages/ansible",
2020-04-30 12:41:59 |         "  executable location = /bin/ansible-playbook",
2020-04-30 12:41:59 |         "  python version = 3.6.8 (default, Dec  5 2019, 15:45:45) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]",
2020-04-30 12:41:59 |         "Using /usr/share/ceph-ansible/ansible.cfg as config file",
....
2020-04-30 12:41:59 |         "TASK [ceph-handler : set_fact handler_osd_status] ******************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/main.yml:11",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:56 -0400 (0:00:00.116)       0:00:36.166 ******** ",
2020-04-30 12:41:59 |         "TASK [ceph-handler : set_fact handler_mds_status] ******************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/main.yml:16",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:56 -0400 (0:00:00.112)       0:00:36.278 ******** ",
2020-04-30 12:41:59 |         "TASK [ceph-handler : set_fact handler_rgw_status] ******************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/main.yml:21",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:57 -0400 (0:00:00.107)       0:00:36.385 ******** ",
2020-04-30 12:41:59 |         "TASK [ceph-handler : set_fact handler_nfs_status] ******************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/main.yml:26",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:57 -0400 (0:00:00.107)       0:00:36.493 ******** ",
2020-04-30 12:41:59 |         "TASK [ceph-handler : set_fact handler_rbd_status] ******************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/main.yml:31",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:57 -0400 (0:00:00.107)       0:00:36.601 ******** ",
2020-04-30 12:41:59 |         "TASK [ceph-handler : set_fact handler_mgr_status] ******************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/roles/ceph-handler/tasks/main.yml:36",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:57 -0400 (0:00:00.107)       0:00:36.709 ******** ",
2020-04-30 12:41:59 |         "    handler_mgr_status: false",
2020-04-30 12:41:59 |         "TASK [get docker version] ******************************************************",
2020-04-30 12:41:59 |         "task path: /usr/share/ceph-ansible/infrastructure-playbooks/docker-to-podman.yml:58",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:57 -0400 (0:00:00.118)       0:00:36.827 ******** ",
2020-04-30 12:41:59 |         "fatal: [controller-0]: FAILED! => changed=false ",
2020-04-30 12:41:59 |         "  cmd: docker --version",
2020-04-30 12:41:59 |         "  msg: '[Errno 2] No such file or directory: ''docker'': ''docker'''",
2020-04-30 12:41:59 |         "PLAY RECAP *********************************************************************",
2020-04-30 12:41:59 |         "controller-0               : ok=37   changed=0    unreachable=0    failed=1    skipped=57   rescued=0    ignored=0   ",
2020-04-30 12:41:59 |         "Thursday 30 April 2020  12:41:57 -0400 (0:00:00.256)       0:00:37.084 ******** ",
2020-04-30 12:41:59 |         "=============================================================================== ",
2020-04-30 12:41:59 |         "gather and delegate facts ---------------------------------------------- 17.78s",

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Run the FFWD2 procedure into one of the controllers
2. Re-run the ceph_systemd step in the same controller it got already upgraded.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Giulio Fidente 2020-04-30 18:10:12 UTC
it looks like the docker command wasn't found on the second run, is this due to the hosting node being upgraded to rhel8 as well with docker being replaced by podman?

Comment 3 RHEL Program Management 2020-05-04 13:29:04 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 6 Jose Luis Franco 2020-05-04 14:35:13 UTC
Created attachment 1684870 [details]
ceph-ansible logs

Comment 19 Yogev Rabl 2020-07-13 19:07:36 UTC
verified

Comment 21 errata-xmlrpc 2020-07-20 14:21:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3003


Note You need to log in before you can comment on or make changes to this bug.