Bug 1820142 - ceph validations not working downstream
Summary: ceph validations not working downstream
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-validations
Version: 16.0 (Train)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: beta
: 16.1 (Train on RHEL 8.2)
Assignee: Francesco Pantano
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-02 11:02 UTC by Jose Luis Franco
Modified: 2020-07-29 07:51 UTC (History)
7 users (show)

Fixed In Version: openstack-tripleo-validations-11.3.2-0.20200611115251.08f469d.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-29 07:51:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1871380 0 None None None 2020-04-07 12:19:58 UTC
OpenStack gerrit 718011 0 None MERGED Introducing tripleo_delegate_to on ceph health validation 2020-09-10 13:16:08 UTC
OpenStack gerrit 718012 0 None MERGED Add tripleo_delegate_to var for ceph health validation 2020-09-10 13:16:08 UTC
OpenStack gerrit 725303 0 None MERGED Introducing tripleo_delegate_to on ceph health validation 2020-09-10 13:16:08 UTC
OpenStack gerrit 725304 0 None MERGED Add tripleo_delegate_to var for ceph health validation 2020-09-10 13:16:08 UTC
Red Hat Product Errata RHBA-2020:3148 0 None None None 2020-07-29 07:51:27 UTC

Description Jose Luis Franco 2020-04-02 11:02:11 UTC
Description of problem:

The ceph validations included in tripleo-heat-templates do not work when upgrading downstream. The validation considers that the ceph-ansible package is retrieved always from "centos-ceph-nautilus" (https://github.com/openstack/tripleo-validations/blob/4899441f68a53ce4c547ca1d40e4b42609906b1a/playbooks/ceph-ansible-installed.yaml#L12) which is not the case in a RHOSP deployment.

A better approach would be to retrieve the value of the registry from the containers-prepare-parameters.yaml file, this way we make sure that the package is updated and retrieved from the right registry.

The workaround to not get blocked by this validation is to skip the "opendev-validation" tags, however they are useful and valid to avoid further trouble so we would like to enable them back once this gets solved.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy OSP16 from a CI job with ceph.
2. Run openstack overcloud external-upgrade run \
        --stack overcloud \
        --tags ceph_systemd \
        -e ceph_ansible_limit=controller-0 
3.

Actual results:


Expected results:


Additional info:

Comment 3 Jose Luis Franco 2020-04-06 10:58:32 UTC
Thanks a lot for the comment Francesco. I will give that a try and comment back in the bugzilla.

Comment 4 Jose Luis Franco 2020-04-06 20:30:34 UTC
Hello Francesco,

Reopening this bug as after making use of the CephAnsibleRepo variable a new issue is appearing, now in a different validation, ceph-health.yaml:

Monday 06 April 2020  16:24:16 -0400 (0:00:00.441)       0:02:31.646 ********** 
ok: [undercloud -> 192.168.24.10] => {
    "inventory_hostname": "undercloud"
}

TASK [ceph : Set container_cli fact from the inventory] *********************************************************************************************************************
task path: /usr/share/openstack-tripleo-validations/roles/ceph/tasks/ceph-health.yaml:13
Monday 06 April 2020  16:24:17 -0400 (0:00:00.400)       0:02:32.046 ********** 
ok: [undercloud -> 192.168.24.10] => {"ansible_facts": {"container_cli": "podman"}, "changed": false}

TASK [ceph : Set container filter format] ***********************************************************************************************************************************
task path: /usr/share/openstack-tripleo-validations/roles/ceph/tasks/ceph-health.yaml:17
Monday 06 April 2020  16:24:18 -0400 (0:00:00.675)       0:02:32.722 ********** 
ok: [undercloud -> 192.168.24.10] => {"ansible_facts": {"container_filter_format": "--format '{{ .Names }}'"}, "changed": false}

TASK [ceph : Set ceph_mon_container name] ***********************************************************************************************************************************
task path: /usr/share/openstack-tripleo-validations/roles/ceph/tasks/ceph-health.yaml:21
Monday 06 April 2020  16:24:18 -0400 (0:00:00.414)       0:02:33.136 ********** 
fatal: [undercloud -> 192.168.24.10]: FAILED! => {"changed": false, "cmd": "podman ps --format '{{ .Names }}' | grep ceph-mon", "delta": "0:00:00.006923", "end": "2020-04-06
 20:24:18.699377", "msg": "non-zero return code", "rc": 1, "start": "2020-04-06 20:24:18.692454", "stderr": "/bin/sh: podman: command not found", "stderr_lines": ["/bin/sh: 
podman: command not found"], "stdout": "", "stdout_lines": []}


The first debug task was added by me to check what's the content of the inventory_hostname. As this task is being run from ceph-base.yaml using delegate_to, the task runs in a ceph node but the inventory_hostname still points to undercloud. This shouldn't be a problem in a deployment scenario, but here we are upgrading from OSP13 to OSP16 so the Undercloud is in RHEL8 (container_cli=podman) and the ceph nodes in RHEL7 (container_cli=docker).

Digging up a little bit in the Ansible behavior, this seems like a common thing. So a solution is parametrizing the target host in the ceph-health validation or find some other way to get the inventory hostname for the delegated target.

Comment 11 errata-xmlrpc 2020-07-29 07:51:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148


Note You need to log in before you can comment on or make changes to this bug.