1820142 – ceph validations not working downstream

Bug 1820142 - ceph validations not working downstream

Summary: ceph validations not working downstream

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-validations
Sub Component:
Version:	16.0 (Train)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	beta
Target Release:	16.1 (Train on RHEL 8.2)
Assignee:	Francesco Pantano
QA Contact:	Yogev Rabl
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-04-02 11:02 UTC by Jose Luis Franco
Modified:	2020-07-29 07:51 UTC (History)
CC List:	7 users (show)
Fixed In Version:	openstack-tripleo-validations-11.3.2-0.20200611115251.08f469d.el8ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-29 07:51:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1871380	None	None	None	2020-04-07 12:19:58 UTC
OpenStack gerrit	718011	None	MERGED	Introducing tripleo_delegate_to on ceph health validation	2020-09-10 13:16:08 UTC
OpenStack gerrit	718012	None	MERGED	Add tripleo_delegate_to var for ceph health validation	2020-09-10 13:16:08 UTC
OpenStack gerrit	725303	None	MERGED	Introducing tripleo_delegate_to on ceph health validation	2020-09-10 13:16:08 UTC
OpenStack gerrit	725304	None	MERGED	Add tripleo_delegate_to var for ceph health validation	2020-09-10 13:16:08 UTC
Red Hat Product Errata	RHBA-2020:3148	None	None	None	2020-07-29 07:51:27 UTC

Description Jose Luis Franco 2020-04-02 11:02:11 UTC

Description of problem:

The ceph validations included in tripleo-heat-templates do not work when upgrading downstream. The validation considers that the ceph-ansible package is retrieved always from "centos-ceph-nautilus" (https://github.com/openstack/tripleo-validations/blob/4899441f68a53ce4c547ca1d40e4b42609906b1a/playbooks/ceph-ansible-installed.yaml#L12) which is not the case in a RHOSP deployment.

A better approach would be to retrieve the value of the registry from the containers-prepare-parameters.yaml file, this way we make sure that the package is updated and retrieved from the right registry.

The workaround to not get blocked by this validation is to skip the "opendev-validation" tags, however they are useful and valid to avoid further trouble so we would like to enable them back once this gets solved.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy OSP16 from a CI job with ceph.
2. Run openstack overcloud external-upgrade run \
        --stack overcloud \
        --tags ceph_systemd \
        -e ceph_ansible_limit=controller-0 
3.

Actual results:


Expected results:


Additional info:

Comment 3 Jose Luis Franco 2020-04-06 10:58:32 UTC

Thanks a lot for the comment Francesco. I will give that a try and comment back in the bugzilla.

Comment 4 Jose Luis Franco 2020-04-06 20:30:34 UTC

Hello Francesco,

Reopening this bug as after making use of the CephAnsibleRepo variable a new issue is appearing, now in a different validation, ceph-health.yaml:

Monday 06 April 2020  16:24:16 -0400 (0:00:00.441)       0:02:31.646 ********** 
ok: [undercloud -> 192.168.24.10] => {
    "inventory_hostname": "undercloud"
}

TASK [ceph : Set container_cli fact from the inventory] *********************************************************************************************************************
task path: /usr/share/openstack-tripleo-validations/roles/ceph/tasks/ceph-health.yaml:13
Monday 06 April 2020  16:24:17 -0400 (0:00:00.400)       0:02:32.046 ********** 
ok: [undercloud -> 192.168.24.10] => {"ansible_facts": {"container_cli": "podman"}, "changed": false}

TASK [ceph : Set container filter format] ***********************************************************************************************************************************
task path: /usr/share/openstack-tripleo-validations/roles/ceph/tasks/ceph-health.yaml:17
Monday 06 April 2020  16:24:18 -0400 (0:00:00.675)       0:02:32.722 ********** 
ok: [undercloud -> 192.168.24.10] => {"ansible_facts": {"container_filter_format": "--format '{{ .Names }}'"}, "changed": false}

TASK [ceph : Set ceph_mon_container name] ***********************************************************************************************************************************
task path: /usr/share/openstack-tripleo-validations/roles/ceph/tasks/ceph-health.yaml:21
Monday 06 April 2020  16:24:18 -0400 (0:00:00.414)       0:02:33.136 ********** 
fatal: [undercloud -> 192.168.24.10]: FAILED! => {"changed": false, "cmd": "podman ps --format '{{ .Names }}' | grep ceph-mon", "delta": "0:00:00.006923", "end": "2020-04-06
 20:24:18.699377", "msg": "non-zero return code", "rc": 1, "start": "2020-04-06 20:24:18.692454", "stderr": "/bin/sh: podman: command not found", "stderr_lines": ["/bin/sh: 
podman: command not found"], "stdout": "", "stdout_lines": []}


The first debug task was added by me to check what's the content of the inventory_hostname. As this task is being run from ceph-base.yaml using delegate_to, the task runs in a ceph node but the inventory_hostname still points to undercloud. This shouldn't be a problem in a deployment scenario, but here we are upgrading from OSP13 to OSP16 so the Undercloud is in RHEL8 (container_cli=podman) and the ceph nodes in RHEL7 (container_cli=docker).

Digging up a little bit in the Ansible behavior, this seems like a common thing. So a solution is parametrizing the target host in the ceph-health validation or find some other way to get the inventory hostname for the delegated target.

Comment 11 errata-xmlrpc 2020-07-29 07:51:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3148

Note You need to log in before you can comment on or make changes to this bug.