Bug 1968177

Summary:

switch-to-containerized fails and leaves cluster in degraded state

Product:

[Red Hat Storage] Red Hat Ceph Storage

Reporter:

Heðin <hej>

Component:

Ceph-Ansible

Assignee:

Dimitri Savineau <dsavinea>

Status:

CLOSED ERRATA

QA Contact:

Ameena Suhani S H <amsyedha>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.2

CC:

aschoen, ceph-eng-bugs, gabrioux, gmeno, nthomas, tserlin, vereddy, ykaul

Target Milestone:

---

Target Release:

4.2z3

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

ceph-ansible-4.0.61-1.el8cp, ceph-ansible-4.0.61-1.el7cp

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-09-27 18:26:24 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
See 2021-06-06 12:11:47,842	none

Description Heðin 2021-06-06 14:08:41 UTC

Created attachment 1789117 [details]
See 2021-06-06 12:11:47,842

Description of problem:
switch-to-containerised fails on RHCS-4 when the following is not set in all.yml:
ceph_docker_registry: "registry.redhat.io"
ceph_docker_registry_auth: true
ceph_docker_registry_username:
ceph_docker_registry_password:

But it does not fail until after the non-containerized mon service have been removed.
This results in the cluster missing a monitor and the playbook fails on subsequent runs because it can't find the removed mon service

Version-Release number of selected component (if applicable):
ceph-ansible.noarch                  4.0.41-1.el7cp          @rhel-7-server-rhceph-4-tools-rpms


How reproducible:
I deploye RHCS-3, non-containerized, upgraded to RHCS-4 non-containerized, followed by running infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml without adding the above-mentioned ceph_docker_registry variables.

Steps to Reproduce:
1. Install RHCS-3 non-containerized
2. Upgrade to RHCS-4
3. Convert to containerized, by running infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml with -limit rhceph01 # rhceph01 first monitor

Actual results:
mon on rhceph01 is removed and cluster is left with 2 functioning mon's and health_warn

Expected results:
Early fail of playbook, with a message pointing out that registry.redhat.io requires said variables to be set, while keeping the cluster HEALTH_OK

Additional info:
Look at line: 2021-06-06 12:11:47,842 in the attached ansible.log

Comment 1 Heðin 2021-06-06 14:10:33 UTC

Set prio to high because the cluster is left in a degraded state.

Comment 2 Guillaume Abrioux 2021-07-02 12:09:52 UTC

v4.0.59 available upstream

Comment 7 Ameena Suhani S H 2021-08-04 06:10:36 UTC

Verified using 

ansible-2.9.24-1.el8ae.noarch
ceph-ansible-4.0.62-1.el8cp.noarch

Comment 9 errata-xmlrpc 2021-09-27 18:26:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.2 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3670