1968177 – switch-to-containerized fails and leaves cluster in degraded state

Bug 1968177 - switch-to-containerized fails and leaves cluster in degraded state

Summary: switch-to-containerized fails and leaves cluster in degraded state

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	4.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.2z3
Assignee:	Dimitri Savineau
QA Contact:	Ameena Suhani S H
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-06 14:08 UTC by Heðin
Modified:	2021-09-27 18:26 UTC (History)
CC List:	8 users (show)
Fixed In Version:	ceph-ansible-4.0.61-1.el8cp, ceph-ansible-4.0.61-1.el7cp
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-09-27 18:26:24 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
See 2021-06-06 12:11:47,842 (5.60 MB, text/plain) 2021-06-06 14:08 UTC, Heðin	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 6627	None	open	switch2container: run ceph-validate role	2021-06-28 14:50:26 UTC
Red Hat Issue Tracker	RHCEPH-294	None	None	None	2021-08-19 06:31:59 UTC
Red Hat Product Errata	RHBA-2021:3670	None	None	None	2021-09-27 18:26:47 UTC

Description Heðin 2021-06-06 14:08:41 UTC

Created attachment 1789117 [details]
See 2021-06-06 12:11:47,842

Description of problem:
switch-to-containerised fails on RHCS-4 when the following is not set in all.yml:
ceph_docker_registry: "registry.redhat.io"
ceph_docker_registry_auth: true
ceph_docker_registry_username:
ceph_docker_registry_password:

But it does not fail until after the non-containerized mon service have been removed.
This results in the cluster missing a monitor and the playbook fails on subsequent runs because it can't find the removed mon service

Version-Release number of selected component (if applicable):
ceph-ansible.noarch                  4.0.41-1.el7cp          @rhel-7-server-rhceph-4-tools-rpms


How reproducible:
I deploye RHCS-3, non-containerized, upgraded to RHCS-4 non-containerized, followed by running infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml without adding the above-mentioned ceph_docker_registry variables.

Steps to Reproduce:
1. Install RHCS-3 non-containerized
2. Upgrade to RHCS-4
3. Convert to containerized, by running infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml with -limit rhceph01 # rhceph01 first monitor

Actual results:
mon on rhceph01 is removed and cluster is left with 2 functioning mon's and health_warn

Expected results:
Early fail of playbook, with a message pointing out that registry.redhat.io requires said variables to be set, while keeping the cluster HEALTH_OK

Additional info:
Look at line: 2021-06-06 12:11:47,842 in the attached ansible.log

Comment 1 Heðin 2021-06-06 14:10:33 UTC

Set prio to high because the cluster is left in a degraded state.

Comment 2 Guillaume Abrioux 2021-07-02 12:09:52 UTC

v4.0.59 available upstream

Comment 7 Ameena Suhani S H 2021-08-04 06:10:36 UTC

Verified using 

ansible-2.9.24-1.el8ae.noarch
ceph-ansible-4.0.62-1.el8cp.noarch

Comment 9 errata-xmlrpc 2021-09-27 18:26:24 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.2 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3670

Note You need to log in before you can comment on or make changes to this bug.