Bug 1968177 - switch-to-containerized fails and leaves cluster in degraded state
Summary: switch-to-containerized fails and leaves cluster in degraded state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2z3
Assignee: Dimitri Savineau
QA Contact: Ameena Suhani S H
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-06 14:08 UTC by Heðin
Modified: 2021-09-27 18:26 UTC (History)
8 users (show)

Fixed In Version: ceph-ansible-4.0.61-1.el8cp, ceph-ansible-4.0.61-1.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-09-27 18:26:24 UTC
Embargoed:


Attachments (Terms of Use)
See 2021-06-06 12:11:47,842 (5.60 MB, text/plain)
2021-06-06 14:08 UTC, Heðin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 6627 0 None open switch2container: run ceph-validate role 2021-06-28 14:50:26 UTC
Red Hat Issue Tracker RHCEPH-294 0 None None None 2021-08-19 06:31:59 UTC
Red Hat Product Errata RHBA-2021:3670 0 None None None 2021-09-27 18:26:47 UTC

Description Heðin 2021-06-06 14:08:41 UTC
Created attachment 1789117 [details]
See 2021-06-06 12:11:47,842

Description of problem:
switch-to-containerised fails on RHCS-4 when the following is not set in all.yml:
ceph_docker_registry: "registry.redhat.io"
ceph_docker_registry_auth: true
ceph_docker_registry_username:
ceph_docker_registry_password:

But it does not fail until after the non-containerized mon service have been removed.
This results in the cluster missing a monitor and the playbook fails on subsequent runs because it can't find the removed mon service

Version-Release number of selected component (if applicable):
ceph-ansible.noarch                  4.0.41-1.el7cp          @rhel-7-server-rhceph-4-tools-rpms


How reproducible:
I deploye RHCS-3, non-containerized, upgraded to RHCS-4 non-containerized, followed by running infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml without adding the above-mentioned ceph_docker_registry variables.

Steps to Reproduce:
1. Install RHCS-3 non-containerized
2. Upgrade to RHCS-4
3. Convert to containerized, by running infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml with -limit rhceph01 # rhceph01 first monitor

Actual results:
mon on rhceph01 is removed and cluster is left with 2 functioning mon's and health_warn

Expected results:
Early fail of playbook, with a message pointing out that registry.redhat.io requires said variables to be set, while keeping the cluster HEALTH_OK

Additional info:
Look at line: 2021-06-06 12:11:47,842 in the attached ansible.log

Comment 1 Heðin 2021-06-06 14:10:33 UTC
Set prio to high because the cluster is left in a degraded state.

Comment 2 Guillaume Abrioux 2021-07-02 12:09:52 UTC
v4.0.59 available upstream

Comment 7 Ameena Suhani S H 2021-08-04 06:10:36 UTC
Verified using 

ansible-2.9.24-1.el8ae.noarch
ceph-ansible-4.0.62-1.el8cp.noarch

Comment 9 errata-xmlrpc 2021-09-27 18:26:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.2 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3670


Note You need to log in before you can comment on or make changes to this bug.