Bug 1794195

Summary: ceph-container-common role is skipped with containerized HCI environment
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Dimitri Savineau <dsavinea>
Component: Ceph-AnsibleAssignee: Dimitri Savineau <dsavinea>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: medium Docs Contact:
Priority: high    
Version: 4.0CC: aschoen, ceph-eng-bugs, flucifre, gabrioux, gmeno, hgurav, johfulto, nthomas, tchandra, tserlin, ykaul, yrabl
Target Milestone: rc   
Target Release: 4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.12-1.el8cp, ceph-ansible-4.0.12-1.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-31 12:48:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1642481    

Description Dimitri Savineau 2020-01-22 21:34:07 UTC
Description of problem:

On containerized HCI environment the OSD and Client nodes are collocated.
The ceph-container-common role isn't executed on the Client nodes except the first one (for keyring purpose) so multiple tasks are skipped including:
  - registry authentication (if set to true)
  - udev rules removal
  - set facts (ceph and docker version) used in later roles

Version-Release number of selected component (if applicable):
ceph-ansible 4.0.11

How reproducible:
100%

Steps to Reproduce:
1. deploy containerized HCI ceph with ceph-ansible (see additional info)


Actual results:

The results is different when using docker or podman

a) docker

TASK [ceph-osd : generate ceph osd docker run script] **************************
Tuesday 21 January 2020  23:01:03 +0300 (0:00:01.415)       0:24:14.961 *******
fatal: [hci-1]: FAILED! => changed=false
  msg: 'AnsibleUndefinedVariable: ''dict object'' has no attribute ''split'''
fatal: [hci-2]: FAILED! => changed=false
  msg: 'AnsibleUndefinedVariable: ''dict object'' has no attribute ''split'''

because ceph_docker_version variable isn't set due to ceph-container-common role being skipped (except hci-0).

TASK [ceph-container-common : include prerequisites.yml] ***********************
Wednesday 21 January 2020  22:55:21 +0300 (0:00:01.505)       0:04:07.762 ***** 
skipping: [hci-1] => changed=false 
  skip_reason: Conditional result was False
skipping: [hci-2] => changed=false 
  skip_reason: Conditional result was False

b) podman

if ceph_docker_registry_auth=false then the playbook succeeded but some important tasks aren't executed.
if ceph_docker_registry_auth=true then the playbook fails on task trying to pull the container image.


fatal: [hci-2]: FAILED! => changed=true
  (...)
  stderr: |-
    Trying to pull registry.redhat.io/rhceph-beta/rhceph-4-rhel8:latest...time="2020-01-22T16:27:23-05:00" level=error msg="Error pulling image ref //registry.redhat.io/rhceph-beta/rhceph-4-rhel8:latest: Error initializing source docker://registry.redhat.io/rhceph-beta/rhceph-4-rhel8:latest: unable to retrieve auth token: invalid username/password"
    Failed
    Error: unable to pull registry.redhat.io/rhceph-beta/rhceph-4-rhel8:latest: unable to pull image: Error initializing source docker://registry.redhat.io/rhceph-beta/rhceph-4-rhel8:latest: unable to retrieve auth token: invalid username/password

because the registry auth task has been skipped.

Expected results:

The ceph-container-common role isn't skipped.


Additional info:

---------
containerized_deployment: true
(optional)
ceph_docker_registry_auth: true
---------

---------
[osds]
hci-0
hci-1
hci-2

[clients]
hci-0
hci-1
hci-2
---------

Comment 2 Federico Lucifredi 2020-01-23 04:12:57 UTC
This is a blocker.

Comment 10 Yogev Rabl 2020-01-29 14:29:40 UTC
Verified

Comment 12 errata-xmlrpc 2020-01-31 12:48:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0312