Bug 1851764

Summary: Dashboard deployment fails on secondary site, when deploying rgw-multisite
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Heðin <hmoller>
Component: Ceph-AnsibleAssignee: Dimitri Savineau <dsavinea>
Status: CLOSED ERRATA QA Contact: Sunil Angadi <sangadi>
Severity: medium Docs Contact: Aron Gunn <agunn>
Priority: unspecified    
Version: 4.1CC: agunn, aschoen, ceph-eng-bugs, dsavinea, gabrioux, gjose, gmeno, maydin, nthomas, sangadi, tserlin, ykaul
Target Milestone: z2Flags: hmoller: needinfo-
Target Release: 4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.29-1.el8cp, ceph-ansible-4.0.29-1.el7cp Doc Type: Bug Fix
Doc Text:
.{storage-product} Dashboard fails when deploying a Ceph Object Gateway secondary site Previously, the {storage-product} Dashboard would fail to deploy the secondary site in a Ceph Object Gateway multi-site deployment, because when Ceph Ansible ran the `radosgw-admin user create` command, the command would return an error. With this release, the Ceph Ansible task in the deployment process has been split into two different tasks. Doing this allows the {storage-product} Dashboard to deploy a Ceph Object Gateway secondary site successfully.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-30 17:26:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1816167    

Description Heðin 2020-06-28 19:39:35 UTC
Description of problem:
with rgw-multisite, dashboard deployment fails on secondary site because task tries to run task "get radosgw system user" on all mon's but through the container named ceph-mon-site2m1 on all 3 mons.

Version-Release number of selected component (if applicable):
container image rhceph:4-27 and ceph-ansible 4.0.23

How reproducible:
100%

Steps to Reproduce:
1. deploy a rgw-multiste setup without deploying dashboard initially
2. deploy dashboard on secondary site, collocated on osd node
3. fails with the result below.

Actual results:
2020-06-28 17:24:26,943 p=99004 u=cephadmin n=ansible | TASK [ceph-dashboard : get radosgw system user] *******************************
********************************************************
2020-06-28 17:24:26,943 p=99004 u=cephadmin n=ansible | Sunday 28 June 2020  17:24:26 +0300 (0:00:00.119)       0:10:26.424 ***********
2020-06-28 17:24:27,340 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (3 retries left).
2020-06-28 17:24:27,411 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (3 retries left).
2020-06-28 17:24:27,701 p=99004 u=cephadmin n=ansible | changed: [site2m1]
2020-06-28 17:24:32,595 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (2 retries left).
2020-06-28 17:24:32,671 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (2 retries left).
2020-06-28 17:24:37,850 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (1 retries left).
2020-06-28 17:24:37,929 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (1 retries left).
2020-06-28 17:24:43,116 p=99004 u=cephadmin n=ansible | fatal: [site2m2]: FAILED! => changed=true
  attempts: 3
  cmd:
  - timeout
  - --foreground
  - -s
  - KILL
  - '20'
  - podman
  - exec
  - ceph-mon-site2m1
  - radosgw-admin
  - --cluster
  - ceph
  - user
  - info
  - --uid=ceph-dashboard
  delta: '0:00:00.055607'
  end: '2020-06-28 17:24:44.044413'
  msg: non-zero return code
  rc: 125
  start: '2020-06-28 17:24:43.988806'
  stderr: 'Error: no container with name or ID ceph-mon-site2m1 found: no such container'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
2020-06-28 17:24:43,192 p=99004 u=cephadmin n=ansible | fatal: [site2m3]: FAILED! => changed=true
  attempts: 3
  cmd:
  - timeout
  - --foreground
  - -s
  - KILL
  - '20'
  - podman
  - exec
  - ceph-mon-site2m1
  - radosgw-admin
  - --cluster
  - ceph
  - user
  - info
  - --uid=ceph-dashboard
  delta: '0:00:00.055752'
  end: '2020-06-28 17:24:44.121714'
  msg: non-zero return code
  rc: 125
  start: '2020-06-28 17:24:44.065962'
  stderr: 'Error: no container with name or ID ceph-mon-site2m1 found: no such container'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

Expected results:
successfull playthrough

Additional info:
Affected task is found in roles/ceph-dashboard/tasks/configure_dashboard.yml
Initial workaround has been to add run_once: true to the task.

Comment 2 Dimitri Savineau 2020-06-29 14:36:22 UTC
*** Bug 1851917 has been marked as a duplicate of this bug. ***

Comment 4 Guillaume Abrioux 2020-08-20 14:11:43 UTC
*** Bug 1851793 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2020-09-30 17:26:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4144