Bug 1851764 - Dashboard deployment fails on secondary site, when deploying rgw-multisite
Summary: Dashboard deployment fails on secondary site, when deploying rgw-multisite
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: z2
: 4.1
Assignee: Dimitri Savineau
QA Contact: Sunil Angadi
Aron Gunn
URL:
Whiteboard:
: 1851793 (view as bug list)
Depends On:
Blocks: 1816167
TreeView+ depends on / blocked
 
Reported: 2020-06-28 19:39 UTC by Heðin
Modified: 2023-12-15 18:22 UTC (History)
12 users (show)

Fixed In Version: ceph-ansible-4.0.29-1.el8cp, ceph-ansible-4.0.29-1.el7cp
Doc Type: Bug Fix
Doc Text:
.{storage-product} Dashboard fails when deploying a Ceph Object Gateway secondary site Previously, the {storage-product} Dashboard would fail to deploy the secondary site in a Ceph Object Gateway multi-site deployment, because when Ceph Ansible ran the `radosgw-admin user create` command, the command would return an error. With this release, the Ceph Ansible task in the deployment process has been split into two different tasks. Doing this allows the {storage-product} Dashboard to deploy a Ceph Object Gateway secondary site successfully.
Clone Of:
Environment:
Last Closed: 2020-09-30 17:26:19 UTC
Embargoed:
hmoller: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 5471 0 None closed ceph-dashboard: update create/get rgw user tasks (bp #5077) 2021-01-22 06:33:02 UTC
Red Hat Issue Tracker RHCEPH-8053 0 None None None 2023-12-15 18:22:51 UTC
Red Hat Product Errata RHBA-2020:4144 0 None None None 2020-09-30 17:26:44 UTC

Description Heðin 2020-06-28 19:39:35 UTC
Description of problem:
with rgw-multisite, dashboard deployment fails on secondary site because task tries to run task "get radosgw system user" on all mon's but through the container named ceph-mon-site2m1 on all 3 mons.

Version-Release number of selected component (if applicable):
container image rhceph:4-27 and ceph-ansible 4.0.23

How reproducible:
100%

Steps to Reproduce:
1. deploy a rgw-multiste setup without deploying dashboard initially
2. deploy dashboard on secondary site, collocated on osd node
3. fails with the result below.

Actual results:
2020-06-28 17:24:26,943 p=99004 u=cephadmin n=ansible | TASK [ceph-dashboard : get radosgw system user] *******************************
********************************************************
2020-06-28 17:24:26,943 p=99004 u=cephadmin n=ansible | Sunday 28 June 2020  17:24:26 +0300 (0:00:00.119)       0:10:26.424 ***********
2020-06-28 17:24:27,340 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (3 retries left).
2020-06-28 17:24:27,411 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (3 retries left).
2020-06-28 17:24:27,701 p=99004 u=cephadmin n=ansible | changed: [site2m1]
2020-06-28 17:24:32,595 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (2 retries left).
2020-06-28 17:24:32,671 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (2 retries left).
2020-06-28 17:24:37,850 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (1 retries left).
2020-06-28 17:24:37,929 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (1 retries left).
2020-06-28 17:24:43,116 p=99004 u=cephadmin n=ansible | fatal: [site2m2]: FAILED! => changed=true
  attempts: 3
  cmd:
  - timeout
  - --foreground
  - -s
  - KILL
  - '20'
  - podman
  - exec
  - ceph-mon-site2m1
  - radosgw-admin
  - --cluster
  - ceph
  - user
  - info
  - --uid=ceph-dashboard
  delta: '0:00:00.055607'
  end: '2020-06-28 17:24:44.044413'
  msg: non-zero return code
  rc: 125
  start: '2020-06-28 17:24:43.988806'
  stderr: 'Error: no container with name or ID ceph-mon-site2m1 found: no such container'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
2020-06-28 17:24:43,192 p=99004 u=cephadmin n=ansible | fatal: [site2m3]: FAILED! => changed=true
  attempts: 3
  cmd:
  - timeout
  - --foreground
  - -s
  - KILL
  - '20'
  - podman
  - exec
  - ceph-mon-site2m1
  - radosgw-admin
  - --cluster
  - ceph
  - user
  - info
  - --uid=ceph-dashboard
  delta: '0:00:00.055752'
  end: '2020-06-28 17:24:44.121714'
  msg: non-zero return code
  rc: 125
  start: '2020-06-28 17:24:44.065962'
  stderr: 'Error: no container with name or ID ceph-mon-site2m1 found: no such container'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

Expected results:
successfull playthrough

Additional info:
Affected task is found in roles/ceph-dashboard/tasks/configure_dashboard.yml
Initial workaround has been to add run_once: true to the task.

Comment 2 Dimitri Savineau 2020-06-29 14:36:22 UTC
*** Bug 1851917 has been marked as a duplicate of this bug. ***

Comment 4 Guillaume Abrioux 2020-08-20 14:11:43 UTC
*** Bug 1851793 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2020-09-30 17:26:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4144


Note You need to log in before you can comment on or make changes to this bug.