1851764 – Dashboard deployment fails on secondary site, when deploying rgw-multisite

Bug 1851764 - Dashboard deployment fails on secondary site, when deploying rgw-multisite

Summary: Dashboard deployment fails on secondary site, when deploying rgw-multisite

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	4.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	z2
Target Release:	4.1
Assignee:	Dimitri Savineau
QA Contact:	Sunil Angadi
Docs Contact:	Aron Gunn
URL:
Whiteboard:
Duplicates (1):	1851793 (view as bug list)
Depends On:
Blocks:	1816167
TreeView+	depends on / blocked

Reported:	2020-06-28 19:39 UTC by Heðin
Modified:	2023-12-15 18:22 UTC (History)
CC List:	12 users (show)
Fixed In Version:	ceph-ansible-4.0.29-1.el8cp, ceph-ansible-4.0.29-1.el7cp
Doc Type:	Bug Fix
Doc Text:	.{storage-product} Dashboard fails when deploying a Ceph Object Gateway secondary site Previously, the {storage-product} Dashboard would fail to deploy the secondary site in a Ceph Object Gateway multi-site deployment, because when Ceph Ansible ran the `radosgw-admin user create` command, the command would return an error. With this release, the Ceph Ansible task in the deployment process has been split into two different tasks. Doing this allows the {storage-product} Dashboard to deploy a Ceph Object Gateway secondary site successfully.
Clone Of:
Environment:
Last Closed:	2020-09-30 17:26:19 UTC
Embargoed:
Dependent Products:
Flags:	hmoller: needinfo-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 5471	None	closed	ceph-dashboard: update create/get rgw user tasks (bp #5077)	2021-01-22 06:33:02 UTC
Red Hat Issue Tracker	RHCEPH-8053	None	None	None	2023-12-15 18:22:51 UTC
Red Hat Product Errata	RHBA-2020:4144	None	None	None	2020-09-30 17:26:44 UTC

Description Heðin 2020-06-28 19:39:35 UTC

Description of problem:
with rgw-multisite, dashboard deployment fails on secondary site because task tries to run task "get radosgw system user" on all mon's but through the container named ceph-mon-site2m1 on all 3 mons.

Version-Release number of selected component (if applicable):
container image rhceph:4-27 and ceph-ansible 4.0.23

How reproducible:
100%

Steps to Reproduce:
1. deploy a rgw-multiste setup without deploying dashboard initially
2. deploy dashboard on secondary site, collocated on osd node
3. fails with the result below.

Actual results:
2020-06-28 17:24:26,943 p=99004 u=cephadmin n=ansible | TASK [ceph-dashboard : get radosgw system user] *******************************
********************************************************
2020-06-28 17:24:26,943 p=99004 u=cephadmin n=ansible | Sunday 28 June 2020  17:24:26 +0300 (0:00:00.119)       0:10:26.424 ***********
2020-06-28 17:24:27,340 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (3 retries left).
2020-06-28 17:24:27,411 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (3 retries left).
2020-06-28 17:24:27,701 p=99004 u=cephadmin n=ansible | changed: [site2m1]
2020-06-28 17:24:32,595 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (2 retries left).
2020-06-28 17:24:32,671 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (2 retries left).
2020-06-28 17:24:37,850 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (1 retries left).
2020-06-28 17:24:37,929 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (1 retries left).
2020-06-28 17:24:43,116 p=99004 u=cephadmin n=ansible | fatal: [site2m2]: FAILED! => changed=true
  attempts: 3
  cmd:
  - timeout
  - --foreground
  - -s
  - KILL
  - '20'
  - podman
  - exec
  - ceph-mon-site2m1
  - radosgw-admin
  - --cluster
  - ceph
  - user
  - info
  - --uid=ceph-dashboard
  delta: '0:00:00.055607'
  end: '2020-06-28 17:24:44.044413'
  msg: non-zero return code
  rc: 125
  start: '2020-06-28 17:24:43.988806'
  stderr: 'Error: no container with name or ID ceph-mon-site2m1 found: no such container'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
2020-06-28 17:24:43,192 p=99004 u=cephadmin n=ansible | fatal: [site2m3]: FAILED! => changed=true
  attempts: 3
  cmd:
  - timeout
  - --foreground
  - -s
  - KILL
  - '20'
  - podman
  - exec
  - ceph-mon-site2m1
  - radosgw-admin
  - --cluster
  - ceph
  - user
  - info
  - --uid=ceph-dashboard
  delta: '0:00:00.055752'
  end: '2020-06-28 17:24:44.121714'
  msg: non-zero return code
  rc: 125
  start: '2020-06-28 17:24:44.065962'
  stderr: 'Error: no container with name or ID ceph-mon-site2m1 found: no such container'
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

Expected results:
successfull playthrough

Additional info:
Affected task is found in roles/ceph-dashboard/tasks/configure_dashboard.yml
Initial workaround has been to add run_once: true to the task.

Comment 2 Dimitri Savineau 2020-06-29 14:36:22 UTC

*** Bug 1851917 has been marked as a duplicate of this bug. ***

Comment 4 Guillaume Abrioux 2020-08-20 14:11:43 UTC

*** Bug 1851793 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2020-09-30 17:26:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4144

Note You need to log in before you can comment on or make changes to this bug.