Description of problem: with rgw-multisite, dashboard deployment fails on secondary site because task tries to run task "get radosgw system user" on all mon's but through the container named ceph-mon-site2m1 on all 3 mons. Version-Release number of selected component (if applicable): container image rhceph:4-27 and ceph-ansible 4.0.23 How reproducible: 100% Steps to Reproduce: 1. deploy a rgw-multiste setup without deploying dashboard initially 2. deploy dashboard on secondary site, collocated on osd node 3. fails with the result below. Actual results: 2020-06-28 17:24:26,943 p=99004 u=cephadmin n=ansible | TASK [ceph-dashboard : get radosgw system user] ******************************* ******************************************************** 2020-06-28 17:24:26,943 p=99004 u=cephadmin n=ansible | Sunday 28 June 2020 17:24:26 +0300 (0:00:00.119) 0:10:26.424 *********** 2020-06-28 17:24:27,340 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (3 retries left). 2020-06-28 17:24:27,411 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (3 retries left). 2020-06-28 17:24:27,701 p=99004 u=cephadmin n=ansible | changed: [site2m1] 2020-06-28 17:24:32,595 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (2 retries left). 2020-06-28 17:24:32,671 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (2 retries left). 2020-06-28 17:24:37,850 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (1 retries left). 2020-06-28 17:24:37,929 p=99004 u=cephadmin n=ansible | FAILED - RETRYING: get radosgw system user (1 retries left). 2020-06-28 17:24:43,116 p=99004 u=cephadmin n=ansible | fatal: [site2m2]: FAILED! => changed=true attempts: 3 cmd: - timeout - --foreground - -s - KILL - '20' - podman - exec - ceph-mon-site2m1 - radosgw-admin - --cluster - ceph - user - info - --uid=ceph-dashboard delta: '0:00:00.055607' end: '2020-06-28 17:24:44.044413' msg: non-zero return code rc: 125 start: '2020-06-28 17:24:43.988806' stderr: 'Error: no container with name or ID ceph-mon-site2m1 found: no such container' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> 2020-06-28 17:24:43,192 p=99004 u=cephadmin n=ansible | fatal: [site2m3]: FAILED! => changed=true attempts: 3 cmd: - timeout - --foreground - -s - KILL - '20' - podman - exec - ceph-mon-site2m1 - radosgw-admin - --cluster - ceph - user - info - --uid=ceph-dashboard delta: '0:00:00.055752' end: '2020-06-28 17:24:44.121714' msg: non-zero return code rc: 125 start: '2020-06-28 17:24:44.065962' stderr: 'Error: no container with name or ID ceph-mon-site2m1 found: no such container' stderr_lines: <omitted> stdout: '' stdout_lines: <omitted> Expected results: successfull playthrough Additional info: Affected task is found in roles/ceph-dashboard/tasks/configure_dashboard.yml Initial workaround has been to add run_once: true to the task.
*** Bug 1851917 has been marked as a duplicate of this bug. ***
*** Bug 1851793 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 4.1 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4144