Description of problem: Containerized based 4.3 deployments are failing in step : TASK [ceph-mgr : wait for all mgr to be up] 2023-08-01 05:15:48,049 p=135284 u=admin n=ansible | <dell-r640-039.dsal.lab.eng.rdu2.redhat.com> (0, b'\n{"cmd": ["podman", "exec", "ceph-mon-dell-r640-039", "ceph", "--cluster", "ceph", "mgr", "dump", "-f", "json"], "stdout": "\\n{\\"epoch\\":1,\\"active_gid\\":0,\\"active_name\\":\\"\\",\\"active_addrs\\":{\\"addrvec\\":[]},\\"active_addr\\":\\":/0\\",\\"active_change\\":\\"0.000000\\",\\"available\\":false,\\"standbys\\":[],\\"modules\\":[\\"iostat\\",\\"restful\\"],\\"available_modules\\":[],\\"services\\":{},\\"always_on_modules\\":{\\"nautilus\\":[\\"balancer\\",\\"crash\\",\\"devicehealth\\",\\"orchestrator_cli\\",\\"progress\\",\\"rbd_support\\",\\"status\\",\\"volumes\\"]}}", "stderr": "", "rc": 0, "start": "2023-08-01 05:15:47.658988", "end": "2023-08-01 05:15:48.033574", "delta": "0:00:00.374586", "changed": true, "invocation": {"module_args": {"_raw_params": "podman exec ceph-mon-dell-r640-039 ceph --cluster ceph mgr dump -f json", "warn": true, "_uses_shell": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}}\n', b"OpenSSH_8.0p1, OpenSSL 1.1.1k FIPS 25 Mar 2021\r\ndebug1: Reading configuration data /home/admin/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host dell-r640-039.dsal.lab.eng.rdu2.redhat.com originally dell-r640-039.dsal.lab.eng.rdu2.redhat.com\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: not matched 'final'\r\ndebug2: match not found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 (parse only)\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-,gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]\r\ndebug1: configuration requests final Match pass\r\ndebug1: re-parsing configuration\r\ndebug1: Reading configuration data /home/admin/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host dell-r640-039.dsal.lab.eng.rdu2.redhat.com originally dell-r640-039.dsal.lab.eng.rdu2.redhat.com\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: matched 'final'\r\ndebug2: match found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-,gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 122037\r\ndebug3: mux_client_request_session: session request sent\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n") 2023-08-01 05:15:53,061 p=135284 u=admin n=ansible | Using module file /usr/lib/python3.6/site-packages/ansible/modules/commands/command.py 2023-08-01 05:15:53,061 p=135284 u=admin n=ansible | Pipelining is enabled. 2023-08-01 05:15:53,062 p=135284 u=admin n=ansible | <dell-r640-039.dsal.lab.eng.rdu2.redhat.com> ESTABLISH SSH CONNECTION FOR USER: None 2023-08-01 05:15:53,062 p=135284 u=admin n=ansible | <dell-r640-039.dsal.lab.eng.rdu2.redhat.com> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=60 -o ControlPath=/home/admin/.ansible/cp/%h-%r-%p dell-r640-039.dsal.lab.eng.rdu2.redhat.com '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-dozrddkvoqrvdxlqckfdfjegvqezdjlu ; /usr/libexec/platform-python'"'"'"'"'"'"'"'"' && sleep 0'"'"'' 2023-08-01 05:15:53,103 p=135284 u=admin n=ansible | Escalation succeeded 2023-08-01 05:15:53,634 p=135284 u=admin n=ansible | <dell-r640-039.dsal.lab.eng.rdu2.redhat.com> (0, b'\n{"cmd": ["podman", "exec", "ceph-mon-dell-r640-039", "ceph", "--cluster", "ceph", "mgr", "dump", "-f", "json"], "stdout": "\\n{\\"epoch\\":1,\\"active_gid\\":0,\\"active_name\\":\\"\\",\\"active_addrs\\":{\\"addrvec\\":[]},\\"active_addr\\":\\":/0\\",\\"active_change\\":\\"0.000000\\",\\"available\\":false,\\"standbys\\":[],\\"modules\\":[\\"iostat\\",\\"restful\\"],\\"available_modules\\":[],\\"services\\":{},\\"always_on_modules\\":{\\"nautilus\\":[\\"balancer\\",\\"crash\\",\\"devicehealth\\",\\"orchestrator_cli\\",\\"progress\\",\\"rbd_support\\",\\"status\\",\\"volumes\\"]}}", "stderr": "", "rc": 0, "start": "2023-08-01 05:15:53.222176", "end": "2023-08-01 05:15:53.618200", "delta": "0:00:00.396024", "changed": true, "invocation": {"module_args": {"_raw_params": "podman exec ceph-mon-dell-r640-039 ceph --cluster ceph mgr dump -f json", "warn": true, "_uses_shell": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}}\n', b"OpenSSH_8.0p1, OpenSSL 1.1.1k FIPS 25 Mar 2021\r\ndebug1: Reading configuration data /home/admin/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host dell-r640-039.dsal.lab.eng.rdu2.redhat.com originally dell-r640-039.dsal.lab.eng.rdu2.redhat.com\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: not matched 'final'\r\ndebug2: match not found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 (parse only)\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-,gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]\r\ndebug1: configuration requests final Match pass\r\ndebug1: re-parsing configuration\r\ndebug1: Reading configuration data /home/admin/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host dell-r640-039.dsal.lab.eng.rdu2.redhat.com originally dell-r640-039.dsal.lab.eng.rdu2.redhat.com\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: matched 'final'\r\ndebug2: match found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-,gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 122037\r\ndebug3: mux_client_request_session: session request sent\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n") 2023-08-01 05:15:53,644 p=122013 u=admin n=ansible | fatal: [dell-r640-078.dsal.lab.eng.rdu2.redhat.com -> dell-r640-039.dsal.lab.eng.rdu2.redhat.com]: FAILED! => changed=false attempts: 30 cmd: - podman - exec - ceph-mon-dell-r640-039 - ceph - --cluster - ceph - mgr - dump - -f - json delta: '0:00:00.396024' end: '2023-08-01 05:15:53.618200' invocation: module_args: _raw_params: podman exec ceph-mon-dell-r640-039 ceph --cluster ceph mgr dump -f json _uses_shell: false argv: null chdir: null creates: null executable: null removes: null stdin: null stdin_add_newline: true strip_empty_ends: true warn: true rc: 0 start: '2023-08-01 05:15:53.222176' stderr: '' stderr_lines: <omitted> stdout: |2- {"epoch":1,"active_gid":0,"active_name":"","active_addrs":{"addrvec":[]},"active_addr":":/0","active_change":"0.000000","available":false,"standbys":[],"modules":["iostat","restful"],"available_modules":[],"services":{},"always_on_modules":{"nautilus":["balancer","crash","devicehealth","orchestrator_cli","progress","rbd_support","status","volumes"]}} stdout_lines: <omitted> 2023-08-01 05:15:53,644 p=122013 u=admin n=ansible | NO MORE HOSTS LEFT ********************************************************************************************************************************************************************************************************* 2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | PLAY RECAP ***************************************************************************************************************************************************************************************************************** 2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | dell-r640-039.dsal.lab.eng.rdu2.redhat.com : ok=218 changed=12 unreachable=0 failed=0 skipped=329 rescued=0 ignored=0 2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | dell-r640-069.dsal.lab.eng.rdu2.redhat.com : ok=189 changed=8 unreachable=0 failed=0 skipped=305 rescued=0 ignored=0 2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | dell-r640-073.dsal.lab.eng.rdu2.redhat.com : ok=128 changed=5 unreachable=0 failed=0 skipped=240 rescued=0 ignored=0 2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | dell-r640-078.dsal.lab.eng.rdu2.redhat.com : ok=138 changed=15 unreachable=0 failed=1 skipped=246 rescued=0 ignored=0 2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | dell-r640-083.dsal.lab.eng.rdu2.redhat.com : ok=70 changed=2 unreachable=0 failed=0 skipped=173 rescued=0 ignored=0 2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | INSTALLER STATUS *********************************************************************************************************************************************************************************************************** 2023-08-01 05:15:53,648 p=122013 u=admin n=ansible | Install Ceph Monitor : Complete (0:00:26) 2023-08-01 05:15:53,648 p=122013 u=admin n=ansible | Install Ceph Manager : In Progress (0:03:18) 2023-08-01 05:15:53,648 p=122013 u=admin n=ansible | This phase can be restarted by running: roles/ceph-mgr/tasks/main.yml Errors captured from journalctl logs for mgr daemon: Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: Starting Ceph Manager... Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43199]: Error: no container with ID or name "ceph-mgr-dell-r640-069" found: no such container Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43218]: Error: no container with ID or name "ceph-mgr-dell-r640-069" found: no such container Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43236]: Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43236]: be0b946aea64c017eb9841f3243e896ed72f2463bdb3125a94bb8cbf23e8bbe9 Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: Started Ceph Manager. Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com ceph-mgr-dell-r640-069[43260]: find: '/var/lib/ceph/mgr/ceph-dell-r640-069/keyring': Permission denied Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com ceph-mgr-dell-r640-069[43260]: chown: cannot access '/var/lib/ceph/mgr/ceph-dell-r640-069/keyring': Permission denied Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: ceph-mgr: Main process exited, code=exited, status=1/FAILURE Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: ceph-mgr: Failed with result 'exit-code'. Aug 01 04:09:31 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: ceph-mgr: Service RestartSec=10s expired, scheduling restart. Aug 01 04:09:31 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: ceph-mgr: Scheduled restart job, restart counter is at 1. Aug 01 04:09:31 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: Stopped Ceph Manager. Aug 01 04:09:31 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: Starting Ceph Manager... Aug 01 04:09:31 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43354]: Error: no container with ID or name "ceph-mgr-dell-r640-069" found: no such container Aug 01 04:09:31 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43375]: Error: no container with ID or name "ceph-mgr-dell-r640-069" found: no such container Aug 01 04:09:32 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43394]: $ cat /usr/share/ceph-ansible/group_vars/all.yml | grep -v "^#" | grep -v "^$" --- dummy: configure_firewall: False ceph_repository_type: cdn ceph_origin: repository ceph_repository: rhcs ceph_rhcs_version: 4 ceph_iscsi_config_dev: false monitor_interface: eno1 public_network: 10.1.240.0/23 radosgw_interface: eno1 ceph_docker_image: "rhceph/rhceph-4-rhel8" ceph_docker_image_tag: "latest" ceph_docker_registry: "registry.redhat.io" ceph_docker_registry_auth: true ceph_docker_registry_username: "qa" ceph_docker_registry_password: "MTQj5t3n5K86p3gH" containerized_deployment: True dashboard_admin_user: admin dashboard_admin_password: passwd node_exporter_container_image: registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.6 grafana_admin_user: admin grafana_admin_password: passwd grafana_container_image: registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4 prometheus_container_image: registry.redhat.io/openshift4/ose-prometheus:v4.6 alertmanager_container_image: registry.redhat.io/openshift4/ose-prometheus-alertmanager:v4.6 Version-Release number of selected component (if applicable): 4.3z1 How reproducible: 3/3 Steps to Reproduce: 1. Try deploying Containerized 4.3z1 RHCS cluster. 2. Observe failures during mgr deployment. Actual results: Deployment failed Expected results: No failures. Additional info: The exact same all.yml with param : containerized_deployment: False , Passed. 2/2 times. Issue looks like with only containerized based deployment.
*** This bug has been marked as a duplicate of bug 2169767 ***