Bug 2228413 - Containerized based 4.3 deployments are failing in step : TASK [ceph-mgr : wait for all mgr to be up]
Summary: Containerized based 4.3 deployments are failing in step : TASK [ceph-mgr : wa...
Keywords:
Status: CLOSED DUPLICATE of bug 2169767
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.3
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 7.1
Assignee: Teoman ONAY
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-02 10:54 UTC by Pawan
Modified: 2023-08-04 04:08 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-04 04:08:32 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-7144 0 None None None 2023-08-02 13:35:17 UTC

Description Pawan 2023-08-02 10:54:29 UTC
Description of problem:
Containerized based 4.3 deployments are failing in step : TASK [ceph-mgr : wait for all mgr to be up] 

2023-08-01 05:15:48,049 p=135284 u=admin n=ansible | <dell-r640-039.dsal.lab.eng.rdu2.redhat.com> (0, b'\n{"cmd": ["podman", "exec", "ceph-mon-dell-r640-039", "ceph", "--cluster", "ceph", "mgr", "dump", "-f", "json"], "stdout": "\\n{\\"epoch\\":1,\\"active_gid\\":0,\\"active_name\\":\\"\\",\\"active_addrs\\":{\\"addrvec\\":[]},\\"active_addr\\":\\":/0\\",\\"active_change\\":\\"0.000000\\",\\"available\\":false,\\"standbys\\":[],\\"modules\\":[\\"iostat\\",\\"restful\\"],\\"available_modules\\":[],\\"services\\":{},\\"always_on_modules\\":{\\"nautilus\\":[\\"balancer\\",\\"crash\\",\\"devicehealth\\",\\"orchestrator_cli\\",\\"progress\\",\\"rbd_support\\",\\"status\\",\\"volumes\\"]}}", "stderr": "", "rc": 0, "start": "2023-08-01 05:15:47.658988", "end": "2023-08-01 05:15:48.033574", "delta": "0:00:00.374586", "changed": true, "invocation": {"module_args": {"_raw_params": "podman exec ceph-mon-dell-r640-039 ceph --cluster ceph mgr dump -f json", "warn": true, "_uses_shell": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}}\n', b"OpenSSH_8.0p1, OpenSSL 1.1.1k  FIPS 25 Mar 2021\r\ndebug1: Reading configuration data /home/admin/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host dell-r640-039.dsal.lab.eng.rdu2.redhat.com originally dell-r640-039.dsal.lab.eng.rdu2.redhat.com\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: not matched 'final'\r\ndebug2: match not found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 (parse only)\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-,gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]\r\ndebug1: configuration requests final Match pass\r\ndebug1: re-parsing configuration\r\ndebug1: Reading configuration data /home/admin/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host dell-r640-039.dsal.lab.eng.rdu2.redhat.com originally dell-r640-039.dsal.lab.eng.rdu2.redhat.com\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: matched 'final'\r\ndebug2: match found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-,gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 122037\r\ndebug3: mux_client_request_session: session request sent\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n")
2023-08-01 05:15:53,061 p=135284 u=admin n=ansible | Using module file /usr/lib/python3.6/site-packages/ansible/modules/commands/command.py
2023-08-01 05:15:53,061 p=135284 u=admin n=ansible | Pipelining is enabled.
2023-08-01 05:15:53,062 p=135284 u=admin n=ansible | <dell-r640-039.dsal.lab.eng.rdu2.redhat.com> ESTABLISH SSH CONNECTION FOR USER: None
2023-08-01 05:15:53,062 p=135284 u=admin n=ansible | <dell-r640-039.dsal.lab.eng.rdu2.redhat.com> SSH: EXEC ssh -vvv -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=60 -o ControlPath=/home/admin/.ansible/cp/%h-%r-%p dell-r640-039.dsal.lab.eng.rdu2.redhat.com '/bin/sh -c '"'"'sudo -H -S -n  -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-dozrddkvoqrvdxlqckfdfjegvqezdjlu ; /usr/libexec/platform-python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
2023-08-01 05:15:53,103 p=135284 u=admin n=ansible | Escalation succeeded
2023-08-01 05:15:53,634 p=135284 u=admin n=ansible | <dell-r640-039.dsal.lab.eng.rdu2.redhat.com> (0, b'\n{"cmd": ["podman", "exec", "ceph-mon-dell-r640-039", "ceph", "--cluster", "ceph", "mgr", "dump", "-f", "json"], "stdout": "\\n{\\"epoch\\":1,\\"active_gid\\":0,\\"active_name\\":\\"\\",\\"active_addrs\\":{\\"addrvec\\":[]},\\"active_addr\\":\\":/0\\",\\"active_change\\":\\"0.000000\\",\\"available\\":false,\\"standbys\\":[],\\"modules\\":[\\"iostat\\",\\"restful\\"],\\"available_modules\\":[],\\"services\\":{},\\"always_on_modules\\":{\\"nautilus\\":[\\"balancer\\",\\"crash\\",\\"devicehealth\\",\\"orchestrator_cli\\",\\"progress\\",\\"rbd_support\\",\\"status\\",\\"volumes\\"]}}", "stderr": "", "rc": 0, "start": "2023-08-01 05:15:53.222176", "end": "2023-08-01 05:15:53.618200", "delta": "0:00:00.396024", "changed": true, "invocation": {"module_args": {"_raw_params": "podman exec ceph-mon-dell-r640-039 ceph --cluster ceph mgr dump -f json", "warn": true, "_uses_shell": false, "stdin_add_newline": true, "strip_empty_ends": true, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}}\n', b"OpenSSH_8.0p1, OpenSSL 1.1.1k  FIPS 25 Mar 2021\r\ndebug1: Reading configuration data /home/admin/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host dell-r640-039.dsal.lab.eng.rdu2.redhat.com originally dell-r640-039.dsal.lab.eng.rdu2.redhat.com\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: not matched 'final'\r\ndebug2: match not found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1 (parse only)\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-,gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]\r\ndebug1: configuration requests final Match pass\r\ndebug1: re-parsing configuration\r\ndebug1: Reading configuration data /home/admin/.ssh/config\r\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug3: /etc/ssh/ssh_config line 52: Including file /etc/ssh/ssh_config.d/05-redhat.conf depth 0\r\ndebug1: Reading configuration data /etc/ssh/ssh_config.d/05-redhat.conf\r\ndebug2: checking match for 'final all' host dell-r640-039.dsal.lab.eng.rdu2.redhat.com originally dell-r640-039.dsal.lab.eng.rdu2.redhat.com\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 3: matched 'final'\r\ndebug2: match found\r\ndebug3: /etc/ssh/ssh_config.d/05-redhat.conf line 5: Including file /etc/crypto-policies/back-ends/openssh.config depth 1\r\ndebug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config\r\ndebug3: gss kex names ok: [gss-curve25519-sha256-,gss-nistp256-sha256-,gss-group14-sha256-,gss-group16-sha512-,gss-gex-sha1-,gss-group14-sha1-]\r\ndebug3: kex names ok: [curve25519-sha256,curve25519-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1]\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_forwards: request forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 122037\r\ndebug3: mux_client_request_session: session request sent\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Received exit status from master 0\r\n")
2023-08-01 05:15:53,644 p=122013 u=admin n=ansible | fatal: [dell-r640-078.dsal.lab.eng.rdu2.redhat.com -> dell-r640-039.dsal.lab.eng.rdu2.redhat.com]: FAILED! => changed=false 
  attempts: 30
  cmd:
  - podman
  - exec
  - ceph-mon-dell-r640-039
  - ceph
  - --cluster
  - ceph
  - mgr
  - dump
  - -f
  - json
  delta: '0:00:00.396024'
  end: '2023-08-01 05:15:53.618200'
  invocation:
    module_args:
      _raw_params: podman exec ceph-mon-dell-r640-039 ceph --cluster ceph mgr dump -f json
      _uses_shell: false
      argv: null
      chdir: null
      creates: null
      executable: null
      removes: null
      stdin: null
      stdin_add_newline: true
      strip_empty_ends: true
      warn: true
  rc: 0
  start: '2023-08-01 05:15:53.222176'
  stderr: ''
  stderr_lines: <omitted>
  stdout: |2-
  
    {"epoch":1,"active_gid":0,"active_name":"","active_addrs":{"addrvec":[]},"active_addr":":/0","active_change":"0.000000","available":false,"standbys":[],"modules":["iostat","restful"],"available_modules":[],"services":{},"always_on_modules":{"nautilus":["balancer","crash","devicehealth","orchestrator_cli","progress","rbd_support","status","volumes"]}}
  stdout_lines: <omitted>
2023-08-01 05:15:53,644 p=122013 u=admin n=ansible | NO MORE HOSTS LEFT *********************************************************************************************************************************************************************************************************
2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | PLAY RECAP *****************************************************************************************************************************************************************************************************************
2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | dell-r640-039.dsal.lab.eng.rdu2.redhat.com : ok=218  changed=12   unreachable=0    failed=0    skipped=329  rescued=0    ignored=0   
2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | dell-r640-069.dsal.lab.eng.rdu2.redhat.com : ok=189  changed=8    unreachable=0    failed=0    skipped=305  rescued=0    ignored=0   
2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | dell-r640-073.dsal.lab.eng.rdu2.redhat.com : ok=128  changed=5    unreachable=0    failed=0    skipped=240  rescued=0    ignored=0   
2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | dell-r640-078.dsal.lab.eng.rdu2.redhat.com : ok=138  changed=15   unreachable=0    failed=1    skipped=246  rescued=0    ignored=0   
2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | dell-r640-083.dsal.lab.eng.rdu2.redhat.com : ok=70   changed=2    unreachable=0    failed=0    skipped=173  rescued=0    ignored=0   
2023-08-01 05:15:53,645 p=122013 u=admin n=ansible | INSTALLER STATUS ***********************************************************************************************************************************************************************************************************
2023-08-01 05:15:53,648 p=122013 u=admin n=ansible | Install Ceph Monitor           : Complete (0:00:26)
2023-08-01 05:15:53,648 p=122013 u=admin n=ansible | Install Ceph Manager           : In Progress (0:03:18)
2023-08-01 05:15:53,648 p=122013 u=admin n=ansible | 	This phase can be restarted by running: roles/ceph-mgr/tasks/main.yml



Errors captured from journalctl logs for mgr daemon: 

Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: Starting Ceph Manager...
Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43199]: Error: no container with ID or name "ceph-mgr-dell-r640-069" found: no such container
Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43218]: Error: no container with ID or name "ceph-mgr-dell-r640-069" found: no such container
Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43236]:
Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43236]: be0b946aea64c017eb9841f3243e896ed72f2463bdb3125a94bb8cbf23e8bbe9
Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: Started Ceph Manager.
Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com ceph-mgr-dell-r640-069[43260]: find: '/var/lib/ceph/mgr/ceph-dell-r640-069/keyring': Permission denied
Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com ceph-mgr-dell-r640-069[43260]: chown: cannot access '/var/lib/ceph/mgr/ceph-dell-r640-069/keyring': Permission denied
Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: ceph-mgr: Main process exited, code=exited, status=1/FAILURE
Aug 01 04:09:21 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: ceph-mgr: Failed with result 'exit-code'.
Aug 01 04:09:31 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: ceph-mgr: Service RestartSec=10s expired, scheduling restart.
Aug 01 04:09:31 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: ceph-mgr: Scheduled restart job, restart counter is at 1.
Aug 01 04:09:31 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: Stopped Ceph Manager.
Aug 01 04:09:31 dell-r640-069.dsal.lab.eng.rdu2.redhat.com systemd[1]: Starting Ceph Manager...
Aug 01 04:09:31 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43354]: Error: no container with ID or name "ceph-mgr-dell-r640-069" found: no such container
Aug 01 04:09:31 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43375]: Error: no container with ID or name "ceph-mgr-dell-r640-069" found: no such container
Aug 01 04:09:32 dell-r640-069.dsal.lab.eng.rdu2.redhat.com podman[43394]:


$ cat /usr/share/ceph-ansible/group_vars/all.yml | grep -v "^#" | grep -v "^$"
---
dummy:
configure_firewall: False
ceph_repository_type: cdn
ceph_origin: repository
ceph_repository: rhcs
ceph_rhcs_version: 4
ceph_iscsi_config_dev: false
monitor_interface: eno1
public_network: 10.1.240.0/23
radosgw_interface: eno1
ceph_docker_image: "rhceph/rhceph-4-rhel8"
ceph_docker_image_tag: "latest"
ceph_docker_registry: "registry.redhat.io"
ceph_docker_registry_auth: true
ceph_docker_registry_username: "qa"
ceph_docker_registry_password: "MTQj5t3n5K86p3gH"
containerized_deployment: True
dashboard_admin_user: admin
dashboard_admin_password: passwd
node_exporter_container_image: registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.6
grafana_admin_user: admin
grafana_admin_password: passwd
grafana_container_image: registry.redhat.io/rhceph/rhceph-4-dashboard-rhel8:4
prometheus_container_image: registry.redhat.io/openshift4/ose-prometheus:v4.6
alertmanager_container_image: registry.redhat.io/openshift4/ose-prometheus-alertmanager:v4.6


Version-Release number of selected component (if applicable):
4.3z1 

How reproducible:
3/3

Steps to Reproduce:
1. Try deploying Containerized 4.3z1 RHCS cluster.
2. Observe failures during mgr deployment.

Actual results:
Deployment failed

Expected results:
No failures.

Additional info:
The exact same all.yml with param : containerized_deployment: False , Passed. 2/2 times. Issue looks like with only containerized based deployment.

Comment 10 Pawan 2023-08-04 04:08:32 UTC

*** This bug has been marked as a duplicate of bug 2169767 ***


Note You need to log in before you can comment on or make changes to this bug.