Bug 2228425 - switch-from-non-containerized-to-containerized-ceph-daemons yaml failing in task : [waiting for the monitor to join the quorum...] [NEEDINFO]
Summary: switch-from-non-containerized-to-containerized-ceph-daemons yaml failing in t...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 4.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 7.1
Assignee: Teoman ONAY
QA Contact: Manisha Saini
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-08-02 11:33 UTC by Pawan
Modified: 2023-08-02 12:15 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
bkunal: needinfo? (tonay)


Attachments (Terms of Use)

Description Pawan 2023-08-02 11:33:55 UTC
Description of problem:

switch-from-non-containerized-to-containerized-ceph-daemons yaml failing in task : [waiting for the monitor to join the quorum...]

ansible-playbook infrastructure-playbooks/switch-from-non-containerized-to-containerized-ceph-daemons.yml -i hosts 

2023-08-02 06:10:17,548 p=2408 u=cephuser n=ansible | TASK [waiting for the monitor to join the quorum...] **********************************************************************************
2023-08-02 06:10:17,549 p=2408 u=cephuser n=ansible | Wednesday 02 August 2023  06:10:17 -0400 (0:00:00.024)       0:01:04.506 ******
2023-08-02 06:14:02,688 p=2408 u=cephuser n=ansible | fatal: [ceph-amk-recovery-qmk2lk-node1-installer]: FAILED! => changed=false
  attempts: 20
  cmd:
  - podman
  - run
  - --rm
  - -v
  - /etc/ceph:/etc/ceph:z
  - --entrypoint=ceph
  - quay.io/ceph/daemon:latest-nautilus
  - --cluster
  - ceph
  - quorum_status
  - --format
  - json
  delta: '0:00:00.949622'
  end: '2023-08-02 06:14:02.660484'
  rc: 0
  start: '2023-08-02 06:14:01.710862'
  stderr: ''
  stderr_lines: <omitted>
  stdout: |2-

    {"election_epoch":28,"quorum":[0,2],"quorum_names":["ceph-amk-recovery-qmk2lk-node3","ceph-amk-recovery-qmk2lk-node2"],"quorum_leader_name":"ceph-amk-recovery-qmk2lk-node3","quorum_age":252,"monmap":{"epoch":1,"fsid":"139a26d2-7079-428c-81c5-4ff990d2a2b7","modified":"2023-08-02 00:33:39.815248","created":"2023-08-02 00:33:39.815248","min_mon_release":14,"min_mon_release_name":"nautilus","election_strategy":1,"disallowed_leaders: ":"","stretch_mode":false,"features":{"persistent":["kraken","luminous","mimic","osdmap-prune","nautilus","elector-pinging"],"optional":[]},"mons":[{"rank":0,"name":"ceph-amk-recovery-qmk2lk-node3","public_addrs":{"addrvec":[{"type":"v2","addr":"10.0.208.117:3300","nonce":0},{"type":"v1","addr":"10.0.208.117:6789","nonce":0}]},"addr":"10.0.208.117:6789/0","public_addr":"10.0.208.117:6789/0","crush_location":"{}"},{"rank":1,"name":"ceph-amk-recovery-qmk2lk-node1-installer","public_addrs":{"addrvec":[{"type":"v2","addr":"10.0.211.66:3300","nonce":0},{"type":"v1","addr":"10.0.211.66:6789","nonce":0}]},"addr":"10.0.211.66:6789/0","public_addr":"10.0.211.66:6789/0","crush_location":"{}"},{"rank":2,"name":"ceph-amk-recovery-qmk2lk-node2","public_addrs":{"addrvec":[{"type":"v2","addr":"10.0.211.227:3300","nonce":0},{"type":"v1","addr":"10.0.211.227:6789","nonce":0}]},"addr":"10.0.211.227:6789/0","public_addr":"10.0.211.227:6789/0","crush_location":"{}"}]}}
  stdout_lines: <omitted>
2023-08-02 06:14:02,689 p=2408 u=cephuser n=ansible | PLAY RECAP ****************************************************************************************************************************
2023-08-02 06:14:02,689 p=2408 u=cephuser n=ansible | ceph-amk-recovery-qmk2lk-node1-installer : ok=104  changed=10   unreachable=0    failed=1    skipped=209  rescued=0    ignored=0
2023-08-02 06:14:02,689 p=2408 u=cephuser n=ansible | ceph-amk-recovery-qmk2lk-node2 : ok=25   changed=0    unreachable=0    failed=0    skipped=75   rescued=0    ignored=0
2023-08-02 06:14:02,689 p=2408 u=cephuser n=ansible | ceph-amk-recovery-qmk2lk-node3 : ok=25   changed=0    unreachable=0    failed=0    skipped=75   rescued=0    ignored=0
2023-08-02 06:14:02,689 p=2408 u=cephuser n=ansible | ceph-amk-recovery-qmk2lk-node4 : ok=25   changed=0    unreachable=0    failed=0    skipped=86   rescued=0    ignored=0
2023-08-02 06:14:02,689 p=2408 u=cephuser n=ansible | ceph-amk-recovery-qmk2lk-node5 : ok=25   changed=0    unreachable=0    failed=0    skipped=86   rescued=0    ignored=0
2023-08-02 06:14:02,690 p=2408 u=cephuser n=ansible | ceph-amk-recovery-qmk2lk-node6 : ok=26   changed=0    unreachable=0    failed=0    skipped=87   rescued=0    ignored=0
2023-08-02 06:14:02,690 p=2408 u=cephuser n=ansible | ceph-amk-recovery-qmk2lk-node7 : ok=23   changed=0    unreachable=0    failed=0    skipped=75   rescued=0    ignored=0
2023-08-02 06:14:02,690 p=2408 u=cephuser n=ansible | localhost                  : ok=0    changed=0    unreachable=0    failed=0    skipped=2    rescued=0    ignored=0
2023-08-02 06:14:02,690 p=2408 u=cephuser n=ansible | Wednesday 02 August 2023  06:14:02 -0400 (0:03:45.141)       0:04:49.647 ******


journalctl logs for mon daemon post running the yml:

Aug 02 06:50:12 ceph-amk-recovery-qmk2lk-node1-installer ceph-mon-ceph-amk-recovery-qmk2lk-node1-installer[6167]: debug 2023-08-02 06:50:12.812 7faf7e282700  0 can't decode unknown message type 140 MSG_AUTH=17
Aug 02 06:50:12 ceph-amk-recovery-qmk2lk-node1-installer ceph-mon-ceph-amk-recovery-qmk2lk-node1-installer[6167]: debug 2023-08-02 06:50:12.815 7faf8328c700  0 can't decode unknown message type 140 MSG_AUTH=17
Aug 02 06:50:12 ceph-amk-recovery-qmk2lk-node1-installer ceph-mon-ceph-amk-recovery-qmk2lk-node1-installer[6167]: debug 2023-08-02 06:50:12.818 7faf8328c700  0 can't decode unknown message type 140 MSG_AUTH=17
Aug 02 06:50:13 ceph-amk-recovery-qmk2lk-node1-installer ceph-mon-ceph-amk-recovery-qmk2lk-node1-installer[6167]: debug 2023-08-02 06:50:13.019 7faf83a8d700  0 can't decode unknown message type 140 MSG_AUTH=17
Aug 02 06:50:13 ceph-amk-recovery-qmk2lk-node1-installer ceph-mon-ceph-amk-recovery-qmk2lk-node1-installer[6167]: debug 2023-08-02 06:50:13.020 7faf8328c700  0 can't decode unknown message type 140 MSG_AUTH=17
Aug 02 06:50:13 ceph-amk-recovery-qmk2lk-node1-installer ceph-mon-ceph-amk-recovery-qmk2lk-node1-installer[6167]: debug 2023-08-02 06:50:13.021 7faf8328c700  0 can't decode unknown message type 140 MSG_AUTH=17
Aug 02 06:50:13 ceph-amk-recovery-qmk2lk-node1-installer ceph-mon-ceph-amk-recovery-qmk2lk-node1-installer[6167]: debug 2023-08-02 06:50:13.021 7faf8328c700  0 can't decode unknown message type 140 MSG_AUTH=17
Aug 02 06:50:13 ceph-amk-recovery-qmk2lk-node1-installer ceph-mon-ceph-amk-recovery-qmk2lk-node1-installer[6167]: debug 2023-08-02 06:50:13.022 7faf8328c700  0 can't decode unknown message type 140 MSG_AUTH=17
Aug 02 06:50:13 ceph-amk-recovery-qmk2lk-node1-installer ceph-mon-ceph-amk-recovery-qmk2lk-node1-installer[6167]: debug 2023-08-02 06:50:13.023 7faf8328c700  0 can't decode unknown message type 140 MSG_AUTH=17
Aug 02 06:50:13 ceph-amk-recovery-qmk2lk-node1-installer ceph-mon-ceph-amk-recovery-qmk2lk-node1-installer[6167]: debug 2023-08-02 06:50:13.024 7faf8328c700  0 can't decode unknown message type 140 MSG_AUTH=17
Aug 02 06:50:13 ceph-amk-recovery-qmk2lk-node1-installer ceph-mon-ceph-amk-recovery-qmk2lk-node1-installer[6167]: debug 2023-08-02 06:50:13.025 7faf8328c700  0 can't decode unknown message type 140 MSG_AUTH=17


Version-Release number of selected component (if applicable):
4.2z1 GA 
ceph version 14.2.22-128.el8cp (40a2bf9c4e79e39754d69a95cd51bd60991284be) nautilus (stable)

How reproducible:
3/3

Steps to Reproduce:
1. Deploy RPM based 4.3z1 RHCS cluster
2. Run playbook: switch-from-non-containerized-to-containerized-ceph-daemons.yml 
3. Observe failures for monitor daemon. post this, mon daemon is running, but it is not part of the quorum.

Actual results:
Playbook failed.

Expected results:
Playbook should not fail.

Additional info:


Note You need to log in before you can comment on or make changes to this bug.