While using test builds of ceph-ansible 4 and the rhel8-based rhcs4 container image, a command is run with python [1]. That command fails because "python" is not a working command within the container, though python3 is. Symlinking python to python3 in /usr/bin works around the issue [2], but the container should be updated to handle this scenario (e.g. maybe here [3]?). This problem presents itself from ceph-ansible like [4]. [1] [root@overcloud-computehci-0 ~]# podman exec -ti ceph-osd-0 bash bash-4.4# bash-4.4# cat /opt/ceph-container/bin/osd_volume_activate.sh #!/bin/bash set -e function osd_volume_activate { : "${OSD_ID:?Give me an OSD ID to activate, eg: -e OSD_ID=0}" CEPH_VOLUME_LIST_JSON="$(ceph-volume lvm list --format json)" if ! echo "$CEPH_VOLUME_LIST_JSON" | python -c "import sys, json; print(json.load(sys.stdin)[\"$OSD_ID\"])" &> /dev/null; then log "OSD id $OSD_ID does not exist" exit 1 fi [2] Reproduce locally and then workaround: bash-4.4# ceph-volume lvm list --format json | python -c "import sys, json; print(json.load(sys.stdin)[\"0\"])" bash: python: command not found bash-4.4# bash-4.4# ln -s /usr/bin/python3 /usr/bin/python bash-4.4# ceph-volume lvm list --format json | python -c "import sys, json; print(json.load(sys.stdin)[\"0\"])" [{'devices': ... <snip> [3] https://github.com/ceph/ceph-container/tree/master/ceph-releases/ALL/rhel8/daemon [4] 2019-03-15 22:29:44,241 p=328317 u=root | fatal: [overcloud-computehci-1 -> 192.168.24.12]: FAILED! => changed=false attempts: 30 cmd: test "$(podman exec ceph-mon-overcloud-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_osds"])')" -gt 0 && test "$(podman exec ceph-mon-overcloud-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_osds"])')" = "$(podman exec ceph-mon-overcloud-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_up_osds"] )')" delta: '0:00:01.101414' end: '2019-03-15 22:29:44.218315' invocation: module_args: _raw_params: test "$(podman exec ceph-mon-overcloud-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_osds"])')" -gt 0 && test "$(podman exec ceph-mon-overcloud-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_osds"])')" = "$(podman exec ceph-mon-overcloud-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_up_osds"])')" _uses_shell: true argv: null chdir: null creates: null executable: null removes: null stdin: null warn: true msg: non-zero return code rc: 1 start: '2019-03-15 22:29:43.116901' stderr: '' stderr_lines: [] stdout: '' stdout_lines: <omitted> 2019-03-15 22:29:44,241 p=328317 u=root | NO MORE HOSTS LEFT ************************************************************* 2019-03-15 22:29:44,242 p=328317 u=root | PLAY RECAP ********************************************************************* 2019-03-15 22:29:44,242 p=328317 u=root | overcloud-computehci-0 : ok=90 changed=10 unreachable=0 failed=0 2019-03-15 22:29:44,242 p=328317 u=root | overcloud-computehci-1 : ok=89 changed=10 unreachable=0 failed=1 2019-03-15 22:29:44,242 p=328317 u=root | overcloud-controller-0 : ok=182 changed=11 unreachable=0 failed=0 2019-03-15 22:29:44,242 p=328317 u=root | overcloud-controller-1 : ok=170 changed=9 unreachable=0 failed=0 2019-03-15 22:29:44,242 p=328317 u=root | overcloud-controller-2 : ok=172 changed=11 unreachable=0 failed=0
You might also see this error when debugging locally like this: [root@overcloud-computehci-0 ~]# ./ceph-osd-run.sh 0 2019-03-15 23:10:16 /opt/ceph-container/bin/entrypoint.sh: OSD id 0 does not exist [root@overcloud-computehci-0 ~]#
> You might also see this error when debugging locally like this: > > [root@overcloud-computehci-0 ~]# ./ceph-osd-run.sh 0 > 2019-03-15 23:10:16 /opt/ceph-container/bin/entrypoint.sh: OSD id 0 does not exist In containerized deployment you need to use device name not OSD id
> In containerized deployment you need to use device name not OSD id Nevermind I forgot that rhcs 4 is based on nautilus so it's ceph-volume only (the statement was only true for ceph-disk deployment with container)
Reproduced: TASK [ceph-osd : wait for all osd to be up] ************************************", "task path: /usr/share/ceph-ansible/roles/ceph-osd/tasks/openstack_config.yml:2", "Friday 10 May 2019 18:27:47 +0000 (0:00:00.314) 0:04:05.481 ************ ", "FAILED - RETRYING: wait for all osd to be up (60 retries left).", "FAILED - RETRYING: wait for all osd to be up (59 retries left).", "FAILED - RETRYING: wait for all osd to be up (58 retries left).", "FAILED - RETRYING: wait for all osd to be up (57 retries left).", "FAILED - RETRYING: wait for all osd to be up (56 retries left).", "FAILED - RETRYING: wait for all osd to be up (55 retries left).", "FAILED - RETRYING: wait for all osd to be up (54 retries left).", "FAILED - RETRYING: wait for all osd to be up (53 retries left).", "FAILED - RETRYING: wait for all osd to be up (52 retries left).", "FAILED - RETRYING: wait for all osd to be up (51 retries left).", "FAILED - RETRYING: wait for all osd to be up (50 retries left).", "FAILED - RETRYING: wait for all osd to be up (49 retries left).", "FAILED - RETRYING: wait for all osd to be up (48 retries left).", "FAILED - RETRYING: wait for all osd to be up (47 retries left).", "FAILED - RETRYING: wait for all osd to be up (46 retries left).", "FAILED - RETRYING: wait for all osd to be up (45 retries left).", "FAILED - RETRYING: wait for all osd to be up (44 retries left).", "FAILED - RETRYING: wait for all osd to be up (43 retries left).", "FAILED - RETRYING: wait for all osd to be up (42 retries left).", "FAILED - RETRYING: wait for all osd to be up (41 retries left).", "FAILED - RETRYING: wait for all osd to be up (40 retries left).", "FAILED - RETRYING: wait for all osd to be up (39 retries left).", "FAILED - RETRYING: wait for all osd to be up (38 retries left).", "FAILED - RETRYING: wait for all osd to be up (37 retries left).", "FAILED - RETRYING: wait for all osd to be up (36 retries left).", "FAILED - RETRYING: wait for all osd to be up (35 retries left).", "FAILED - RETRYING: wait for all osd to be up (34 retries left).", "FAILED - RETRYING: wait for all osd to be up (33 retries left).", "FAILED - RETRYING: wait for all osd to be up (32 retries left).", "FAILED - RETRYING: wait for all osd to be up (31 retries left).", "FAILED - RETRYING: wait for all osd to be up (30 retries left).", "FAILED - RETRYING: wait for all osd to be up (29 retries left).", "FAILED - RETRYING: wait for all osd to be up (28 retries left).", "FAILED - RETRYING: wait for all osd to be up (27 retries left).", "FAILED - RETRYING: wait for all osd to be up (26 retries left).", "FAILED - RETRYING: wait for all osd to be up (25 retries left).", "FAILED - RETRYING: wait for all osd to be up (24 retries left).", "FAILED - RETRYING: wait for all osd to be up (23 retries left).", "FAILED - RETRYING: wait for all osd to be up (22 retries left).", "FAILED - RETRYING: wait for all osd to be up (21 retries left).", "FAILED - RETRYING: wait for all osd to be up (20 retries left).", "FAILED - RETRYING: wait for all osd to be up (19 retries left).", "FAILED - RETRYING: wait for all osd to be up (18 retries left).", "FAILED - RETRYING: wait for all osd to be up (17 retries left).", "FAILED - RETRYING: wait for all osd to be up (16 retries left).", "FAILED - RETRYING: wait for all osd to be up (15 retries left).", "FAILED - RETRYING: wait for all osd to be up (14 retries left).", "FAILED - RETRYING: wait for all osd to be up (13 retries left).", "FAILED - RETRYING: wait for all osd to be up (12 retries left).", "FAILED - RETRYING: wait for all osd to be up (11 retries left).", "FAILED - RETRYING: wait for all osd to be up (10 retries left).", "FAILED - RETRYING: wait for all osd to be up (9 retries left).", "FAILED - RETRYING: wait for all osd to be up (8 retries left).", "FAILED - RETRYING: wait for all osd to be up (7 retries left).", "FAILED - RETRYING: wait for all osd to be up (6 retries left).", "FAILED - RETRYING: wait for all osd to be up (5 retries left).", "FAILED - RETRYING: wait for all osd to be up (4 retries left).", "FAILED - RETRYING: wait for all osd to be up (3 retries left).", "FAILED - RETRYING: wait for all osd to be up (2 retries left).", "FAILED - RETRYING: wait for all osd to be up (1 retries left).", "fatal: [ceph-2]: FAILED! => changed=false ", " attempts: 60", " test \"$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)[\"osdmap\"][\"osdmap\"][\"num_osds\"])')\" -gt 0 && test \"$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)[\"osdmap\"][\"osdmap\"][\"num_osds\"])')\" = \"$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)[\"osdmap\"][\"osdmap\"][\"num_up_osds\"])')\"", " delta: '0:00:01.748742'", " end: '2019-05-10 18:39:48.793998'", " rc: 1", " start: '2019-05-10 18:39:47.045256'", "NO MORE HOSTS LEFT *************************************************************", "PLAY RECAP *********************************************************************", "ceph-0 : ok=102 changed=10 unreachable=0 failed=0 skipped=184 rescued=0 ignored=0 ", "ceph-1 : ok=100 changed=10 unreachable=0 failed=0 skipped=179 rescued=0 ignored=0 ", "ceph-2 : ok=101 changed=10 unreachable=0 failed=1 skipped=178 rescued=0 ignored=0 ", "compute-0 : ok=31 changed=0 unreachable=0 failed=0 skipped=86 rescued=0 ignored=0 ", "controller-0 : ok=189 changed=22 unreachable=0 failed=0 skipped=307 rescued=0 ignored=0 ", parameter_defaults: ContainerImagePrepare: - push_destination: true set: ceph_image: rhceph-4.0-rhel8 ceph_namespace: docker-registry.upshift.redhat.com/ceph ceph_tag: latest tar logs - https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/df/view/deployment/job/DFG-df-deployment-15-virthost-1cont_1comp_3ceph-no_UC_SSL-no_OC_SSL-ceph-ipv6-vlan-RHELOSP-31817/3/artifact/
(In reply to Artem Hrechanychenko from comment #8) > Reproduced: Artem, We need to be careful here. It is very easy to reproduce this error for other reasons than the root cause of this bug; e.g. unclean disks or more OSDs than you have time to bring up. The new docker-registry.upshift.redhat.com/ceph/rhceph-4.0-rhel8:latest container does have a python command which is the root cause of this bug (unfixed versions only had a python3 command). Feel free to launch the container and verify that you have a python command directly. If you reproduce the issue and keep the system running, then please ping me and I will help you debug it on that live system. I don't doubt that you are seeing the issue you reported in #8. I just don't think THIS bug is the root cause since I see it has the necessary binary. Let's figure out why you're running into the issue you reported and go from there. Please ping me after you reproduce and keep the system running. John
Agreed with John because this doesn't seem to be the same issue. The original issue was related to the python command not present in the rhceph 4 container (python3 only). In your situation the python command is executed in on the host > podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import ....' Only 'ceph --cluster ceph -s -f json' is executed on the container, the rest of the pipe on the host.
Confirmed that Artem had a DIFFERENT issue realated to IPv6. More details in https://bugzilla.redhat.com/show_bug.cgi?id=1710319. The fix for THIS bug (not related to IPv6 but related to python in the Ceph container) is ready to be tested with: docker-registry.upshift.redhat.com/ceph/rhceph-4.0-rhel8:latest
Verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0313