Bug 1690093
Summary: | python command not in rhel8-based rhcs4 container image (only python3) | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | John Fulton <johfulto> |
Component: | Container | Assignee: | Dimitri Savineau <dsavinea> |
Status: | CLOSED ERRATA | QA Contact: | Yogev Rabl <yrabl> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.0 | CC: | ceph-eng-bugs, gabrioux, gfidente, tserlin, vashastr, yrabl |
Target Milestone: | rc | ||
Target Release: | 4.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | rhceph-4.0-rhel8:latest | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-01-31 14:44:57 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1594251 |
Description
John Fulton
2019-03-18 18:59:18 UTC
You might also see this error when debugging locally like this: [root@overcloud-computehci-0 ~]# ./ceph-osd-run.sh 0 2019-03-15 23:10:16 /opt/ceph-container/bin/entrypoint.sh: OSD id 0 does not exist [root@overcloud-computehci-0 ~]# > You might also see this error when debugging locally like this:
>
> [root@overcloud-computehci-0 ~]# ./ceph-osd-run.sh 0
> 2019-03-15 23:10:16 /opt/ceph-container/bin/entrypoint.sh: OSD id 0 does not exist
In containerized deployment you need to use device name not OSD id
> In containerized deployment you need to use device name not OSD id
Nevermind I forgot that rhcs 4 is based on nautilus so it's ceph-volume only (the statement was only true for ceph-disk deployment with container)
Reproduced: TASK [ceph-osd : wait for all osd to be up] ************************************", "task path: /usr/share/ceph-ansible/roles/ceph-osd/tasks/openstack_config.yml:2", "Friday 10 May 2019 18:27:47 +0000 (0:00:00.314) 0:04:05.481 ************ ", "FAILED - RETRYING: wait for all osd to be up (60 retries left).", "FAILED - RETRYING: wait for all osd to be up (59 retries left).", "FAILED - RETRYING: wait for all osd to be up (58 retries left).", "FAILED - RETRYING: wait for all osd to be up (57 retries left).", "FAILED - RETRYING: wait for all osd to be up (56 retries left).", "FAILED - RETRYING: wait for all osd to be up (55 retries left).", "FAILED - RETRYING: wait for all osd to be up (54 retries left).", "FAILED - RETRYING: wait for all osd to be up (53 retries left).", "FAILED - RETRYING: wait for all osd to be up (52 retries left).", "FAILED - RETRYING: wait for all osd to be up (51 retries left).", "FAILED - RETRYING: wait for all osd to be up (50 retries left).", "FAILED - RETRYING: wait for all osd to be up (49 retries left).", "FAILED - RETRYING: wait for all osd to be up (48 retries left).", "FAILED - RETRYING: wait for all osd to be up (47 retries left).", "FAILED - RETRYING: wait for all osd to be up (46 retries left).", "FAILED - RETRYING: wait for all osd to be up (45 retries left).", "FAILED - RETRYING: wait for all osd to be up (44 retries left).", "FAILED - RETRYING: wait for all osd to be up (43 retries left).", "FAILED - RETRYING: wait for all osd to be up (42 retries left).", "FAILED - RETRYING: wait for all osd to be up (41 retries left).", "FAILED - RETRYING: wait for all osd to be up (40 retries left).", "FAILED - RETRYING: wait for all osd to be up (39 retries left).", "FAILED - RETRYING: wait for all osd to be up (38 retries left).", "FAILED - RETRYING: wait for all osd to be up (37 retries left).", "FAILED - RETRYING: wait for all osd to be up (36 retries left).", "FAILED - RETRYING: wait for all osd to be up (35 retries left).", "FAILED - RETRYING: wait for all osd to be up (34 retries left).", "FAILED - RETRYING: wait for all osd to be up (33 retries left).", "FAILED - RETRYING: wait for all osd to be up (32 retries left).", "FAILED - RETRYING: wait for all osd to be up (31 retries left).", "FAILED - RETRYING: wait for all osd to be up (30 retries left).", "FAILED - RETRYING: wait for all osd to be up (29 retries left).", "FAILED - RETRYING: wait for all osd to be up (28 retries left).", "FAILED - RETRYING: wait for all osd to be up (27 retries left).", "FAILED - RETRYING: wait for all osd to be up (26 retries left).", "FAILED - RETRYING: wait for all osd to be up (25 retries left).", "FAILED - RETRYING: wait for all osd to be up (24 retries left).", "FAILED - RETRYING: wait for all osd to be up (23 retries left).", "FAILED - RETRYING: wait for all osd to be up (22 retries left).", "FAILED - RETRYING: wait for all osd to be up (21 retries left).", "FAILED - RETRYING: wait for all osd to be up (20 retries left).", "FAILED - RETRYING: wait for all osd to be up (19 retries left).", "FAILED - RETRYING: wait for all osd to be up (18 retries left).", "FAILED - RETRYING: wait for all osd to be up (17 retries left).", "FAILED - RETRYING: wait for all osd to be up (16 retries left).", "FAILED - RETRYING: wait for all osd to be up (15 retries left).", "FAILED - RETRYING: wait for all osd to be up (14 retries left).", "FAILED - RETRYING: wait for all osd to be up (13 retries left).", "FAILED - RETRYING: wait for all osd to be up (12 retries left).", "FAILED - RETRYING: wait for all osd to be up (11 retries left).", "FAILED - RETRYING: wait for all osd to be up (10 retries left).", "FAILED - RETRYING: wait for all osd to be up (9 retries left).", "FAILED - RETRYING: wait for all osd to be up (8 retries left).", "FAILED - RETRYING: wait for all osd to be up (7 retries left).", "FAILED - RETRYING: wait for all osd to be up (6 retries left).", "FAILED - RETRYING: wait for all osd to be up (5 retries left).", "FAILED - RETRYING: wait for all osd to be up (4 retries left).", "FAILED - RETRYING: wait for all osd to be up (3 retries left).", "FAILED - RETRYING: wait for all osd to be up (2 retries left).", "FAILED - RETRYING: wait for all osd to be up (1 retries left).", "fatal: [ceph-2]: FAILED! => changed=false ", " attempts: 60", " test \"$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)[\"osdmap\"][\"osdmap\"][\"num_osds\"])')\" -gt 0 && test \"$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)[\"osdmap\"][\"osdmap\"][\"num_osds\"])')\" = \"$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)[\"osdmap\"][\"osdmap\"][\"num_up_osds\"])')\"", " delta: '0:00:01.748742'", " end: '2019-05-10 18:39:48.793998'", " rc: 1", " start: '2019-05-10 18:39:47.045256'", "NO MORE HOSTS LEFT *************************************************************", "PLAY RECAP *********************************************************************", "ceph-0 : ok=102 changed=10 unreachable=0 failed=0 skipped=184 rescued=0 ignored=0 ", "ceph-1 : ok=100 changed=10 unreachable=0 failed=0 skipped=179 rescued=0 ignored=0 ", "ceph-2 : ok=101 changed=10 unreachable=0 failed=1 skipped=178 rescued=0 ignored=0 ", "compute-0 : ok=31 changed=0 unreachable=0 failed=0 skipped=86 rescued=0 ignored=0 ", "controller-0 : ok=189 changed=22 unreachable=0 failed=0 skipped=307 rescued=0 ignored=0 ", parameter_defaults: ContainerImagePrepare: - push_destination: true set: ceph_image: rhceph-4.0-rhel8 ceph_namespace: docker-registry.upshift.redhat.com/ceph ceph_tag: latest tar logs - https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/df/view/deployment/job/DFG-df-deployment-15-virthost-1cont_1comp_3ceph-no_UC_SSL-no_OC_SSL-ceph-ipv6-vlan-RHELOSP-31817/3/artifact/ (In reply to Artem Hrechanychenko from comment #8) > Reproduced: Artem, We need to be careful here. It is very easy to reproduce this error for other reasons than the root cause of this bug; e.g. unclean disks or more OSDs than you have time to bring up. The new docker-registry.upshift.redhat.com/ceph/rhceph-4.0-rhel8:latest container does have a python command which is the root cause of this bug (unfixed versions only had a python3 command). Feel free to launch the container and verify that you have a python command directly. If you reproduce the issue and keep the system running, then please ping me and I will help you debug it on that live system. I don't doubt that you are seeing the issue you reported in #8. I just don't think THIS bug is the root cause since I see it has the necessary binary. Let's figure out why you're running into the issue you reported and go from there. Please ping me after you reproduce and keep the system running. John Agreed with John because this doesn't seem to be the same issue.
The original issue was related to the python command not present in the rhceph 4 container (python3 only).
In your situation the python command is executed in on the host
> podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import ....'
Only 'ceph --cluster ceph -s -f json' is executed on the container, the rest of the pipe on the host.
Confirmed that Artem had a DIFFERENT issue realated to IPv6. More details in https://bugzilla.redhat.com/show_bug.cgi?id=1710319. The fix for THIS bug (not related to IPv6 but related to python in the Ceph container) is ready to be tested with: docker-registry.upshift.redhat.com/ceph/rhceph-4.0-rhel8:latest Verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0313 |