Bug 1690093 - python command not in rhel8-based rhcs4 container image (only python3)
Summary: python command not in rhel8-based rhcs4 container image (only python3)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Container
Version: 4.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 4.0
Assignee: Dimitri Savineau
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks: 1594251
TreeView+ depends on / blocked
 
Reported: 2019-03-18 18:59 UTC by John Fulton
Modified: 2020-01-31 14:45 UTC (History)
6 users (show)

Fixed In Version: rhceph-4.0-rhel8:latest
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-31 14:44:57 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-container pull 1328 0 'None' closed daemon: Add support for python3 2020-07-08 07:09:31 UTC
Red Hat Product Errata RHBA-2020:0313 0 None None None 2020-01-31 14:45:14 UTC

Description John Fulton 2019-03-18 18:59:18 UTC
While using test builds of ceph-ansible 4 and the rhel8-based rhcs4 container image, a command is run with python [1]. That command fails because "python" is not a working command within the container, though python3 is. Symlinking python to python3 in /usr/bin works around the issue [2], but the container should be updated to handle this scenario (e.g. maybe here [3]?). This problem presents itself from ceph-ansible like [4]. 

[1]
[root@overcloud-computehci-0 ~]# podman exec -ti ceph-osd-0 bash                                                                                                                                                                                                              
bash-4.4# 
bash-4.4# cat /opt/ceph-container/bin/osd_volume_activate.sh
#!/bin/bash
set -e

function osd_volume_activate {
  : "${OSD_ID:?Give me an OSD ID to activate, eg: -e OSD_ID=0}"

  CEPH_VOLUME_LIST_JSON="$(ceph-volume lvm list --format json)"

  if ! echo "$CEPH_VOLUME_LIST_JSON" | python -c "import sys, json; print(json.load(sys.stdin)[\"$OSD_ID\"])" &> /dev/null; then
    log "OSD id $OSD_ID does not exist"
    exit 1
  fi


[2] Reproduce locally and then workaround:
bash-4.4# ceph-volume lvm list --format json | python -c "import sys, json; print(json.load(sys.stdin)[\"0\"])"
bash: python: command not found
bash-4.4#

bash-4.4# ln -s /usr/bin/python3 /usr/bin/python
bash-4.4# ceph-volume lvm list --format json | python -c "import sys, json; print(json.load(sys.stdin)[\"0\"])"
[{'devices': ... <snip>


[3] https://github.com/ceph/ceph-container/tree/master/ceph-releases/ALL/rhel8/daemon

[4] 
2019-03-15 22:29:44,241 p=328317 u=root |  fatal: [overcloud-computehci-1 -> 192.168.24.12]: FAILED! => changed=false
  attempts: 30
  cmd: test "$(podman exec ceph-mon-overcloud-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_osds"])')" -gt 0 && test "$(podman exec ceph-mon-overcloud-controller-0 ceph --cluster ceph -s -f
json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_osds"])')" = "$(podman exec ceph-mon-overcloud-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_up_osds"]
)')"
  delta: '0:00:01.101414'
  end: '2019-03-15 22:29:44.218315'
  invocation:
    module_args:
      _raw_params: test "$(podman exec ceph-mon-overcloud-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_osds"])')" -gt 0 && test "$(podman exec ceph-mon-overcloud-controller-0 ceph --cluster
 ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_osds"])')" = "$(podman exec ceph-mon-overcloud-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)["osdmap"]["osdmap"]["num_up_osds"])')"
      _uses_shell: true
      argv: null
      chdir: null
      creates: null
      executable: null
      removes: null
      stdin: null
      warn: true
  msg: non-zero return code
  rc: 1
  start: '2019-03-15 22:29:43.116901'
  stderr: ''
  stderr_lines: []
  stdout: ''
  stdout_lines: <omitted>
2019-03-15 22:29:44,241 p=328317 u=root |  NO MORE HOSTS LEFT *************************************************************
2019-03-15 22:29:44,242 p=328317 u=root |  PLAY RECAP *********************************************************************
2019-03-15 22:29:44,242 p=328317 u=root |  overcloud-computehci-0     : ok=90   changed=10   unreachable=0    failed=0
2019-03-15 22:29:44,242 p=328317 u=root |  overcloud-computehci-1     : ok=89   changed=10   unreachable=0    failed=1
2019-03-15 22:29:44,242 p=328317 u=root |  overcloud-controller-0     : ok=182  changed=11   unreachable=0    failed=0
2019-03-15 22:29:44,242 p=328317 u=root |  overcloud-controller-1     : ok=170  changed=9    unreachable=0    failed=0
2019-03-15 22:29:44,242 p=328317 u=root |  overcloud-controller-2     : ok=172  changed=11   unreachable=0    failed=0

Comment 2 John Fulton 2019-03-18 19:20:04 UTC
You might also see this error when debugging locally like this:

[root@overcloud-computehci-0 ~]# ./ceph-osd-run.sh 0
2019-03-15 23:10:16  /opt/ceph-container/bin/entrypoint.sh: OSD id 0 does not exist
[root@overcloud-computehci-0 ~]#

Comment 3 Dimitri Savineau 2019-03-19 15:55:35 UTC
> You might also see this error when debugging locally like this:
> 
> [root@overcloud-computehci-0 ~]# ./ceph-osd-run.sh 0
> 2019-03-15 23:10:16  /opt/ceph-container/bin/entrypoint.sh: OSD id 0 does not exist

In containerized deployment you need to use device name not OSD id

Comment 4 Dimitri Savineau 2019-03-19 20:21:35 UTC
> In containerized deployment you need to use device name not OSD id

Nevermind I forgot that rhcs 4 is based on nautilus so it's ceph-volume only (the statement was only true for ceph-disk deployment with container)

Comment 8 Artem Hrechanychenko 2019-05-13 09:02:44 UTC
Reproduced:

TASK [ceph-osd : wait for all osd to be up] ************************************",
        "task path: /usr/share/ceph-ansible/roles/ceph-osd/tasks/openstack_config.yml:2",
        "Friday 10 May 2019  18:27:47 +0000 (0:00:00.314)       0:04:05.481 ************ ",
        "FAILED - RETRYING: wait for all osd to be up (60 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (59 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (58 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (57 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (56 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (55 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (54 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (53 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (52 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (51 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (50 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (49 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (48 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (47 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (46 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (45 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (44 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (43 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (42 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (41 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (40 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (39 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (38 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (37 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (36 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (35 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (34 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (33 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (32 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (31 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (30 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (29 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (28 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (27 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (26 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (25 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (24 retries left).",
 "FAILED - RETRYING: wait for all osd to be up (23 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (22 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (21 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (20 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (19 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (18 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (17 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (16 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (15 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (14 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (13 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (12 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (11 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (10 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (9 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (8 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (7 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (6 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (5 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (4 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (3 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (2 retries left).",
        "FAILED - RETRYING: wait for all osd to be up (1 retries left).",
        "fatal: [ceph-2]: FAILED! => changed=false ",
        "  attempts: 60",
        "    test \"$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)[\"osdmap\"][\"osdmap\"][\"num_osds\"])')\" -gt 0 && test \"$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)[\"osdmap\"][\"osdmap\"][\"num_osds\"])')\" = \"$(podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import sys, json; print(json.load(sys.stdin)[\"osdmap\"][\"osdmap\"][\"num_up_osds\"])')\"",
        "  delta: '0:00:01.748742'",
        "  end: '2019-05-10 18:39:48.793998'",
        "  rc: 1",
        "  start: '2019-05-10 18:39:47.045256'",
        "NO MORE HOSTS LEFT *************************************************************",
        "PLAY RECAP *********************************************************************",
        "ceph-0                     : ok=102  changed=10   unreachable=0    failed=0    skipped=184  rescued=0    ignored=0   ",
        "ceph-1                     : ok=100  changed=10   unreachable=0    failed=0    skipped=179  rescued=0    ignored=0   ",
        "ceph-2                     : ok=101  changed=10   unreachable=0    failed=1    skipped=178  rescued=0    ignored=0   ",
        "compute-0                  : ok=31   changed=0    unreachable=0    failed=0    skipped=86   rescued=0    ignored=0   ",
        "controller-0               : ok=189  changed=22   unreachable=0    failed=0    skipped=307  rescued=0    ignored=0   ",

parameter_defaults:
  ContainerImagePrepare:
  - push_destination: true
    set:
      ceph_image: rhceph-4.0-rhel8
      ceph_namespace: docker-registry.upshift.redhat.com/ceph
      ceph_tag: latest


tar logs - https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/df/view/deployment/job/DFG-df-deployment-15-virthost-1cont_1comp_3ceph-no_UC_SSL-no_OC_SSL-ceph-ipv6-vlan-RHELOSP-31817/3/artifact/

Comment 9 John Fulton 2019-05-13 13:29:02 UTC
(In reply to Artem Hrechanychenko from comment #8)
> Reproduced:

Artem,

We need to be careful here. It is very easy to reproduce this error for other reasons than the root cause of this bug; e.g. unclean disks or more OSDs than you have time to bring up. 

The new docker-registry.upshift.redhat.com/ceph/rhceph-4.0-rhel8:latest container does have a python command which is the root cause of this bug (unfixed versions only had a python3 command). Feel free to launch the container and verify that you have a python command directly. 

If you reproduce the issue and keep the system running, then please ping me and I will help you debug it on that live system. I don't doubt that you are seeing the issue you reported in #8. I just don't think THIS bug is the root cause since I see it has the necessary binary. Let's figure out why you're running into the issue you reported and go from there. Please ping me after you reproduce and keep the system running.

  John

Comment 10 Dimitri Savineau 2019-05-13 13:48:50 UTC
Agreed with John because this doesn't seem to be the same issue.

The original issue was related to the python command not present in the rhceph 4 container (python3 only).

In your situation the python command is executed in on the host

> podman exec ceph-mon-controller-0 ceph --cluster ceph -s -f json | python -c 'import ....'

Only 'ceph --cluster ceph -s -f json' is executed on the container, the rest of the pipe on the host.

Comment 11 John Fulton 2019-05-21 15:03:13 UTC
Confirmed that Artem had a DIFFERENT issue realated to IPv6. More details in https://bugzilla.redhat.com/show_bug.cgi?id=1710319. 

The fix for THIS bug (not related to IPv6 but related to python in the Ceph container) is ready to be tested with:

 docker-registry.upshift.redhat.com/ceph/rhceph-4.0-rhel8:latest

Comment 12 Yogev Rabl 2019-06-04 13:24:27 UTC
Verified

Comment 14 errata-xmlrpc 2020-01-31 14:44:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0313


Note You need to log in before you can comment on or make changes to this bug.