Description of problem:[cephadm] 5.0 - Ceph orch upgrade start --image option accepts invalid image names and status shows In progress. [cephadm] 5.0 - Ceph orch upgrade start --image option accepts invalid image names and status shows In progress. Version-Release number of selected component (if applicable): [cephuser@ceph-sunil1adm-1614692246522-node1-mon-mgr-installer-node-expor ~]$ sudo cephadm shell Inferring fsid f64f341c-655d-11eb-8778-fa163e914bcc Inferring config /var/lib/ceph/f64f341c-655d-11eb-8778-fa163e914bcc/mon.ceph-sunil1adm-1614692246522-node1-mon-mgr-installer-node-expor/config Using recent ceph image registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155 How reproducible: Steps to Reproduce: 1. Deploy 5.0 cluster 2. Enter to cephadm shell 3. Perform build updates using ceph orch upgrade start --image 4. Pass invalid image/wrong input and check the behaviour Actual results: [ceph: root@ceph-sunil1adm-1614692246522-node1-mon-mgr-installer-node-expor /]# ceph orch upgrade start --image 11111111111111111111111112222aaaaaaaaaaaaaaa Initiating upgrade to 11111111111111111111111112222aaaaaaaaaaaaaaa [ceph: root@ceph-sunil1adm-1614692246522-node1-mon-mgr-installer-node-expor /]# ceph orch upgrade status { "target_image": "11111111111111111111111112222aaaaaaaaaaaaaaa", "in_progress": true, "services_complete": [], "message": "" } Expected results: Invalid inputs should not be accepted instead we should see an error/warning Additional info: 10.0.210.149 cephuser/cephuser
Did it stay in this state or was it only for a split second? I tried this while using a downstream image and the upgrade was marked as failed in under 30 seconds. [ceph: root@vm-00 /]# ceph orch ps NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID alertmanager.vm-00 vm-00 running (86s) 9s ago 5m 0.20.0 registry.redhat.io/openshift4/ose-prometheus-alertmanager:v4.5 32979bd08f6f 38e587128adb crash.vm-00 vm-00 running (5m) 9s ago 5m 16.1.0-486.el8cp registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest 6f642d99fe72 4bd6489a0d15 crash.vm-01 vm-01 running (2m) 10s ago 2m 16.1.0-486.el8cp registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155 6f642d99fe72 1c9a13ae00bf crash.vm-02 vm-02 running (2m) 10s ago 2m 16.1.0-486.el8cp registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155 6f642d99fe72 54263ffe9379 grafana.vm-00 vm-00 running (81s) 9s ago 4m 6.7.4 registry.redhat.io/rhceph-alpha/rhceph-5-dashboard-rhel8:latest ea002a20207d af4214ec962d mgr.vm-00.anypjj vm-00 running (6m) 9s ago 6m 16.1.0-486.el8cp registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest 6f642d99fe72 31651d816772 mon.vm-00 vm-00 running (6m) 9s ago 6m 16.1.0-486.el8cp registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest 6f642d99fe72 66db0d641169 node-exporter.vm-00 vm-00 running (4m) 9s ago 4m 0.18.1 registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5 e4be1e64c76a 17a61f43a634 node-exporter.vm-01 vm-01 running (2m) 10s ago 2m 0.18.1 registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5 e4be1e64c76a df380b4d66f9 node-exporter.vm-02 vm-02 running (2m) 10s ago 2m 0.18.1 registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5 e4be1e64c76a ae74c570749a osd.0 vm-01 running (112s) 10s ago 112s 16.1.0-486.el8cp registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155 6f642d99fe72 91491e7857fb osd.1 vm-00 running (110s) 9s ago 109s 16.1.0-486.el8cp registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest 6f642d99fe72 bbd780bf894c osd.2 vm-02 running (110s) 10s ago 110s 16.1.0-486.el8cp registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155 6f642d99fe72 f0b2405428b8 osd.3 vm-01 running (107s) 10s ago 107s 16.1.0-486.el8cp registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155 6f642d99fe72 de382a104d8b osd.4 vm-00 running (104s) 9s ago 104s 16.1.0-486.el8cp registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest 6f642d99fe72 29e406a4e959 osd.5 vm-02 running (105s) 10s ago 105s 16.1.0-486.el8cp registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155 6f642d99fe72 73c9c76142c5 prometheus.vm-00 vm-00 running (84s) 9s ago 4m 2.22.2 registry.redhat.io/openshift4/ose-prometheus:v4.6 aa176108957b 6ab465141483 [ceph: root@vm-00 /]# ceph version ceph version 16.1.0-486.el8cp (f9701a56b7b8182352532afba8db2bf394c8585a) pacific (rc) [ceph: root@vm-00 /]# ceph orch upgrade start --image 1111111111111aaaaaaaa Initiating upgrade to 1111111111111aaaaaaaa [ceph: root@vm-00 /]# ceph orch upgrade status { "target_image": "1111111111111aaaaaaaa", "in_progress": true, "services_complete": [], "message": "Error: UPGRADE_FAILED_PULL: Upgrade: failed to pull target image" } [ceph: root@vm-00 /]# This is basically how I expected it to fail when a garbage image name is given. Verifying the image name with a regular expression or something is very difficult because the image name may or may not use the full url or the tag. Look at what docker does in this situation: bash-5.0$ docker pull 111111111aaaaaaaaa Using default tag: latest Trying to pull repository docker.io/library/111111111aaaaaaaaa ... Trying to pull repository registry.fedoraproject.org/111111111aaaaaaaaa ... Trying to pull repository registry.access.redhat.com/111111111aaaaaaaaa ... Trying to pull repository registry.centos.org/111111111aaaaaaaaa ... Trying to pull repository quay.io/111111111aaaaaaaaa ... Trying to pull repository docker.io/library/111111111aaaaaaaaa ... repository docker.io/111111111aaaaaaaaa not found: does not exist or no pull access Since it's so difficult to tell if the image is valid until we attempt to pull it, failing on pull is basically the best option we have.
@Adam, VM which i saw the issue was destroyed Hence, Verified this issue in the fresh cluster. Ceph orch status says :Fail to pull target image" and this is expected behavior. I was seeing "In progress" state hence logged the BZ in the cluster which i mentioned in the bug. The change is was using 3rd March build in the older cluster where issue was seen and latest build in the cluster where issue is verified. [ceph: root@magna057 /]# ceph orch upgrade start --image 1111111111111aaaaaaaa Initiating upgrade to 1111111111111aaaaaaaa [ceph: root@magna057 /]# ceph orch upgrade status { "target_image": "1111111111111aaaaaaaa", "in_progress": true, "services_complete": [], "message": "Error: UPGRADE_FAILED_PULL: Upgrade: failed to pull target image" }
@Preethi if the upgrade is being properly marked as failed with latest build do we want to move this bug to verified?
Moving to ON_QA since it seems like it is properly failing the upgrade by saying it failed to pull image. If you see some other behavior where the upgrade isn't marked as failed after a few minutes feel free to change the status back and post any new information.
Issue is not seen with latest build. Hence, moving this to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3294