Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1917543

Summary: [cephadm] 5.0 - Registry based update to the latest build using Ceph orch upgrade start --image <registry url> is throwing an error " Unable to pull the target image"
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Preethi <pnataraj>
Component: CephadmAssignee: Adam King <adking>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: high Docs Contact: Karen Norteman <knortema>
Priority: urgent    
Version: 5.0CC: jolmomar, kdreyer, sewagner, tserlin, vereddy
Target Milestone: ---   
Target Release: 5.0   
Hardware: x86_64   
OS: All   
Whiteboard:
Fixed In Version: ceph-16.2.0-13.el8cp Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-30 08:27:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Preethi 2021-01-18 17:20:37 UTC
I tried the same steps in another bootstrapping node i.e magna061 and noticed same issue

[ceph: root@magna061 /]# ceph status
  cluster:
    id:     a2a63eea-51b8-11eb-889f-002590fbd650
    health: HEALTH_OK
 
  services:
    mon: 5 daemons, quorum magna061,magna064,magna066,magna063,magna065 (age 10d)
    mgr: magna061.codfuh(active, since 10d), standbys: magna063.zohfew
    osd: 17 osds: 17 up (since 6d), 17 in (since 6d)
    rgw: 2 daemons active (myrealm1.myzone1.magna063.jpljqe, myrealm1.myzone1.magna064.rribai)
 
  data:
    pools:   6 pools, 137 pgs
    objects: 394 objects, 38 KiB
    usage:   2.2 GiB used, 15 TiB / 15 TiB avail
    pgs:     137 active+clean
 
  io:
    client:   1.7 KiB/s rd, 3 op/s rd, 0 op/s wr
 
[ceph: root@magna061 /]# ceph orch upgrade start --image registry.redhat.io
Initiating upgrade to registry.redhat.io
[ceph: root@magna061 /]# ceph orch upgrade status
{
    "target_image": "registry.redhat.io",
    "in_progress": true,
    "services_complete": [],
    "message": ""
}
[ceph: root@magna061 /]# ceph status
  cluster:
    id:     a2a63eea-51b8-11eb-889f-002590fbd650
    health: HEALTH_OK
 
  services:
    mon: 5 daemons, quorum magna061,magna064,magna066,magna063,magna065 (age 10d)
    mgr: magna061.codfuh(active, since 10d), standbys: magna063.zohfew
    osd: 17 osds: 17 up (since 6d), 17 in (since 6d)
    rgw: 2 daemons active (myrealm1.myzone1.magna063.jpljqe, myrealm1.myzone1.magna064.rribai)
 
  data:
    pools:   6 pools, 137 pgs
    objects: 394 objects, 38 KiB
    usage:   2.2 GiB used, 15 TiB / 15 TiB avail
    pgs:     137 active+clean
 
  io:
    client:   8.9 KiB/s rd, 17 op/s rd, 0 op/s wr
 
  progress:
 
[ceph: root@magna061 /]# ceph status
  cluster:
    id:     a2a63eea-51b8-11eb-889f-002590fbd650
    health: HEALTH_WARN
            Upgrade: failed to pull target image
 
  services:
    mon: 5 daemons, quorum magna061,magna064,magna066,magna063,magna065 (age 10d)
    mgr: magna061.codfuh(active, since 10d), standbys: magna063.zohfew
    osd: 17 osds: 17 up (since 6d), 17 in (since 6d)
    rgw: 2 daemons active (myrealm1.myzone1.magna063.jpljqe, myrealm1.myzone1.magna064.rribai)
 
  data:
    pools:   6 pools, 137 pgs
    objects: 394 objects, 38 KiB
    usage:   2.2 GiB used, 15 TiB / 15 TiB avail
    pgs:     137 active+clean
 
  io:
    client:   2.0 KiB/s rd, 3 op/s rd, 0 op/s wr
 
  progress:
 
[ceph: root@magna061 /]# ceph orch upgrade status
{
    "target_image": "registry.redhat.io",
    "in_progress": true,
    "services_complete": [],
    "message": "Error: UPGRADE_FAILED_PULL: Upgrade: failed to pull target image"
}

Comment 2 Adam King 2021-02-17 22:20:46 UTC
related upstream tracker: https://tracker.ceph.com/issues/48695

Comment 5 Adam King 2021-02-23 13:46:19 UTC
Unfortunately, I don't think "ceph -s" or "ceph orch upgrade status" shows any completion message after the update has already finished and is currently only useful for upgrade information while the upgrade is still in progress. The best way, currently, to tell if an update completed successfully after "ceph orch upgrade status" reports "in_progess" to be "false" is to run "ceph orch ls" and look at the image name for all ceph daemons (daemons of type 'mgr', 'mon', 'crash', 'osd', 'mds', 'rgw' and 'rbd-mirror'). For example, upgrading from the 15.2.8 cluster to the downstream image, using "ceph orch ls" before upgrade would give me (not all ceph daemon type services have image name "docker.io/amk3798/ceph:15.2.8"):

[ceph: root@vm-00 /]# ceph orch ls
NAME                       RUNNING  REFRESHED  AGE  PLACEMENT  IMAGE NAME                            IMAGE ID      
alertmanager                   1/1  1s ago     5m   count:1    docker.io/prom/alertmanager:v0.20.0   0881eb8f169f  
crash                          3/3  2s ago     5m   *          docker.io/amk3798/ceph:15.2.8         3c03696dbf74  
grafana                        1/1  1s ago     5m   count:1    docker.io/ceph/ceph-grafana:6.6.2     a0dce381714a  
mgr                            2/2  2s ago     5m   count:2    docker.io/amk3798/ceph:15.2.8         3c03696dbf74  
mon                            3/5  2s ago     5m   count:5    docker.io/amk3798/ceph:15.2.8         3c03696dbf74  
node-exporter                  3/3  2s ago     5m   *          docker.io/prom/node-exporter:v0.18.1  e5a616e4b9cf  
osd.all-available-devices      6/6  2s ago     4m   *          docker.io/amk3798/ceph:15.2.8         3c03696dbf74  
prometheus                     1/1  1s ago     5m   count:1    docker.io/prom/prometheus:v2.18.1     de242295e225  



in the middle of the upgrade I might see something like this. Notice, some of the ceph daemons still have the old image name, mgr has the new one (mgr is upgraded first) and mon service has image name "mixed" which means some of the mon daemons have been upgraded and some have not:


[ceph: root@vm-00 /]# ceph orch ls
NAME                       RUNNING  REFRESHED  AGE  PLACEMENT  IMAGE NAME                                             IMAGE ID      
alertmanager                   1/1  37s ago    9m   count:1    docker.io/prom/alertmanager:v0.20.0                    0881eb8f169f  
crash                          3/3  38s ago    10m  *          docker.io/amk3798/ceph:15.2.8                          3c03696dbf74  
grafana                        1/1  37s ago    9m   count:1    docker.io/ceph/ceph-grafana:6.6.2                      a0dce381714a  
mgr                            2/2  38s ago    10m  count:2    registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest  c88a5d60f510  
mon                            3/5  38s ago    10m  count:5    mix                                                    mix           
node-exporter                  3/3  38s ago    9m   *          docker.io/prom/node-exporter:v0.18.1                   e5a616e4b9cf  
osd.all-available-devices      6/6  38s ago    9m   *          docker.io/amk3798/ceph:15.2.8                          3c03696dbf74  
prometheus                     1/1  37s ago    10m  count:1    docker.io/prom/prometheus:v2.18.1                      de242295e225  



Finally, after the upgrade is fully complete, I can see the image name for all ceph daemon type services is the new, upgraded image, telling me all ceph daemons have been upgraded and the upgrade can be considered complete.


[ceph: root@vm-00 /]# ceph orch ls
NAME                       RUNNING  REFRESHED  AGE  PLACEMENT  IMAGE NAME                                             IMAGE ID      
alertmanager                   1/1  4m ago     17m  count:1    docker.io/prom/alertmanager:v0.20.0                    0881eb8f169f  
crash                          3/3  4m ago     17m  *          registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest  c88a5d60f510  
grafana                        1/1  4m ago     17m  count:1    docker.io/ceph/ceph-grafana:6.6.2                      a0dce381714a  
mgr                            2/2  4m ago     17m  count:2    registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest  c88a5d60f510  
mon                            3/5  4m ago     17m  count:5    registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest  c88a5d60f510  
node-exporter                  3/3  4m ago     17m  *          docker.io/prom/node-exporter:v0.18.1                   e5a616e4b9cf  
osd.all-available-devices      6/6  4m ago     16m  *          registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest  c88a5d60f510  
prometheus                     1/1  4m ago     17m  count:1    docker.io/prom/prometheus:v2.18.1                      de242295e225  


Can you share the "ceph orch ls" output on clusters where you've done this upgrade? This will give us a good idea if the daemons have all been upgraded successfully and if not, which ones failed to do so.

Comment 6 Adam King 2021-02-23 16:23:47 UTC
To adjust my previous comment, it's better to look at the "IMAGE ID" field than the "IMAGE NAME" field. Right now, sometimes the same image can temporarily show up with two different names: one with the expected name and one using the digest (e.g. "docker.io/amk3798/ceph:testing" and "docker.io/amk3798/ceph@sha256:2bd0cd0945534f321737f1d5959af199c05656d80d1fd4303bb55966876ab387" could actually be the same image. For that reason, the "IMAGE ID" field is actually a more consistent way to check. Simply find the image id for the image you're upgrading to and check all services for ceph daemons have that image id in the output of "ceph orch ls"

Comment 7 Preethi 2021-02-24 09:16:44 UTC
@Adam, Below snippet of ceph orch ls.It was same like before performing the build update. Since, you have tried update commands from upstream to downstream you can see the changes in the ceph orch ls Image name. However, we do not see any changes before and after upgrade unless we have any version changes we have for promethues/node exporter etc. I would suggest we should strongly have progress status when we perform build updates. Let me know if i can create an RFE BZ on the same. We had this progress status in the earlier builds. 

[root@ceph-adm7 ~]# sudo cephadm shell
Inferring fsid 58149bf2-66ac-11eb-84bf-001a4a000262
Inferring config /var/lib/ceph/58149bf2-66ac-11eb-84bf-001a4a000262/mon.ceph-adm7/config
Using recent ceph image registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest
[ceph: root@ceph-adm7 /]# ceph orch ls
NAME                       RUNNING  REFRESHED  AGE  PLACEMENT                      IMAGE NAME                                                       IMAGE ID      
alertmanager                   1/1  118s ago   2w   count:1                        registry.redhat.io/openshift4/ose-prometheus-alertmanager:v4.5   b7bae610cd46  
crash                          3/3  2m ago     2w   *                              registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest            c88a5d60f510  
grafana                        1/1  118s ago   2w   count:1                        registry.redhat.io/rhceph-alpha/rhceph-5-dashboard-rhel8:latest  bd3d7748747b  
mgr                            3/2  2m ago     7d   <unmanaged>                    registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest            c88a5d60f510  
mon                            3/3  2m ago     2w   ceph-adm7;ceph-adm8;ceph-adm9  registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest            c88a5d60f510  
node-exporter                  1/3  2m ago     2w   *                              registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5  mix           
osd.all-available-devices    13/13  2m ago     12d  *                              registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest            c88a5d60f510  
prometheus                     1/1  118s ago   2w   count:1                        registry.redhat.io/openshift4/ose-prometheus:v4.6                bebb0ddef7f0  
[ceph: root@ceph-adm7 /]# history

NOTE: We need to wait for new alpha release for this to test again

Comment 8 Adam King 2021-02-24 13:43:06 UTC
@Preethi, I'm a bit confused, it looks like the image for crash, mgr, mon and osd here are "registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest" which is the image you were trying to upgrade to originally in the command "ceph orch upgrade start --image registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest" and they have the same image id as in my orch ls output from a few comments ago (c88a5d60f510). It looks like the daemons were successfully upgraded?

Comment 9 Preethi 2021-02-24 14:51:21 UTC
@Adam, We do not have latest build in registry.redhat.io to perform build update. Was performing same build to build update as we were noticing "faile dto pull target image" issue as per the BZ. To ensure build to latest build is working with this command, We may need to wait until we have latest build pushed to alpha path. I will verify the BZ once we have the latest image and update you.

Comment 10 Preethi 2021-04-26 10:22:39 UTC
@Adam, Issue is not seen. Hence, moving to verified.

Below snippet of the output with the workaround



[ceph: root@magna007 /]# ceph orch ps | grep mgr
mgr.magna007.wpgvme         magna007  running (6h)   11s ago    3w   *:9283         16.1.0-1325-geb5d7a86  0a963d7074de  585d98f2cc16  
mgr.magna010.syndxo         magna010  running (29s)  13s ago    3w   *:8443 *:9283  16.2.0-13.el8cp        89a188512eee  53aae976ffc6  
[ceph: root@magna007 /]# ceph orch daemon redeploy mgr.magna007.wpgvme --image registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-99506-20210424023822
Scheduled to redeploy mgr.magna007.wpgvme on host 'magna007'
[ceph: root@magna007 /]# ceph mgr fail
[ceph: root@magna007 /]# ceph -s
  cluster:
    id:     802d6a00-9277-11eb-aa4f-002590fc2538
    health: HEALTH_OK
 
  services:
    mon:        3 daemons, quorum magna007,magna010,magna104 (age 6h)
    mgr:        magna007.wpgvme(active, since 7s), standbys: magna010.syndxo
    osd:        15 osds: 15 up (since 6h), 15 in (since 2w)
    rbd-mirror: 1 daemon active (1 hosts)
    rgw:        5 daemons active (3 hosts, 1 zones)
 
  data:
    pools:   7 pools, 169 pgs
    objects: 372 objects, 24 KiB
    usage:   9.5 GiB used, 14 TiB / 14 TiB avail
    pgs:     169 active+clean

 
[ceph: root@magna007 /]# ceph orch ps | grep mgr
mgr.magna007.wpgvme         magna007  running (19s)  0s ago     3w   *:8443 *:9283  16.2.0-13.el8cp        89a188512eee  a287245f4789  
mgr.magna010.syndxo         magna010  running (6m)   3s ago     3w   *:8443 *:9283  16.2.0-13.el8cp        89a188512eee  53aae976ffc6  
[ceph: root@magna007 /]# ceph orch 

[ceph: root@magna007 /]# ceph orch upgrade start --image registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-99506-20210424023822
Initiating upgrade to registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-5.0-rhel-8-containers-candidate-99506-20210424023822
[ceph: root@magna007 /]# ceph orch upgrade status
{
    "target_image": "registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:55206326df77ef04991a3d4a59621f9dfcff5a8e68c151febc3d5e0e1cfd79e8",
    "in_progress": true,
    "services_complete": [
        "mgr"
    ],
    "progress": "2/41 ceph daemons upgraded",
    "message": ""
}
[ceph: root@magna007 /]#

Comment 13 errata-xmlrpc 2021-08-30 08:27:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3294