1937239 – [cephadm] 5.0 - Ceph orch upgrade start --image option accepts invalid image names and status shows In progress.

Bug 1937239 - [cephadm] 5.0 - Ceph orch upgrade start --image option accepts invalid image names and status shows In progress.

Summary: [cephadm] 5.0 - Ceph orch upgrade start --image option accepts invalid image ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Cephadm
Sub Component:
Version:	5.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	5.0
Assignee:	Adam King
QA Contact:	Vasishta
Docs Contact:	Karen Norteman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-10 08:39 UTC by Preethi
Modified:	2021-08-30 08:28 UTC (History)
CC List:	3 users (show)
Fixed In Version:	ceph-16.1.0-1084.el8cp
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-08-30 08:28:49 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHCEPH-1049	0	None	None	None	2021-08-27 05:14:53 UTC
Red Hat Product Errata	RHBA-2021:3294	0	None	None	None	2021-08-30 08:28:57 UTC

Description Preethi 2021-03-10 08:39:46 UTC

Description of problem:[cephadm] 5.0 - Ceph orch upgrade start --image option accepts invalid image names and status shows In progress.

[cephadm] 5.0 - Ceph orch upgrade start --image option accepts invalid image names and status shows In progress.
Version-Release number of selected component (if applicable):

[cephuser@ceph-sunil1adm-1614692246522-node1-mon-mgr-installer-node-expor ~]$ sudo cephadm shell
Inferring fsid f64f341c-655d-11eb-8778-fa163e914bcc
Inferring config /var/lib/ceph/f64f341c-655d-11eb-8778-fa163e914bcc/mon.ceph-sunil1adm-1614692246522-node1-mon-mgr-installer-node-expor/config
Using recent ceph image registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155


How reproducible:


Steps to Reproduce:
1. Deploy 5.0 cluster 
2. Enter to cephadm shell
3. Perform build updates using ceph orch upgrade start --image
4. Pass invalid image/wrong input and check the behaviour




Actual results:

[ceph: root@ceph-sunil1adm-1614692246522-node1-mon-mgr-installer-node-expor /]# ceph orch upgrade start --image 11111111111111111111111112222aaaaaaaaaaaaaaa
Initiating upgrade to 11111111111111111111111112222aaaaaaaaaaaaaaa
[ceph: root@ceph-sunil1adm-1614692246522-node1-mon-mgr-installer-node-expor /]# ceph orch upgrade status
{
    "target_image": "11111111111111111111111112222aaaaaaaaaaaaaaa",
    "in_progress": true,
    "services_complete": [],
    "message": ""
}

Expected results:
Invalid inputs should not be accepted instead we should see an error/warning

Additional info:
10.0.210.149 cephuser/cephuser

Comment 1 Adam King 2021-03-11 21:15:50 UTC

Did it stay in this state or was it only for a split second? I tried this while using a downstream image and the upgrade was marked as failed in under 30 seconds.


[ceph: root@vm-00 /]# ceph orch ps
NAME                 HOST   STATUS          REFRESHED  AGE   VERSION           IMAGE NAME                                                                                                              IMAGE ID      CONTAINER ID  
alertmanager.vm-00   vm-00  running (86s)   9s ago     5m    0.20.0            registry.redhat.io/openshift4/ose-prometheus-alertmanager:v4.5                                                          32979bd08f6f  38e587128adb  
crash.vm-00          vm-00  running (5m)    9s ago     5m    16.1.0-486.el8cp  registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest                                                                   6f642d99fe72  4bd6489a0d15  
crash.vm-01          vm-01  running (2m)    10s ago    2m    16.1.0-486.el8cp  registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155  6f642d99fe72  1c9a13ae00bf  
crash.vm-02          vm-02  running (2m)    10s ago    2m    16.1.0-486.el8cp  registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155  6f642d99fe72  54263ffe9379  
grafana.vm-00        vm-00  running (81s)   9s ago     4m    6.7.4             registry.redhat.io/rhceph-alpha/rhceph-5-dashboard-rhel8:latest                                                         ea002a20207d  af4214ec962d  
mgr.vm-00.anypjj     vm-00  running (6m)    9s ago     6m    16.1.0-486.el8cp  registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest                                                                   6f642d99fe72  31651d816772  
mon.vm-00            vm-00  running (6m)    9s ago     6m    16.1.0-486.el8cp  registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest                                                                   6f642d99fe72  66db0d641169  
node-exporter.vm-00  vm-00  running (4m)    9s ago     4m    0.18.1            registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5                                                         e4be1e64c76a  17a61f43a634  
node-exporter.vm-01  vm-01  running (2m)    10s ago    2m    0.18.1            registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5                                                         e4be1e64c76a  df380b4d66f9  
node-exporter.vm-02  vm-02  running (2m)    10s ago    2m    0.18.1            registry.redhat.io/openshift4/ose-prometheus-node-exporter:v4.5                                                         e4be1e64c76a  ae74c570749a  
osd.0                vm-01  running (112s)  10s ago    112s  16.1.0-486.el8cp  registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155  6f642d99fe72  91491e7857fb  
osd.1                vm-00  running (110s)  9s ago     109s  16.1.0-486.el8cp  registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest                                                                   6f642d99fe72  bbd780bf894c  
osd.2                vm-02  running (110s)  10s ago    110s  16.1.0-486.el8cp  registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155  6f642d99fe72  f0b2405428b8  
osd.3                vm-01  running (107s)  10s ago    107s  16.1.0-486.el8cp  registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155  6f642d99fe72  de382a104d8b  
osd.4                vm-00  running (104s)  9s ago     104s  16.1.0-486.el8cp  registry.redhat.io/rhceph-alpha/rhceph-5-rhel8:latest                                                                   6f642d99fe72  29e406a4e959  
osd.5                vm-02  running (105s)  10s ago    105s  16.1.0-486.el8cp  registry.redhat.io/rhceph-alpha/rhceph-5-rhel8@sha256:9aaea414e2c263216f3cdcb7a096f57c3adf6125ec9f4b0f5f65fa8c43987155  6f642d99fe72  73c9c76142c5  
prometheus.vm-00     vm-00  running (84s)   9s ago     4m    2.22.2            registry.redhat.io/openshift4/ose-prometheus:v4.6                                                                       aa176108957b  6ab465141483  

[ceph: root@vm-00 /]# ceph version
ceph version 16.1.0-486.el8cp (f9701a56b7b8182352532afba8db2bf394c8585a) pacific (rc)

[ceph: root@vm-00 /]# ceph orch upgrade start --image 1111111111111aaaaaaaa
Initiating upgrade to 1111111111111aaaaaaaa

[ceph: root@vm-00 /]# ceph orch upgrade status
{
    "target_image": "1111111111111aaaaaaaa",
    "in_progress": true,
    "services_complete": [],
    "message": "Error: UPGRADE_FAILED_PULL: Upgrade: failed to pull target image"
}
[ceph: root@vm-00 /]# 


This is basically how I expected it to fail when a garbage image name is given. Verifying the image name with a regular expression or something is very difficult because the image name may or may not use the full url or the tag. Look at what docker does in this situation:

bash-5.0$ docker pull 111111111aaaaaaaaa
Using default tag: latest
Trying to pull repository docker.io/library/111111111aaaaaaaaa ... 
Trying to pull repository registry.fedoraproject.org/111111111aaaaaaaaa ... 
Trying to pull repository registry.access.redhat.com/111111111aaaaaaaaa ... 
Trying to pull repository registry.centos.org/111111111aaaaaaaaa ... 
Trying to pull repository quay.io/111111111aaaaaaaaa ... 
Trying to pull repository docker.io/library/111111111aaaaaaaaa ... 
repository docker.io/111111111aaaaaaaaa not found: does not exist or no pull access


Since it's so difficult to tell if the image is valid until we attempt to pull it, failing on pull is basically the best option we have.

Comment 2 Preethi 2021-03-15 10:16:33 UTC

@Adam, VM which i saw the issue was destroyed Hence, Verified this issue in the fresh cluster.  Ceph orch status says :Fail to pull target image" and this is expected behavior. I was seeing "In progress" state hence logged the BZ in the cluster which i mentioned in the bug. The change is was using 3rd March build in the older cluster where issue was seen and latest build in the cluster where issue is verified.


[ceph: root@magna057 /]# ceph orch upgrade start --image 1111111111111aaaaaaaa
Initiating upgrade to 1111111111111aaaaaaaa
[ceph: root@magna057 /]# ceph orch upgrade status
{
    "target_image": "1111111111111aaaaaaaa",
    "in_progress": true,
    "services_complete": [],
    "message": "Error: UPGRADE_FAILED_PULL: Upgrade: failed to pull target image"
}

Comment 3 Adam King 2021-03-15 15:30:46 UTC

@Preethi if the upgrade is being properly marked as failed with latest build do we want to move this bug to verified?

Comment 4 Adam King 2021-03-23 16:13:55 UTC

Moving to ON_QA since it seems like it is properly failing the upgrade by saying it failed to pull image. If you see some other behavior where the upgrade isn't marked as failed after a few minutes feel free to change the status back and post any new information.

Comment 5 Preethi 2021-03-25 07:36:24 UTC

Issue is not seen with latest build. Hence, moving this to  verified state.

Comment 8 errata-xmlrpc 2021-08-30 08:28:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.0 bug fix and enhancement), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3294

Note You need to log in before you can comment on or make changes to this bug.