Bug 2246177

Summary: Build to Build upgrade failed with msg "Error EINVAL: Upgrade aborted - Some host(s) are currently offline: {'label'}"
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Sunil Kumar Nagaraju <sunnagar>
Component: CephadmAssignee: Adam King <adking>
Status: NEW --- QA Contact: Mohit Bisht <mobisht>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0CC: adking, cephqe-warriors, saraut
Target Milestone: ---   
Target Release: 9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sunil Kumar Nagaraju 2023-10-25 16:18:40 UTC
Description of problem:

Build to Build upgrade failed with msg "Error EINVAL: Upgrade aborted - Some host(s) are currently offline: {'label'}"

Upgrade was from 18.2.0-81 to 18.2.0-99.

[ceph: root@cali001 /]# ceph orch upgrade start --image registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-7.0-rhel-9-containers-candidate-27036-20231025004606
Error EINVAL: Upgrade aborted - Some host(s) are currently offline: {'label'}
[ceph: root@cali001 /]# 

But did not notice any host has offline status 

[ceph: root@cali001 /]# ceph orch host ls
HOST     ADDR         LABELS                        STATUS  
cali001  10.8.130.1   _admin,osd,installer,mon,mgr          
cali004  10.8.130.4   mgr,osd,mon,gw                        
cali005  10.8.130.5   osd,mon                               
cali008  10.8.130.8   osd,nvmeof-gw                         
cali010  10.8.130.10  nvmeof-gw,osd                         
5 hosts in cluster

Version-Release number of selected component (if applicable):
[ceph: root@cali001 /]# ceph version 
ceph version 18.2.0-81.el9cp (1ad353f2afde2c897f3d381ae98a362298907a2d) reef (stable)


How reproducible:


Steps to Reproduce:
1. deploy cluster wiht 18.2.0-81 with all services 
2. deploy nvmeof services and run IOS.
3. upgrade to next versio 18.2.0-99 and can notice the above issue mentioned.

Actual results:


Expected results:
Upgrade should start and succedd
 
Additional info: (creads: root/passwd)
# ceph orch host ls
HOST     ADDR         LABELS                        STATUS  
cali001  10.8.130.1   _admin,osd,installer,mon,mgr          
cali004  10.8.130.4   mgr,osd,mon,gw                        
cali005  10.8.130.5   osd,mon                               
cali008  10.8.130.8   osd,nvmeof-gw                         
cali010  10.8.130.10  nvmeof-gw,osd