Description of problem: ceph orch upgrade was stuck without starting in a cluster where one of the node had stale mount and ceph-volume inventory list was getting stuck. The affected node was rebooted. upgrade started but cluster ended up in >>>> health: HEALTH_ERR Upgrade: failed due to an unexpected exception Unexpected exception occurred during upgrade process: Failed to connect to <hostname with P>. Please make sure that the host is reachable and accepts connections using the cephadm SSH key >>>> Orchestrator had tried upgrading all daemons (crash and osds) on that node, but cluster status had not changed from HEALTH_ERR due to reach-ability to that node. (OSDs were down could be due to different reason) Tried >> ceph orch upgrade start <same-old-image> Cluster state got refreshed and it was HEALTH_OK Version-Release number of selected component (if applicable): 16.2.10-82.el8cp How reproducible: Tried once. Steps to Reproduce: <Explained above> Actual results: Scope for improvement in cluster upgrade failure/handling report Could be a stale error report Expected results: Cluster status to get updated based on actual progress even with hosts offline.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3623