Bug 1844496

Summary: [Containerized UPGRADES] Upgrade from 4.0 to 4.1 on RHEL 8 fails due to error on set_fact ceph_osd_image_repodigest_before_pulling
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Mike Hackett <mhackett>
Component: Ceph-AnsibleAssignee: Dimitri Savineau <dsavinea>
Status: CLOSED ERRATA QA Contact: Sunil Angadi <sangadi>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.1CC: aschoen, assingh, ceph-eng-bugs, dsavinea, gabrioux, gmeno, gsitlani, hmoller, hyelloji, jbrier, jelopez, lithomas, nthomas, pcfe, rmahique, sangadi, tpetr, tserlin, vashastr, vereddy, vumrao, ykaul
Target Milestone: z1Keywords: Reopened, Upgrades
Target Release: 4.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.24-1.el8cp, ceph-ansible-4.0.24-1.el7cp Doc Type: Bug Fix
Doc Text:
.Upgrading a containerized cluster from 4.0 to 4.1 on {os-product} 8.1 no longer fails Previously, when upgrading a {storage-product} cluster from 4.0 to 4.1 the upgrade could fail with an error on `set_fact ceph_osd_image_repodigest_before_pulling`. Due to an issue with how the container image tag was updated, `ceph-ansible` could fail. In {storage-product} 4.1z1 `ceph-ansible` has been updated so it no longer fails and upgrading works as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-30 15:05:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1816167    

Description Mike Hackett 2020-06-05 14:28:18 UTC
Description of problem:
When upgrading a containerized (podman) cluster from RHCS 4.0 to RHCS 4.1 on RHEL 8 the upgrade fails when reaching the OSD's with the following error:

TASK [ceph-container-common : set_fact ceph_osd_image_repodigest_before_pulling] *************************************************************************************************************************************************************
Friday 05 June 2020  19:37:35 +0530 (0:00:00.070)       0:08:21.175 *********** 
fatal: [ceph4node1.example.com]: FAILED! => 
  msg: |-
    The task includes an option with an undefined variable. The error was: None has no element 0
  
    The error appears to be in '/usr/share/ceph-ansible/roles/ceph-container-common/tasks/fetch_image.yml': line 137, column 3, but may
    be elsewhere in the file depending on the exact syntax problem.
  
    The offending line appears to be:
  
  
    - name: set_fact ceph_osd_image_repodigest_before_pulling
      ^ here

The monitor containers upgrade successfully, we do not fail until the OSD's are attempted to be upgraded.


Cluster was installed on RHCS 4.0 3 days ago following the Ceph documentation when 4-20 image was latest. Upgrade was attempted following the documentation without diverting, and the upgrade failed at this step.

A customer has encountered this issue as well as a field resource in his home lab, so this is 3rd reproduction so far reported.

Comment 17 errata-xmlrpc 2020-07-20 14:21:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3003