Bug 2295143

Summary: Ceph upgrade fails when running FFU 16.2 to 17.1
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Itzik Brown <itbrown>
Component: Ceph-AnsibleAssignee: Teoman ONAY <tonay>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5.3CC: alink, ceph-eng-bugs, cephqe-warriors, fpantano, gfidente, gmeno, gouthamr, jbadiapa, jcaratza, jelle.hoylaerts.ext, johfulto, jveiraca, kdreyer, ktordeur, lkuchlan, mcaldeir, msaini, rpollack, tonay, tserlin
Target Milestone: ---   
Target Release: 5.3z8   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-ansible-6.0.28.17-1.el8cp Doc Type: Bug Fix
Doc Text:
.The "Update the placement of radosgw hosts" task no longer fails during upgrade Previously, the "Update the placement of radosgw hosts" task would fail during an upgrade from Red Hat Ceph Storage 4 to Red Hat Ceph Storage 5. With this fix, the "Update the placement of radosgw hosts" task completes as expected.
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-02-13 19:22:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2160009    

Description Itzik Brown 2024-07-02 08:54:37 UTC
Description of problem:
FFU fails in the Ceph upgrade stage.
From the ceph-upgrade-run.log

2024-07-02 07:58:20,105 p=554280 u=root n=ansible | TASK [Update the placement of radosgw hosts] ***********************************
2024-07-02 07:58:20,105 p=554280 u=root n=ansible | Tuesday 02 July 2024  07:58:20 +0000 (0:00:00.231)       0:05:29.024 ********** 
2024-07-02 07:58:20,198 p=554280 u=root n=ansible | fatal: [controller-0 -> {{ groups[mon_group_name][0] }}]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: the inline if-expression on line 10 evaluated to false and no else section was defined.\n\nThe error appears to be in '/usr/share/ceph-ansible/infrastructure-playbooks/cephadm-adopt.yml': line 1016, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: Update the placement of radosgw hosts\n      ^ here\n"}
2024-07-02 07:58:20,289 p=554280 u=root n=ansible | fatal: [controller-1 -> {{ groups[mon_group_name][0] }}]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: the inline if-expression on line 10 evaluated to false and no else section was defined.\n\nThe error appears to be in '/usr/share/ceph-ansible/infrastructure-playbooks/cephadm-adopt.yml': line 1016, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: Update the placement of radosgw hosts\n      ^ here\n"}
2024-07-02 07:58:20,316 p=554280 u=root n=ansible | fatal: [controller-2 -> {{ groups[mon_group_name][0] }}]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: the inline if-expression on line 10 evaluated to false and no else section was defined.\n\nThe error appears to be in '/usr/share/ceph-ansible/infrastructure-playbooks/cephadm-adopt.yml': line 1016, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n    - name: Update the placement of radosgw hosts\n      ^ here\n"}
2024-07-02 07:58:20,316 p=554280 u=root n=ansible | NO MORE HOSTS LEFT 

Version-Release number of selected component (if applicable):
RHOS-16.2-RHEL-8-20240612.n.1
RHOS-17.1-RHEL-8-20240701.n.1

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 John Fulton 2024-07-16 13:50:57 UTC
Workaround:

Downgrade ceph-ansible to a version older than ceph-ansible-6.0.28.8-1.el8cp

And use workaround described in comment #0 of https://bugzilla.redhat.com/show_bug.cgi?id=2262133

```
extra_container_args:
- -v
- /etc/pki/ca-trust:/etc/pki/ca-trust:ro
```

Comment 19 John Fulton 2024-08-18 16:02:53 UTC
Manny, 

ceph-ansible-6.0.28.8-1.el8cp has bug 2295143 so use the previous version so you do not hit that bug as a workaround.

The required version per the table (https://access.redhat.com/solutions/2045583) may be the latest, but it seems to have this bug. When we ship an update in the latest version, then we (which will become the required version) that is what will ultimately solved this problem. Until then the "required" version has this bug.

Comment 22 Manny 2024-08-21 00:09:12 UTC
Please see KCS Article #, (https://access.redhat.com/solutions/7083494) for this issue

BR
Manny

Comment 45 errata-xmlrpc 2025-02-13 19:22:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.3 security and bug fix updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2025:1478

Comment 46 John Fulton 2025-02-18 14:43:16 UTC
(In reply to John Fulton from comment #8)
> Workaround:
> 
> Downgrade ceph-ansible to a version older than ceph-ansible-6.0.28.8-1.el8cp
> 
> And use workaround described in comment #0 of
> Partnerhttps://bugzilla.redhat.com/show_bug.cgi?id=2262133
> 
> ```
> extra_container_args:
> - -v
> - /etc/pki/ca-trust:/etc/pki/ca-trust:ro
> ```

It is no longer necessary to downgrade now that ceph-ansible 6.0.28.20-1 has been released which contains a fix for this bug and others. So the recommendation is the usual: use the latest available version.