Bug 2073480

Summary: [ceph-ansible] cephadm-adopt playbook fails on TASK [install cephadm] on dedicated OSD nodes during upgrade
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Gaurav Sitlani <gsitlani>
Component: Ceph-AnsibleAssignee: Guillaume Abrioux <gabrioux>
Status: CLOSED ERRATA QA Contact: Sayalee <saraut>
Severity: medium Docs Contact: Akash Raj <akraj>
Priority: unspecified    
Version: 5.0CC: akraj, aschoen, ceph-eng-bugs, gabrioux, gmeno, nthomas, saraut, sourdas, tserlin, vereddy, vumrao, ykaul
Target Milestone: ---   
Target Release: 5.1z2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-ansible-6.0.25.7-1.el8cp Doc Type: Bug Fix
Doc Text:
.Adoption playbook can now install `cephadm` on OSD nodes Previously, due to the tools repository being disabled on OSD nodes, you could not install `cephadm` OSD nodes resulting in the failure of the adoption playbook. With this fix, the tools repository is enabled on OSD nodes and the adoption playbook can now install `cephadm` on OSD nodes.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-06-30 20:54:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2099589    

Description Gaurav Sitlani 2022-04-08 15:07:56 UTC
Description of problem:

The following failure is observed while upgrading RHCS to 5.0z4 release on dedicated OSD nodes :

2022-04-05 12:09:54,181 p=3484089 u=root n=ansible | TASK [install cephadm] *****************************************************************************************************************************************************************
*****************************************************
2022-04-05 12:09:54,181 p=3484089 u=root n=ansible | Tuesday 05 April 2022  12:09:54 +0000 (0:00:13.865)       0:01:11.768 ********* 
2022-04-05 12:09:59,761 p=3484089 u=root n=ansible | changed: [m1.test.com]
2022-04-05 12:10:00,239 p=3484089 u=root n=ansible | changed: [m2.test.com]
2022-04-05 12:10:00,262 p=3484089 u=root n=ansible | changed: [m3.test.com]
2022-04-05 12:10:00,469 p=3484089 u=root n=ansible | changed: [r1.test.com]
2022-04-05 12:10:00,503 p=3484089 u=root n=ansible | changed: [r2.test.com]
2022-04-05 12:10:00,514 p=3484089 u=root n=ansible | changed: [r3.test.com]
2022-04-05 12:10:00,665 p=3484089 u=root n=ansible | changed: [r4.test.com]
2022-04-05 12:10:00,750 p=3484089 u=root n=ansible | changed: [r5.test.com]
2022-04-05 12:10:38,459 p=3484089 u=root n=ansible | fatal: [o1.test.com]: FAILED! => changed=false 
  attempts: 3
  failures:
  - No package cephadm available.
  msg: Failed to install some of the specified packages
  rc: 1
  results: []
2022-04-05 12:10:38,510 p=3484089 u=root n=ansible | fatal: [o2.test.com]: FAILED! => changed=false 
  attempts: 3
  failures:
  - No package cephadm available.
  msg: Failed to install some of the specified packages
  rc: 1
  results: []
2022-04-05 12:10:39,601 p=3484089 u=root n=ansible | fatal: [o3.test.com]: FAILED! => changed=false 
  attempts: 3
  failures:
  - No package cephadm available.
  msg: Failed to install some of the specified packages
  rc: 1
  results: []
2022-04-05 12:10:39,602 p=3484089 u=root n=ansible | NO MORE HOSTS LEFT **************************************************************************************************************************************************************************************************************************
2022-04-05 12:10:39,602 p=3484089 u=root n=ansible | PLAY RECAP **********************************************************************************************************************************************************************************************************************************
2022-04-05 12:10:39,602 p=3484089 u=root n=ansible | m1.test.com : ok=14   changed=3    unreachable=0    failed=0    skipped=19   rescued=0    ignored=0   
2022-04-05 12:10:39,603 p=3484089 u=root n=ansible | m2.test.com : ok=12   changed=3    unreachable=0    failed=0    skipped=19   rescued=0    ignored=0   
2022-04-05 12:10:39,603 p=3484089 u=root n=ansible | m3.test.com : ok=12   changed=3    unreachable=0    failed=0    skipped=19   rescued=0    ignored=0   
2022-04-05 12:10:39,603 p=3484089 u=root n=ansible | r1.test.com : ok=12   changed=3    unreachable=0    failed=0    skipped=19   rescued=0    ignored=0   
2022-04-05 12:10:39,603 p=3484089 u=root n=ansible | r2.test.com : ok=12   changed=3    unreachable=0    failed=0    skipped=19   rescued=0    ignored=0   
2022-04-05 12:10:39,603 p=3484089 u=root n=ansible | r3.test.com : ok=12   changed=3    unreachable=0    failed=0    skipped=19   rescued=0    ignored=0   
2022-04-05 12:10:39,603 p=3484089 u=root n=ansible | r4.test.com : ok=12   changed=3    unreachable=0    failed=0    skipped=19   rescued=0    ignored=0   
2022-04-05 12:10:39,603 p=3484089 u=root n=ansible | r5.test.com : ok=12   changed=3    unreachable=0    failed=0    skipped=19   rescued=0    ignored=0   
2022-04-05 12:10:39,603 p=3484089 u=root n=ansible | o1.test.com : ok=10   changed=1    unreachable=0    failed=1    skipped=20   rescued=0    ignored=0   
2022-04-05 12:10:39,603 p=3484089 u=root n=ansible | o2.test.com : ok=10   changed=1    unreachable=0    failed=1    skipped=20   rescued=0    ignored=0   
2022-04-05 12:10:39,603 p=3484089 u=root n=ansible | o3.test.com : ok=10   changed=1    unreachable=0    failed=1    skipped=20   rescued=0    ignored=0   
2022-04-05 12:10:39,603 p=3484089 u=root n=ansible | localhost                  : ok=1    changed=1    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   


Version-Release number of selected component (if applicable):
ceph-ansible-6.0.20.2-1

How reproducible:
Observed in test environment in a multi-site cluster upgrade.

Steps to Reproduce:

1. Have dedicated OSD hosts in the cluster like for example :

# cat hosts 
[mons]
m1.test.com
m2.test.com
m3.test.com

[mgrs]
m1.test.com
m2.test.com
m3.test.com

[osds]
o1.test.com
o2.test.com
o3.test.com

[rgws]
r1.test.com
r2.test.com
r3.test.com
r4.test.com

[grafana-server]
r5.test.com

2. After a successful rolling_update.yml run : ansible-playbook infrastructure-playbooks/cephadm-adopt.yml -i hosts

3. The above failure is noticed because the tools repo is not enabled on the OSD nodes and it fails to install the cephadm package.

Actual results:
The playbook fails on OSD nodes only, because it doesn't enable the tools repository on OSD nodes.

Expected results:
adopt playbook should enable the tools repository on OSD nodes as well.

Additional info:
After the tools repo (rhceph-5-tools-for-rhel-8-x86_64-rpms) is enabled on the OSD nodes, the playbook succeeds.

Comment 17 errata-xmlrpc 2022-06-30 20:54:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:5450