Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2071035

Summary: [ceph-ansible] shrink-osd MUST NOT zap partitions/disks without confirming they are for the expected OSD
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Michael J. Kidd <linuxkidd>
Component: Ceph-AnsibleAssignee: Teoman ONAY <tonay>
Status: CLOSED ERRATA QA Contact: Manisha Saini <msaini>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.2CC: aschoen, ceph-eng-bugs, gabrioux, gjose, gmeno, mmuench, msaini, nthomas, tserlin, vereddy, ykaul
Target Milestone: ---   
Target Release: 4.3z1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-ansible-4.0.70.10-1.el8cp, ceph-ansible-4.0.70.10-1.el7cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-22 11:21:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael J. Kidd 2022-04-01 16:21:24 UTC
Description of problem:
- When using shrink-osd.yml on ceph-disk deployed, 'ceph-volume simple scan' taken over node, the playbook blindly zaps disks / partitions referenced in /etc/ceph/osd/*.json, even though enumeration has changed the physical disk to device path mapping.

Version-Release number of selected component (if applicable):
RHCS 4.2z4

How reproducible:
100%

Steps to Reproduce:
1. On node with 'ceph-disk' deployed partitioned OSDs, run 'ceph-volume simple scan'
2. Simulate enumeration change by modifying any one of /etc/ceph/osd/*.json files to indicate the OS disk's partitions for a given OSD
3. Run 'shrink-osd.yml' for the OSD which had the json modified.

Actual results:
OS partition(s) / LV(s) wiped or attempted to be wiped.

Expected results:
- The ceph-ansible code MUST abort, or skip the non-ceph related disk
- The ceph-ansible code MUST skip disks where the UUID does not match the targeted OSD.
- The ceph-ansible code should use the UUID to identify the proper disk and confirm it is an OSD partition before zapping.


Additional info:
- Environment was being converted from Partition to LVM deployed bluestore OSDs.
- In this instance, the 'shrink-osd.yml' playbook wiped the /boot partition and an un-used swap LV from the OS disk.
- In addition, since the enumeration order had changed for multiple physical disks, the underlying partitions for non-targeted OSD ids were destroyed, instead of those which now align with the targeted OSD id.

Comment 14 Ameena Suhani S H 2022-07-13 14:47:22 UTC
The shrink playbook fails with below error

$ rpm -qa|grep ansi
ansible-2.9.27-1.el7ae.noarch
ceph-ansible-4.0.70.9-1.el7cp.noarch


$ ansible-playbook -vvvv -e ireallymeanit=yes infrastructure-playbooks/shrink-osd.yml -e osd_to_kill=0 -i hosts
ansible-playbook 2.9.27
  config file = /usr/share/ceph-ansible/ansible.cfg
  configured module search path = [u'/usr/share/ceph-ansible/library']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible-playbook
  python version = 2.7.5 (default, May 27 2022, 11:27:32) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
Using /usr/share/ceph-ansible/ansible.cfg as config file
setting up inventory plugins
host_list declined parsing /usr/share/ceph-ansible/hosts as it did not pass its verify_file() method
script declined parsing /usr/share/ceph-ansible/hosts as it did not pass its verify_file() method
auto declined parsing /usr/share/ceph-ansible/hosts as it did not pass its verify_file() method
Parsed /usr/share/ceph-ansible/hosts inventory source with ini plugin
ERROR! couldn't resolve module/action 'ceph_volume_simple_scan'. This often indicates a misspelling, missing collection, or incorrect module path.

The error appears to be in '/usr/share/ceph-ansible/infrastructure-playbooks/shrink-osd.yml': line 125, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


    - name: refresh /etc/ceph/osd files non containerized_deployment
      ^ here

Comment 24 errata-xmlrpc 2022-09-22 11:21:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 4.3 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6684