Bug 1569413 - Add support to shrink-osd.yml to shrink OSDs deployed with ceph-volume
Summary: Add support to shrink-osd.yml to shrink OSDs deployed with ceph-volume
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: z1
: 3.3
Assignee: Guillaume Abrioux
QA Contact: Yogesh Mane
Bara Ancincova
URL:
Whiteboard:
: 1564444 1643468 1643927 (view as bug list)
Depends On: 1644847 1728710
Blocks: 1557269 1584264 1629656
TreeView+ depends on / blocked
 
Reported: 2018-04-19 08:59 UTC by Sébastien Han
Modified: 2019-10-22 13:29 UTC (History)
22 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.16-1.el7cp Ubuntu: ceph-ansible_3.2.16-2redhat1
Doc Type: Known Issue
Doc Text:
.The `shrink-osd.yml` playbook currently has no support for removing OSDs created by `ceph-volume` The `shrink-osd.yml` playbook assumes all OSDs are created by the `ceph-disk` utility. Consequently, OSDs deployed by using the `ceph-volume` utility cannot be shrunk. To work around this issue, remove OSDs deployed by using `ceph-volume` manually.
Clone Of:
: 1728710 (view as bug list)
Environment:
Last Closed: 2019-10-22 13:29:00 UTC
Embargoed:


Attachments (Terms of Use)
ceph-volume.log (68.82 KB, text/plain)
2018-10-31 14:33 UTC, Noah Watkins
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 40664 0 None None None 2019-07-03 18:18:56 UTC
Github ceph ceph-ansible pull 3280 0 'None' closed Support OSD removal with ceph-volume 2020-09-08 20:38:20 UTC
Github ceph ceph-ansible pull 3515 0 'None' closed shrink_osd: remove volumes with c-v zap by fsid 2020-09-08 20:38:19 UTC
Github ceph ceph-ansible pull 3695 0 'None' closed shrink-osd: fix lvm zap by osd-fsid 2020-09-08 20:38:19 UTC
Github https://github.com/ceph ceph-ansible pull 3530 0 None None None 2020-09-08 20:38:18 UTC
Red Hat Bugzilla 1644828 0 medium CLOSED ceph-volume zap --destroy should remove LVs completly 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2019:3173 0 None None None 2019-10-22 13:29:20 UTC

Internal Links: 1644828

Description Sébastien Han 2018-04-19 08:59:42 UTC
Description of problem:

The shrink-osd.yml playbook currently has no support for removing OSDs created by ceph-volume. It assumes all OSDs were created using ceph-disk

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Sébastien Han 2018-09-25 15:40:20 UTC
*** Bug 1564444 has been marked as a duplicate of this bug. ***

Comment 4 Sébastien Han 2018-10-26 12:51:59 UTC
*** Bug 1643468 has been marked as a duplicate of this bug. ***

Comment 5 Sébastien Han 2018-10-29 13:37:43 UTC
*** Bug 1643927 has been marked as a duplicate of this bug. ***

Comment 6 Ramakrishnan Periyasamy 2018-10-29 13:44:54 UTC
Adding this info from 1643927 bz to make sure this part is not missed for ceph-volume osd removal support.

After purging the cluster using purge-docker-cluster.yml cluster got purged without any issues but osd(ceph-volume) entries are not cleared properly from the baremetal disks. This issue should be observed in shrink-osd.yml too.

lsblk command output:

[ubuntu@host083 ~]$ lsblk
NAME                                              MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                                 8:0    0 931.5G  0 disk 
└─sda1                                              8:1    0 931.5G  0 part /
sdb                                                 8:16   0 931.5G  0 disk 
└─sdb1                                              8:17   0 931.5G  0 part 
  └─ceph--ee79538f--30c1--4dbb--915c--f7c31a283fdc-osd--data--dd3ebb52--41ff--4dd9--82b9--820b706aa8ca
                                                  253:1    0 931.5G  0 lvm  
sdc                                                 8:32   0 931.5G  0 disk 
└─sdc1                                              8:33   0 931.5G  0 part 
  └─ceph--2a28630c--dbd9--4532--b85f--19022326f5ac-osd--data--5e1d755b--37d2--4674--aab2--f2eee8ffc7d8
                                                  253:0    0 931.5G  0 lvm

Comment 7 Noah Watkins 2018-10-30 23:15:34 UTC
Alfredo,

This case mentioned by Ramakrishnan looks like a case that isn't handled by `ceph-volume zap` -- correct? For instance, say I want to purge osd.3:

NAME                   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                      8:0    0   50G  0 disk 
sdb                      8:16   0   50G  0 disk 
├─test_group-data--lv1 252:1    0   25G  0 lvm  /var/lib/ceph/osd/ceph-2
└─test_group-data--lv2 252:2    0 12.5G  0 lvm  /var/lib/ceph/osd/ceph-3

Then zap the storage for osd.3:

[vagrant@osd2 ~]$ sudo ceph-volume lvm zap --destroy test_group/data-lv2


But the logical volume is still present:


[vagrant@osd2 ~]$ lsblk
NAME                   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                      8:0    0   50G  0 disk
sdb                      8:16   0   50G  0 disk
├─test_group-data--lv1 252:1    0   25G  0 lvm  /var/lib/ceph/osd/ceph-2
└─test_group-data--lv2 252:2    0 12.5G  0 lvm


It would seem consistent to remove the volume like happens for raw devices and --destroy, but I suspect this has been thought through before. Should we add an option for this? Parsing through the JSON output of ceph-volume and deciding on what to do with ansible loops is fairly tedious.

Comment 8 Alfredo Deza 2018-10-30 23:27:57 UTC
`zap --destroy` should destroy the LVs, but the OSD should be stopped before. You didn't share the output/logs, but I am guessing it refused to destroy the LVs because they were in use.

Comment 9 Noah Watkins 2018-10-30 23:33:55 UTC
Here are the logs. I stopped the OSDs first. Not destroying the LVs seems to align with what I read in the ceph-volume docs. But maybe I am I invoking it correctly? BTW, I hope I am using the BZ `needsinfo` feature correctly here to grab your feedback!

[vagrant@osd2 ~]$ sudo ceph-volume lvm zap --destroy test_group/data-lv2   
                                                                                                                                        
--> Unmounting /var/lib/ceph/osd/ceph-3                                                                                                                                                                            
Running command: umount -v /var/lib/ceph/osd/ceph-3                                                                                                                                                                
 stderr: umount: /var/lib/ceph/osd/ceph-3 (/dev/mapper/test_group-data--lv2) unmounted                                                                                                                             
--> Zapping: /dev/test_group/data-lv2                                                                                                                                                                              
Running command: wipefs --all /dev/test_group/data-lv2
 stdout: /dev/test_group/data-lv2: 4 bytes were erased at offset 0x00000000 (xfs): 58 46 53 42
Running command: dd if=/dev/zero of=/dev/test_group/data-lv2 bs=1M count=10
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.00296541 s, 3.5 GB/s
Running command: lvchange --deltag ceph.type=data /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.journal_uuid=NWOSlT-Yvwq-VBn8-tuOr-ZuLW-euhL-3RORhe /dev/test_group/data-lv2                                                                                              
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.osd_id=3 /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.cluster_fsid=81c2cf74-7df3-4a46-a2ff-a0ddef4caf3f /dev/test_group/data-lv2                                                                                                
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.cluster_name=ceph /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.osd_fsid=062f766b-5d69-4c27-be48-e5e26eb5c6cc /dev/test_group/data-lv2                                                                                                    
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.encrypted=0 /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.data_uuid=rFCgPl-3qAc-JoUi-2B1W-iLkB-0qIS-z6Ggvw /dev/test_group/data-lv2                                                                                                 
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.cephx_lockbox_secret= /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.crush_device_class=None /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.data_device=/dev/test_group/data-lv2 /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.vdo=0 /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.journal_device=/dev/journals/journal1 /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
--> Zapping successful for: test_group/data-lv2

[vagrant@osd2 ~]$ lsblk
NAME                   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                      8:0    0   50G  0 disk
sdb                      8:16   0   50G  0 disk
├─test_group-data--lv1 252:1    0   25G  0 lvm  /var/lib/ceph/osd/ceph-2
└─test_group-data--lv2 252:2    0 12.5G  0 lvm

Comment 10 Ramakrishnan Periyasamy 2018-10-31 08:38:54 UTC
As a QE, we would request to fix this bz in 3.2 itself. After every purge we are spending atleast 45mins to 1hr for re-imaging. Eg: ceph-volume, ceph-ansible tests we need to create and destroy the cluster lot of times and it is pain to spend 1hr everytime for re-imaging.

Comment 11 Alfredo Deza 2018-10-31 13:31:35 UTC
Can you include /var/log/ceph/ceph-volume.log ? The terminal output doesn't tell us the whole story

Comment 12 Noah Watkins 2018-10-31 14:33:18 UTC
Created attachment 1499429 [details]
ceph-volume.log

Comment 13 Alfredo Deza 2018-10-31 14:34:48 UTC
@noah the right thing to do here is to zap the device to get rid of the vgs and lvs.

In this case you want:

    ceph-volume lvm zap --destroy /dev/sdb

Comment 14 Noah Watkins 2018-10-31 14:43:25 UTC
@alfredo

Indeed, that does work. But that doesn't seem to handle a case like this:

[vagrant@osd2 ~]$ lsblk
NAME                   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                      8:0    0   50G  0 disk
sdb                      8:16   0   50G  0 disk
├─test_group-data--lv1 252:1    0   25G  0 lvm  /var/lib/ceph/osd/ceph-2
└─test_group-data--lv2 252:2    0 12.5G  0 lvm

where the osd that had been using test_group/data-lv2 is removed, but osd.2 remains. That is, a removal of a single OSD rather than a full purge. The lsblk out I posted here is what ceph-ansible is creating for functional testing.

So it seems like either (1) we should not support that layout (2) ceph-volume should have an option to run `lvremove` in this case or (3) ceph-ansible runs lvremove itself. If I'm not missing something fundamental here, solving this in (2) seems vastly simpler than in ceph-ansible.

Comment 15 Noah Watkins 2018-10-31 22:32:38 UTC
This is now supported in https://github.com/ceph/ceph-ansible/pull/3280 but the issue of removing the lv/vg is pending changes in ceph-volume.

Comment 16 Noah Watkins 2018-11-15 18:48:52 UTC
Merged upstream.

Comment 19 Noah Watkins 2018-11-19 14:18:34 UTC
Hi Harish, what information do you need? I'm on PTO, but the work for this bug is upstream in ceph-ansible. Only the ability to do full removal of logical volumes and partitions is missing, and that handled by work being completed in ceph-volume.

Comment 20 Harish NV Rao 2018-11-19 14:33:16 UTC
(In reply to Noah Watkins from comment #19)
> Hi Harish, what information do you need? I'm on PTO, but the work for this
> bug is upstream in ceph-ansible. Only the ability to do full removal of
> logical volumes and partitions is missing, and that handled by work being
> completed in ceph-volume.

Sorry to bother you. We want to know by when this BZ will be in ON_QA state.
Should I check that with Alfredo?

Comment 22 Alfredo Deza 2018-11-19 16:15:47 UTC
The work to destroy/remove LVs is already done by BZ 1644828 and committed to ceph-3.2-rhel-7 in RHEL dist-git:

http://pkgs.devel.redhat.com/cgit/rpms/ceph/commit/?id=371594a23d4fa8cadb4b462351d5ef201d7d2ab9

There isn't any ceph-volume work pending here for this to work. I'm unsure what needs to happen on the ceph-ansible side. Sebastien might now since Noah is out

Comment 23 Noah Watkins 2018-11-19 20:44:51 UTC
The work done by BZ 1644828 was a blocker for this, but it looks like that is done now according to comment #22, so I removed the blocker status here. I don't think there is anything else to do on this ticket. The ceph-volume work was for making sure the lg/lv is removed, and that should now happen transparently.

Comment 26 Sébastien Han 2018-11-22 15:57:41 UTC
Why did we retarget this for rc?
This was implemented way after the official dev freeze.

Comment 30 Noah Watkins 2019-02-05 00:28:05 UTC
Ken:

The backport for this is https://github.com/ceph/ceph-ansible/pull/3530 and I just updated it to fix the previous conflicts, so it is awaiting review.

Note that the backport of https://github.com/ceph/ceph-ansible/pull/3280 into 3.2 renamed shrink-osd.yml (the ceph-disk based version) to be shrink-osd-ceph-disk.yml and shrink-osd.yml is now the ceph-volume version.

Comment 39 Alfredo Deza 2019-07-03 16:29:17 UTC
There is an issue in ceph-volume preventing this bug from being fixed. In zap.py ceph-volume has incorrectly defined 'block' when it should be 'db', which causes the collection of related devices to skip block.db always.

In addition to that, further enhancement of partition zapping needs to be done to prevent the following:

Running command: /usr/sbin/wipefs --all /dev/sdz2
 stderr: wipefs: error: /dev/sdz2: probing initialization failed: No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1

Comment 51 errata-xmlrpc 2019-10-22 13:29:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3173


Note You need to log in before you can comment on or make changes to this bug.