1569413 – Add support to shrink-osd.yml to shrink OSDs deployed with ceph-volume

Bug 1569413 - Add support to shrink-osd.yml to shrink OSDs deployed with ceph-volume

Summary: Add support to shrink-osd.yml to shrink OSDs deployed with ceph-volume

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	z1
Target Release:	3.3
Assignee:	Guillaume Abrioux
QA Contact:	Yogesh Mane
Docs Contact:	Bara Ancincova
URL:
Whiteboard:
Duplicates (3):	1564444 1643468 1643927 (view as bug list)
Depends On:	1644847 1728710
Blocks:	1557269 1584264 1629656
TreeView+	depends on / blocked

Reported:	2018-04-19 08:59 UTC by Sébastien Han
Modified:	2019-10-22 13:29 UTC (History)
CC List:	22 users (show)
Fixed In Version:	RHEL: ceph-ansible-3.2.16-1.el7cp Ubuntu: ceph-ansible_3.2.16-2redhat1
Doc Type:	Known Issue
Doc Text:	.The `shrink-osd.yml` playbook currently has no support for removing OSDs created by `ceph-volume` The `shrink-osd.yml` playbook assumes all OSDs are created by the `ceph-disk` utility. Consequently, OSDs deployed by using the `ceph-volume` utility cannot be shrunk. To work around this issue, remove OSDs deployed by using `ceph-volume` manually.
Clone Of:
Clones:	1728710 (view as bug list)
Environment:
Last Closed:	2019-10-22 13:29:00 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
ceph-volume.log (68.82 KB, text/plain) 2018-10-31 14:33 UTC, Noah Watkins	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	40664	None	None	None	2019-07-03 18:18:56 UTC
Github	ceph ceph-ansible pull 3280	'None'	closed	Support OSD removal with ceph-volume	2020-09-08 20:38:20 UTC
Github	ceph ceph-ansible pull 3515	'None'	closed	shrink_osd: remove volumes with c-v zap by fsid	2020-09-08 20:38:19 UTC
Github	ceph ceph-ansible pull 3695	'None'	closed	shrink-osd: fix lvm zap by osd-fsid	2020-09-08 20:38:19 UTC
Github	https://github.com/ceph ceph-ansible pull 3530	None	None	None	2020-09-08 20:38:18 UTC
Red Hat Bugzilla	1644828	medium	CLOSED	ceph-volume zap --destroy should remove LVs completly	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHBA-2019:3173	None	None	None	2019-10-22 13:29:20 UTC

Internal Links: 1644828

Description Sébastien Han 2018-04-19 08:59:42 UTC

Description of problem:

The shrink-osd.yml playbook currently has no support for removing OSDs created by ceph-volume. It assumes all OSDs were created using ceph-disk

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Sébastien Han 2018-09-25 15:40:20 UTC

*** Bug 1564444 has been marked as a duplicate of this bug. ***

Comment 4 Sébastien Han 2018-10-26 12:51:59 UTC

*** Bug 1643468 has been marked as a duplicate of this bug. ***

Comment 5 Sébastien Han 2018-10-29 13:37:43 UTC

*** Bug 1643927 has been marked as a duplicate of this bug. ***

Comment 6 Ramakrishnan Periyasamy 2018-10-29 13:44:54 UTC

Adding this info from 1643927 bz to make sure this part is not missed for ceph-volume osd removal support.

After purging the cluster using purge-docker-cluster.yml cluster got purged without any issues but osd(ceph-volume) entries are not cleared properly from the baremetal disks. This issue should be observed in shrink-osd.yml too.

lsblk command output:

[ubuntu@host083 ~]$ lsblk
NAME                                              MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                                 8:0    0 931.5G  0 disk 
└─sda1                                              8:1    0 931.5G  0 part /
sdb                                                 8:16   0 931.5G  0 disk 
└─sdb1                                              8:17   0 931.5G  0 part 
  └─ceph--ee79538f--30c1--4dbb--915c--f7c31a283fdc-osd--data--dd3ebb52--41ff--4dd9--82b9--820b706aa8ca
                                                  253:1    0 931.5G  0 lvm  
sdc                                                 8:32   0 931.5G  0 disk 
└─sdc1                                              8:33   0 931.5G  0 part 
  └─ceph--2a28630c--dbd9--4532--b85f--19022326f5ac-osd--data--5e1d755b--37d2--4674--aab2--f2eee8ffc7d8
                                                  253:0    0 931.5G  0 lvm

Comment 7 Noah Watkins 2018-10-30 23:15:34 UTC

Alfredo,

This case mentioned by Ramakrishnan looks like a case that isn't handled by `ceph-volume zap` -- correct? For instance, say I want to purge osd.3:

NAME                   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                      8:0    0   50G  0 disk 
sdb                      8:16   0   50G  0 disk 
├─test_group-data--lv1 252:1    0   25G  0 lvm  /var/lib/ceph/osd/ceph-2
└─test_group-data--lv2 252:2    0 12.5G  0 lvm  /var/lib/ceph/osd/ceph-3

Then zap the storage for osd.3:

[vagrant@osd2 ~]$ sudo ceph-volume lvm zap --destroy test_group/data-lv2


But the logical volume is still present:


[vagrant@osd2 ~]$ lsblk
NAME                   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                      8:0    0   50G  0 disk
sdb                      8:16   0   50G  0 disk
├─test_group-data--lv1 252:1    0   25G  0 lvm  /var/lib/ceph/osd/ceph-2
└─test_group-data--lv2 252:2    0 12.5G  0 lvm


It would seem consistent to remove the volume like happens for raw devices and --destroy, but I suspect this has been thought through before. Should we add an option for this? Parsing through the JSON output of ceph-volume and deciding on what to do with ansible loops is fairly tedious.

Comment 8 Alfredo Deza 2018-10-30 23:27:57 UTC

`zap --destroy` should destroy the LVs, but the OSD should be stopped before. You didn't share the output/logs, but I am guessing it refused to destroy the LVs because they were in use.

Comment 9 Noah Watkins 2018-10-30 23:33:55 UTC

Here are the logs. I stopped the OSDs first. Not destroying the LVs seems to align with what I read in the ceph-volume docs. But maybe I am I invoking it correctly? BTW, I hope I am using the BZ `needsinfo` feature correctly here to grab your feedback!

[vagrant@osd2 ~]$ sudo ceph-volume lvm zap --destroy test_group/data-lv2   
                                                                                                                                        
--> Unmounting /var/lib/ceph/osd/ceph-3                                                                                                                                                                            
Running command: umount -v /var/lib/ceph/osd/ceph-3                                                                                                                                                                
 stderr: umount: /var/lib/ceph/osd/ceph-3 (/dev/mapper/test_group-data--lv2) unmounted                                                                                                                             
--> Zapping: /dev/test_group/data-lv2                                                                                                                                                                              
Running command: wipefs --all /dev/test_group/data-lv2
 stdout: /dev/test_group/data-lv2: 4 bytes were erased at offset 0x00000000 (xfs): 58 46 53 42
Running command: dd if=/dev/zero of=/dev/test_group/data-lv2 bs=1M count=10
 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB) copied, 0.00296541 s, 3.5 GB/s
Running command: lvchange --deltag ceph.type=data /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.journal_uuid=NWOSlT-Yvwq-VBn8-tuOr-ZuLW-euhL-3RORhe /dev/test_group/data-lv2                                                                                              
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.osd_id=3 /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.cluster_fsid=81c2cf74-7df3-4a46-a2ff-a0ddef4caf3f /dev/test_group/data-lv2                                                                                                
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.cluster_name=ceph /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.osd_fsid=062f766b-5d69-4c27-be48-e5e26eb5c6cc /dev/test_group/data-lv2                                                                                                    
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.encrypted=0 /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.data_uuid=rFCgPl-3qAc-JoUi-2B1W-iLkB-0qIS-z6Ggvw /dev/test_group/data-lv2                                                                                                 
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.cephx_lockbox_secret= /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.crush_device_class=None /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.data_device=/dev/test_group/data-lv2 /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.vdo=0 /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
Running command: lvchange --deltag ceph.journal_device=/dev/journals/journal1 /dev/test_group/data-lv2
 stdout: Logical volume test_group/data-lv2 changed.
--> Zapping successful for: test_group/data-lv2

[vagrant@osd2 ~]$ lsblk
NAME                   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                      8:0    0   50G  0 disk
sdb                      8:16   0   50G  0 disk
├─test_group-data--lv1 252:1    0   25G  0 lvm  /var/lib/ceph/osd/ceph-2
└─test_group-data--lv2 252:2    0 12.5G  0 lvm

Comment 10 Ramakrishnan Periyasamy 2018-10-31 08:38:54 UTC

As a QE, we would request to fix this bz in 3.2 itself. After every purge we are spending atleast 45mins to 1hr for re-imaging. Eg: ceph-volume, ceph-ansible tests we need to create and destroy the cluster lot of times and it is pain to spend 1hr everytime for re-imaging.

Comment 11 Alfredo Deza 2018-10-31 13:31:35 UTC

Can you include /var/log/ceph/ceph-volume.log ? The terminal output doesn't tell us the whole story

Comment 12 Noah Watkins 2018-10-31 14:33:18 UTC

Created attachment 1499429 [details]
ceph-volume.log

Comment 13 Alfredo Deza 2018-10-31 14:34:48 UTC

@noah the right thing to do here is to zap the device to get rid of the vgs and lvs.

In this case you want:

    ceph-volume lvm zap --destroy /dev/sdb

Comment 14 Noah Watkins 2018-10-31 14:43:25 UTC

@alfredo

Indeed, that does work. But that doesn't seem to handle a case like this:

[vagrant@osd2 ~]$ lsblk
NAME                   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                      8:0    0   50G  0 disk
sdb                      8:16   0   50G  0 disk
├─test_group-data--lv1 252:1    0   25G  0 lvm  /var/lib/ceph/osd/ceph-2
└─test_group-data--lv2 252:2    0 12.5G  0 lvm

where the osd that had been using test_group/data-lv2 is removed, but osd.2 remains. That is, a removal of a single OSD rather than a full purge. The lsblk out I posted here is what ceph-ansible is creating for functional testing.

So it seems like either (1) we should not support that layout (2) ceph-volume should have an option to run `lvremove` in this case or (3) ceph-ansible runs lvremove itself. If I'm not missing something fundamental here, solving this in (2) seems vastly simpler than in ceph-ansible.

Comment 15 Noah Watkins 2018-10-31 22:32:38 UTC

This is now supported in https://github.com/ceph/ceph-ansible/pull/3280 but the issue of removing the lv/vg is pending changes in ceph-volume.

Comment 16 Noah Watkins 2018-11-15 18:48:52 UTC

Merged upstream.

Comment 19 Noah Watkins 2018-11-19 14:18:34 UTC

Hi Harish, what information do you need? I'm on PTO, but the work for this bug is upstream in ceph-ansible. Only the ability to do full removal of logical volumes and partitions is missing, and that handled by work being completed in ceph-volume.

Comment 20 Harish NV Rao 2018-11-19 14:33:16 UTC

(In reply to Noah Watkins from comment #19)
> Hi Harish, what information do you need? I'm on PTO, but the work for this
> bug is upstream in ceph-ansible. Only the ability to do full removal of
> logical volumes and partitions is missing, and that handled by work being
> completed in ceph-volume.

Sorry to bother you. We want to know by when this BZ will be in ON_QA state.
Should I check that with Alfredo?

Comment 22 Alfredo Deza 2018-11-19 16:15:47 UTC

The work to destroy/remove LVs is already done by BZ 1644828 and committed to ceph-3.2-rhel-7 in RHEL dist-git:

http://pkgs.devel.redhat.com/cgit/rpms/ceph/commit/?id=371594a23d4fa8cadb4b462351d5ef201d7d2ab9

There isn't any ceph-volume work pending here for this to work. I'm unsure what needs to happen on the ceph-ansible side. Sebastien might now since Noah is out

Comment 23 Noah Watkins 2018-11-19 20:44:51 UTC

The work done by BZ 1644828 was a blocker for this, but it looks like that is done now according to comment #22, so I removed the blocker status here. I don't think there is anything else to do on this ticket. The ceph-volume work was for making sure the lg/lv is removed, and that should now happen transparently.

Comment 26 Sébastien Han 2018-11-22 15:57:41 UTC

Why did we retarget this for rc?
This was implemented way after the official dev freeze.

Comment 30 Noah Watkins 2019-02-05 00:28:05 UTC

Ken:

The backport for this is https://github.com/ceph/ceph-ansible/pull/3530 and I just updated it to fix the previous conflicts, so it is awaiting review.

Note that the backport of https://github.com/ceph/ceph-ansible/pull/3280 into 3.2 renamed shrink-osd.yml (the ceph-disk based version) to be shrink-osd-ceph-disk.yml and shrink-osd.yml is now the ceph-volume version.

Comment 39 Alfredo Deza 2019-07-03 16:29:17 UTC

There is an issue in ceph-volume preventing this bug from being fixed. In zap.py ceph-volume has incorrectly defined 'block' when it should be 'db', which causes the collection of related devices to skip block.db always.

In addition to that, further enhancement of partition zapping needs to be done to prevent the following:

Running command: /usr/sbin/wipefs --all /dev/sdz2
 stderr: wipefs: error: /dev/sdz2: probing initialization failed: No such file or directory
-->  RuntimeError: command returned non-zero exit status: 1

Comment 51 errata-xmlrpc 2019-10-22 13:29:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3173

Note You need to log in before you can comment on or make changes to this bug.

adeza
anharris
aschoen
assingh
ceph-eng-bugs
edonnell
gabrioux
gmeno
hnallurv
jbrier
kdreyer
mmanjuna
nthomas
pasik
rperiyas
seb
shan
tnielsen
tserlin
ukurundw
vashastr
ymane