1875777 – Filestore to Bluestore migration skipped if osd_objectstore is not set to "filestore"

Bug 1875777 - Filestore to Bluestore migration skipped if osd_objectstore is not set to "filestore"

Summary: Filestore to Bluestore migration skipped if osd_objectstore is not set to "fi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Ansible
Sub Component:
Version:	4.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.2z1
Assignee:	Guillaume Abrioux
QA Contact:	Ameena Suhani S H
Docs Contact:	Karen Norteman
URL:
Whiteboard:
Duplicates (1):	1902153 (view as bug list)
Depends On:
Blocks:	1760354 1880316 1890121
TreeView+	depends on / blocked

Reported:	2020-09-04 11:01 UTC by Francesco Pantano
Modified:	2024-03-25 16:25 UTC (History)
CC List:	26 users (show)
Fixed In Version:	ceph-ansible-4.0.44-1.el8cp, ceph-ansible-4.0.44-1.el7cp
Doc Type:	Bug Fix
Doc Text:	.The FileStore to BlueStore migration process can fail for OSD nodes that have a mix of FileStore OSDs and BlueStore OSDs Previously, if deployments running {storage-product} versions earlier than 3.2 never had `osd_objectstore` explicitly set in either `group_vars`, `host_vars`, or `inventory`, the deployment had FileStore OSDs. FileStore was the default prior to {storage-product} 3.2. After upgrading the deployed storage cluster to {storage-product} 3.2, new OSDs added to an existing OSD node would use the BlueStore backend because it became the new default. This resulted in a mix of FileStore and BlueStore OSDs on the same node. In some specific cases, a FileStore OSD might share a journal or DB device with a BlueStore OSD. In such cases, redeploying all the OSDs causes `ceph-volume` errors, either because partitions cannot be passed in `lvm batch` or because of the GPT header. With this release, there are two options for migrating OSDs with a mix of FileStore and BlueStore configurations: * Set the extra variable `force_filestore_to_bluestore` to `true` when running the `filestore-to-bluestore.yml` playbook. This setting forces the playbook to automatically migrate all OSDs, even those that already use BlueStore. * Run the `filestore-to-bluestore.yml` playbook without setting `force_filestore_to_bluestore` (the default is `false`). This causes the playbook to automatically skip the migration on nodes where there is a mix of FileStore and BlueStore OSDs. It will migrate the nodes that have only FileStore OSDs. At the end of the playbook execution, a report displays to show which nodes were skipped. Before upgrading from {storage-product} 3 to 4, manually examine each node that has been skipped in order to determine the best method for migrating the OSDs.
Clone Of:
Environment:
Last Closed:	2021-04-28 20:12:31 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	ceph ceph-ansible pull 6143	0	None	closed	[skip ci] fs2bs: skip migration when a mix of fs and bs is detected	2021-02-15 20:22:44 UTC
Red Hat Product Errata	RHSA-2021:1452	0	None	None	None	2021-04-28 20:13:08 UTC

Description Francesco Pantano 2020-09-04 11:01:07 UTC

Description of problem:

When Ceph is upgraded to Ceph 4 the Filestore to Bluestore playbook should be triggered to migrate all the OSDs previously deployed using filestore.
Looking at [1], if osd_objectore: "filestore" wasn't explicitly set, all the tasks are skipped and the migration never happens.
There are use cases where the cluster was deployed using the default values (with Ceph 3), that changed over the time and this couldn't work for all use cases.

Instead of relying on osd_objectstore parameter, isn't better to inspect the OSDs metadata to see if it should be migrated on bluestore?

[1] https://github.com/ceph/ceph-ansible/blob/master/infrastructure-playbooks/filestore-to-bluestore.yml 




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 11 Francesco Pantano 2020-12-02 09:02:09 UTC

*** Bug 1902153 has been marked as a duplicate of this bug. ***

Comment 13 John Fulton 2021-01-04 14:44:52 UTC

*** Bug 1911669 has been marked as a duplicate of this bug. ***

Comment 32 John Fulton 2021-01-14 13:41:14 UTC

1911669 will be solved by 1875777 so it qualifies as a duplicate.
By way of 1911669 the release notes in 1733577 were called into question.
However, provided you have the patch from 1886175, then the release note is accurate.
I suppose in theory we could have also closed 1911669 as a duplicate of 1886175 (and made this ceph-ansible bug less noisy, sorry guits)

Comment 33 Ravi Singh 2021-01-20 06:29:37 UTC

(In reply to John Fulton from comment #32)
> 1911669 will be solved by 1875777 so it qualifies as a duplicate.
> By way of 1911669 the release notes in 1733577 were called into question.
> However, provided you have the patch from 1886175, then the release note is
> accurate.
> I suppose in theory we could have also closed 1911669 as a duplicate of
> 1886175 (and made this ceph-ansible bug less noisy, sorry guits)

Indeed as we agree 1911669 is not duplicate of this bug so we will be discussing this again on 1911669.

Comment 34 Giulio Fidente 2021-01-21 09:34:16 UTC

(In reply to Ravi Singh from comment #33)
> Indeed as we agree 1911669 is not duplicate of this bug so we will be
> discussing this again on 1911669.

to clarify the situation; two changes are needed in tripleo, tracked by [1] and [2] and these will both ship with the z4 update

to be able to complete successfully the migration, a fix for ceph-ansible is also needed, tracked by [3]

this bug is meant to solve a problem which *does not* block migration but makes it impossible to *restart* the automated process in case of failures and also makes the migration process longer having to update the Heat stack twice

1. https://bugzilla.redhat.com/show_bug.cgi?id=1886175
2. https://bugzilla.redhat.com/show_bug.cgi?id=1895756
3. https://bugzilla.redhat.com/show_bug.cgi?id=1918327

Comment 35 Giulio Fidente 2021-01-21 16:21:05 UTC

(In reply to Giulio Fidente from comment #34)
> (In reply to Ravi Singh from comment #33)
> > Indeed as we agree 1911669 is not duplicate of this bug so we will be
> > discussing this again on 1911669.
> 
> to clarify the situation; two changes are needed in tripleo, tracked by [1]
> and [2] and these will both ship with the z4 update
> 
> to be able to complete successfully the migration, a fix for ceph-ansible is
> also needed, tracked by [3]
> 
> this bug is meant to solve a problem which *does not* block migration but
> makes it impossible to *restart* the automated process in case of failures
> and also makes the migration process longer having to update the Heat stack
> twice
> 
> 1. https://bugzilla.redhat.com/show_bug.cgi?id=1886175
> 2. https://bugzilla.redhat.com/show_bug.cgi?id=1895756
> 3. https://bugzilla.redhat.com/show_bug.cgi?id=1918327

as discussed with Francesco, Guillaume and Dimitri, if we can fix BZ#1875777 in z1, then we don't need the fix for BZ#1918327 and this would be our preferred approach

Comment 44 Ameena Suhani S H 2021-02-16 04:35:24 UTC

Verified using
ceph-ansible-4.0.46-1.el7cp.noarch
ceph-base-14.2.11-121.el7cp.x86_64

Comment 50 errata-xmlrpc 2021-04-28 20:12:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage security, bug fix, and enhancement Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1452

Note You need to log in before you can comment on or make changes to this bug.

agunn
alfrgarc
anharris
aschoen
assingh
ceph-eng-bugs
csharpe
dsavinea
dwojewod
gabrioux
gfidente
gmeno
gsitlani
jbiao
johfulto
knortema
lithomas
mhackett
mmuench
nthomas
pasik
ravsingh
tkajinam
tserlin
vashastr
ykaul