Bug 1902153

Summary: OSD migration from filestore to bluestore is not invoked properly
Product: Red Hat OpenStack Reporter: Takashi Kajinami <tkajinam>
Component: documentationAssignee: ndeevy <ndeevy>
Status: CLOSED DUPLICATE QA Contact: RHOS Documentation Team <rhos-docs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16.1 (Train)CC: dmacpher, fpantano, ndeevy
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-02 09:02:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Takashi Kajinami 2020-11-27 05:53:47 UTC
Description of problem:

A customer is now testing osd migration from filestore to bluestore following the documentation.
 https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/framework_for_upgrades_13_to_16.1/osd-migration-from-filestore-to-bluestore

However they observe that their osds are still using filrestore after the command to trigger migration completes successfully.

~~~
[stack@undercloud-0 ~]$ openstack overcloud external-upgrade run --tags ceph_fstobs -e ceph_ansible_limit=ceph-0| tee oc-fstobs.log
...
Success
~~~

~~~
[root@controller-0 ~]# podman exec -it ceph-mon-controller-0 sh -c "ceph -f json osd metadata" | jq -c '.[] | select(.hostname == "ceph-0") | ["host", .hostname, "osd_id", .id, "objectstore", .osd_objectstore]'
["host","ceph-0","osd_id",0,"objectstore","filestore"]
["host","ceph-0","osd_id",1,"objectstore","filestore"]
...
~~~

Version-Release number of selected component (if applicable):


ansible-role-tripleo-modify-image-1.2.1-0.20200804085623.1dffa21.el8ost.noarch
ansible-tripleo-ipa-0.2.1-1.20200813093411.3bb3c53.el8ost.noarch
ansible-tripleo-ipsec-9.2.1-0.20200311073016.0c8693c.el8ost.noarch
openstack-tripleo-common-11.4.1-1.20200914165651.el8ost.noarch
openstack-tripleo-common-containers-11.4.1-1.20200914165651.el8ost.noarch
openstack-tripleo-heat-templates-11.3.2-1.20200914170156.el8ost.noarch
openstack-tripleo-image-elements-10.6.2-0.20200528043425.7dc0fa1.el8ost.noarch
openstack-tripleo-puppet-elements-11.2.2-0.20200701163410.432518a.el8ost.noarch
openstack-tripleo-validations-11.3.2-1.20200914170825.el8ost.noarch
puppet-tripleo-11.5.0-1.20200914161840.f716ef5.el8ost.noarch
python3-tripleoclient-12.3.2-1.20200914164928.el8ost.noarch
python3-tripleoclient-heat-installer-12.3.2-1.20200914164928.el8ost.noarch
python3-tripleo-common-11.4.1-1.20200914165651.el8ost.noarch
tripleo-ansible-0.5.1-1.20200914163925.el8ost.noarch

ceph-ansible-4.0.31-1.el8cp.noarch

How reproducible:
Always

Steps to Reproduce:
1. Deploy OSP16.1 + OCS4 with filestore
2. Follow the documentation to migrate osd from filestore to bluestore

Actual results:
OSD keeps using filestore even after successful command execution

Expected results:
OSD uses bluestore after successful command execution

Additional info:

Comment 1 Takashi Kajinami 2020-11-27 05:57:27 UTC
I turned out that filestore-to-bluestore.yaml skipped the steps for migration

/var/lib/mistral/4eacc9bf-622b-43cc-9301-c8c1f6e328b6/ceph-ansible/ceph_ansible_command.log
~~~
Running /var/lib/mistral/4eacc9bf-622b-43cc-9301-c8c1f6e328b6/ceph-ansible/ceph_ansible_command.sh
...
2020-11-27 11:42:08,293 p=143980 u=root n=ansible | ok: [ceph-0] => {"ansible_facts": {"current_objectstore": "bluestore"}, "changed": false}
2020-11-27 11:42:08,343 p=143980 u=root n=ansible | TASK [warn user about osd already using bluestore] *****************************
2020-11-27 11:42:08,343 p=143980 u=root n=ansible | Friday 27 November 2020  11:42:08 +0900 (0:00:00.075)       0:00:05.484 ******* 
2020-11-27 11:42:08,368 p=143980 u=root n=ansible | ok: [ceph-0] => {
    "msg": "WARNING: ceph-0 is already using bluestore. Skipping all tasks."
}
...
~~~

This is because the playbook has a logic to skip migration when osd_objectstore is "bluestore"

https://github.com/ceph/ceph-ansible/tree/stable-4.0/infrastructure-playbooks/filestore-to-bluestore.yml
~~~
- hosts: "{{ osd_group_name }}"
  become: true
  serial: 1
  vars:
    delegate_facts_host: true
  tasks:
    - name: gather and delegate facts
      setup:
      delegate_to: "{{ item }}"
      delegate_facts: True
      with_items: "{{ groups[mon_group_name] }}"
      run_once: true
      when: delegate_facts_host | bool

    - import_role:
        name: ceph-defaults

    - name: set_fact current_objectstore
      set_fact:
        current_objectstore: '{{ osd_objectstore }}'

    - name: warn user about osd already using bluestore
      debug:
        msg: 'WARNING: {{ inventory_hostname }} is already using bluestore. Skipping all tasks.'
      when: current_objectstore == 'bluestore'
~~~

Our current documentation says that we should set "osd_objectstore: bluestore" in
CephAnsibleDiskConfig and this is causing the issue.

Even if we remove that line bluestore seems to be the default value in ceph-ansible
so I'm afraid the issue is not solved.

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/16.1/html/framework_for_upgrades_13_to_16.1/osd-migration-from-filestore-to-bluestore#migrating-OSDs-from-FileStore-to-BlueStore

Comment 2 Dan Macpherson 2020-11-27 12:51:27 UTC
I don't think this is a documentation BZ. This sounds like an engineering BZ.

Comment 3 Takashi Kajinami 2020-11-30 15:02:06 UTC
(In reply to Dan Macpherson from comment #2)
> I don't think this is a documentation BZ. This sounds like an engineering BZ.

One possible solution without code change would be to set
 osd_objectstore: filestore
before the migration command and reset the parameter to bluestore after migration completes.
We can use "openstack overcloud deploy --stack-only" to change the parameter without
triggering actual deployment steps.

However I tend to agree with you about this is an engineering BZ and the above parameter settings
should be handled in tripleo, ideally.

Do you want me to change the assigned component to tripleo-heat-templates(or any different package
if we have better one), or can we get some insights from Ceph squad before moving this bz ?

Comment 4 ndeevy 2020-12-01 09:03:12 UTC
Thanks Takashi

Hi @Francesco. Could you take a look and advise how best to address this BZ please? I agree with Dan that it seems more like an engineering BZ.

Thanks :)

Comment 14 Takashi Kajinami 2020-12-04 13:19:20 UTC
Thanks Naomi,

The updated version looks good to me !