Bug 1591074
Summary: | Support NVMe based bucket index pools | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | John Harrigan <jharriga> | ||||
Component: | Ceph-Ansible | Assignee: | Ali Maredia <amaredia> | ||||
Status: | CLOSED ERRATA | QA Contact: | Tiffany Nguyen <tunguyen> | ||||
Severity: | urgent | Docs Contact: | Aron Gunn <agunn> | ||||
Priority: | urgent | ||||||
Version: | 3.1 | CC: | agunn, amaredia, bengland, bniver, ceph-eng-bugs, dfuller, gmeno, hnallurv, jbrier, jdurgin, mhackett, nojha, nthomas, sankarshan, seb, shan, tserlin, tunguyen, vakulkar | ||||
Target Milestone: | rc | ||||||
Target Release: | 3.1 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHEL: ceph-ansible-3.1.0-0.1.rc18.el7cp Ubuntu: ceph-ansible_3.1.0~rc18-2redhat1 | Doc Type: | Enhancement | ||||
Doc Text: |
.Support NVMe based bucket index pools
Previously, configuring Ceph to optimize storage on high speed NVMe or SATA SSDs when using Object Gateway was a completely manual process which required complicated LVM configuration.
With this release, the `ceph-ansible` package provides two new Ansible playbooks that facilitate setting up SSD storage using LVM to optimize performance when using Object Gateway. See the link:{object-gw-production}#using-nvme-with-lvm-optimally[Using NVMe with LVM Optimally] chapter in the {product} Object Gateway for Production Guide for more information.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-09-26 18:22:08 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1593868, 1602919 | ||||||
Bug Blocks: | 1581350, 1584264 | ||||||
Attachments: |
|
Description
John Harrigan
2018-06-14 02:22:06 UTC
Yes, this should move to 3.2. I'm re-targetting the work. Thanks. Created attachment 1471129 [details]
sample osds.yml file showing osd_scenario=lvm format used in Scale Lab
For QE testing I suggest the following: Two hardware configurations: * 1 NVMe and (at least) four HDDs - one bucketIndex * 2 NVMe and (at least) four HDDs - two bucketIndexes WORKFLOW: 1) Starting with these available raw block devices: * /dev/nvme0n1 * /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde 2) Make edits to the playbook to match the configuration 3) Run the playbook 4) Review LVM configuration. Ten LV's total: * one FSjournal LV per HDD (placed on NVMe) - 4 LV's on /dev/nvme0n1 * one data LV per HDD (placed on each HDD) - one LV per HDD * one FSjournal LV for BucketIndex (placed on NVMe) - one LV on /dev/nvme0n1 * one data LV for BucketIndex (placed on NVMe) - one LV on /dev/nvme0n1 5) Edit osds.yml file for "osd_scenario=lvm" 6) run ceph-ansible and verify successful deployment 7) purge-cluster 8) Run the playbook for teardown 9) Review the LVM configuration is removed (lvdisplay, pvdisplay) Redo for the second configuration, this time using two NVMe devices. * /dev/nvme0n1, /dev/nvme1n1 * /dev/sdb, /dev/sdc, /dev/sdd, /dev/sde Make edits to the playbook for the first NVMe device. Run the playbook. Make edits to the playbook for the second NVMe device. Run the playbook. The FSjournal and BucketIndex LV's should be split across the two NVMe's. There should be two BucketIndex OSDs (one per NVMe device) for a total of twelve LVs. The LVM configuration should look like this: * one FSjournal LV per HDD (placed on both NVMe's) - two LV's on /dev/nvme0n1, two on nvme1n1 * one data LV per HDD (placed on each HDD) - one LV per HDD * one FSjournal LV per BucketIndex (placed on btoh NVMe's) - one LV on /dev/nvme0n1, one on /dev/nvme1n1 * one data LV per BucketIndex (placed on both NVMe's) - one LV on /dev/nvme0n1, one on /dev/nvme1n1 To add to John's comment, I know that teardown *seems* like an insignificant thing, because the purpose of ceph-ansible is to set up a cluster, not tear it down. After all, ansible playbooks are supposed to be idempotent (doing it twice same as doing it once). However, in practice, we find that if you don't have tear-down capabilities (i.e. infrastructure-playbooks/*purge-cluster*), then if setup fails, or if the wrong configuration was established, you often have no way to undo the damage. In a CI virtualized environment, it's not necessary, you just create new VMs and new virtual drives and start over. But in the bare-metal world of real hardware, you can't do that. We could all write your own scripts to re-init storage, but that's exactly what we're trying to avoid; we don't want to have every ceph-ansible user write their own automation to do tear-down, because it's really hard to do right. Ansible is memory-less - it has no innate way of knowing what configuration was previously used, so it cannot and does not know how to unwind any previous configuration before establishing a new configuration. But with a teardown script, you can purge the old configuration, then change the inputs to ceph-ansible, then run site.yml to establish the new configuration. So please have pity on us poor souls who live outside the sunny sim-world of CI ;-) it's worth noting that current playbook only addresses filestore based clusters since it creates LVs for FSjournals and bucket indexes. In RHCS 3.2 bluestore will be supported and likely the default. How will the logical volumes be created in that release? Will this playbook need to be extended to support bluestore, which requires two LVs (WAL and DB) as well as bucket index on NVMe ? (In reply to John Harrigan from comment #20) > it's worth noting that current playbook only addresses filestore based > clusters > since it creates LVs for FSjournals and bucket indexes. > > In RHCS 3.2 bluestore will be supported and likely the default. > How will the logical volumes be created in that release? > Will this playbook need to be extended to support bluestore, which requires > two LVs (WAL and DB) as well as bucket index on NVMe ? If we need changes for 3.2 + bluestore, let's open a new bz for them. To clarify, only one LV is needed in the common case of one fast device and one slow device - if you have a DB LV, the WAL will be stored there. (In reply to John Harrigan from comment #20) > it's worth noting that current playbook only addresses filestore based > clusters > since it creates LVs for FSjournals and bucket indexes. > > In RHCS 3.2 bluestore will be supported and likely the default. > How will the logical volumes be created in that release? > Will this playbook need to be extended to support bluestore, which requires > two LVs (WAL and DB) as well as bucket index on NVMe ? The plan is to have ceph-volume handling the LV's creation for 3.2 so this won't need to be extended. Although, this needs a BZ. (In reply to leseb from comment #22) > (In reply to John Harrigan from comment #20) > > it's worth noting that current playbook only addresses filestore based > > clusters > > since it creates LVs for FSjournals and bucket indexes. > > > > In RHCS 3.2 bluestore will be supported and likely the default. > > How will the logical volumes be created in that release? > > Will this playbook need to be extended to support bluestore, which requires > > two LVs (WAL and DB) as well as bucket index on NVMe ? > > The plan is to have ceph-volume handling the LV's creation for 3.2 so this > won't need to be extended. Although, this needs a BZ. opened new BZ https://bugzilla.redhat.com/show_bug.cgi?id=1619812 Verified using build 12.2.5-39.el7cp. Both scenarios are used to verify as comment #10. Two hardware configurations: * 1 NVMe and (at least) four HDDs - one bucketIndex * 2 NVMe and (at least) four HDDs - two bucketIndexes Seeing issue of "device excluded by a filter” while running "ansible-playbook lv-create.yml". Workaround: run "wipefs -a" for all devices on OSD nodes to remove any FS/GPT signatures. This needs to be addressed. Other than that, everything is working as expected. I think its related to Bz 1619090 These doc updates should mention that they only apply to filestore, not bluestore. Chapter 10. Using NVMe with LVM Optimally Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2819 |