When Ceph is running on NVMe-SSD OSDs, it needs multiple OSDs per NVM SSD device to fully utilize the device, as stated in this Ceph documentation page section "NVMe SSD partitioning" , but ceph-ansible's normal osd_scenarios "collocated" and "non-collocated" do not support this at the present time - they expect "devices" to point to an entire block device, not a partition in a device. This is a downstream bugzilla to request for the product the upstream github issue 2126 .
here is a performance example of why this is important. Note that performance with 2 OSDS/NVM is almost double that for perf with 1 OSD/NVM.
Note that I used ceph-volume, upstream ceph-ansible and Luminous 12.2.2 to achieve this result, not RHCS 3.0.
Technically, ceph-volume does this already, provided that the logical volumes are created before hand and then specified for ceph-ansible to consume.
Commenting again, this is plain impossible with ceph-disk, and ceph-volume is fully capable of handling it, given the LVs are made. Can we close this? Or can we get clarification on what else is needed here?
I second what Alfredo is saying, if this is something supported out of the box by ceph-volume then there is no need for an RFE and this should be closed. We just need to make sure osp/ooo adds the support for ceph-volume.
I think it is fine to require ceph-volume for NVME support.
This has documentation impact and an OSP impact, so not closing the bug (but someone else should feel free to slice and dice into two if that helps).
Federico, there is still a slight functionality gap. ceph-volume supports LVM and therefore ceph-ansible would be fine, except that nothing will construct the LVM volumes that are fed to ceph-ansible at present. Not hard to do, but where does this happen automatically during OOO deployment?
What if ceph-ansible had a new feature which took parameters like this:
and then made "osd_lvm_count" LVs on each PV and then used those LVs as if they had originally been passed under the "devices" list?
1. update docs so user manually does what is in comment #12
2. if some future version of ceph-ansible gets the feature from comment #12, then update docs to use feature instead
We might be able to provide a preboot script along with the docs update which would set up the LVs during step1 (we used to use a script like that to clean the disks during deployment -- ironic cleans them now).
What do you think of comment #12 Seb?
(In reply to leseb from comment #9)
> I second what Alfredo is saying, if this is something supported out of the
> box by ceph-volume then there is no need for an RFE and this should be
> closed. We just need to make sure osp/ooo adds the support for ceph-volume.
Yes, osp/ooo can ship the appropriate ceph versions which include ceph-volume. However, as per Ben's comment #11, shouldn't ceph-ansible set up the LVs?
Suppose OSP, were not in the picture and you have customers deploying Ceph in a new environment. Are you going to require as a prerequisite that they have the logical volumes already created on their systems? If so, they might configure those LVs inconsistently across their systems or not configure them optimally (e.g. too many LVs per PV). These are the types of problems which lead to the need for deployment tools which ensure it's done correctly in every deployment. For this reason, I am asking if what's in comment #12, or something like it, could be a future feature of ceph-ansible.
Work for this is in-progress and will be solved by the introduction of choose_disk + the ability to create required PV/VG/LV.
The subject of the PR has changed a bit since now we are more talking about a set of pre-tasks that will create the create the PV/VG/LV.
Harish it's in 3.*, Would you please say where are you proposing I put it?
3.2, if it's going to be fixed there.
I don't think this is the case anymore. I've validated ceph-disk can in fact support non-collocated scenario w/NVMEs. Update the osds.yml to reflect the following:
New partitions on nvme0n1 will be added automatically in line with journal_size configured.
Outcome of triage is that this is a difficult objective for 3.0Z5.
*** Bug 1588085 has been marked as a duplicate of this bug. ***
(In reply to Randy Martinez from comment #23)
> I don't think this is the case anymore. I've validated ceph-disk can in fact
> support non-collocated scenario w/NVMEs. Update the osds.yml to reflect the
> osd_scenario: non-collocated
> - /dev/sdb
> - /dev/sdc
> - /dev/sdd
> - /dev/nvme0n1
> - /dev/nvme0n1
> - /dev/nvme0n1
> New partitions on nvme0n1 will be added automatically in line with
> journal_size configured.
That works but it's for something else. That's not what this bug is about. This bug is about passing a PV like /dev/nvme0n1 and a number, e.g. 4, and then having ceph-ansible create 4 LVs on that PV and then using those LVs as devices as if I had created them myself and then passed this:
For info on why you would do this see http://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments#NVMe-SSD-partitioning
What other changes must happen for ceph-ansible to resolve this BZ beyond the ceph-ansible v3.2.0beta2 release?
None, but I'll let Andrew confirm since he recently added the support for "batch" in ceph-ansible. Thanks.
Thanks Bara, the only thing to mention is that this is not supported by the containerized deployment. Thanks.
I'm actually assigning this to Andrew since he did the implementation and I'll let him answer your question as well John.
I don't follow why containerized Ceph should be different, other than that it's site-docker.yml vs site.yml.
Ben, it is different because batch does not support prepare only, see: http://tracker.ceph.com/issues/36363
John, please do not forget to mention that this does not support containerized deployments. Thanks
(In reply to leseb from comment #41)
> Ben, it is different because batch does not support prepare only, see:
So support for this feature in containers depends on the above issue being completed.
As far as you know, when it is completed will that be sufficient and this feature will be supported with containers?
If so will that be tracked in a different bug?
(In reply to John Fulton from comment #49)
> (In reply to leseb from comment #41)
> > Ben, it is different because batch does not support prepare only, see:
> > http://tracker.ceph.com/issues/36363
> So support for this feature in containers depends on the above issue being
> As far as you know, when it is completed will that be sufficient and this
> feature will be supported with containers?
> If so will that be tracked in a different bug?
The 'ceph-volume lvm batch --prepare' feature is completed upstream, merged to master and currently being backported to luminous. Once it's merged to master we'll get it cherry-picked downstream.
Sebastian, when are you able to start on the container support for this?
Commits from https://github.com/ceph/ceph/pull/24587 have been pushed downstream
Andrew, the patch is upstream, I just added it to the BZ.
Seb added https://github.com/ceph/ceph-ansible/pull/3269 to this BZ, so I'm resetting Fixed In Version to ceph-ansible 3.2.0rc1.
All planned testcases have been completed successfully, moving BZ to VERIFIED state.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.