Bug 1666822
Summary: | ceph-volume does not always populate dictionary key rotational | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | John Fulton <johfulto> | ||||
Component: | Ceph-Volume | Assignee: | Andrew Schoen <aschoen> | ||||
Status: | CLOSED ERRATA | QA Contact: | Eliad Cohen <elicohen> | ||||
Severity: | urgent | Docs Contact: | Bara Ancincova <bancinco> | ||||
Priority: | urgent | ||||||
Version: | 3.2 | CC: | adakopou, akaris, anharris, arkady_kanevsky, aschoen, assingh, ceph-eng-bugs, ceph-qe-bugs, cschwede, dhill, elicohen, flucifre, gabrioux, gael_rehault, gfidente, gmeno, gsitlani, jmelvin, kholtz, kplantjr, mmuench, nmorell, pgrist, schhabdi, sisadoun, ssigwald, tchandra, tserlin, vashastr | ||||
Target Milestone: | rc | Flags: | kholtz:
needinfo-
|
||||
Target Release: | 3.3 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHEL: ceph-12.2.12-26.el7cp Ubuntu: ceph_12.2.12-22redhat1xenial | Doc Type: | Bug Fix | ||||
Doc Text: |
.`ceph-volume` can determine if a device is rotational or not even if the device is not in the `/sys/block/` directory
If the device name did not exist in the `/sys/block/` directory, the `ceph-volume` utility could not acquire information on if a device was rotational or not. This was for example the case for loopback devices or devices listed in the `/dev/disk/by-path/` directory. Consequently, the `lvm batch` subcommand failed. with this update, `ceph-volume` uses the `lsblk` command to determine if a device is rotational if no information is found in `/sys/block/` for the given device. As a result, `lvm batch` works as expected in this case.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-08-21 15:10:24 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1578730, 1726135 | ||||||
Attachments: |
|
Description
John Fulton
2019-01-16 16:39:32 UTC
I understand the need for using loop back devices, but these aren't supported for ceph-volume and I don't foresee adding that as a feature. However, there are a couple of things here that should be noted: 1) this is still a bug, where ceph-volume is trusting that device objects will always have the "rotational" flag, the device would still be rejected but with an error message (vs. a traceback like today) 2) it is possible to get ceph-volume to work with loop devices and save resources, this is how ceph-volume is able to test rotational+NVMe devices for example. In short: - finds an available loop device - creates a sparse file - attaches the sparse file onto the loop device - tells NVMe to make a target out of it The last portion sets everything right with the kernel recognizing the loop device as a new NVMe device. The playbook is at: https://github.com/ceph/ceph/blob/master/src/ceph-volume/ceph_volume/tests/functional/batch/playbooks/setup_mixed_type.yml (In reply to Alfredo Deza from comment #1) > I understand the need for using loop back devices, but these aren't > supported for ceph-volume and I don't foresee adding that as a feature. > > However, there are a couple of things here that should be noted: > > 1) this is still a bug, where ceph-volume is trusting that device objects > will always have the "rotational" flag, the device would still be rejected > but with an error message (vs. a traceback like today) OK, I'm fine with you using this bug to solve the above issue if that's what you'd like to do. > 2) it is possible to get ceph-volume to work with loop devices and save > resources, this is how ceph-volume is able to test rotational+NVMe devices > for example. In short: > - finds an available loop device > - creates a sparse file > - attaches the sparse file onto the loop device > - tells NVMe to make a target out of it > > The last portion sets everything right with the kernel recognizing the loop > device as a new NVMe device. The playbook is at: > > https://github.com/ceph/ceph/blob/master/src/ceph-volume/ceph_volume/tests/ > functional/batch/playbooks/setup_mixed_type.yml Thanks, that's a nice trick to simulate having NVMe devices so that I could continue to use 'ceph-volume batch' on loopback devices. For TripleO CI I found another way to use loopback devices without using the deprecated [1] collocated or non-collocated osd_scenarios, which is to simply not use 'ceph-volume batch' mode. So in this case I just pass the info about a precreated LVM. When I do that it doesn't hit this issue. sudo dd if=/dev/zero of=/var/lib/ceph-osd.img bs=1 count=0 seek=7G sudo losetup /dev/loop3 /var/lib/ceph-osd.img sudo pvcreate /dev/loop3 sudo vgcreate vg2 /dev/loop3 sudo lvcreate -n data-lv2 -l 597 vg2 sudo lvcreate -n db-lv2 -l 597 vg2 sudo lvcreate -n wal-lv2 -l 597 vg2 and then in my THT pass parameter_defaults: CephAnsibleDisksConfig: osd_scenario: lvm osd_objectstore: bluestore lvm_volumes: - data: data-lv2 data_vg: vg2 db: db-lv2 db_vg: vg2 wal: wal-lv2 wal_vg: vg2 It worked on my testing VM with a loopback so I'll try having TripleO CI create the LVM structure before running ceph-ansible. [1] https://github.com/ceph/ceph-ansible/blob/master/docs/source/osds/scenarios.rst#collocated I have received reports of people hitting this issue even when they are not using loopback devices so I am updated the bug title. I have asked them to update this bug with their lsblk output. The issue is more serious if people are hitting it with real disks. I am seeing the same behavior with 24 real disks, 20 spinning and 4 solid state. Just for the record I am using docker instead of podman. lsblk is able to determine whether or not the disks are rotating or not: [heat-admin@overcloud-cephstorage-0 ~]$ lsblk -d -o ROTA $(for i in {a..x}; do echo -n "/dev/sd$i "; done) ROTA 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 (In reply to Jeremy from comment #15) > The mentioned workaround https://access.redhat.com/solutions/3954161 says > "If you don't want to use the ceph-volume batch feature and have direct > control of what disk gets picked for what, then you may create LVM volumes > directly on the devices with an OSPd preboot script" .. Could we get that > script or some directions to give our customers how to do that. You can have director run any script on first boot as described here: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/advanced_overcloud_customization/chap-configuration_hooks#sect-Customizing_Configuration_on_First_Boot Instead of having the embedded bash script in the example above echo a line into /etc/resolv.conf, you could have it create LVMs with the lvcreate command something like: config: | #!/bin/bash pvcreate {{ ceph_loop_device }} vgcreate {{ ceph_logical_volume_group }} {{ ceph_loop_device }} lvcreate -n {{ ceph_logical_volume_wal }} -l 375 {{ ceph_logical_volume_group }} lvcreate -n {{ ceph_logical_volume_db }} -l 375 {{ ceph_logical_volume_group }} lvcreate -n {{ ceph_logical_volume_data }} -l 1041 {{ ceph_logical_volume_group }} lvs Naturally you'll need to change the sizes and the LVM names based on what you choose. So this example: parameter_defaults: CephAnsibleDisksConfig: osd_objectstore: bluestore osd_scenario: lvm lvm_volumes: - data: ceph_lv_data data_vg: ceph_vg db: ceph_lv_db db_vg: ceph_vg wal: ceph_lv_wal wal_vg: ceph_vg We could set: {{ ceph_logical_volume_group }} to ceph_vg {{ ceph_logical_volume_wal }} to ceph_lv_wal {{ ceph_logical_volume_data }} to ceph_lv_data {{ ceph_logical_volume_db }} to ceph_lv_db That's for ONE pv which would be {{ ceph_loop_device }}. If the devices list is longer the above would need to be expanded. Created attachment 1549277 [details]
Example workaround for OSPd
(In reply to Siggy Sigwald from comment #46) > A message from our customer on the support case: > > Looking at the BZ, it looks like it s targetted for ceph 3.3 - we are > unfortunatly not able to wait for that to release due to date constraints > for our release. > Would you be able to provide us with a way to patch this fix into the > existing 3.2 , not entirely sure where the fix needs to go, ceph container > image or the overcloud image, but having this would be much more easier for > us to implement than the previously proposed workaround where we need to > "manually" create the vlm's > thanks > > Please advice. > Thanks. Siggy, Unfortunately there is just too much change between 3.2 and 3.3 and it is not possible to deliver a simple patch here. It requires many of the changes present in 3.3 to implement a fix. Thanks, Andrew Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:2538 *** Bug 1674022 has been marked as a duplicate of this bug. *** The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |