Bug 1666822

Summary: ceph-volume does not always populate dictionary key rotational
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: John Fulton <johfulto>
Component: Ceph-VolumeAssignee: Andrew Schoen <aschoen>
Status: CLOSED ERRATA QA Contact: Eliad Cohen <elicohen>
Severity: urgent Docs Contact: Bara Ancincova <bancinco>
Priority: urgent    
Version: 3.2CC: adakopou, akaris, anharris, arkady_kanevsky, aschoen, assingh, ceph-eng-bugs, ceph-qe-bugs, cschwede, dhill, elicohen, flucifre, gabrioux, gael_rehault, gfidente, gmeno, gsitlani, jmelvin, kholtz, kplantjr, mmuench, nmorell, pgrist, schhabdi, sisadoun, ssigwald, tchandra, tserlin, vashastr
Target Milestone: rcFlags: kholtz: needinfo-
Target Release: 3.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: RHEL: ceph-12.2.12-26.el7cp Ubuntu: ceph_12.2.12-22redhat1xenial Doc Type: Bug Fix
Doc Text:
.`ceph-volume` can determine if a device is rotational or not even if the device is not in the `/sys/block/` directory If the device name did not exist in the `/sys/block/` directory, the `ceph-volume` utility could not acquire information on if a device was rotational or not. This was for example the case for loopback devices or devices listed in the `/dev/disk/by-path/` directory. Consequently, the `lvm batch` subcommand failed. with this update, `ceph-volume` uses the `lsblk` command to determine if a device is rotational if no information is found in `/sys/block/` for the given device. As a result, `lvm batch` works as expected in this case.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-21 15:10:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1578730, 1726135    
Attachments:
Description Flags
Example workaround for OSPd none

Description John Fulton 2019-01-16 16:39:32 UTC
When using ceph-volume with ceph-nautilus (dev) [1] via ceph-ansible master and passing a loopback device [2] for the disk my deployment fails because the generated dictionary does not container the key 'rotational'. Could ceph-volume please handle this case and populate the dictionary to have a 'rotational' key? I'm not sure if the value the key maps to matters for this case.

Though loopback devices shouldn't be used in production, OpenStack's TripleO CI system uses a loopback device to simulate a block device for ceph deployment and this issue prevents us getting our CI working with ceph-volume (it currently works with ceph-disk) with the limited resources we have.


[1]

[root@fultonj ~]# podman exec -ti 7224e510ead3 ceph --version
ceph version 14.0.1-2605-g6b17068 (6b170687d1b8ffc393eaf9194b615758049fcc40) nautilus (dev)
[root@fultonj ~]# 

[2]
    devices:
      - /dev/loop3

as created with:

dd if=/dev/zero of=/var/lib/ceph-osd.img bs=1 count=0 seek=7G
losetup /dev/loop3 /var/lib/ceph-osd.img
sgdisk -Z /dev/loop3

2019-01-16 16:16:10,638 p=265329 u=root |  TASK [ceph-osd : read information about the devices] ***************************
2019-01-16 16:16:10,638 p=265329 u=root |  task path: /home/stack/ceph-ansible/roles/ceph-osd/tasks/main.yml:24
2019-01-16 16:16:10,638 p=265329 u=root |  Wednesday 16 January 2019  16:16:10 +0000 (0:00:00.049)       0:01:16.293 ***** 
2019-01-16 16:16:10,820 p=265329 u=root |  Using module file /usr/lib/python3.6/site-packages/ansible/modules/system/parted.py
2019-01-16 16:16:12,025 p=265329 u=root |  ok: [fultonj] => (item=/dev/loop3) => changed=false 
  disk:
    dev: /dev/loop3
    logical_block: 512
    model: Loopback device
    physical_block: 512
    size: 7168.0
    table: unknown
    unit: mib
  invocation:
    module_args:
      align: optimal
      device: /dev/loop3
      flags: null
      label: msdos
      name: null
      number: null
      part_end: 100%
      part_start: 0%
      part_type: primary
      state: info
      unit: MiB
  item: /dev/loop3
  partitions: []
  script: unit 'MiB' print
2019-01-16 16:16:12,058 p=265329 u=root |  TASK [ceph-osd : include check_gpt.yml] ****************************************
2019-01-16 16:16:12,058 p=265329 u=root |  task path: /home/stack/ceph-ansible/roles/ceph-osd/tasks/main.yml:31
2019-01-16 16:16:12,058 p=265329 u=root |  Wednesday 16 January 2019  16:16:12 +0000 (0:00:01.420)       0:01:17.713 ***** 
2019-01-16 16:16:12,080 p=265329 u=root |  skipping: [fultonj] => changed=false 
  skip_reason: Conditional result was False
2019-01-16 16:16:12,112 p=265329 u=root |  TASK [ceph-osd : include_tasks scenarios/collocated.yml] ***********************
2019-01-16 16:16:12,113 p=265329 u=root |  task path: /home/stack/ceph-ansible/roles/ceph-osd/tasks/main.yml:36
2019-01-16 16:16:12,113 p=265329 u=root |  Wednesday 16 January 2019  16:16:12 +0000 (0:00:00.054)       0:01:17.768 ***** 
2019-01-16 16:16:12,129 p=265329 u=root |  skipping: [fultonj] => changed=false 
  skip_reason: Conditional result was False
2019-01-16 16:16:12,162 p=265329 u=root |  TASK [ceph-osd : include_tasks scenarios/non-collocated.yml] *******************
2019-01-16 16:16:12,162 p=265329 u=root |  task path: /home/stack/ceph-ansible/roles/ceph-osd/tasks/main.yml:41
2019-01-16 16:16:12,162 p=265329 u=root |  Wednesday 16 January 2019  16:16:12 +0000 (0:00:00.049)       0:01:17.817 ***** 
2019-01-16 16:16:12,180 p=265329 u=root |  skipping: [fultonj] => changed=false 
  skip_reason: Conditional result was False
2019-01-16 16:16:12,212 p=265329 u=root |  TASK [ceph-osd : include_tasks scenarios/lvm.yml] ******************************
2019-01-16 16:16:12,212 p=265329 u=root |  task path: /home/stack/ceph-ansible/roles/ceph-osd/tasks/main.yml:47
2019-01-16 16:16:12,212 p=265329 u=root |  Wednesday 16 January 2019  16:16:12 +0000 (0:00:00.050)       0:01:17.867 ***** 
2019-01-16 16:16:12,230 p=265329 u=root |  skipping: [fultonj] => changed=false 
  skip_reason: Conditional result was False
2019-01-16 16:16:12,262 p=265329 u=root |  TASK [ceph-osd : include_tasks scenarios/lvm-batch.yml] ************************
2019-01-16 16:16:12,263 p=265329 u=root |  task path: /home/stack/ceph-ansible/roles/ceph-osd/tasks/main.yml:55
2019-01-16 16:16:12,263 p=265329 u=root |  Wednesday 16 January 2019  16:16:12 +0000 (0:00:00.050)       0:01:17.918 ***** 
2019-01-16 16:16:12,315 p=265329 u=root |  included: /home/stack/ceph-ansible/roles/ceph-osd/tasks/scenarios/lvm-batch.yml for fultonj
2019-01-16 16:16:12,359 p=265329 u=root |  TASK [ceph-osd : use ceph-volume lvm batch to create bluestore osds] ***********
2019-01-16 16:16:12,359 p=265329 u=root |  task path: /home/stack/ceph-ansible/roles/ceph-osd/tasks/scenarios/lvm-batch.yml:3
2019-01-16 16:16:12,360 p=265329 u=root |  Wednesday 16 January 2019  16:16:12 +0000 (0:00:00.096)       0:01:18.015 ***** 
2019-01-16 16:16:12,545 p=265329 u=root |  Using module file /home/stack/ceph-ansible/library/ceph_volume.py
2019-01-16 16:16:14,325 p=265329 u=root |  The full traceback is:
  File "/tmp/ansible_ceph_volume_payload_2k1mkdqh/__main__.py", line 602, in run_module
    report_result = json.loads(out)
  File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None

2019-01-16 16:16:14,329 p=265329 u=root |  fatal: [fultonj]: FAILED! => changed=true 
  cmd:
  - podman
  - run
  - --rm
  - --privileged
  - --net=host
  - -v
  - /run/lock/lvm:/run/lock/lvm:z
  - -v
  - /var/run/udev/:/var/run/udev/:z
  - -v
  - /dev:/dev
  - -v
  - /etc/ceph:/etc/ceph:z
  - -v
  - /run/lvm/:/run/lvm/
  - -v
  - /var/lib/ceph/:/var/lib/ceph/:z
  - -v
  - /var/log/ceph/:/var/log/ceph/:z
  - --entrypoint=ceph-volume
  - docker.io/ceph/daemon:latest-master
  - --cluster
  - ceph
  - lvm
  - batch
  - --bluestore
  - --yes
  - --prepare
  - /dev/loop3
  - --report
  - --format=json
  invocation:
    module_args:
      action: batch
      batch_devices:
      - /dev/loop3
      block_db_size: '-1'
      cluster: ceph
      containerized: 'False'
      crush_device_class: ''
      data: null
      data_vg: null
      db: null
      db_vg: null
      dmcrypt: false
      journal: null
      journal_size: '5120'
      journal_vg: null
      objectstore: bluestore
      osds_per_device: 1
      report: false
      wal: null
      wal_vg: null
  msg: non-zero return code
  rc: 1
  stderr: '-->  KeyError: ''rotational'''
  stderr_lines:
  - '-->  KeyError: ''rotational'''
  stdout: ''
  stdout_lines: <omitted>
2019-01-16 16:16:14,330 p=265329 u=root |  NO MORE HOSTS LEFT *************************************************************
2019-01-16 16:16:14,330 p=265329 u=root |  PLAY RECAP *********************************************************************
2019-01-16 16:16:14,331 p=265329 u=root |  fultonj                    : ok=197  changed=5    unreachable=0    failed=1

Comment 1 Alfredo Deza 2019-01-16 18:35:45 UTC
I understand the need for using loop back devices, but these aren't supported for ceph-volume and I don't foresee adding that as a feature.

However, there are a couple of things here that should be noted:

1) this is still a bug, where ceph-volume is trusting that device objects will always have the "rotational" flag, the device would still be rejected but
with an error message (vs. a traceback like today)

2) it is possible to get ceph-volume to work with loop devices and save resources, this is how ceph-volume is able to test rotational+NVMe devices for example. In short:
 - finds an available loop device
 - creates a sparse file
 - attaches the sparse file onto the loop device
 - tells NVMe to make a target out of it

The last portion sets everything right with the kernel recognizing the loop device as a new NVMe device. The playbook is at:

https://github.com/ceph/ceph/blob/master/src/ceph-volume/ceph_volume/tests/functional/batch/playbooks/setup_mixed_type.yml

Comment 3 John Fulton 2019-02-14 19:49:15 UTC
(In reply to Alfredo Deza from comment #1)
> I understand the need for using loop back devices, but these aren't
> supported for ceph-volume and I don't foresee adding that as a feature.
> 
> However, there are a couple of things here that should be noted:
> 
> 1) this is still a bug, where ceph-volume is trusting that device objects
> will always have the "rotational" flag, the device would still be rejected
> but with an error message (vs. a traceback like today)

OK, I'm fine with you using this bug to solve the above issue if that's what you'd like to do. 

> 2) it is possible to get ceph-volume to work with loop devices and save
> resources, this is how ceph-volume is able to test rotational+NVMe devices
> for example. In short:
>  - finds an available loop device
>  - creates a sparse file
>  - attaches the sparse file onto the loop device
>  - tells NVMe to make a target out of it
> 
> The last portion sets everything right with the kernel recognizing the loop
> device as a new NVMe device. The playbook is at:
> 
> https://github.com/ceph/ceph/blob/master/src/ceph-volume/ceph_volume/tests/
> functional/batch/playbooks/setup_mixed_type.yml

Thanks, that's a nice trick to simulate having NVMe devices so that I could continue to use 'ceph-volume batch' on loopback devices.

For TripleO CI I found another way to use loopback devices without using the deprecated [1] collocated or non-collocated osd_scenarios, which is to simply not use 'ceph-volume batch' mode. So in this case I just pass the info about a precreated LVM. When I do that it doesn't hit this issue.

sudo dd if=/dev/zero of=/var/lib/ceph-osd.img bs=1 count=0 seek=7G
sudo losetup /dev/loop3 /var/lib/ceph-osd.img
sudo pvcreate /dev/loop3
sudo vgcreate vg2 /dev/loop3
sudo lvcreate -n data-lv2 -l 597 vg2
sudo lvcreate -n db-lv2 -l 597 vg2
sudo lvcreate -n wal-lv2 -l 597 vg2

and then in my THT pass

parameter_defaults:
  CephAnsibleDisksConfig:
    osd_scenario: lvm
    osd_objectstore: bluestore
    lvm_volumes:
      - data: data-lv2
        data_vg: vg2
        db: db-lv2
        db_vg: vg2
        wal: wal-lv2
        wal_vg: vg2

It worked on my testing VM with a loopback so I'll try having TripleO CI create the LVM structure before running ceph-ansible.

[1] https://github.com/ceph/ceph-ansible/blob/master/docs/source/osds/scenarios.rst#collocated

Comment 4 John Fulton 2019-03-11 18:06:25 UTC
I have received reports of people hitting this issue even when they are not using loopback devices so I am updated the bug title. I have asked them to update this bug with their lsblk output. The issue is more serious if people are hitting it with real disks.

Comment 5 Keith Plant 2019-03-11 18:26:01 UTC
I am seeing the same behavior with 24 real disks, 20 spinning and 4 solid state. Just for the record I am using docker instead of podman.

lsblk is able to determine whether or not the disks are rotating or not:

[heat-admin@overcloud-cephstorage-0 ~]$ lsblk -d -o ROTA $(for i in {a..x}; do echo -n "/dev/sd$i "; done)
ROTA
   0
   0
   0
   0
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1
   1

Comment 17 John Fulton 2019-03-28 21:07:58 UTC
(In reply to Jeremy from comment #15)
> The mentioned workaround https://access.redhat.com/solutions/3954161 says
> "If you don't want to use the ceph-volume batch feature and have direct
> control of what disk gets picked for what, then you may create LVM volumes
> directly on the devices with an OSPd preboot script" .. Could we get that
> script or some directions to give our customers how to do that.

You can have director run any script on first boot as described here:

 https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/advanced_overcloud_customization/chap-configuration_hooks#sect-Customizing_Configuration_on_First_Boot

Instead of having the embedded bash script in the example above echo a line into /etc/resolv.conf, you could have it create LVMs with the lvcreate command something like:

  config: |
    #!/bin/bash
    pvcreate {{ ceph_loop_device }}
    vgcreate {{ ceph_logical_volume_group }} {{ ceph_loop_device }}
    lvcreate -n {{ ceph_logical_volume_wal }} -l 375 {{ ceph_logical_volume_group }}
    lvcreate -n {{ ceph_logical_volume_db }} -l 375 {{ ceph_logical_volume_group }}
    lvcreate -n {{ ceph_logical_volume_data }} -l 1041 {{ ceph_logical_volume_group }}
    lvs

Naturally you'll need to change the sizes and the LVM names based on what you choose. So this example:

parameter_defaults:
  CephAnsibleDisksConfig:
    osd_objectstore: bluestore
    osd_scenario: lvm
    lvm_volumes:
      - data: ceph_lv_data
        data_vg: ceph_vg
        db: ceph_lv_db
        db_vg: ceph_vg
        wal: ceph_lv_wal
        wal_vg: ceph_vg

We could set:

{{ ceph_logical_volume_group }} to ceph_vg
{{ ceph_logical_volume_wal }} to ceph_lv_wal
{{ ceph_logical_volume_data }} to ceph_lv_data
{{ ceph_logical_volume_db }} to ceph_lv_db

That's for ONE pv which would be {{ ceph_loop_device }}. If the devices list is longer the above would need to be expanded.

Comment 20 John Fulton 2019-03-29 03:01:26 UTC
Created attachment 1549277 [details]
Example workaround for OSPd

Comment 48 Andrew Schoen 2019-07-18 14:34:30 UTC
(In reply to Siggy Sigwald from comment #46)
> A message from our customer on the support case:
> 
> Looking at the BZ, it looks like it s targetted for ceph 3.3 - we are
> unfortunatly not able to wait for that to release due to date constraints
> for our release.
> Would you be able to provide us with a way to patch this fix into the
> existing 3.2 , not entirely sure where the fix needs to go, ceph container
> image or the overcloud image, but having this would be much more easier for
> us to implement than the previously proposed workaround where we need to
> "manually" create the vlm's 
> thanks
> 
> Please advice.
> Thanks.

Siggy,

Unfortunately there is just too much change between 3.2 and 3.3 and it is not possible to deliver a simple patch here. It requires many of the changes present in 3.3 to implement a fix.

Thanks,
Andrew

Comment 60 errata-xmlrpc 2019-08-21 15:10:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2538

Comment 61 Alfredo Deza 2019-09-26 14:58:53 UTC
*** Bug 1674022 has been marked as a duplicate of this bug. ***

Comment 62 Red Hat Bugzilla 2024-01-06 04:25:58 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days