Bug 1537980 - ceph osd: Cannot use NVMe device as an OSD when running Ceph in Containers
Summary: ceph osd: Cannot use NVMe device as an OSD when running Ceph in Containers
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Container
Version: 3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 3.1
Assignee: Sébastien Han
QA Contact: Vasishta
URL:
Whiteboard:
Depends On:
Blocks: 1553254 1548353 1572368
TreeView+ depends on / blocked
 
Reported: 2018-01-24 08:55 UTC by Deepthi Dharwar
Modified: 2021-06-10 14:19 UTC (History)
19 users (show)

Fixed In Version: rhceph-rhel7-docker-3-4
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-08 18:32:42 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-container pull 885 0 None closed disk_list.sh: improve 'ceph data' mount partition 2020-06-05 16:56:51 UTC
Github ceph ceph-container pull 892 0 None closed disk_list: fix wrong path to device 2020-06-05 16:56:51 UTC

Description Deepthi Dharwar 2018-01-24 08:55:39 UTC
Description of problem: 

Using the latest ceph-ansible, to deploy Ceph in Containers. 
Ceph cluster comprises of 1 MON, 1 MGR and 1 OSD - NVMe.
Currently when we try to deploy the same, the OSD container keeps restarting as it fails to execute entrypoint.sh disk_list.sh.

When we use NVMe, the partitions are /dev/nvme0n1p1 and /dev/nvme0n1p2. 
Looks like it is appending 1 to the OSD_DEVICE so it is searching for /dev/nvme0n11 instead, while trying to mount ceph data.

The device is prepared properly but when activating the disk, it fails.
There are no errors on the ceph-ansible side as it has deployed the container.

Version-Release number of selected component (if applicable):


How reproducible: Very easy. Just need to have a single OSD backed to NVMe and run Ceph OSD process for the same in containers.


Steps to Reproduce:
1. Use ceph-ansible to deploy Ceph in Containers comprising of 1 MON, 1 MGR and 1 OSD
2. Make sure the OSD is backed to NVMe device. 
3. Once ceph-ansible deploys, the CEPH-OSD container keeps restarting as it fails to execute disk_list script. 

Actual results:
CEPH-OSD container keeps restarting as it fails to mount ceph data.
# ./ceph-osd-run.sh nvme0n1
mount: special device /dev/nvme0n11 does not exist
Error response from daemon: No such container: expose_partitions_nvme0n1
2018-01-24 03:35:22  /entrypoint.sh: static: does not generate config
mount: special device /dev/nvme0n11 does not exist



Expected results:
Ceph OSD container to be up and running when backed to NVMe 

Additional info:
function mount_ceph_data () {
  if is_dmcrypt; then
	mount /dev/mapper/"${data_uuid}" "$tmp_dir"
  else
	if is_loop_dev "${OSD_DEVICE}"; then
	  mount "${OSD_DEVICE}p1" "$tmp_dir"
	else
	  mount "${OSD_DEVICE}1" "$tmp_dir"
	fi
  fi
Need a check if the device is NVMe and append 'p1' instead of '1' while mounting the disk.

Upstream bug: https://github.com/ceph/ceph-container/issues/884

Comment 4 Vikhyat Umrao 2018-02-01 14:35:41 UTC
For noncontainerized we have another bug -  https://bugzilla.redhat.com/show_bug.cgi?id=1541016

Comment 33 Vasishta 2018-05-03 07:30:57 UTC
QE had tried using ceph-ansible-3.0.28-1.el7cp.noarch and container image - ceph-3.0-rhel-7-docker-candidate-53533-20180320051359

Was working fine.


Regards,
Vasishta Shastry
AQE, Ceph

Comment 35 Vasishta 2018-05-04 06:47:53 UTC
Hi Vikhyat,

You're welcome.
Moving to VERIFIED state.


Regards,
Vasishta Shastry
AQE, Ceph

Comment 37 Ken Dreyer (Red Hat) 2018-05-08 18:32:42 UTC
Fixed as of the latest container image announced at https://access.redhat.com/errata/RHBA-2018:1260


Note You need to log in before you can comment on or make changes to this bug.