Bug 1378090 - 'ceph-deploy osd create' fails on a manually encrypted disk [NEEDINFO]
Summary: 'ceph-deploy osd create' fails on a manually encrypted disk
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Disk
Version: 1.3.3
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: rc
: 1.3.4
Assignee: Kefu Chai
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks: 1329797
TreeView+ depends on / blocked
 
Reported: 2016-09-21 13:09 UTC by Tejas
Modified: 2022-02-21 18:01 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-20 20:59:27 UTC
Target Upstream Version:
tchandra: needinfo? (hnallurv)


Attachments (Terms of Use)

Comment 5 Loic Dachary 2016-09-21 14:50:18 UTC
I think to remember there was an issue with creating partitions on devicemapper devices but it's a vague rememberance. Are you able to create a partition table on the /dev/dm-0 device manually with sgdisk or parted ?

Comment 6 Tejas 2016-09-21 16:21:00 UTC
hi Loic,

    Yes, I am able to create partition table on the mapper device.

Also from the "osd create" command, the partitions are created:
Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/mapper/cry3: 1000.2 GB, 1000202788864 bytes, 1953521072 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1     20482048   1953521038  921.8G  unknown         ceph data
 2         2048     20480000    9.8G  unknown         ceph journal


But still ceph disk throws a warning that the partition doesnt exist:
magna061][WARNIN] ceph-disk: Error: partition 1 for /dev/dm-0 does not appear to exist

Thanks,
Tejas

Comment 7 Tejas 2016-09-22 13:39:26 UTC
Hi Loic,

    We are able to  create partitions manually on encrypted disks.
However the partitions created by ceph-deploy on the encrypted disk are not seen , and its fails to create the xfs filesystem.

Thanks,
Tejas

Comment 8 Loic Dachary 2016-09-22 14:58:16 UTC
We should first verify that the partition devices are created:

It's worth checking if

   ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/mapper/cry1

throws the same error as

   ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/dm-0

And also ls -ltr /dev/mapper/cry1* to verify if /dev/mapper/cry1p1 /dev/mapper/cry1p2 exist. In order to avoid device name parsing issues, I suggest naming the encrypted device cry instead of cry1 because the number could possibly be mistaken for a partition number. I'd have to check the code to be sure but let's be on the safe side.

You should not use ceph-deploy for these tests to reduce the number of layers.

Comment 9 Loic Dachary 2016-09-22 15:05:30 UTC
@Tejas if you provide me with access to a machine where this can be reproduced, I will be able to figure it out by experimenting myself. Thanks !

Comment 11 Loic Dachary 2016-09-22 16:50:35 UTC
TL;DR: I don't see how we can make it work with ceph-disk, there are too many issues. The last resort is to use ceph-osd --mkfs manually.

Using /dev/mapper/swift instead of /dev/dm-0 only marginally improves things.

[root@magna061 ~]# ceph-disk -v prepare /dev/mapper/swift
INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_type
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_type
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type
INFO:ceph-disk:Will colocate journal with data on /dev/mapper/swift
DEBUG:ceph-disk:Creating journal partition num 2 size 10000 on /dev/mapper/swift
INFO:ceph-disk:Running command: /usr/sbin/sgdisk --new=2:0:10000M --change-name=2:ceph journal --partition-guid=2:19c3a473-2d4b-488c-bc9d-0e13e51c9914 --typecode=2:45b0969e-9b03-\
4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/mapper/swift
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
INFO:ceph-disk:calling partx on prepared device /dev/mapper/swift
INFO:ceph-disk:re-reading known partitions will display errors
INFO:ceph-disk:Running command: /usr/sbin/partx -a /dev/mapper/swift
partx: /dev/mapper/swift: error adding partition 2
INFO:ceph-disk:Running command: /usr/bin/udevadm settle
DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/19c3a473-2d4b-488c-bc9d-0e13e51c9914
DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/19c3a473-2d4b-488c-bc9d-0e13e51c9914
DEBUG:ceph-disk:Creating osd partition on /dev/mapper/swift
INFO:ceph-disk:Running command: /usr/sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:f44cc730-5726-4c69-a872-669c7541030e --typecode=1:89c57f98-2fe5-4dc0\
-89c1-f3ad0ceff2be -- /dev/mapper/swift
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot.
The operation has completed successfully.
INFO:ceph-disk:calling partx on created device /dev/mapper/swift
INFO:ceph-disk:re-reading known partitions will display errors
INFO:ceph-disk:Running command: /usr/sbin/partx -a /dev/mapper/swift
partx: /dev/mapper/swift: error adding partitions 1-2
INFO:ceph-disk:Running command: /usr/bin/udevadm settle
ceph-disk: Error: partition 1 for /dev/mapper/swift does not appear to exist
[root@magna061 ~]# sgdisk --print /dev/mapper/swift
Disk /dev/mapper/swift: 1953521072 sectors, 931.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 5A07CDAE-970B-4550-BCE0-B863A9591D8E
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 1953521038
Partitions will be aligned on 2048-sector boundaries
Total free space is 4061 sectors (2.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1        20482048      1953521038   921.7 GiB   FFFF  ceph data
   2            2048        20480000   9.8 GiB     FFFF  ceph journal

The problem here is that partx -a does not handle the devicemapper devices and we should use partprobe instead.

[root@magna061 ~]# partprobe /dev/mapper/swift
[root@magna061 ~]# ls -l /dev/mapper/swift*
lrwxrwxrwx. 1 root root 7 Sep 22 16:24 /dev/mapper/swift -> ../dm-0
lrwxrwxrwx. 1 root root 7 Sep 22 16:24 /dev/mapper/swift1 -> ../dm-1
lrwxrwxrwx. 1 root root 7 Sep 22 16:24 /dev/mapper/swift2 -> ../dm-2

we now have the expected block devices. The preparation has been interrupted but let see how far we can get by monkey patching ceph-disk to use partprobe instead of partx.

(same as above after ceph-disk zap /dev/mapper/swift)
DEBUG:ceph-disk:Calling partprobe on created device /dev/mapper/swift
INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/mapper/swift
INFO:ceph-disk:Running command: /usr/bin/udevadm settle
ceph-disk: Error: partition 1 for /dev/mapper/swift does not appear to exist

Now we hit another problem: to figure out the partition name ceph-disk uses /sys/block and expects the names to refect the partitions. However we have:

[root@magna061 ~]# ls -l /sys/block
total 0
lrwxrwxrwx. 1 root root 0 Sep 22 11:28 dm-0 -> ../devices/virtual/block/dm-0
lrwxrwxrwx. 1 root root 0 Sep 22 16:35 dm-1 -> ../devices/virtual/block/dm-1
lrwxrwxrwx. 1 root root 0 Sep 22 16:35 dm-2 -> ../devices/virtual/block/dm-2
lrwxrwxrwx. 1 root root 0 Sep 22 11:10 sda -> ../devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda
lrwxrwxrwx. 1 root root 0 Sep 22 11:10 sdb -> ../devices/pci0000:00/0000:00:1f.2/ata2/host1/target1:0:0/1:0:0:0/block/sdb
lrwxrwxrwx. 1 root root 0 Sep 22 11:10 sdc -> ../devices/pci0000:00/0000:00:1f.2/ata3/host2/target2:0:0/2:0:0:0/block/sdc
lrwxrwxrwx. 1 root root 0 Sep 22 11:10 sdd -> ../devices/pci0000:00/0000:00:1f.2/ata4/host3/target3:0:0/3:0:0:0/block/sdd

and only the cannonical devicemapper names are shown here.

For the record these issues have been fixed in infernalis (the release after 0.94) but backporting the changes would be too much of a risk: they are extensive.

Comment 12 Loic Dachary 2016-09-22 16:52:36 UTC
Should we try to find a way to do that manually using ceph-osd --mkfs ?

Comment 13 Tejas 2016-09-22 17:36:00 UTC
IMO if we can get the ceph-deploy --dmcrypt method to the customer without the need for a OSD reboot, that would be better than pursuing this:
https://bugzilla.redhat.com/show_bug.cgi?id=1377639

Harish can you also pitch in on this?

Thanks,
Tejas

Comment 14 Federico Lucifredi 2016-09-22 17:52:55 UTC
#13: yes, it would be better, but it seems too late in the Hammer cycle to do this.

We will use the process outlined by Loic in #11 as our setup process for encrypted OSDs. Please test those steps, and document these in the corresponding Doc bug.

Comment 16 Loic Dachary 2016-11-23 07:16:13 UTC
For the record https://bugzilla.redhat.com/show_bug.cgi?id=1377639 is closed and this should probably be closed too.


Note You need to log in before you can comment on or make changes to this bug.