Bug 1300617 - Unable to prepare/Activate a SSD disk in Ceph Cluster
Summary: Unable to prepare/Activate a SSD disk in Ceph Cluster
Keywords:
Status: CLOSED DUPLICATE of bug 1300703
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 1.3.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: 1.3.3
Assignee: Loic Dachary
QA Contact: ceph-qe-bugs
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-01-21 09:46 UTC by Tanay Ganguly
Modified: 2017-07-30 15:10 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
.The "ceph-disk prepare" command fails on SSD disks An attempt to prepare Solid-state Drives (SSDs) by running the `ceph-disk prepare` command fails. To work around this issue, perform the steps below: . Manually remove the `udev` rules by running the following command as `root`: + ---- # rm /usr/lib/udev/rules.d/95-ceph-osd.rules ---- . Prepare the disks: + ---- $ ceph-disk prepare ---- . Add the "ceph-disk activate-all" string to the `/etc/rc.local` file. Run the following command as `root`: + ---- # echo "ceph-disk activate-all" | tee -a /etc/rc.local ---- . Reboot the system or activate the disks by running the following command as `root`: + ---- # ceph-disk activate-all ----
Clone Of:
Environment:
Last Closed: 2016-03-21 23:47:36 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 14099 0 None None None 2016-01-22 02:03:54 UTC

Description Tanay Ganguly 2016-01-21 09:46:53 UTC
Description of problem:
Unable to prepare a SSD disk in Ceph Cluster

Version-Release number of selected component (if applicable):
1.3.2
ceph-0.94.5-1.el7cp.x86_64
ceph-common-0.94.5-1.el7cp.x86_64
ceph-osd-0.94.5-1.el7cp.x86_64

RHEL 7.2

How reproducible:
Tried 3 times, after zapping but its failing.

Steps to Reproduce:
1.Zap the Disk which was successful.
2.Tried to prepare the Disk, and its Failing.

Actual results:
Its only failing when i am trying to prepare the SSD drive.
In one of my other machine in the same cluster, the SSD disk was been prepared and also got activated. Although i am using the same package and distros for this Machine

Expected results:


Additional info:
This is happening only in case of SSD Drive.


Console log:
--------------------------------------------------------------------------

[cephqe9][DEBUG ] connection detected need for sudo
[cephqe9][DEBUG ] connected to host: cephqe9
[cephqe9][DEBUG ] detect platform information from remote host
[cephqe9][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Red Hat Enterprise Linux Server 7.2 Maipo
[ceph_deploy.osd][DEBUG ] Preparing host cephqe9 disk /dev/sdi journal None activate False
[cephqe9][INFO  ] Running command: sudo ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdi
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type
[cephqe9][WARNIN] INFO:ceph-disk:Will colocate journal with data on /dev/sdi
[cephqe9][WARNIN] DEBUG:ceph-disk:Creating journal partition num 2 size 5120 on /dev/sdi
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:5120M --change-name=2:ceph journal --partition-guid=2:2a1d237c-5caf-4eda-8213-cb64aa7f56d3 --typecode=2:45b0969e
-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdi
[cephqe9][DEBUG ] Warning: The kernel is still using the old partition table.
[cephqe9][DEBUG ] The new table will be used at the next reboot.
[cephqe9][DEBUG ] The operation has completed successfully.
[cephqe9][WARNIN] INFO:ceph-disk:calling partx on prepared device /dev/sdi
[cephqe9][WARNIN] INFO:ceph-disk:re-reading known partitions will display errors
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/partx -a /dev/sdi
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
[cephqe9][WARNIN] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/2a1d237c-5caf-4eda-8213-cb64aa7f56d3
[cephqe9][WARNIN] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/2a1d237c-5caf-4eda-8213-cb64aa7f56d3

[cephqe9][WARNIN] DEBUG:ceph-disk:Creating osd partition on /dev/sdi
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:bc8615dc-072e-4395-b80a-770add113d2c --typecode=1:89c57f98-2f
e5-4dc0-89c1-f3ad0ceff2be -- /dev/sdi
[cephqe9][DEBUG ] Warning: The kernel is still using the old partition table.
[cephqe9][DEBUG ] The new table will be used at the next reboot.
[cephqe9][DEBUG ] The operation has completed successfully.
[cephqe9][WARNIN] INFO:ceph-disk:calling partx on created device /dev/sdi
[cephqe9][WARNIN] INFO:ceph-disk:re-reading known partitions will display errors
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/partx -a /dev/sdi
[cephqe9][WARNIN] partx: /dev/sdi: error adding partitions 1-2
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
[cephqe9][WARNIN] DEBUG:ceph-disk:Creating xfs fs on /dev/sdi1
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdi1
[cephqe9][WARNIN] mkfs.xfs: /dev/sdi1 contains a mounted filesystem
[cephqe9][WARNIN] Usage: mkfs.xfs
[cephqe9][WARNIN] /* blocksize */               [-b log=n|size=num]
[cephqe9][WARNIN] /* metadata */                [-m crc=0|1,finobt=0|1]
[cephqe9][WARNIN] /* data subvol */     [-d agcount=n,agsize=n,file,name=xxx,size=num,
[cephqe9][WARNIN]                           (sunit=value,swidth=value|su=num,sw=num|noalign),
[cephqe9][WARNIN]                           sectlog=n|sectsize=num
[cephqe9][WARNIN] /* force overwrite */ [-f]
[cephqe9][WARNIN] /* inode size */      [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2,
[cephqe9][WARNIN]                           projid32bit=0|1]
[cephqe9][WARNIN] /* no discard */      [-K]
[cephqe9][WARNIN] /* log subvol */      [-l agnum=n,internal,size=num,logdev=xxx,version=n
[cephqe9][WARNIN]                           sunit=value|su=num,sectlog=n|sectsize=num,
[cephqe9][WARNIN]                           lazy-count=0|1]
[cephqe9][WARNIN] /* label */           [-L label (maximum 12 characters)]
[cephqe9][WARNIN] /* naming */          [-n log=n|size=num,version=2|ci,ftype=0|1]
[cephqe9][WARNIN] /* no-op info only */ [-N]
[cephqe9][WARNIN] /* prototype file */  [-p fname]
[cephqe9][WARNIN] /* quiet */           [-q]
[cephqe9][WARNIN] /* realtime subvol */ [-r extsize=num,size=num,rtdev=xxx]
[cephqe9][WARNIN] /* sectorsize */      [-s log=n|size=num]
[cephqe9][WARNIN] /* version */         [-V]
[cephqe9][WARNIN]                       devicename
[cephqe9][WARNIN] <devicename> is required unless -d name=xxx is given.
[cephqe9][WARNIN] <num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB),
[cephqe9][WARNIN]       xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB).
[cephqe9][WARNIN] <value> is xxx (512 byte blocks).
[cephqe9][WARNIN] ceph-disk: Error: Command '['/sbin/mkfs', '-t', 'xfs', '-f', '-i', 'size=2048', '--', '/dev/sdi1']' returned non-zero exit status 1
[cephqe9][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdi
[ceph_deploy][ERROR ] GenericError: Failed to create 2 OSDs

-------------------------------------------------------------------------------

Comment 3 Ken Dreyer (Red Hat) 2016-01-21 15:38:40 UTC
Loic, is this an issue in ceph-disk?

Comment 5 Loic Dachary 2016-01-22 02:01:18 UTC
This is mostly likely a consequence of http://tracker.ceph.com/issues/14099

Comment 6 Ken Dreyer (Red Hat) 2016-01-26 02:30:47 UTC
http://tracker.ceph.com/issues/14099 is not fixed upstream - re-targeting to RHCS 1.3.3

Comment 8 Loic Dachary 2016-01-28 15:02:35 UTC
@Ken, backporting is behind because the test infrastructure was disturbed in the past few weeks.

Comment 9 Federico Lucifredi 2016-02-03 16:46:46 UTC
Needinfo Ken — if we can do this in 1.3.2 he will let us know, at this time I am not re-targeting from 1.3.3 unless Dev gives us the go-ahead.

Comment 10 Ken Dreyer (Red Hat) 2016-02-03 18:12:46 UTC
Loic helped me understand that in Infernalis/Jewel, there was a large refactor of the ceph-disk workflow and how it relates to the init system and udev. So things are going to get more stable in RHCS 2.0. But we can't easily cherry-pick this work to RHCS 1.3.

In the mean time, for Hammer, it's possible that we can document a workaround for users who hit ceph-disk issues. Discussion ongoing upstream: http://www.spinics.net/lists/ceph-devel/msg28384.html

Comment 11 Loic Dachary 2016-02-04 10:20:21 UTC
@Ken right !

The safe way to address this problem is to remove the udev rules and manually run ceph-disk prepare for the desired disks. It won't activate them but it should be free of undesired interferences / races. Once the disks are prepared, a call to ceph-disk activate-all can be added to /etc/rc.local or something similar. It will sequentially activate all prepared disks and, again, will be safe of any udev / initsystem interferences.

Some stability fixes are backported and having them will help in some cases, but they are no cure for a more general problem. The infernalis release fixed all this, fortunately ;-)

Comment 12 Federico Lucifredi 2016-02-16 20:58:34 UTC
This is presumed to be racy-ness in ceph-disk, not a regression. 

It should be documented in the release notes nonetheless.

Comment 14 Tanay Ganguly 2016-02-18 17:00:46 UTC
Hi Bara,

As per doc text 

. Manually remove the `udev` rules by running the following command as `root`:

+
----
# rm /usr/lib/path/to/udev.rule

I am unable to find the udev.rule file, do you mean /usr/lib/udev/rules.d/*

If yes, which specific file we need to remove.
Is it /usr/lib/udev/rules.d/50-udev-default.rules ?

Comment 16 Loic Dachary 2016-02-19 11:46:42 UTC
rm /usr/lib/udev/rules.d/95-ceph-osd.rules

Is the file that needs to be removed

Comment 17 Tanay Ganguly 2016-02-19 11:53:14 UTC
Bara,

I followed the Doc test mentioned, and it is working fine,
I am not seeing any Error while Preparing the SSD.


NOTE: I just tried it once.

Comment 18 Federico Lucifredi 2016-03-21 23:47:36 UTC

*** This bug has been marked as a duplicate of bug 1300703 ***


Note You need to log in before you can comment on or make changes to this bug.