Bug 1300617 - Unable to prepare/Activate a SSD disk in Ceph Cluster
Unable to prepare/Activate a SSD disk in Ceph Cluster
Status: CLOSED DUPLICATE of bug 1300703
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS (Show other bugs)
1.3.2
x86_64 Linux
unspecified Severity high
: rc
: 1.3.3
Assigned To: Loic Dachary
ceph-qe-bugs
Bara Ancincova
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-01-21 04:46 EST by Tanay Ganguly
Modified: 2017-07-30 11:10 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
.The "ceph-disk prepare" command fails on SSD disks An attempt to prepare Solid-state Drives (SSDs) by running the `ceph-disk prepare` command fails. To work around this issue, perform the steps below: . Manually remove the `udev` rules by running the following command as `root`: + ---- # rm /usr/lib/udev/rules.d/95-ceph-osd.rules ---- . Prepare the disks: + ---- $ ceph-disk prepare ---- . Add the "ceph-disk activate-all" string to the `/etc/rc.local` file. Run the following command as `root`: + ---- # echo "ceph-disk activate-all" | tee -a /etc/rc.local ---- . Reboot the system or activate the disks by running the following command as `root`: + ---- # ceph-disk activate-all ----
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-03-21 19:47:36 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 14099 None None None 2016-01-21 21:03 EST

  None (edit)
Description Tanay Ganguly 2016-01-21 04:46:53 EST
Description of problem:
Unable to prepare a SSD disk in Ceph Cluster

Version-Release number of selected component (if applicable):
1.3.2
ceph-0.94.5-1.el7cp.x86_64
ceph-common-0.94.5-1.el7cp.x86_64
ceph-osd-0.94.5-1.el7cp.x86_64

RHEL 7.2

How reproducible:
Tried 3 times, after zapping but its failing.

Steps to Reproduce:
1.Zap the Disk which was successful.
2.Tried to prepare the Disk, and its Failing.

Actual results:
Its only failing when i am trying to prepare the SSD drive.
In one of my other machine in the same cluster, the SSD disk was been prepared and also got activated. Although i am using the same package and distros for this Machine

Expected results:


Additional info:
This is happening only in case of SSD Drive.


Console log:
--------------------------------------------------------------------------

[cephqe9][DEBUG ] connection detected need for sudo
[cephqe9][DEBUG ] connected to host: cephqe9
[cephqe9][DEBUG ] detect platform information from remote host
[cephqe9][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: Red Hat Enterprise Linux Server 7.2 Maipo
[ceph_deploy.osd][DEBUG ] Preparing host cephqe9 disk /dev/sdi journal None activate False
[cephqe9][INFO  ] Running command: sudo ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdi
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type
[cephqe9][WARNIN] INFO:ceph-disk:Will colocate journal with data on /dev/sdi
[cephqe9][WARNIN] DEBUG:ceph-disk:Creating journal partition num 2 size 5120 on /dev/sdi
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:5120M --change-name=2:ceph journal --partition-guid=2:2a1d237c-5caf-4eda-8213-cb64aa7f56d3 --typecode=2:45b0969e
-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdi
[cephqe9][DEBUG ] Warning: The kernel is still using the old partition table.
[cephqe9][DEBUG ] The new table will be used at the next reboot.
[cephqe9][DEBUG ] The operation has completed successfully.
[cephqe9][WARNIN] INFO:ceph-disk:calling partx on prepared device /dev/sdi
[cephqe9][WARNIN] INFO:ceph-disk:re-reading known partitions will display errors
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/partx -a /dev/sdi
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
[cephqe9][WARNIN] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/2a1d237c-5caf-4eda-8213-cb64aa7f56d3
[cephqe9][WARNIN] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/2a1d237c-5caf-4eda-8213-cb64aa7f56d3

[cephqe9][WARNIN] DEBUG:ceph-disk:Creating osd partition on /dev/sdi
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:bc8615dc-072e-4395-b80a-770add113d2c --typecode=1:89c57f98-2f
e5-4dc0-89c1-f3ad0ceff2be -- /dev/sdi
[cephqe9][DEBUG ] Warning: The kernel is still using the old partition table.
[cephqe9][DEBUG ] The new table will be used at the next reboot.
[cephqe9][DEBUG ] The operation has completed successfully.
[cephqe9][WARNIN] INFO:ceph-disk:calling partx on created device /dev/sdi
[cephqe9][WARNIN] INFO:ceph-disk:re-reading known partitions will display errors
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/partx -a /dev/sdi
[cephqe9][WARNIN] partx: /dev/sdi: error adding partitions 1-2
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle
[cephqe9][WARNIN] DEBUG:ceph-disk:Creating xfs fs on /dev/sdi1
[cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdi1
[cephqe9][WARNIN] mkfs.xfs: /dev/sdi1 contains a mounted filesystem
[cephqe9][WARNIN] Usage: mkfs.xfs
[cephqe9][WARNIN] /* blocksize */               [-b log=n|size=num]
[cephqe9][WARNIN] /* metadata */                [-m crc=0|1,finobt=0|1]
[cephqe9][WARNIN] /* data subvol */     [-d agcount=n,agsize=n,file,name=xxx,size=num,
[cephqe9][WARNIN]                           (sunit=value,swidth=value|su=num,sw=num|noalign),
[cephqe9][WARNIN]                           sectlog=n|sectsize=num
[cephqe9][WARNIN] /* force overwrite */ [-f]
[cephqe9][WARNIN] /* inode size */      [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2,
[cephqe9][WARNIN]                           projid32bit=0|1]
[cephqe9][WARNIN] /* no discard */      [-K]
[cephqe9][WARNIN] /* log subvol */      [-l agnum=n,internal,size=num,logdev=xxx,version=n
[cephqe9][WARNIN]                           sunit=value|su=num,sectlog=n|sectsize=num,
[cephqe9][WARNIN]                           lazy-count=0|1]
[cephqe9][WARNIN] /* label */           [-L label (maximum 12 characters)]
[cephqe9][WARNIN] /* naming */          [-n log=n|size=num,version=2|ci,ftype=0|1]
[cephqe9][WARNIN] /* no-op info only */ [-N]
[cephqe9][WARNIN] /* prototype file */  [-p fname]
[cephqe9][WARNIN] /* quiet */           [-q]
[cephqe9][WARNIN] /* realtime subvol */ [-r extsize=num,size=num,rtdev=xxx]
[cephqe9][WARNIN] /* sectorsize */      [-s log=n|size=num]
[cephqe9][WARNIN] /* version */         [-V]
[cephqe9][WARNIN]                       devicename
[cephqe9][WARNIN] <devicename> is required unless -d name=xxx is given.
[cephqe9][WARNIN] <num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB),
[cephqe9][WARNIN]       xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB).
[cephqe9][WARNIN] <value> is xxx (512 byte blocks).
[cephqe9][WARNIN] ceph-disk: Error: Command '['/sbin/mkfs', '-t', 'xfs', '-f', '-i', 'size=2048', '--', '/dev/sdi1']' returned non-zero exit status 1
[cephqe9][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdi
[ceph_deploy][ERROR ] GenericError: Failed to create 2 OSDs

-------------------------------------------------------------------------------
Comment 3 Ken Dreyer (Red Hat) 2016-01-21 10:38:40 EST
Loic, is this an issue in ceph-disk?
Comment 5 Loic Dachary 2016-01-21 21:01:18 EST
This is mostly likely a consequence of http://tracker.ceph.com/issues/14099
Comment 6 Ken Dreyer (Red Hat) 2016-01-25 21:30:47 EST
http://tracker.ceph.com/issues/14099 is not fixed upstream - re-targeting to RHCS 1.3.3
Comment 8 Loic Dachary 2016-01-28 10:02:35 EST
@Ken, backporting is behind because the test infrastructure was disturbed in the past few weeks.
Comment 9 Federico Lucifredi 2016-02-03 11:46:46 EST
Needinfo Ken — if we can do this in 1.3.2 he will let us know, at this time I am not re-targeting from 1.3.3 unless Dev gives us the go-ahead.
Comment 10 Ken Dreyer (Red Hat) 2016-02-03 13:12:46 EST
Loic helped me understand that in Infernalis/Jewel, there was a large refactor of the ceph-disk workflow and how it relates to the init system and udev. So things are going to get more stable in RHCS 2.0. But we can't easily cherry-pick this work to RHCS 1.3.

In the mean time, for Hammer, it's possible that we can document a workaround for users who hit ceph-disk issues. Discussion ongoing upstream: http://www.spinics.net/lists/ceph-devel/msg28384.html
Comment 11 Loic Dachary 2016-02-04 05:20:21 EST
@Ken right !

The safe way to address this problem is to remove the udev rules and manually run ceph-disk prepare for the desired disks. It won't activate them but it should be free of undesired interferences / races. Once the disks are prepared, a call to ceph-disk activate-all can be added to /etc/rc.local or something similar. It will sequentially activate all prepared disks and, again, will be safe of any udev / initsystem interferences.

Some stability fixes are backported and having them will help in some cases, but they are no cure for a more general problem. The infernalis release fixed all this, fortunately ;-)
Comment 12 Federico Lucifredi 2016-02-16 15:58:34 EST
This is presumed to be racy-ness in ceph-disk, not a regression. 

It should be documented in the release notes nonetheless.
Comment 14 Tanay Ganguly 2016-02-18 12:00:46 EST
Hi Bara,

As per doc text 

. Manually remove the `udev` rules by running the following command as `root`:

+
----
# rm /usr/lib/path/to/udev.rule

I am unable to find the udev.rule file, do you mean /usr/lib/udev/rules.d/*

If yes, which specific file we need to remove.
Is it /usr/lib/udev/rules.d/50-udev-default.rules ?
Comment 16 Loic Dachary 2016-02-19 06:46:42 EST
rm /usr/lib/udev/rules.d/95-ceph-osd.rules

Is the file that needs to be removed
Comment 17 Tanay Ganguly 2016-02-19 06:53:14 EST
Bara,

I followed the Doc test mentioned, and it is working fine,
I am not seeing any Error while Preparing the SSD.


NOTE: I just tried it once.
Comment 18 Federico Lucifredi 2016-03-21 19:47:36 EDT

*** This bug has been marked as a duplicate of bug 1300703 ***

Note You need to log in before you can comment on or make changes to this bug.