Description of problem: Unable to prepare a SSD disk in Ceph Cluster Version-Release number of selected component (if applicable): 1.3.2 ceph-0.94.5-1.el7cp.x86_64 ceph-common-0.94.5-1.el7cp.x86_64 ceph-osd-0.94.5-1.el7cp.x86_64 RHEL 7.2 How reproducible: Tried 3 times, after zapping but its failing. Steps to Reproduce: 1.Zap the Disk which was successful. 2.Tried to prepare the Disk, and its Failing. Actual results: Its only failing when i am trying to prepare the SSD drive. In one of my other machine in the same cluster, the SSD disk was been prepared and also got activated. Although i am using the same package and distros for this Machine Expected results: Additional info: This is happening only in case of SSD Drive. Console log: -------------------------------------------------------------------------- [cephqe9][DEBUG ] connection detected need for sudo [cephqe9][DEBUG ] connected to host: cephqe9 [cephqe9][DEBUG ] detect platform information from remote host [cephqe9][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Red Hat Enterprise Linux Server 7.2 Maipo [ceph_deploy.osd][DEBUG ] Preparing host cephqe9 disk /dev/sdi journal None activate False [cephqe9][INFO ] Running command: sudo ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdi [cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid [cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs [cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs [cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs [cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs [cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size [cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters [cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size [cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type [cephqe9][WARNIN] INFO:ceph-disk:Will colocate journal with data on /dev/sdi [cephqe9][WARNIN] DEBUG:ceph-disk:Creating journal partition num 2 size 5120 on /dev/sdi [cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:5120M --change-name=2:ceph journal --partition-guid=2:2a1d237c-5caf-4eda-8213-cb64aa7f56d3 --typecode=2:45b0969e -9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdi [cephqe9][DEBUG ] Warning: The kernel is still using the old partition table. [cephqe9][DEBUG ] The new table will be used at the next reboot. [cephqe9][DEBUG ] The operation has completed successfully. [cephqe9][WARNIN] INFO:ceph-disk:calling partx on prepared device /dev/sdi [cephqe9][WARNIN] INFO:ceph-disk:re-reading known partitions will display errors [cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/partx -a /dev/sdi [cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle [cephqe9][WARNIN] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/2a1d237c-5caf-4eda-8213-cb64aa7f56d3 [cephqe9][WARNIN] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/2a1d237c-5caf-4eda-8213-cb64aa7f56d3 [cephqe9][WARNIN] DEBUG:ceph-disk:Creating osd partition on /dev/sdi [cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:bc8615dc-072e-4395-b80a-770add113d2c --typecode=1:89c57f98-2f e5-4dc0-89c1-f3ad0ceff2be -- /dev/sdi [cephqe9][DEBUG ] Warning: The kernel is still using the old partition table. [cephqe9][DEBUG ] The new table will be used at the next reboot. [cephqe9][DEBUG ] The operation has completed successfully. [cephqe9][WARNIN] INFO:ceph-disk:calling partx on created device /dev/sdi [cephqe9][WARNIN] INFO:ceph-disk:re-reading known partitions will display errors [cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/partx -a /dev/sdi [cephqe9][WARNIN] partx: /dev/sdi: error adding partitions 1-2 [cephqe9][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle [cephqe9][WARNIN] DEBUG:ceph-disk:Creating xfs fs on /dev/sdi1 [cephqe9][WARNIN] INFO:ceph-disk:Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdi1 [cephqe9][WARNIN] mkfs.xfs: /dev/sdi1 contains a mounted filesystem [cephqe9][WARNIN] Usage: mkfs.xfs [cephqe9][WARNIN] /* blocksize */ [-b log=n|size=num] [cephqe9][WARNIN] /* metadata */ [-m crc=0|1,finobt=0|1] [cephqe9][WARNIN] /* data subvol */ [-d agcount=n,agsize=n,file,name=xxx,size=num, [cephqe9][WARNIN] (sunit=value,swidth=value|su=num,sw=num|noalign), [cephqe9][WARNIN] sectlog=n|sectsize=num [cephqe9][WARNIN] /* force overwrite */ [-f] [cephqe9][WARNIN] /* inode size */ [-i log=n|perblock=n|size=num,maxpct=n,attr=0|1|2, [cephqe9][WARNIN] projid32bit=0|1] [cephqe9][WARNIN] /* no discard */ [-K] [cephqe9][WARNIN] /* log subvol */ [-l agnum=n,internal,size=num,logdev=xxx,version=n [cephqe9][WARNIN] sunit=value|su=num,sectlog=n|sectsize=num, [cephqe9][WARNIN] lazy-count=0|1] [cephqe9][WARNIN] /* label */ [-L label (maximum 12 characters)] [cephqe9][WARNIN] /* naming */ [-n log=n|size=num,version=2|ci,ftype=0|1] [cephqe9][WARNIN] /* no-op info only */ [-N] [cephqe9][WARNIN] /* prototype file */ [-p fname] [cephqe9][WARNIN] /* quiet */ [-q] [cephqe9][WARNIN] /* realtime subvol */ [-r extsize=num,size=num,rtdev=xxx] [cephqe9][WARNIN] /* sectorsize */ [-s log=n|size=num] [cephqe9][WARNIN] /* version */ [-V] [cephqe9][WARNIN] devicename [cephqe9][WARNIN] <devicename> is required unless -d name=xxx is given. [cephqe9][WARNIN] <num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB), [cephqe9][WARNIN] xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB). [cephqe9][WARNIN] <value> is xxx (512 byte blocks). [cephqe9][WARNIN] ceph-disk: Error: Command '['/sbin/mkfs', '-t', 'xfs', '-f', '-i', 'size=2048', '--', '/dev/sdi1']' returned non-zero exit status 1 [cephqe9][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdi [ceph_deploy][ERROR ] GenericError: Failed to create 2 OSDs -------------------------------------------------------------------------------
Loic, is this an issue in ceph-disk?
This is mostly likely a consequence of http://tracker.ceph.com/issues/14099
http://tracker.ceph.com/issues/14099 is not fixed upstream - re-targeting to RHCS 1.3.3
@Ken, backporting is behind because the test infrastructure was disturbed in the past few weeks.
Needinfo Ken — if we can do this in 1.3.2 he will let us know, at this time I am not re-targeting from 1.3.3 unless Dev gives us the go-ahead.
Loic helped me understand that in Infernalis/Jewel, there was a large refactor of the ceph-disk workflow and how it relates to the init system and udev. So things are going to get more stable in RHCS 2.0. But we can't easily cherry-pick this work to RHCS 1.3. In the mean time, for Hammer, it's possible that we can document a workaround for users who hit ceph-disk issues. Discussion ongoing upstream: http://www.spinics.net/lists/ceph-devel/msg28384.html
@Ken right ! The safe way to address this problem is to remove the udev rules and manually run ceph-disk prepare for the desired disks. It won't activate them but it should be free of undesired interferences / races. Once the disks are prepared, a call to ceph-disk activate-all can be added to /etc/rc.local or something similar. It will sequentially activate all prepared disks and, again, will be safe of any udev / initsystem interferences. Some stability fixes are backported and having them will help in some cases, but they are no cure for a more general problem. The infernalis release fixed all this, fortunately ;-)
This is presumed to be racy-ness in ceph-disk, not a regression. It should be documented in the release notes nonetheless.
Hi Bara, As per doc text . Manually remove the `udev` rules by running the following command as `root`: + ---- # rm /usr/lib/path/to/udev.rule I am unable to find the udev.rule file, do you mean /usr/lib/udev/rules.d/* If yes, which specific file we need to remove. Is it /usr/lib/udev/rules.d/50-udev-default.rules ?
rm /usr/lib/udev/rules.d/95-ceph-osd.rules Is the file that needs to be removed
Bara, I followed the Doc test mentioned, and it is working fine, I am not seeing any Error while Preparing the SSD. NOTE: I just tried it once.
*** This bug has been marked as a duplicate of bug 1300703 ***