Hide Forgot
Description of problem: "ceph-deploy disk activate" command fails to work every time. I think it's because ceph-disk prepare did not initialize the SSD journal. It should have. Version-Release number of selected component (if applicable): RHCS 1.3.1, RHEL 7.2 GA ceph-common-0.94.3-3.el7cp.x86_64 ceph-deploy-1.5.27.3-1.el7cp.noarch How reproducible: Not sure yet. Steps to Reproduce: 1. bring up ceph cluster using RHCS documentation at: https://access.redhat.com/documentation/en/red-hat-ceph-storage/ right up to point where it's timeto deploy OSDs 2. zap all the disks using something like: devices="b c d e f g h i j k l m n o p q r s t u v w x y" for d in $devices ; do ceph-deploy disk zap hp60ds{1,2,3,4}:/dev/sd$device ; done 3. try to deploy OSDs using a bash script like this: k=1 devices="d e f g h i j k l m n o p q r s t u v w x y" for device in $devices do ((k=$k+1)) ceph-deploy disk prepare hp60ds{1,2,3,4}:/dev/sd${device}:/dev/nvme0n1p$k ceph-deploy disk activate hp60ds{1,2,3,4}:/dev/sd${device}1:/dev/nvme0n1p$k done Actual results: "ceph-deploy disk activate" commands fail sometimes, with errors like at bottom. But "ceph-deploy disk prepare" commands seem to succeed. Haven't yet determined if they really did succeed. So it appears that the problem is that the ceph-disk activate command fails. Expected results: either all OSDs work or some clear error message about why they didn't work. Additional info: I did get some complaints about how ceph.conf was different on OSD hosts, said to use --override-conf option. The reason for this is that I added this: osd crush location hook = /usr/bin/calamari-crush-location as a result of reading this: https://access.redhat.com/documentation/en/red-hat-ceph-storage/1.3/installation-guide-for-rhel-x86-64/chapter-13-adjust-crush-tunables + ceph-deploy disk activate hp60ds1:/dev/sdd1:/dev/nvme0n1p2 hp60ds2:/dev/sdd1:/dev/nvme0n1p2 hp60ds3:/dev/sdd1:/dev/nvme0n1p2 hp60ds4:/dev/sdd1:/dev/nvme0n1p2 [ceph_deploy.conf][DEBUG ] found configuration file at: /root/ceph-config/cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.27.3): /usr/bin/ceph-deploy disk activate hp60ds1:/dev/sdd1:/dev/nvme0n1p2 hp60ds2:/dev/sdd1:/dev/nvme0n1p2 hp60ds3:/dev/sdd1:/dev/nvme0n1p2 hp60ds4:/dev/sdd1:/dev/nvme0n1p2 [ceph_deploy.cli][INFO ] ceph-deploy options: [ceph_deploy.cli][INFO ] username : None [ceph_deploy.cli][INFO ] verbose : False [ceph_deploy.cli][INFO ] overwrite_conf : False [ceph_deploy.cli][INFO ] subcommand : activate [ceph_deploy.cli][INFO ] quiet : False [ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x1081c20> [ceph_deploy.cli][INFO ] cluster : ceph [ceph_deploy.cli][INFO ] func : <function disk at 0x1076f50> [ceph_deploy.cli][INFO ] ceph_conf : None [ceph_deploy.cli][INFO ] default_release : False [ceph_deploy.cli][INFO ] disk : [('hp60ds1', '/dev/sdd1', '/dev/nvme0n1p2'), ('hp60ds2', '/dev/sdd1', '/dev/nvme0n1p2'), ('hp60ds3', '/dev/sdd1', '/dev/nvme0n1p2'), ('hp60ds4', '/dev/sdd1', '/dev/nvme0n1p2')] [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks hp60ds1:/dev/sdd1:/dev/nvme0n1p2 hp60ds2:/dev/sdd1:/dev/nvme0n1p2 hp60ds3:/dev/sdd1:/dev/nvme0n1p2 hp60ds4:/dev/sdd1:/dev/nvme0n1p2 [hp60ds1][DEBUG ] connected to host: hp60ds1 [hp60ds1][DEBUG ] detect platform information from remote host [hp60ds1][DEBUG ] detect machine type [ceph_deploy.osd][INFO ] Distro info: Red Hat Enterprise Linux Client 7.2 Maipo [ceph_deploy.osd][DEBUG ] activating host hp60ds1 disk /dev/sdd1 [ceph_deploy.osd][DEBUG ] will use init type: sysvinit [hp60ds1][INFO ] Running command: ceph-disk -v activate --mark-init sysvinit --mount /dev/sdd1 [hp60ds1][WARNING] INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -ovalue -- /dev/sdd1 [hp60ds1][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs [hp60ds1][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs [hp60ds1][WARNING] DEBUG:ceph-disk:Mounting /dev/sdd1 on /var/lib/ceph/tmp/mnt.yd8Glu with options noatime,inode64 [hp60ds1][WARNING] INFO:ceph-disk:Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/sdd1 /var/lib/ceph/tmp/mnt.yd8Glu [hp60ds1][WARNING] DEBUG:ceph-disk:Cluster uuid is 05ec8c06-7c7d-473a-b49a-fb8bdd631035 [hp60ds1][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid [hp60ds1][WARNING] DEBUG:ceph-disk:Cluster name is ceph [hp60ds1][WARNING] DEBUG:ceph-disk:OSD uuid is c55065cf-8914-46c9-9c54-684da1c768fa [hp60ds1][WARNING] DEBUG:ceph-disk:OSD id is 8 [hp60ds1][WARNING] DEBUG:ceph-disk:Marking with init system sysvinit [hp60ds1][WARNING] DEBUG:ceph-disk:ceph osd.8 data dir is ready at /var/lib/ceph/tmp/mnt.yd8Glu [hp60ds1][WARNING] DEBUG:ceph-disk:Moving mount to final location... [hp60ds1][WARNING] INFO:ceph-disk:Running command: /bin/mount -o noatime,inode64 -- /dev/sdd1 /var/lib/ceph/osd/ceph-8 [hp60ds1][WARNING] INFO:ceph-disk:Running command: /bin/umount -l -- /var/lib/ceph/tmp/mnt.yd8Glu [hp60ds1][WARNING] DEBUG:ceph-disk:Starting ceph osd.8... [hp60ds1][WARNING] INFO:ceph-disk:Running command: /usr/sbin/service ceph --cluster ceph start osd.8 [hp60ds1][DEBUG ] === osd.8 === [hp60ds1][WARNING] ERROR:calamari_osd_location:Error 2 running ceph config-key get:'libust[93887/93887]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) [hp60ds1][WARNING] Error ENOENT: error obtaining 'daemon-private/osd.8/v1/calamari/osd_crush_location': (2) No such file or directory' [hp60ds1][WARNING] ERROR:calamari_osd_location:Error 1 running ceph config-key get:'2016-01-05 09:39:30.596497 7f8138802700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication [hp60ds1][WARNING] 2016-01-05 09:39:30.596500 7f8138802700 0 librados: client.admin initialization error (2) No such file or directory [hp60ds1][WARNING] Error connecting to cluster: ObjectNotFound' [hp60ds1][WARNING] libust[93955/93955]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) [hp60ds1][WARNING] create-or-move updated item name 'osd.8' weight 2.73 at location {host=hp60ds1} to crush map [hp60ds1][DEBUG ] Starting Ceph osd.8 on hp60ds1... [hp60ds1][WARNING] Running as unit run-93998.service. [hp60ds1][INFO ] checking OSD status... [hp60ds1][INFO ] Running command: ceph --cluster=ceph osd stat --format=json [hp60ds1][INFO ] Running command: systemctl enable ceph [hp60ds1][WARNING] ceph.service is not a native service, redirecting to /sbin/chkconfig. [hp60ds1][WARNING] Executing /sbin/chkconfig ceph on When I go to one of the OSDs and try to directly activate it, I get the same errors. It looks like everything is setup but when I run ceph-osd directly, I get: 2016-01-05 10:14:45.759221 7f524e37f840 1 journal _open /var/lib/ceph/osd/ceph-8/journal fd 18: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1 2016-01-05 10:14:45.759253 7f524e37f840 -1 journal FileJournal::open: ondisk fsid e24f6e2d-17b2-4909-aa43-34f17f64ce67 doesn't match expected c55065cf-8914-46c9-9c54-684da1c768fa, invalid (someone else's?) journal 2016-01-05 10:14:45.759297 7f524e37f840 -1 filestore(/var/lib/ceph/osd/ceph-8) mount failed to open journal /var/lib/ceph/osd/ceph-8/journal: (22) Invalid argument 2016-01-05 10:14:45.766750 7f524e37f840 -1 osd.8 0 OSD:init: unable to mount object store So ceph-deploy disk zap did not zap the journal as well. I guess I have to do that manually. Ughh. Why doesn't ceph-disk prepare do this? [root@hp60ds1 ceph-8]# ceph-osd -i 8 --pid-file /var/run/ceph/osd.8.pid -c /etc/ceph/ceph.conf --cluster ceph --mkjournal -d --debug_osd 10 2016-01-05 10:24:32.644350 7f1f4db4d840 0 ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b), process ceph-osd, pid 108247 SG_IO: questionable sense data, results may be incorrect 2016-01-05 10:24:32.670894 7f1f4db4d840 1 journal _open /var/lib/ceph/osd/ceph-8/journal fd 6: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1 2016-01-05 10:24:32.670940 7f1f4db4d840 -1 journal check: ondisk fsid e24f6e2d-17b2-4909-aa43-34f17f64ce67 doesn't match expected c55065cf-8914-46c9-9c54-684da1c768fa, invalid (someone else's?) journal SG_IO: questionable sense data, results may be incorrect 2016-01-05 10:24:32.672832 7f1f4db4d840 1 journal _open /var/lib/ceph/osd/ceph-8/journal fd 6: 10239344640 bytes, block size 4096 bytes, directio = 1, aio = 1 2016-01-05 10:24:32.673088 7f1f4db4d840 0 filestore(/var/lib/ceph/osd/ceph-8) mkjournal created journal on /var/lib/ceph/osd/ceph-8/journal 2016-01-05 10:24:32.673115 7f1f4db4d840 -1 created new journal /var/lib/ceph/osd/ceph-8/journal for object store /var/lib/ceph/osd/ceph-8 Then ceph-disk activate works and the OSD comes online, despite all the errors reported by ceph-disk (ughh). I really doubt the average ceph-deploy user will figure all this out. If they could, they wouldn't be using ceph-deploy would they?
Alfredo, I think we can close this, but I'd like to hear your thoughts
Newest ceph-deploy (2.0.0) is using ceph-volume and none of the symptoms described here (which are really ceph-disk issues) are present. Unsure what the beset category for 'CLOSED' this issue is, since it was more of a ceph-disk problem that is no longer used.