Description of problem: on all OSD nodes in the cluster, we can see a ceph-disk@<dev>2.service this is seen in the failed state this partition is where the journal is colocated on the disk. Version-Release number of selected component (if applicable): ceph10.1.1 How reproducible: Always Steps to Reproduce: 1. Install a ceph cluster 2. check the service status on the OSD nodes Actual results: a failed service is seen for every correct OSD service entry Expected results: Is this service required? if so why is it in a failed state? Additional info: [root@magna080 ~]# systemctl status ceph* ● ceph-osd - Ceph object storage daemon Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2016-04-13 11:03:13 UTC; 8min ago Main PID: 11290 (ceph-osd) CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd └─11290 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph Apr 13 11:03:12 magna080 systemd[1]: Starting Ceph object storage daemon... Apr 13 11:03:12 magna080 ceph-osd-prestart.sh[11204]: getopt: unrecognized option '--setuser' Apr 13 11:03:12 magna080 ceph-osd-prestart.sh[11204]: getopt: unrecognized option '--setgroup' Apr 13 11:03:13 magna080 ceph-osd-prestart.sh[11204]: create-or-move updating item name 'osd.5' weight 0.8995 at location {host=magna080,root=default} to crush map Apr 13 11:03:13 magna080 systemd[1]: Started Ceph object storage daemon. Apr 13 11:03:13 magna080 ceph-osd[11290]: starting osd.5 at :/0 osd_data /var/lib/ceph/osd/ceph-5 /var/lib/ceph/osd/ceph-5/journal Apr 13 11:03:14 magna080 ceph-osd[11290]: 2016-04-13 11:03:14.190475 7f3e6ae9d800 -1 osd.5 0 log_to_monitors {default=true} Apr 13 11:03:14 magna080 systemd[1]: Started Ceph object storage daemon. Apr 13 11:03:15 magna080 systemd[1]: Started Ceph object storage daemon. Apr 13 11:05:26 magna080 systemd[1]: Started Ceph object storage daemon. ● ceph-disk - Ceph disk activation: /dev/sdd2 Loaded: loaded (/usr/lib/systemd/system/ceph-disk@.service; static; vendor preset: disabled) Active: failed (Result: start-limit) since Wed 2016-04-13 11:04:16 UTC; 7min ago Main PID: 11815 Apr 13 11:04:17 magna080 sh[11815]: ceph_disk.main.Error: Error: return code 1 Apr 13 11:04:21 magna080 systemd[1]: start request repeated too quickly for ceph-disk Apr 13 11:04:21 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdd2. Apr 13 11:04:21 magna080 systemd[1]: ceph-disk failed. Apr 13 11:04:23 magna080 systemd[1]: start request repeated too quickly for ceph-disk Apr 13 11:04:23 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdd2. Apr 13 11:04:23 magna080 systemd[1]: ceph-disk failed. Apr 13 11:04:23 magna080 systemd[1]: start request repeated too quickly for ceph-disk Apr 13 11:04:23 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdd2. Apr 13 11:04:23 magna080 systemd[1]: ceph-disk failed. ● ceph-disk - Ceph disk activation: /dev/sdc2 Loaded: loaded (/usr/lib/systemd/system/ceph-disk@.service; static; vendor preset: disabled) Active: failed (Result: start-limit) since Wed 2016-04-13 11:03:05 UTC; 8min ago Main PID: 10694 (code=exited, status=1/FAILURE) Apr 13 11:03:08 magna080 sh[10283]: ceph_disk.main.Error: Error: return code 1 Apr 13 11:03:08 magna080 systemd[1]: start request repeated too quickly for ceph-disk Apr 13 11:03:08 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdc2. Apr 13 11:03:08 magna080 systemd[1]: ceph-disk failed. Apr 13 11:03:09 magna080 systemd[1]: start request repeated too quickly for ceph-disk Apr 13 11:03:09 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdc2. Apr 13 11:03:09 magna080 systemd[1]: ceph-disk failed. Apr 13 11:03:10 magna080 systemd[1]: start request repeated too quickly for ceph-disk Apr 13 11:03:10 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdc2. Apr 13 11:03:10 magna080 systemd[1]: ceph-disk failed. ● ceph-osd - Ceph object storage daemon Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2016-04-13 11:03:00 UTC; 8min ago Main PID: 10325 (ceph-osd) CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd └─10325 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph Apr 13 11:02:58 magna080 systemd[1]: Starting Ceph object storage daemon... Apr 13 11:02:58 magna080 ceph-osd-prestart.sh[10230]: getopt: unrecognized option '--setuser' Apr 13 11:02:58 magna080 ceph-osd-prestart.sh[10230]: getopt: unrecognized option '--setgroup' Apr 13 11:03:00 magna080 ceph-osd-prestart.sh[10230]: create-or-move updating item name 'osd.2' weight 0.8995 at location {host=magna080,root=default} to crush map Apr 13 11:03:00 magna080 systemd[1]: Started Ceph object storage daemon. Apr 13 11:03:00 magna080 ceph-osd[10325]: starting osd.2 at :/0 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal Apr 13 11:03:00 magna080 ceph-osd[10325]: 2016-04-13 11:03:00.261144 7fcd57450800 -1 osd.2 0 log_to_monitors {default=true} Apr 13 11:03:01 magna080 systemd[1]: Started Ceph object storage daemon. Apr 13 11:05:25 magna080 systemd[1]: Started Ceph object storage daemon. ● ceph-osd - Ceph object storage daemon Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled) Active: active (running) since Wed 2016-04-13 11:04:26 UTC; 7min ago Main PID: 12201 (ceph-osd) CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd └─12201 /usr/bin/ceph-osd -f --cluster ceph --id 8 --setuser ceph --setgroup ceph Apr 13 11:04:25 magna080 systemd[1]: Starting Ceph object storage daemon... Apr 13 11:04:25 magna080 ceph-osd-prestart.sh[12152]: getopt: unrecognized option '--setuser' Apr 13 11:04:25 magna080 ceph-osd-prestart.sh[12152]: getopt: unrecognized option '--setgroup' Apr 13 11:04:26 magna080 ceph-osd-prestart.sh[12152]: create-or-move updating item name 'osd.8' weight 0.8995 at location {host=magna080,root=default} to crush map Apr 13 11:04:26 magna080 systemd[1]: Started Ceph object storage daemon. Apr 13 11:04:26 magna080 ceph-osd[12201]: starting osd.8 at :/0 osd_data /var/lib/ceph/osd/ceph-8 /var/lib/ceph/osd/ceph-8/journal Apr 13 11:04:26 magna080 ceph-osd[12201]: 2016-04-13 11:04:26.660755 7fd05be43800 -1 osd.8 0 log_to_monitors {default=true} Apr 13 11:04:26 magna080 systemd[1]: Started Ceph object storage daemon. Apr 13 11:04:27 magna080 systemd[1]: Started Ceph object storage daemon. Apr 13 11:05:26 magna080 systemd[1]: Started Ceph object storage daemon. ● ceph-disk - Ceph disk activation: /dev/sdb2 Loaded: loaded (/usr/lib/systemd/system/ceph-disk@.service; static; vendor preset: disabled) Active: failed (Result: exit-code) since Wed 2016-04-13 10:42:08 UTC; 29min ago Main PID: 761 (code=exited, status=1/FAILURE) Apr 13 10:42:08 magna080 sh[761]: main(sys.argv[1:]) Apr 13 10:42:08 magna080 sh[761]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4915, in main Apr 13 10:42:08 magna080 sh[761]: args.func(args) Apr 13 10:42:08 magna080 sh[761]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4352, in main_trigger Apr 13 10:42:08 magna080 sh[761]: raise Error('return code ' + str(ret)) Apr 13 10:42:08 magna080 sh[761]: ceph_disk.main.Error: Error: return code 1 Apr 13 10:42:08 magna080 systemd[1]: ceph-disk: main process exited, code=exited, status=1/FAILURE Apr 13 10:42:08 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdb2. Apr 13 10:42:08 magna080 systemd[1]: Unit ceph-disk entered failed state. Apr 13 10:42:08 magna080 systemd[1]: ceph-disk failed. [root@magna080 ~]# [root@magna080 ~]# [root@magna080 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 917G 2.7G 868G 1% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 0 16G 0% /dev/shm tmpfs 16G 8.7M 16G 1% /run tmpfs 16G 0 16G 0% /sys/fs/cgroup tmpfs 3.2G 0 3.2G 0% /run/user/0 /dev/sdb1 922G 35M 922G 1% /var/lib/ceph/osd/ceph-2 /dev/sdc1 922G 34M 922G 1% /var/lib/ceph/osd/ceph-5 /dev/sdd1 922G 34M 922G 1% /var/lib/ceph/osd/ceph-8 [root@magna080 ~]# fdisk -l WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion. Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: gpt # Start End Size Type Name 1 20973568 1953525134 921.5G unknown ceph data 2 2048 20973567 10G unknown ceph journal WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion. Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: gpt # Start End Size Type Name 1 20973568 1953525134 921.5G unknown ceph data 2 2048 20973567 10G unknown ceph journal Disk /dev/sda: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: dos Disk identifier: 0x0002b997 Device Boot Start End Blocks Id System /dev/sda1 * 2048 1953523711 976760832 83 Linux WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion. Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk label type: gpt # Start End Size Type Name 1 20973568 1953525134 921.5G unknown ceph data 2 2048 20973567 10G unknown ceph journal
Loic, Can you comment on what is going on here? thank you, G
Is it still an issue with 10.2.0 ?
yes, we are still seeing this issue in ceph 10.2.0
Loic, what would cause this?
> Is this service required? Yes. > if so why is it in a failed state? Here is what happens at boot time (or after a new OSD is prepared and activated): Scenario a) * udev runs the service to activate /dev/sda1 which is the data partition of the OSD but /dev/sdb1 which is the journal partition is not up yet and the service fails * udev runs the service to activate /dev/sdb1 which is the journal partition of the OSD and since the data partition is already up, it succeeds and the OSD is ready to be used Scenario b) It is the same as Scenario a) except the journal partition shows up first and the data partition after it. Either way, one of the two services is going to fail because the resources are only partially available (one disk out of two). The more general case is a little more involved but it's not substantially different. Please let me know if that explanation makes sense to you. I thought at first that it was really failing and looked hard to find the bug causing it :-)
Expected behavior, so moving out of 2.0. However, we should plan on a better way to handle this scenario for the user in a future 2.z release.
Loic, on many setups, I consistently see only scenario a) that you mentioned in comment 9, Its on both ubuntu/rhel as well, Is there anything we can do to fix this in 2.0? It looks pretty bad even though display only and makes user think that disk activation has failed
I created an issue upstream to work on this.
This could be fixed if we added 1 as an acceptable exit code for ceph-disk command, e.g. adding this line to the service file for ceph-disk: SuccessExitStatus=1 We'd be just ignoring the error, though.
As mentioned by Vasu in comment 11, it gives bad user experience. Request this to be fixed in 2.0. Adding it to "Bug 1343229 - [Tracker] engineering work required for 2.0 GA"
Boris, SuccessExitStatus=1 would do the trick. But it would also hide legitimate errors. It's a judgement call. Cheers
It looks good to me.
Upstream PR: https://github.com/ceph/ceph/pull/9943 We should be able to fix this in the next async errata.
*** Bug 1346308 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2016-2815.html