Bug 1326740

Summary:	ceph-disk@dev-sd<>2.service is created on all OSD nodes, and its in failed state
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Tejas <tchandra>
Component:	Ceph-Disk	Assignee:	Boris Ranto <branto>
Status:	CLOSED ERRATA	QA Contact:	Tejas <tchandra>
Severity:	medium	Docs Contact:	Bara Ancincova <bancinco>
Priority:	high
Version:	2.0	CC:	adeza, branto, ceph-eng-bugs, hnallurv, icolle, kdreyer, ldachary, mbukatov, nthomas, sankarshan, tchandra, uboppana, vakulkar, vashastr
Target Milestone:	rc
Target Release:	2.1
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:	RHEL: ceph-10.2.3-2.el7cp Ubuntu: ceph_10.2.3-3redhat1xenial	Doc Type:	Bug Fix
Doc Text:	.The `ceph-disk` service no longer fails because of limited resources After installation of a new Ceph storage cluster, failed instances of the `ceph-disk` service appeared because the service was started twice: once to activate the data partition, and once to activate the journal partition. After the disk activation, one of these instances failed because of limited resources. With this update, the instance without the resources terminates with the `0` exit status and an informative message is returned.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-11-22 19:25:15 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1322504, 1383917

Description Tejas 2016-04-13 11:59:27 UTC

Description of problem:
on all OSD nodes in the cluster, we can see a ceph-disk@<dev>2.service
this is seen in the failed state
this partition is where the journal is colocated on the disk.

Version-Release number of selected component (if applicable):
ceph10.1.1

How reproducible:
Always

Steps to Reproduce:
1. Install a ceph cluster
2. check the service status on the OSD nodes

Actual results:
a failed service is seen for every correct OSD service entry

Expected results:
Is this service required? if so why is it in a failed state?

Additional info:

[root@magna080 ~]# systemctl status ceph*
● ceph-osd - Ceph object storage daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2016-04-13 11:03:13 UTC; 8min ago
 Main PID: 11290 (ceph-osd)
   CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd
           └─11290 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph

Apr 13 11:03:12 magna080 systemd[1]: Starting Ceph object storage daemon...
Apr 13 11:03:12 magna080 ceph-osd-prestart.sh[11204]: getopt: unrecognized option '--setuser'
Apr 13 11:03:12 magna080 ceph-osd-prestart.sh[11204]: getopt: unrecognized option '--setgroup'
Apr 13 11:03:13 magna080 ceph-osd-prestart.sh[11204]: create-or-move updating item name 'osd.5' weight 0.8995 at location {host=magna080,root=default} to crush map
Apr 13 11:03:13 magna080 systemd[1]: Started Ceph object storage daemon.
Apr 13 11:03:13 magna080 ceph-osd[11290]: starting osd.5 at :/0 osd_data /var/lib/ceph/osd/ceph-5 /var/lib/ceph/osd/ceph-5/journal
Apr 13 11:03:14 magna080 ceph-osd[11290]: 2016-04-13 11:03:14.190475 7f3e6ae9d800 -1 osd.5 0 log_to_monitors {default=true}
Apr 13 11:03:14 magna080 systemd[1]: Started Ceph object storage daemon.
Apr 13 11:03:15 magna080 systemd[1]: Started Ceph object storage daemon.
Apr 13 11:05:26 magna080 systemd[1]: Started Ceph object storage daemon.

● ceph-disk - Ceph disk activation: /dev/sdd2
   Loaded: loaded (/usr/lib/systemd/system/ceph-disk@.service; static; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2016-04-13 11:04:16 UTC; 7min ago
 Main PID: 11815

Apr 13 11:04:17 magna080 sh[11815]: ceph_disk.main.Error: Error: return code 1
Apr 13 11:04:21 magna080 systemd[1]: start request repeated too quickly for ceph-disk
Apr 13 11:04:21 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdd2.
Apr 13 11:04:21 magna080 systemd[1]: ceph-disk failed.
Apr 13 11:04:23 magna080 systemd[1]: start request repeated too quickly for ceph-disk
Apr 13 11:04:23 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdd2.
Apr 13 11:04:23 magna080 systemd[1]: ceph-disk failed.
Apr 13 11:04:23 magna080 systemd[1]: start request repeated too quickly for ceph-disk
Apr 13 11:04:23 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdd2.
Apr 13 11:04:23 magna080 systemd[1]: ceph-disk failed.

● ceph-disk - Ceph disk activation: /dev/sdc2
   Loaded: loaded (/usr/lib/systemd/system/ceph-disk@.service; static; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2016-04-13 11:03:05 UTC; 8min ago
 Main PID: 10694 (code=exited, status=1/FAILURE)

Apr 13 11:03:08 magna080 sh[10283]: ceph_disk.main.Error: Error: return code 1
Apr 13 11:03:08 magna080 systemd[1]: start request repeated too quickly for ceph-disk
Apr 13 11:03:08 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdc2.
Apr 13 11:03:08 magna080 systemd[1]: ceph-disk failed.
Apr 13 11:03:09 magna080 systemd[1]: start request repeated too quickly for ceph-disk
Apr 13 11:03:09 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdc2.
Apr 13 11:03:09 magna080 systemd[1]: ceph-disk failed.
Apr 13 11:03:10 magna080 systemd[1]: start request repeated too quickly for ceph-disk
Apr 13 11:03:10 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdc2.
Apr 13 11:03:10 magna080 systemd[1]: ceph-disk failed.

● ceph-osd - Ceph object storage daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2016-04-13 11:03:00 UTC; 8min ago
 Main PID: 10325 (ceph-osd)
   CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd
           └─10325 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph

Apr 13 11:02:58 magna080 systemd[1]: Starting Ceph object storage daemon...
Apr 13 11:02:58 magna080 ceph-osd-prestart.sh[10230]: getopt: unrecognized option '--setuser'
Apr 13 11:02:58 magna080 ceph-osd-prestart.sh[10230]: getopt: unrecognized option '--setgroup'
Apr 13 11:03:00 magna080 ceph-osd-prestart.sh[10230]: create-or-move updating item name 'osd.2' weight 0.8995 at location {host=magna080,root=default} to crush map
Apr 13 11:03:00 magna080 systemd[1]: Started Ceph object storage daemon.
Apr 13 11:03:00 magna080 ceph-osd[10325]: starting osd.2 at :/0 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
Apr 13 11:03:00 magna080 ceph-osd[10325]: 2016-04-13 11:03:00.261144 7fcd57450800 -1 osd.2 0 log_to_monitors {default=true}
Apr 13 11:03:01 magna080 systemd[1]: Started Ceph object storage daemon.
Apr 13 11:05:25 magna080 systemd[1]: Started Ceph object storage daemon.

● ceph-osd - Ceph object storage daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2016-04-13 11:04:26 UTC; 7min ago
 Main PID: 12201 (ceph-osd)
   CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd
           └─12201 /usr/bin/ceph-osd -f --cluster ceph --id 8 --setuser ceph --setgroup ceph

Apr 13 11:04:25 magna080 systemd[1]: Starting Ceph object storage daemon...
Apr 13 11:04:25 magna080 ceph-osd-prestart.sh[12152]: getopt: unrecognized option '--setuser'
Apr 13 11:04:25 magna080 ceph-osd-prestart.sh[12152]: getopt: unrecognized option '--setgroup'
Apr 13 11:04:26 magna080 ceph-osd-prestart.sh[12152]: create-or-move updating item name 'osd.8' weight 0.8995 at location {host=magna080,root=default} to crush map
Apr 13 11:04:26 magna080 systemd[1]: Started Ceph object storage daemon.
Apr 13 11:04:26 magna080 ceph-osd[12201]: starting osd.8 at :/0 osd_data /var/lib/ceph/osd/ceph-8 /var/lib/ceph/osd/ceph-8/journal
Apr 13 11:04:26 magna080 ceph-osd[12201]: 2016-04-13 11:04:26.660755 7fd05be43800 -1 osd.8 0 log_to_monitors {default=true}
Apr 13 11:04:26 magna080 systemd[1]: Started Ceph object storage daemon.
Apr 13 11:04:27 magna080 systemd[1]: Started Ceph object storage daemon.
Apr 13 11:05:26 magna080 systemd[1]: Started Ceph object storage daemon.

● ceph-disk - Ceph disk activation: /dev/sdb2
   Loaded: loaded (/usr/lib/systemd/system/ceph-disk@.service; static; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2016-04-13 10:42:08 UTC; 29min ago
 Main PID: 761 (code=exited, status=1/FAILURE)

Apr 13 10:42:08 magna080 sh[761]: main(sys.argv[1:])
Apr 13 10:42:08 magna080 sh[761]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4915, in main
Apr 13 10:42:08 magna080 sh[761]: args.func(args)
Apr 13 10:42:08 magna080 sh[761]: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4352, in main_trigger
Apr 13 10:42:08 magna080 sh[761]: raise Error('return code ' + str(ret))
Apr 13 10:42:08 magna080 sh[761]: ceph_disk.main.Error: Error: return code 1
Apr 13 10:42:08 magna080 systemd[1]: ceph-disk: main process exited, code=exited, status=1/FAILURE
Apr 13 10:42:08 magna080 systemd[1]: Failed to start Ceph disk activation: /dev/sdb2.
Apr 13 10:42:08 magna080 systemd[1]: Unit ceph-disk entered failed state.
Apr 13 10:42:08 magna080 systemd[1]: ceph-disk failed.
[root@magna080 ~]# 
[root@magna080 ~]# 


[root@magna080 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       917G  2.7G  868G   1% /
devtmpfs         16G     0   16G   0% /dev
tmpfs            16G     0   16G   0% /dev/shm
tmpfs            16G  8.7M   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
tmpfs           3.2G     0  3.2G   0% /run/user/0
/dev/sdb1       922G   35M  922G   1% /var/lib/ceph/osd/ceph-2
/dev/sdc1       922G   34M  922G   1% /var/lib/ceph/osd/ceph-5
/dev/sdd1       922G   34M  922G   1% /var/lib/ceph/osd/ceph-8


[root@magna080 ~]# fdisk -l
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1     20973568   1953525134  921.5G  unknown         ceph data
 2         2048     20973567     10G  unknown         ceph journal
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1     20973568   1953525134  921.5G  unknown         ceph data
 2         2048     20973567     10G  unknown         ceph journal

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x0002b997

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048  1953523711   976760832   83  Linux
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: gpt


#         Start          End    Size  Type            Name
 1     20973568   1953525134  921.5G  unknown         ceph data
 2         2048     20973567     10G  unknown         ceph journal

Comment 2 Christina Meno 2016-04-21 17:55:49 UTC

Loic,

Can you comment on what is going on here?

thank you,
G

Comment 5 Loic Dachary 2016-05-02 07:58:26 UTC

Is it still an issue with 10.2.0 ?

Comment 7 Tejas 2016-05-10 16:26:21 UTC

yes, we are still seeing this issue in ceph 10.2.0

Comment 8 Ken Dreyer (Red Hat) 2016-05-10 20:28:11 UTC

Loic, what would cause this?

Comment 9 Loic Dachary 2016-05-12 10:15:39 UTC

> Is this service required? 

Yes.

> if so why is it in a failed state?

Here is what happens at boot time (or after a new OSD is prepared and activated):

Scenario a)

* udev runs the service to activate /dev/sda1 which is the data partition of the OSD but /dev/sdb1 which is the journal partition is not up yet and the service fails
* udev runs the service to activate /dev/sdb1 which is the journal partition of the OSD and since the data partition is already up, it succeeds and the OSD is ready to be used

Scenario b) 

It is the same as Scenario a) except the journal partition shows up first and the data partition after it.

Either way, one of the two services is going to fail because the resources are only partially available (one disk out of two). The more general case is a little more involved but it's not substantially different.

Please let me know if that explanation makes sense to you. I thought at first that it was really failing and looked hard to find the bug causing it :-)

Comment 10 Ian Colle 2016-05-12 18:07:13 UTC

Expected behavior, so moving out of 2.0. However, we should plan on a better way to handle this scenario for the user in a future 2.z release.

Comment 11 Vasu Kulkarni 2016-05-20 21:29:36 UTC

Loic,

on many setups, I consistently see only scenario a) that you mentioned in comment 9, Its on both ubuntu/rhel as well, Is there anything we can do to fix this in 2.0? It looks pretty bad even though display only and makes user think that disk activation has failed

Comment 12 Loic Dachary 2016-05-23 09:19:22 UTC

I created an issue upstream to work on this.

Comment 13 Boris Ranto 2016-05-24 12:53:02 UTC

This could be fixed if we added 1 as an acceptable exit code for ceph-disk command, e.g. adding this line to the service file for ceph-disk:

SuccessExitStatus=1

We'd be just ignoring the error, though.

Comment 14 Harish NV Rao 2016-06-08 08:59:48 UTC

As mentioned by Vasu in comment 11, it gives bad user experience. Request this to be fixed in 2.0. Adding it to "Bug 1343229 - [Tracker] engineering work required for 2.0 GA"

Comment 16 Loic Dachary 2016-06-14 05:49:28 UTC

Boris,

SuccessExitStatus=1 would do the trick. But it would also hide legitimate errors. It's a judgement call.

Cheers

Comment 18 Loic Dachary 2016-06-15 08:29:17 UTC

It looks good to me.

Comment 19 Boris Ranto 2016-06-28 15:07:36 UTC

Upstream PR:

https://github.com/ceph/ceph/pull/9943

We should be able to fix this in the next async errata.

Comment 20 Loic Dachary 2016-09-27 08:15:32 UTC

*** Bug 1346308 has been marked as a duplicate of this bug. ***

Comment 27 errata-xmlrpc 2016-11-22 19:25:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2815.html