1346308 – ceph-disk crashes during create cluster task using 00000000-0000-0000-0000-000000000000 uuid

Bug 1346308 - ceph-disk crashes during create cluster task using 00000000-0000-0000-0000-000000000000 uuid

Summary: ceph-disk crashes during create cluster task using 00000000-0000-0000-0000-00...

Keywords:
Status:	CLOSED DUPLICATE of bug 1326740
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Disk
Sub Component:
Version:	2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	2.1
Assignee:	Loic Dachary
QA Contact:	ceph-qe-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-06-14 13:50 UTC by Martin Bukatovic
Modified:	2017-07-30 14:57 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-09-27 08:15:32 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
tarball with abrt report directory (1.91 KB, application/x-xz) 2016-06-14 14:07 UTC, Martin Bukatovic	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	15990	0	None	None	None	2016-09-27 07:27:36 UTC

Description Martin Bukatovic 2016-06-14 13:50:30 UTC

Description of problem
======================

During "Create Cluster" task, ceph-disk crashes multiple times on this error:

~~~
FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2
~~~

Information about the crash can be found in /var/log/messages log file and is
also detected and processed by abrt.

That said, skyring seems to be able to recover and "Create Cluster" task
completes with success after all.

Version-Release
===============

On RHSC 2.0 machine:

rhscon-ceph-0.0.20-1.el7scon.x86_64
rhscon-core-0.0.21-1.el7scon.x86_64
rhscon-ui-0.0.34-1.el7scon.noarch
ceph-ansible-1.0.5-19.el7scon.noarch
ceph-installer-1.0.11-1.el7scon.noarch

On storage machine:

rhscon-agent-0.0.9-1.el7scon.noarch
ceph-common-10.2.1-13.el7cp.x86_64

How reproducible
================

100 %

Steps to Reproduce
==================

1. Install RHSC 2.0 following the documentation.
2. Accept few nodes for the ceph cluster.
3. Make sure abrt is installed and configured on all storage machines.
4. Create new ceph cluster named 'alpha'.
5. On all storage machines, check /var/log/messages log file for blkid errors
   related to "zero UUID":

~~~
# grep 00000000-0000-0000-0000-000000000000 /var/log/messages | grep blkid
~~~

6. On all storage machines, list crashes detected by abrt:

~~~
# abrt-cli list
~~~

Actual results
==============

There are few such errors in the /var/log/messages log file, see an example
from one storage machine of my cluster:

~~~
# grep 00000000-0000-0000-0000-000000000000 /var/log/messages | grep blkid
Jun 13 16:11:04 mbukatov-usm1-node1 sh: command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000
Jun 13 16:11:04 mbukatov-usm1-node1 sh: ceph_disk.main.FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2
Jun 13 16:11:05 mbukatov-usm1-node1 sh: command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000
Jun 13 16:11:05 mbukatov-usm1-node1 sh: ceph_disk.main.FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2
~~~

In this example, it crashed 2 times, but the actual number of crashes can vary,
see this stats for the whole cluster:

~~~
$ for i in mbukatov-usm*-node*; do echo -n "$i "; grep -R 00000000-0000-0000-0000-000000000000 $i | grep blkid | grep FilesystemTypeError | wc -l; done
mbukatov-usm1-node1 2
mbukatov-usm1-node2 3
mbukatov-usm1-node3 2
mbukatov-usm1-node4 2
mbukatov-usm2-node1 2
mbukatov-usm2-node2 3
~~~

To add the context, here is larger excerpt from /var/log/messages file:

~~~
Jun 13 16:11:04 mbukatov-usm1-node1 sh: main_trigger:
Jun 13 16:11:04 mbukatov-usm1-node1 sh: main_trigger: get_dm_uuid: get_dm_uuid /dev/vdb1 uuid path is /sys/dev/block/253:17/dm/uuid
Jun 13 16:11:04 mbukatov-usm1-node1 sh: command: Running command: /usr/sbin/blkid -o udev -p /dev/vdb1
Jun 13 16:11:04 mbukatov-usm1-node1 sh: command: Running command: /usr/bin/ceph-osd --get-device-fsid /dev/vdb1
Jun 13 16:11:04 mbukatov-usm1-node1 sh: get_space_osd_uuid: Journal /dev/vdb1 has OSD UUID 00000000-0000-0000-0000-000000000000
Jun 13 16:11:04 mbukatov-usm1-node1 sh: command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000
Jun 13 16:11:04 mbukatov-usm1-node1 sh: Traceback (most recent call last):
Jun 13 16:11:04 mbukatov-usm1-node1 sh: File "/usr/sbin/ceph-disk", line 9, in <module>
Jun 13 16:11:04 mbukatov-usm1-node1 sh: load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
Jun 13 16:11:04 mbukatov-usm1-node1 sh: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4993, in run
Jun 13 16:11:04 mbukatov-usm1-node1 sh: main(sys.argv[1:])
Jun 13 16:11:04 mbukatov-usm1-node1 sh: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4944, in main
Jun 13 16:11:04 mbukatov-usm1-node1 sh: args.func(args)
Jun 13 16:11:04 mbukatov-usm1-node1 sh: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4702, in <lambda>
Jun 13 16:11:04 mbukatov-usm1-node1 sh: func=lambda args: main_activate_space(name, args),
Jun 13 16:11:04 mbukatov-usm1-node1 sh: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3721, in main_activate_space
Jun 13 16:11:04 mbukatov-usm1-node1 sh: reactivate=args.reactivate,
Jun 13 16:11:04 mbukatov-usm1-node1 sh: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3008, in mount_activate
Jun 13 16:11:04 mbukatov-usm1-node1 sh: e,
Jun 13 16:11:04 mbukatov-usm1-node1 sh: ceph_disk.main.FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero 
exit status 2
Jun 13 16:11:04 mbukatov-usm1-node1 python: detected unhandled Python exception in '/usr/sbin/ceph-disk'
Jun 13 16:11:04 mbukatov-usm1-node1 abrt-server: Not saving repeating crash in '/usr/sbin/ceph-disk'
~~~

Also, since abrt detects the crash:

~~~
# abrt-cli list
id dcb84ec438af88af00d317b7b1c1e016e54f9c09
reason:         main.py:3008:mount_activate:FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2
time:           Mon 13 Jun 2016 04:10:59 PM CEST
cmdline:        /usr/bin/python /usr/sbin/ceph-disk --verbose activate-journal /dev/vdb1
package:        ceph-osd-10.2.1-13.el7cp
uid:            0 (root)
count:          1
Directory:      /var/spool/abrt/Python-2016-06-13-16:10:59-8293
Run 'abrt-cli report /var/spool/abrt/Python-2016-06-13-16:10:59-8293' for creating a case in Red Hat Customer Portal

The Autoreporting feature is disabled. Please consider enabling it by issuing
'abrt-auto-reporting enabled' as a user with root privileges
~~~

we have additional details about the crash available, such as this traceback:

~~~
main.py:3008:mount_activate:FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2

Traceback (most recent call last):
  File "/usr/sbin/ceph-disk", line 9, in <module>
    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4993, in run
    main(sys.argv[1:])
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4944, in main
    args.func(args)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4702, in <lambda>
    func=lambda args: main_activate_space(name, args),
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3721, in main_activate_space
    reactivate=args.reactivate,
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3008, in mount_activate
    e,
FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2

Local variables in innermost frame:
activate_key_template: '{statedir}/bootstrap-osd/{cluster}.keyring'
dmcrypt: None
reactivate: False
dmcrypt_key_dir: '/etc/ceph/dmcrypt-keys'
dev: '/dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000'
init: 'auto'
e: CalledProcessError()
~~~

Expected results
================

ceph-disk doesn't crash, there are no related errors and tracebacks either
in /var/log/messages or abrt 

Additional info
===============

I'm not sure if the root cause of this bug is in some RHSC 2.0 component or
in ceph-disk itself. If RHSC 2.0 dev team or Ceph dev team decides to reassing
this BZ to ceph-disk, it would be good to include full reasoning and/or
additional technical details to back such decision.

Comment 2 Martin Bukatovic 2016-06-14 14:07:16 UTC

Created attachment 1167885 [details]
tarball with abrt report directory

Attaching tarball with abrt report directory (without sos report).

Comment 4 Nishanth Thomas 2016-06-16 12:51:23 UTC

Looks like ceph-installer/ansible issue. Moving the component

Comment 8 Martin Bukatovic 2016-06-16 14:42:10 UTC

Since this BZ has been reassigned to ceph component, I'm attaching full list
of ceph packages on Ceph storage machine:

ceph-base-10.2.1-13.el7cp.x86_64
ceph-common-10.2.1-13.el7cp.x86_64
ceph-osd-10.2.1-13.el7cp.x86_64
ceph-selinux-10.2.1-13.el7cp.x86_64
libcephfs1-10.2.1-13.el7cp.x86_64
python-cephfs-10.2.1-13.el7cp.x86_64

Comment 9 Ken Dreyer (Red Hat) 2016-06-16 15:09:39 UTC

From the RHSCON program call today:

> this is a well known ceph-disk issue. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/043630.html  -- see here to troubleshooting info

Comment 10 Loic Dachary 2016-06-16 15:57:21 UTC

Would it be possible to get access to the platform where this problem has been reproduced ?

Comment 11 Martin Bukatovic 2016-06-16 16:08:55 UTC

(In reply to Loic Dachary from comment #10)
> Would it be possible to get access to the platform where this problem has
> been reproduced ?

Yes, I just send you an email with details.

Comment 12 Samuel Just 2016-06-17 21:12:10 UTC

Wait, did this actually prevent cluster set up?

Comment 13 Loic Dachary 2016-06-20 06:55:57 UTC

Working on this now.

Comment 14 Loic Dachary 2016-06-20 07:17:21 UTC

I ran into circles because I was assuming the cluster failed, but I think Sam is on the right path. Here is what happened, I think:

* ceph-disk prepare /dev/vdc:/dev/vdb
* the journal is prepared first and this triggers a udev event which tries to activate the OSD from the journal. The journal is not set yet and contains zeros. When activating the journal, ceph-osd tries to extract the uuid of the matching data partition with

    /usr/bin/ceph-osd --get-device-fsid /dev/vdb1
    get_space_osd_uuid: Journal /dev/vdb1 has OSD UUID 00000000-0000-0000-0000-000000000000
  
There is no partition with that zero uuid and 
 
   /sbin/blkid -p -s TYPE -o value -- /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000

fails.
* the data is prepared and this triggers another udev event which tries to active the OSD from the data partition. When doing that, the journal partition is set with the uuid of the data partition and any subsequent activation via the journal will work:

   /usr/bin/ceph-osd --get-device-fsid /dev/vdb1 
79e98f4c-b538-4361-9bf5-15675919b019

# ceph-disk list
/dev/vda :
 /dev/vda1 other, swap
 /dev/vda2 other, xfs, mounted on /
/dev/vdb :
 /dev/vdb1 ceph journal, for /dev/vdc1
/dev/vdc :
 /dev/vdc1 ceph data, active, cluster alpha, osd.0, journal /dev/vdb1

At this point the OSD enters the cluster and it becomes healthy:

# ceph --cluster alpha osd tree
ID WEIGHT  TYPE NAME                    UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 0.05878 root default                                                   
-2 0.00980     host mbukatov-usm1-node1                                   
 0 0.00980         osd.0                     up  1.00000          1.00000 
-3 0.00980     host mbukatov-usm1-node2                                   
 1 0.00980         osd.1                     up  1.00000          1.00000 
-4 0.00980     host mbukatov-usm1-node3                                   
 2 0.00980         osd.2                     up  1.00000          1.00000 
-5 0.00980     host mbukatov-usm1-node4                                   
 3 0.00980         osd.3                     up  1.00000          1.00000 
-6 0.00980     host mbukatov-usm2-node1                                   
 4 0.00980         osd.4                     up  1.00000          1.00000 
-7 0.00980     host mbukatov-usm2-node2                                   
 5 0.00980         osd.5                     up  1.00000          1.00000 
# ceph --cluster alpha -s
    cluster 37b32727-6713-4237-84f1-fb4d2fc0a651
     health HEALTH_OK
     monmap e3: 3 mons at {mbukatov-usm1-mon1=172.16.180.102:6789/0,mbukatov-usm1-mon2=172.16.180.79:6789/0,mbukatov-usm1-mon3=172.16.180.89:6789/0}
            election epoch 12, quorum 0,1,2 mbukatov-usm1-mon2,mbukatov-usm1-mon3,mbukatov-usm1-mon1
     osdmap e36: 6 osds: 6 up, 6 in
            flags sortbitwise
      pgmap v102: 0 pgs, 0 pools, 0 bytes data, 0 objects
            202 MB used, 61170 MB / 61373 MB avail

If that matches what you have seen, this is expected. If what you have seen is a non-functional ceph cluster, it would be great if you could provide steps to reproduce (because the cluster currently is healthy).

Thanks :-)

Comment 15 Martin Bukatovic 2016-06-20 08:25:49 UTC

(In reply to Samuel Just from comment #12)
> Wait, did this actually prevent cluster set up?

No, the issue doesn't seem to prevent cluster setup. As I wrote in the
description of this BZ, RHSC 2.0 tries multiple times and the cluster
is created in the end:

> That said, skyring seems to be able to recover and "Create Cluster" task
> completes with success after all.

> In this example, it crashed 2 times, but the actual number of crashes can
> vary, see this stats for the whole cluster ...

Comment 16 Martin Bukatovic 2016-06-20 08:40:15 UTC

(In reply to Loic Dachary from comment #14)
> If that matches what you have seen, this is expected. If what you have seen
> is a non-functional ceph cluster, it would be great if you could provide
> steps to reproduce (because the cluster currently is healthy).

Your description matches what I see. And while I understand that the
functionality is not affected, there is one detail which I would not
expect to happen: the crash of ceph-disk.

If I read your comment right, this is a valid use case (from ceph point of view)
and ceph-disk is actually expected to fail at some point during the process.
That said, this also means that ceph-disk should not crash - but I would expect
it to report the fail and shutdown itself peacefully.

Comment 17 Loic Dachary 2016-06-20 08:57:46 UTC

Right, this is work in progress at http://tracker.ceph.com/issues/15990

Comment 18 Samuel Just 2016-06-20 14:59:52 UTC

This doesn't actually prevent cluster creation, so I am deferring it to 2.1.

Comment 19 Alfredo Deza 2016-06-20 17:43:29 UTC

It will prevent cluster creation for things like ceph-ansible that does call ceph-disk directly.

Comment 20 Martin Bukatovic 2016-06-20 17:51:56 UTC

(In reply to Loic Dachary from comment #17)
> Right, this is work in progress at http://tracker.ceph.com/issues/15990

Maybe I'm missing something here, but how is a crash of ceph-disk related to
ceph-disk systemd unit file? Do you mean that some change in the unit file would
make ceph-disk to just report the failure without crashing with a traceback?

(sorry for a stupid question, I'm just confused)

Comment 21 Loic Dachary 2016-06-20 19:01:38 UTC

@mbukatov are there any of the OSD that failed to be up because of this problem ? I only looked into a few of them and they were healthy. It would be great if you could double confirm that all is well, except for the error messages found in the logs.

Comment 22 Loic Dachary 2016-06-20 19:05:22 UTC

@Martin this is not a stupid question at all :-) It's pretty much like having a ceph cluster in which nodes are not available and messages are logged accordingly. But then the cluster is back to being healthy and the messages can be ignored. 

The ceph-disk preparation is asynchronous and while it's happening, messages are issued that reflect the fact that it's not complete yet. These messages show in the form of ceph-disk activate failures, in various contexts. They are to be ignored when and if the osd activates. However, if the osd does not activate, they need to be investigated to figure out what's wrong.

Does that make sense ?

Comment 23 Martin Bukatovic 2016-06-29 08:52:49 UTC

(In reply to Loic Dachary from comment #21)
> @mbukatov are there any of the OSD that failed to be up because
> of this problem ? I only looked into a few of them and they were healthy. It
> would be great if you could double confirm that all is well, except for the
> error messages found in the logs.

I have rechecked with a new cluster using latest QE ready builds.

As it's noted in "How reproducible" section of this BZ, it happens every
time one creates a cluster via RHSC 2.0 web interface, so I reproduced this
easily. The only difference is how many times ceph-disk crashes before it
finally succeeds. This time:

~~~
$ for i in mbukatov-usm*-node*; do echo -n "$i "; grep -R 00000000-0000-0000-0000-000000000000 $i | grep blkid | grep FilesystemTypeError | wc -l; done
mbukatov-usm3-node1.usersys.redhat.com 2
mbukatov-usm3-node2.usersys.redhat.com 1
mbukatov-usm3-node3.usersys.redhat.com 3
mbukatov-usm3-node4.usersys.redhat.com 2
~~~

But to answer your question: the cluster has been created with success and all
OSDs seems to be fine:

~~~
# ceph -c /etc/ceph/alpha.conf osd stat
     osdmap e51: 4 osds: 4 up, 4 in
            flags sortbitwise
~~~

Comment 24 Loic Dachary 2016-06-30 05:59:36 UTC

@Martin, thanks for the confirmation, it is much appreciated.

Comment 25 Loic Dachary 2016-09-27 08:15:32 UTC


*** This bug has been marked as a duplicate of bug 1326740 ***

Note You need to log in before you can comment on or make changes to this bug.