Description of problem ====================== During "Create Cluster" task, ceph-disk crashes multiple times on this error: ~~~ FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2 ~~~ Information about the crash can be found in /var/log/messages log file and is also detected and processed by abrt. That said, skyring seems to be able to recover and "Create Cluster" task completes with success after all. Version-Release =============== On RHSC 2.0 machine: rhscon-ceph-0.0.20-1.el7scon.x86_64 rhscon-core-0.0.21-1.el7scon.x86_64 rhscon-ui-0.0.34-1.el7scon.noarch ceph-ansible-1.0.5-19.el7scon.noarch ceph-installer-1.0.11-1.el7scon.noarch On storage machine: rhscon-agent-0.0.9-1.el7scon.noarch ceph-common-10.2.1-13.el7cp.x86_64 How reproducible ================ 100 % Steps to Reproduce ================== 1. Install RHSC 2.0 following the documentation. 2. Accept few nodes for the ceph cluster. 3. Make sure abrt is installed and configured on all storage machines. 4. Create new ceph cluster named 'alpha'. 5. On all storage machines, check /var/log/messages log file for blkid errors related to "zero UUID": ~~~ # grep 00000000-0000-0000-0000-000000000000 /var/log/messages | grep blkid ~~~ 6. On all storage machines, list crashes detected by abrt: ~~~ # abrt-cli list ~~~ Actual results ============== There are few such errors in the /var/log/messages log file, see an example from one storage machine of my cluster: ~~~ # grep 00000000-0000-0000-0000-000000000000 /var/log/messages | grep blkid Jun 13 16:11:04 mbukatov-usm1-node1 sh: command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000 Jun 13 16:11:04 mbukatov-usm1-node1 sh: ceph_disk.main.FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2 Jun 13 16:11:05 mbukatov-usm1-node1 sh: command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000 Jun 13 16:11:05 mbukatov-usm1-node1 sh: ceph_disk.main.FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2 ~~~ In this example, it crashed 2 times, but the actual number of crashes can vary, see this stats for the whole cluster: ~~~ $ for i in mbukatov-usm*-node*; do echo -n "$i "; grep -R 00000000-0000-0000-0000-000000000000 $i | grep blkid | grep FilesystemTypeError | wc -l; done mbukatov-usm1-node1 2 mbukatov-usm1-node2 3 mbukatov-usm1-node3 2 mbukatov-usm1-node4 2 mbukatov-usm2-node1 2 mbukatov-usm2-node2 3 ~~~ To add the context, here is larger excerpt from /var/log/messages file: ~~~ Jun 13 16:11:04 mbukatov-usm1-node1 sh: main_trigger: Jun 13 16:11:04 mbukatov-usm1-node1 sh: main_trigger: get_dm_uuid: get_dm_uuid /dev/vdb1 uuid path is /sys/dev/block/253:17/dm/uuid Jun 13 16:11:04 mbukatov-usm1-node1 sh: command: Running command: /usr/sbin/blkid -o udev -p /dev/vdb1 Jun 13 16:11:04 mbukatov-usm1-node1 sh: command: Running command: /usr/bin/ceph-osd --get-device-fsid /dev/vdb1 Jun 13 16:11:04 mbukatov-usm1-node1 sh: get_space_osd_uuid: Journal /dev/vdb1 has OSD UUID 00000000-0000-0000-0000-000000000000 Jun 13 16:11:04 mbukatov-usm1-node1 sh: command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000 Jun 13 16:11:04 mbukatov-usm1-node1 sh: Traceback (most recent call last): Jun 13 16:11:04 mbukatov-usm1-node1 sh: File "/usr/sbin/ceph-disk", line 9, in <module> Jun 13 16:11:04 mbukatov-usm1-node1 sh: load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')() Jun 13 16:11:04 mbukatov-usm1-node1 sh: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4993, in run Jun 13 16:11:04 mbukatov-usm1-node1 sh: main(sys.argv[1:]) Jun 13 16:11:04 mbukatov-usm1-node1 sh: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4944, in main Jun 13 16:11:04 mbukatov-usm1-node1 sh: args.func(args) Jun 13 16:11:04 mbukatov-usm1-node1 sh: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4702, in <lambda> Jun 13 16:11:04 mbukatov-usm1-node1 sh: func=lambda args: main_activate_space(name, args), Jun 13 16:11:04 mbukatov-usm1-node1 sh: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3721, in main_activate_space Jun 13 16:11:04 mbukatov-usm1-node1 sh: reactivate=args.reactivate, Jun 13 16:11:04 mbukatov-usm1-node1 sh: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3008, in mount_activate Jun 13 16:11:04 mbukatov-usm1-node1 sh: e, Jun 13 16:11:04 mbukatov-usm1-node1 sh: ceph_disk.main.FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2 Jun 13 16:11:04 mbukatov-usm1-node1 python: detected unhandled Python exception in '/usr/sbin/ceph-disk' Jun 13 16:11:04 mbukatov-usm1-node1 abrt-server: Not saving repeating crash in '/usr/sbin/ceph-disk' ~~~ Also, since abrt detects the crash: ~~~ # abrt-cli list id dcb84ec438af88af00d317b7b1c1e016e54f9c09 reason: main.py:3008:mount_activate:FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2 time: Mon 13 Jun 2016 04:10:59 PM CEST cmdline: /usr/bin/python /usr/sbin/ceph-disk --verbose activate-journal /dev/vdb1 package: ceph-osd-10.2.1-13.el7cp uid: 0 (root) count: 1 Directory: /var/spool/abrt/Python-2016-06-13-16:10:59-8293 Run 'abrt-cli report /var/spool/abrt/Python-2016-06-13-16:10:59-8293' for creating a case in Red Hat Customer Portal The Autoreporting feature is disabled. Please consider enabling it by issuing 'abrt-auto-reporting enabled' as a user with root privileges ~~~ we have additional details about the crash available, such as this traceback: ~~~ main.py:3008:mount_activate:FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2 Traceback (most recent call last): File "/usr/sbin/ceph-disk", line 9, in <module> load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')() File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4993, in run main(sys.argv[1:]) File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4944, in main args.func(args) File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4702, in <lambda> func=lambda args: main_activate_space(name, args), File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3721, in main_activate_space reactivate=args.reactivate, File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3008, in mount_activate e, FilesystemTypeError: Cannot discover filesystem type: device /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000: Command '/sbin/blkid' returned non-zero exit status 2 Local variables in innermost frame: activate_key_template: '{statedir}/bootstrap-osd/{cluster}.keyring' dmcrypt: None reactivate: False dmcrypt_key_dir: '/etc/ceph/dmcrypt-keys' dev: '/dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000' init: 'auto' e: CalledProcessError() ~~~ Expected results ================ ceph-disk doesn't crash, there are no related errors and tracebacks either in /var/log/messages or abrt Additional info =============== I'm not sure if the root cause of this bug is in some RHSC 2.0 component or in ceph-disk itself. If RHSC 2.0 dev team or Ceph dev team decides to reassing this BZ to ceph-disk, it would be good to include full reasoning and/or additional technical details to back such decision.
Created attachment 1167885 [details] tarball with abrt report directory Attaching tarball with abrt report directory (without sos report).
Looks like ceph-installer/ansible issue. Moving the component
Since this BZ has been reassigned to ceph component, I'm attaching full list of ceph packages on Ceph storage machine: ceph-base-10.2.1-13.el7cp.x86_64 ceph-common-10.2.1-13.el7cp.x86_64 ceph-osd-10.2.1-13.el7cp.x86_64 ceph-selinux-10.2.1-13.el7cp.x86_64 libcephfs1-10.2.1-13.el7cp.x86_64 python-cephfs-10.2.1-13.el7cp.x86_64
From the RHSCON program call today: > this is a well known ceph-disk issue. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/043630.html -- see here to troubleshooting info
Would it be possible to get access to the platform where this problem has been reproduced ?
(In reply to Loic Dachary from comment #10) > Would it be possible to get access to the platform where this problem has > been reproduced ? Yes, I just send you an email with details.
Wait, did this actually prevent cluster set up?
Working on this now.
I ran into circles because I was assuming the cluster failed, but I think Sam is on the right path. Here is what happened, I think: * ceph-disk prepare /dev/vdc:/dev/vdb * the journal is prepared first and this triggers a udev event which tries to activate the OSD from the journal. The journal is not set yet and contains zeros. When activating the journal, ceph-osd tries to extract the uuid of the matching data partition with /usr/bin/ceph-osd --get-device-fsid /dev/vdb1 get_space_osd_uuid: Journal /dev/vdb1 has OSD UUID 00000000-0000-0000-0000-000000000000 There is no partition with that zero uuid and /sbin/blkid -p -s TYPE -o value -- /dev/disk/by-partuuid/00000000-0000-0000-0000-000000000000 fails. * the data is prepared and this triggers another udev event which tries to active the OSD from the data partition. When doing that, the journal partition is set with the uuid of the data partition and any subsequent activation via the journal will work: /usr/bin/ceph-osd --get-device-fsid /dev/vdb1 79e98f4c-b538-4361-9bf5-15675919b019 # ceph-disk list /dev/vda : /dev/vda1 other, swap /dev/vda2 other, xfs, mounted on / /dev/vdb : /dev/vdb1 ceph journal, for /dev/vdc1 /dev/vdc : /dev/vdc1 ceph data, active, cluster alpha, osd.0, journal /dev/vdb1 At this point the OSD enters the cluster and it becomes healthy: # ceph --cluster alpha osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 0.05878 root default -2 0.00980 host mbukatov-usm1-node1 0 0.00980 osd.0 up 1.00000 1.00000 -3 0.00980 host mbukatov-usm1-node2 1 0.00980 osd.1 up 1.00000 1.00000 -4 0.00980 host mbukatov-usm1-node3 2 0.00980 osd.2 up 1.00000 1.00000 -5 0.00980 host mbukatov-usm1-node4 3 0.00980 osd.3 up 1.00000 1.00000 -6 0.00980 host mbukatov-usm2-node1 4 0.00980 osd.4 up 1.00000 1.00000 -7 0.00980 host mbukatov-usm2-node2 5 0.00980 osd.5 up 1.00000 1.00000 # ceph --cluster alpha -s cluster 37b32727-6713-4237-84f1-fb4d2fc0a651 health HEALTH_OK monmap e3: 3 mons at {mbukatov-usm1-mon1=172.16.180.102:6789/0,mbukatov-usm1-mon2=172.16.180.79:6789/0,mbukatov-usm1-mon3=172.16.180.89:6789/0} election epoch 12, quorum 0,1,2 mbukatov-usm1-mon2,mbukatov-usm1-mon3,mbukatov-usm1-mon1 osdmap e36: 6 osds: 6 up, 6 in flags sortbitwise pgmap v102: 0 pgs, 0 pools, 0 bytes data, 0 objects 202 MB used, 61170 MB / 61373 MB avail If that matches what you have seen, this is expected. If what you have seen is a non-functional ceph cluster, it would be great if you could provide steps to reproduce (because the cluster currently is healthy). Thanks :-)
(In reply to Samuel Just from comment #12) > Wait, did this actually prevent cluster set up? No, the issue doesn't seem to prevent cluster setup. As I wrote in the description of this BZ, RHSC 2.0 tries multiple times and the cluster is created in the end: > That said, skyring seems to be able to recover and "Create Cluster" task > completes with success after all. > In this example, it crashed 2 times, but the actual number of crashes can > vary, see this stats for the whole cluster ...
(In reply to Loic Dachary from comment #14) > If that matches what you have seen, this is expected. If what you have seen > is a non-functional ceph cluster, it would be great if you could provide > steps to reproduce (because the cluster currently is healthy). Your description matches what I see. And while I understand that the functionality is not affected, there is one detail which I would not expect to happen: the crash of ceph-disk. If I read your comment right, this is a valid use case (from ceph point of view) and ceph-disk is actually expected to fail at some point during the process. That said, this also means that ceph-disk should not crash - but I would expect it to report the fail and shutdown itself peacefully.
Right, this is work in progress at http://tracker.ceph.com/issues/15990
This doesn't actually prevent cluster creation, so I am deferring it to 2.1.
It will prevent cluster creation for things like ceph-ansible that does call ceph-disk directly.
(In reply to Loic Dachary from comment #17) > Right, this is work in progress at http://tracker.ceph.com/issues/15990 Maybe I'm missing something here, but how is a crash of ceph-disk related to ceph-disk systemd unit file? Do you mean that some change in the unit file would make ceph-disk to just report the failure without crashing with a traceback? (sorry for a stupid question, I'm just confused)
@mbukatov are there any of the OSD that failed to be up because of this problem ? I only looked into a few of them and they were healthy. It would be great if you could double confirm that all is well, except for the error messages found in the logs.
@Martin this is not a stupid question at all :-) It's pretty much like having a ceph cluster in which nodes are not available and messages are logged accordingly. But then the cluster is back to being healthy and the messages can be ignored. The ceph-disk preparation is asynchronous and while it's happening, messages are issued that reflect the fact that it's not complete yet. These messages show in the form of ceph-disk activate failures, in various contexts. They are to be ignored when and if the osd activates. However, if the osd does not activate, they need to be investigated to figure out what's wrong. Does that make sense ?
(In reply to Loic Dachary from comment #21) > @mbukatov are there any of the OSD that failed to be up because > of this problem ? I only looked into a few of them and they were healthy. It > would be great if you could double confirm that all is well, except for the > error messages found in the logs. I have rechecked with a new cluster using latest QE ready builds. As it's noted in "How reproducible" section of this BZ, it happens every time one creates a cluster via RHSC 2.0 web interface, so I reproduced this easily. The only difference is how many times ceph-disk crashes before it finally succeeds. This time: ~~~ $ for i in mbukatov-usm*-node*; do echo -n "$i "; grep -R 00000000-0000-0000-0000-000000000000 $i | grep blkid | grep FilesystemTypeError | wc -l; done mbukatov-usm3-node1.usersys.redhat.com 2 mbukatov-usm3-node2.usersys.redhat.com 1 mbukatov-usm3-node3.usersys.redhat.com 3 mbukatov-usm3-node4.usersys.redhat.com 2 ~~~ But to answer your question: the cluster has been created with success and all OSDs seems to be fine: ~~~ # ceph -c /etc/ceph/alpha.conf osd stat osdmap e51: 4 osds: 4 up, 4 in flags sortbitwise ~~~
@Martin, thanks for the confirmation, it is much appreciated.
*** This bug has been marked as a duplicate of bug 1326740 ***