We should be hitting this in 3.0 as well. +++ This bug was initially created as a clone of Bug #1476453 +++ Description of problem: ceph-disk activate fails with illegal instruction: # ceph-disk -v activate --reactivate /dev/mapper/mpathg1 main_activate: path = /dev/mapper/mpathg1 get_dm_uuid: get_dm_uuid /dev/mapper/mpathg1 uuid path is /sys/dev/block/253:13/dm/uuid get_dm_uuid: get_dm_uuid /dev/mapper/mpathg1 uuid is part1-mpath-3600508b300954b90aa418f385e820016 get_dm_uuid: get_dm_uuid /dev/mapper/mpathg1 uuid path is /sys/dev/block/253:13/dm/uuid get_dm_uuid: get_dm_uuid /dev/mapper/mpathg1 uuid is part1-mpath-3600508b300954b90aa418f385e820016 command: Running command: /usr/sbin/blkid -o udev -p /dev/mapper/mpathg1 get_dm_uuid: get_dm_uuid /dev/mapper/mpathg1 uuid path is /sys/dev/block/253:13/dm/uuid get_dm_uuid: get_dm_uuid /dev/mapper/mpathg1 uuid is part1-mpath-3600508b300954b90aa418f385e820016 command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/mapper/mpathg1 command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs mount: Mounting /dev/mapper/mpathg1 on /var/lib/ceph/tmp/mnt.WJ9Mm4 with options noatime,inode64 command_check_call: Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/mapper/mpathg1 /var/lib/ceph/tmp/mnt.WJ9Mm4 command: Running command: /usr/sbin/restorecon /var/lib/ceph/tmp/mnt.WJ9Mm4 activate: Cluster uuid is 40216c8a-ba3c-4cec-9e80-212396e214a1 command: Running command: /usr/bin/ceph-osd --cluster=toad --show-config-value=fsid command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid activate: Cluster name is ceph activate: OSD uuid is 25efb560-300e-48d3-9156-2f4c0746a313 activate: OSD id is 13 activate: Initializing OSD... command_check_call: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/tmp/mnt.WJ9Mm4/activate.monmap got monmap epoch 8 command: Running command: /usr/bin/timeout 300 ceph-osd --cluster ceph --mkfs --mkkey -i 13 --monmap /var/lib/ceph/tmp/mnt.WJ9Mm4/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.WJ9Mm4 --osd-uuid 25efb560-300e-48d3-9156-2f4c0746a313 --keyring /var/lib/ceph/tmp/mnt.WJ9Mm4/keyring --setuser ceph --setgroup ceph mount_activate: Failed to activate unmount: Unmounting /var/lib/ceph/tmp/mnt.WJ9Mm4 command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.WJ9Mm4 Traceback (most recent call last): File "/usr/sbin/ceph-disk", line 11, in <module> load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')() File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5731, in run main(sys.argv[1:]) File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5682, in main args.func(args) File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3755, in main_activate reactivate=args.reactivate, File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3512, in mount_activate (osd_id, cluster) = activate(path, activate_key_template, init) File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3688, in activate keyring=keyring, File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3100, in mkfs '--setgroup', get_ceph_group(), File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3064, in ceph_osd_mkfs raise Error('%s failed : %s' % (str(arguments), error)) ceph_disk.main.Error: Error: ['ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', u'13', '--monmap', '/var/lib/ceph/tmp/mnt.WJ9Mm4/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.WJ9Mm4', '--osd-uuid', u'25efb560-300e-48d3-9156-2f4c0746a313', '--keyring', '/var/lib/ceph/tmp/mnt.WJ9Mm4/keyring', '--setuser', 'ceph', '--setgroup', 'ceph'] failed : *** Caught signal (Illegal instruction) ** in thread 7f0c68b4bd00 thread_name:ceph-osd ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc) 1: (()+0x9b3458) [0x55a55ca1a458] 2: (()+0x122c0) [0x7f0c6620d2c0] 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x55a55ce0c939] 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x55a55cd0787f] 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x55a55ccd3ed3] 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x55a55ccd5941] 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x55a55ccd6fde] 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x55a55c9646cd] 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x55a55c965e14] 10: (BlueStore::_open_db(bool)+0x4f3) [0x55a55c8f48e3] 11: (BlueStore::mkfs()+0x8e8) [0x55a55c920a38] 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x55a55c48f8b9] 13: (main()+0xe29) [0x55a55c3e78b9] 14: (__libc_start_main()+0xea) [0x7f0c6517a4da] 15: (_start()+0x2a) [0x55a55c46d98a] 2017-07-29 09:15:16.982152 7f0c68b4bd00 -1 *** Caught signal (Illegal instruction) ** in thread 7f0c68b4bd00 thread_name:ceph-osd ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc) 1: (()+0x9b3458) [0x55a55ca1a458] 2: (()+0x122c0) [0x7f0c6620d2c0] 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x55a55ce0c939] 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x55a55cd0787f] 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x55a55ccd3ed3] 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x55a55ccd5941] 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x55a55ccd6fde] 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x55a55c9646cd] 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x55a55c965e14] 10: (BlueStore::_open_db(bool)+0x4f3) [0x55a55c8f48e3] 11: (BlueStore::mkfs()+0x8e8) [0x55a55c920a38] 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x55a55c48f8b9] 13: (main()+0xe29) [0x55a55c3e78b9] 14: (__libc_start_main()+0xea) [0x7f0c6517a4da] 15: (_start()+0x2a) [0x55a55c46d98a] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 2017-07-29 09:15:16.982152 7f0c68b4bd00 -1 *** Caught signal (Illegal instruction) ** in thread 7f0c68b4bd00 thread_name:ceph-osd ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc) 1: (()+0x9b3458) [0x55a55ca1a458] 2: (()+0x122c0) [0x7f0c6620d2c0] 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x55a55ce0c939] 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x55a55cd0787f] 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x55a55ccd3ed3] 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x55a55ccd5941] 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x55a55ccd6fde] 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x55a55c9646cd] 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x55a55c965e14] 10: (BlueStore::_open_db(bool)+0x4f3) [0x55a55c8f48e3] 11: (BlueStore::mkfs()+0x8e8) [0x55a55c920a38] 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x55a55c48f8b9] 13: (main()+0xe29) [0x55a55c3e78b9] 14: (__libc_start_main()+0xea) [0x7f0c6517a4da] 15: (_start()+0x2a) [0x55a55c46d98a] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. /usr/bin/timeout: the monitored command dumped core Version-Release number of selected component (if applicable): ceph-osd-12.1.1-3.fc27.x86_64 How reproducible: Always, on two of my machines (one with pentium4 CPU, the other with AMD Opteron 8354) Steps to Reproduce: 1. ceph-disk prepare /dev/xxx 2. ceph-disk activate /dev/xxx 3. Actual results: The trackback. Expected results: Running OSD Additional info: --- Additional comment from Loic Dachary on 2017-07-29 09:30:12 CEST --- Hi ! It would be very useful to have detailed steps to reproduce. Would you mind explaining how multipath was set ? Also I'm curious about why you have used the --reactivate flag ? ceph-disk -v activate --reactivate /dev/mapper/mpathg1 Thanks ! --- Additional comment from Tomasz Torcz on 2017-07-29 09:40:29 CEST --- I'm using --reactivate because I'm recreating btrfs-based OSDs with bluestore. I'm using steps documented on http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd , but with the '--reactivate' because '--osd-id' from step 3. seem to no longer exists. Multipath is irrelevant, I get the same backtrace on OSD using plain partitions, but for the sake of completness: # multipath -ll /dev/mapper/mpathg mpathg (3600508b300954b90aa418f385e820016) dm-6 COMPAQ ,MSA1000 VOLUME size=279G features='1 queue_if_no_path' hwhandler='1 hp_sw' wp=rw |-+- policy='service-time 0' prio=2 status=enabled | `- 4:0:0:5 sdl 8:176 active ghost running `-+- policy='service-time 0' prio=4 status=active `- 2:0:0:5 sde 8:64 active ready running lblk: sde 8:64 0 279.4G 0 disk └─mpathg 253:6 0 279.4G 0 mpath ├─mpathg1 253:13 0 100M 0 part └─mpathg2 253:14 0 279.3G 0 part sdl 8:176 0 279.4G 0 disk └─mpathg 253:6 0 279.4G 0 mpath ├─mpathg1 253:13 0 100M 0 part └─mpathg2 253:14 0 279.3G 0 part And no special configuration, just multipath defaults. To rule out multipath, here's the backtrace from OTHER machine. It has sda3 as mountable OSD store and sda6 as block store. # ceph-disk activate /dev/sda3 1>&2 2>/tmp/out.txt got monmap epoch 8 mount_activate: Failed to activate ceph-disk: Error: ['ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', u'8', '--monmap', '/var/lib/ceph/tmp/mnt.eD_OyV/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/ mnt.eD_OyV', '--osd-uuid', u'7671ea65-8cdd-407f-963b-fa4ad85ba9b1', '--keyring', '/var/lib/ceph/tmp/mnt.eD_OyV/keyring', '--setuser', 'ceph', '--setgroup', 'ceph'] failed : *** Ca ught signal (Illegal instruction) ** in thread 7f0029f4cd00 thread_name:ceph-osd ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc) 1: (()+0x9b3458) [0x22fe05a458] 2: (()+0x12720) [0x7f002761b720] 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x22fe44c939] 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x22fe34787f] 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x22fe313ed3] 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescript or, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x2 2fe315941] 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x22fe316fde] 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x22fdfa46cd] 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x22fdfa5e14] 10: (BlueStore::_open_db(bool)+0x4f3) [0x22fdf348e3] 11: (BlueStore::mkfs()+0x8e8) [0x22fdf60a38] 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x22fdacf8b9] 13: (main()+0xe29) [0x22fda278b9] 14: (__libc_start_main()+0xea) [0x7f002653400a] 15: (_start()+0x2a) [0x22fdaad98a] 2017-07-29 09:38:28.304361 7f0029f4cd00 -1 *** Caught signal (Illegal instruction) ** in thread 7f0029f4cd00 thread_name:ceph-osd ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc) 1: (()+0x9b3458) [0x22fe05a458] 2: (()+0x12720) [0x7f002761b720] 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x22fe44c939] 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x22fe34787f] 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x22fe313ed3] 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescript or, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x2 2fe315941] 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x22fe316fde] 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x22fdfa46cd] 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x22fdfa5e14] 10: (BlueStore::_open_db(bool)+0x4f3) [0x22fdf348e3] 11: (BlueStore::mkfs()+0x8e8) [0x22fdf60a38] 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x22fdacf8b9] 13: (main()+0xe29) [0x22fda278b9] 14: (__libc_start_main()+0xea) [0x7f002653400a] 15: (_start()+0x2a) [0x22fdaad98a] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 0> 2017-07-29 09:38:28.304361 7f0029f4cd00 -1 *** Caught signal (Illegal instruction) ** in thread 7f0029f4cd00 thread_name:ceph-osd ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc) 1: (()+0x9b3458) [0x22fe05a458] 2: (()+0x12720) [0x7f002761b720] 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x22fe44c939] 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x22fe34787f] 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x22fe313ed3] 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x22fe315941] 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x22fe316fde] 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x22fdfa46cd] 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x22fdfa5e14] 10: (BlueStore::_open_db(bool)+0x4f3) [0x22fdf348e3] 11: (BlueStore::mkfs()+0x8e8) [0x22fdf60a38] 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x22fdacf8b9] 13: (main()+0xe29) [0x22fda278b9] 14: (__libc_start_main()+0xea) [0x7f002653400a] 15: (_start()+0x2a) [0x22fdaad98a] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. /usr/bin/timeout: the monitored command dumped core --- Additional comment from Loic Dachary on 2017-07-29 13:49:34 CEST --- It's interesting that it can be reproduced without multipath (because multipath adds a level of complexity that may make it more difficult to diagnose the problem). Would it be possible for me to reproduce the same problem ? Should I first create an OSD with a btrfs file system, then try to migrate that OSD to bluestore ? Ideally I would run a series of commands on my own machine and run into the same problem. Could you send me such a series of commands ? Thanks ! --- Additional comment from Tomasz Torcz on 2017-07-30 11:19:20 CEST --- There's no need to start from btrfs OSD. This is 100% reproducible for me on clean disks, when creating new OSD with 12.1.1. Following steps should let you reproduce it: 1. Get Fedora rawhide installed. 2. Rawhide repositories seem not be updated for past few days, so get ceph 12.1.1-3 build manually: koji download-build --arch=x86_64 923148 3. Install downloaded RPMS 4. Prepare disk with two partitions. For me it would be sda3 and sda6: # wipefs -a /dev/sda6 # wipefs -a /dev/sda3 /dev/sda3: 4 bytes were erased at offset 0x00000000 (xfs): 58 46 53 42 5. ceph-prepare disk: # ceph-disk prepare /dev/sda3 set_data_partition: incorrect partition UUID: 0x83, expected ['4fbd7e29-9d25-41b8-afd0-5ec00ceff05d', '4fbd7e29-9d25-41b8-afd0-062c0ceff05d', '4fbd7e29-8ae0-4982-bf9d-5a8d867af560', '4fbd7e29-9d25-41b8-afd0-35865ceff05d'] meta-data=/dev/sda3 isize=2048 agcount=4, agsize=1310720 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0 data = bsize=4096 blocks=5242880, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 5. More preparation - point to block storage partition. It should be symlink to partuuid, but for clarity I've used short name: # mount /dev/sda3 /mnt/tmp # ls -l /mnt/tmp total 16 -rw-r--r--. 1 ceph ceph 37 Jul 30 11:06 ceph_fsid -rw-r--r--. 1 ceph ceph 37 Jul 30 11:06 fsid -rw-r--r--. 1 ceph ceph 21 Jul 30 11:06 magic -rw-r--r--. 1 ceph ceph 10 Jul 30 11:06 type # ln -s /dev/sda6 /mnt/tmp/block # chown ceph:ceph /dev/sda6 # umount /mnt/tmp 6. Activate new OSD, receive backtrace: # ceph-disk activate /dev/sda3 got monmap epoch 8 mount_activate: Failed to activate ceph-disk: Error: ['ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', u'17', '--monmap', '/var/lib/ceph/tmp/mnt.qG3_2R/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.qG3_2R', '--osd-uuid', u'2e70309e-12dd-4a54-9547-ab68a3f842de', '--keyring', '/var/lib/ceph/tmp/mnt.qG3_2R/keyring', '--setuser', 'ceph', '--setgroup', 'ceph'] failed : *** Caught signal (Illegal instruction) ** in thread 7f2653804d00 thread_name:ceph-osd ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc) 1: (()+0x9b3458) [0x906fb14458] 2: (()+0x12720) [0x7f2650ed3720] 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x906ff06939] 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x906fe0187f] 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x906fdcded3] 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x906fdcf941] 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x906fdd0fde] 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x906fa5e6cd] 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x906fa5fe14] 10: (BlueStore::_open_db(bool)+0x4f3) [0x906f9ee8e3] 11: (BlueStore::mkfs()+0x8e8) [0x906fa1aa38] 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x906f5898b9] 13: (main()+0xe29) [0x906f4e18b9] 14: (__libc_start_main()+0xea) [0x7f264fdec00a] 15: (_start()+0x2a) [0x906f56798a] 2017-07-30 11:09:10.793415 7f2653804d00 -1 *** Caught signal (Illegal instruction) ** in thread 7f2653804d00 thread_name:ceph-osd [… snipped … ] --- Additional comment from Loic Dachary on 2017-07-31 08:27:46 CEST --- Thanks for the detailed instructions, this is very helpful. Preparing each partition individually is uncommon but it should not crash the way it does. I'll reproduce this and figure out what to do. Unrelated question: is there a reason why you do not simply ceph-disk prepare /dev/sda and let it partition the disk itself ? --- Additional comment from Tomasz Torcz on 2017-07-31 15:33:34 CEST --- This is my experimental CEPH cluster, running latest code to catch bugs like this one early. As such, is not production ready, created from generally decommisioned hardware and this particular node has only one HDD. This drive is shared between operating system and OSD – thus partitions. --- Additional comment from Jan Kurik on 2017-08-15 08:43:15 CEST --- This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle. Changing version to '27'. --- Additional comment from Tomasz Torcz on 2017-09-12 12:00:21 CEST --- I was able to run ceph-osd under GDB. The result is following info: Thread 1 "ceph-osd" received signal SIGILL, Illegal instruction. 0x00005555563ad449 in std::__sort<__gnu_cxx::__normal_iterator<rocksdb::FileMetaData**, std::vector<rocksdb::FileMetaData*, std::allocator<rocksdb::FileMetaData*> > >, __gnu_cxx::__ops::_Iter_comp_iter<rocksdb::VersionBuilder::Rep::FileComparator> > (__comp=..., __last=..., __first=...) at /usr/include/c++/7/bits/stl_algo.h:1966 1966 if (__first != __last) (gdb) x/i 0x00005555563ad449 => 0x5555563ad449 <rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+1993>: pinsrd $0x0,%ecx,%xmm1 "pinsrd" seems to be SSE4.1/AVX instruction. My servers don't have CPUs with SSE4.1/AVX. I'm not sure how RocksDB in ceph-osd got miscompiled to include this instruction, but it's clearly cause of the crash. --- Additional comment from Tomasz Torcz on 2017-09-12 12:05:00 CEST --- See: https://github.com/facebook/rocksdb/issues/690 "Right now the 'default' build is to build with -march=native. […] The issue with this is that my build box CPU has instructions that my cluster CPU's do not support." Also: https://github.com/ceph/ceph/pull/11677 --- Additional comment from Boris Ranto on 2017-09-12 17:10:39 CEST --- The commit you referenced is already in the 12.x packages. I was able to find a reference to march=native in the sources if we are doing dpdk-enabled build. Maybe, that is a one more suspect to look at. Alternatively, we might want to add PORTABLE=1 before calling 'make' in the spec file to see if that helps. --- Additional comment from Boris Ranto on 2017-09-12 19:35:13 CEST --- There is upstream issue for this: http://tracker.ceph.com/issues/20529 The upstream PR that should fix this is still open/in review: https://github.com/ceph/ceph/pull/17388 --- Additional comment from Boris Ranto on 2017-09-13 00:38:51 CEST --- Can you test this build? https://koji.fedoraproject.org/koji/taskinfo?taskID=21828197
Will be fixed in the rebase to v12.2.1.
Sanity verified with normal disk(none of our lab has mpath devices)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3387