Bug 1476453 - caught signal (Illegal instruction) when activation bluestore OSD
caught signal (Illegal instruction) when activation bluestore OSD
Status: NEW
Product: Fedora
Classification: Fedora
Component: ceph (Show other bugs)
27
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Boris Ranto
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-29 03:21 EDT by Tomasz Torcz
Modified: 2017-08-15 02:43 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Tomasz Torcz 2017-07-29 03:21:32 EDT
Description of problem:
ceph-disk activate fails with illegal instruction:

# ceph-disk -v activate --reactivate /dev/mapper/mpathg1

main_activate: path = /dev/mapper/mpathg1
get_dm_uuid: get_dm_uuid /dev/mapper/mpathg1 uuid path is /sys/dev/block/253:13/dm/uuid
get_dm_uuid: get_dm_uuid /dev/mapper/mpathg1 uuid is part1-mpath-3600508b300954b90aa418f385e820016

get_dm_uuid: get_dm_uuid /dev/mapper/mpathg1 uuid path is /sys/dev/block/253:13/dm/uuid
get_dm_uuid: get_dm_uuid /dev/mapper/mpathg1 uuid is part1-mpath-3600508b300954b90aa418f385e820016

command: Running command: /usr/sbin/blkid -o udev -p /dev/mapper/mpathg1
get_dm_uuid: get_dm_uuid /dev/mapper/mpathg1 uuid path is /sys/dev/block/253:13/dm/uuid
get_dm_uuid: get_dm_uuid /dev/mapper/mpathg1 uuid is part1-mpath-3600508b300954b90aa418f385e820016

command: Running command: /sbin/blkid -p -s TYPE -o value -- /dev/mapper/mpathg1
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
command: Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
mount: Mounting /dev/mapper/mpathg1 on /var/lib/ceph/tmp/mnt.WJ9Mm4 with options noatime,inode64
command_check_call: Running command: /usr/bin/mount -t xfs -o noatime,inode64 -- /dev/mapper/mpathg1 /var/lib/ceph/tmp/mnt.WJ9Mm4
command: Running command: /usr/sbin/restorecon /var/lib/ceph/tmp/mnt.WJ9Mm4
activate: Cluster uuid is 40216c8a-ba3c-4cec-9e80-212396e214a1
command: Running command: /usr/bin/ceph-osd --cluster=toad --show-config-value=fsid
command: Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
activate: Cluster name is ceph
activate: OSD uuid is 25efb560-300e-48d3-9156-2f4c0746a313
activate: OSD id is 13
activate: Initializing OSD...
command_check_call: Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/tmp/mnt.WJ9Mm4/activate.monmap
got monmap epoch 8
command: Running command: /usr/bin/timeout 300 ceph-osd --cluster ceph --mkfs --mkkey -i 13 --monmap /var/lib/ceph/tmp/mnt.WJ9Mm4/activate.monmap --osd-data /var/lib/ceph/tmp/mnt.WJ9Mm4 --osd-uuid 25efb560-300e-48d3-9156-2f4c0746a313 --keyring /var/lib/ceph/tmp/mnt.WJ9Mm4/keyring --setuser ceph --setgroup ceph
mount_activate: Failed to activate
unmount: Unmounting /var/lib/ceph/tmp/mnt.WJ9Mm4
command_check_call: Running command: /bin/umount -- /var/lib/ceph/tmp/mnt.WJ9Mm4
Traceback (most recent call last):
  File "/usr/sbin/ceph-disk", line 11, in <module>
    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5731, in run
    main(sys.argv[1:])
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5682, in main
    args.func(args)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3755, in main_activate
    reactivate=args.reactivate,
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3512, in mount_activate
    (osd_id, cluster) = activate(path, activate_key_template, init)
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3688, in activate
    keyring=keyring,
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3100, in mkfs
    '--setgroup', get_ceph_group(),
  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3064, in ceph_osd_mkfs
    raise Error('%s failed : %s' % (str(arguments), error))
ceph_disk.main.Error: Error: ['ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', u'13', '--monmap', '/var/lib/ceph/tmp/mnt.WJ9Mm4/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.WJ9Mm4', '--osd-uuid', u'25efb560-300e-48d3-9156-2f4c0746a313', '--keyring', '/var/lib/ceph/tmp/mnt.WJ9Mm4/keyring', '--setuser', 'ceph', '--setgroup', 'ceph'] failed :

*** Caught signal (Illegal instruction) **
 in thread 7f0c68b4bd00 thread_name:ceph-osd
 ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
 1: (()+0x9b3458) [0x55a55ca1a458]
 2: (()+0x122c0) [0x7f0c6620d2c0]
 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x55a55ce0c939]
 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x55a55cd0787f]
 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x55a55ccd3ed3]
 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x55a55ccd5941]
 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x55a55ccd6fde]
 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x55a55c9646cd]
 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x55a55c965e14]
 10: (BlueStore::_open_db(bool)+0x4f3) [0x55a55c8f48e3]
 11: (BlueStore::mkfs()+0x8e8) [0x55a55c920a38]
 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x55a55c48f8b9]
 13: (main()+0xe29) [0x55a55c3e78b9]
 14: (__libc_start_main()+0xea) [0x7f0c6517a4da]
 15: (_start()+0x2a) [0x55a55c46d98a]
2017-07-29 09:15:16.982152 7f0c68b4bd00 -1 *** Caught signal (Illegal instruction) **
 in thread 7f0c68b4bd00 thread_name:ceph-osd

 ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
 1: (()+0x9b3458) [0x55a55ca1a458]
 2: (()+0x122c0) [0x7f0c6620d2c0]
 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x55a55ce0c939]
 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x55a55cd0787f]
 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x55a55ccd3ed3]
 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x55a55ccd5941]
 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x55a55ccd6fde]
 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x55a55c9646cd]
 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x55a55c965e14]
 10: (BlueStore::_open_db(bool)+0x4f3) [0x55a55c8f48e3]
 11: (BlueStore::mkfs()+0x8e8) [0x55a55c920a38]
 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x55a55c48f8b9]
 13: (main()+0xe29) [0x55a55c3e78b9]
 14: (__libc_start_main()+0xea) [0x7f0c6517a4da]
 15: (_start()+0x2a) [0x55a55c46d98a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

2017-07-29 09:15:16.982152 7f0c68b4bd00 -1 *** Caught signal (Illegal instruction) **
 in thread 7f0c68b4bd00 thread_name:ceph-osd
ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
 1: (()+0x9b3458) [0x55a55ca1a458]
 2: (()+0x122c0) [0x7f0c6620d2c0]
 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x55a55ce0c939]
 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x55a55cd0787f]
 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x55a55ccd3ed3]
 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x55a55ccd5941]
 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x55a55ccd6fde]
 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x55a55c9646cd]
 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x55a55c965e14]
 10: (BlueStore::_open_db(bool)+0x4f3) [0x55a55c8f48e3]
 11: (BlueStore::mkfs()+0x8e8) [0x55a55c920a38]
 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x55a55c48f8b9]
 13: (main()+0xe29) [0x55a55c3e78b9]
 14: (__libc_start_main()+0xea) [0x7f0c6517a4da]
 15: (_start()+0x2a) [0x55a55c46d98a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

/usr/bin/timeout: the monitored command dumped core


Version-Release number of selected component (if applicable):
ceph-osd-12.1.1-3.fc27.x86_64

How reproducible:
Always, on two of my machines (one with pentium4 CPU, the other with AMD Opteron 8354)

Steps to Reproduce:
1. ceph-disk prepare /dev/xxx
2. ceph-disk activate /dev/xxx
3.

Actual results:
The trackback.

Expected results:
Running OSD

Additional info:
Comment 1 Loic Dachary 2017-07-29 03:30:12 EDT
Hi !

It would be very useful to have detailed steps to reproduce. Would you mind explaining how multipath was set ? Also I'm curious about why you have used the --reactivate flag ?

    ceph-disk -v activate --reactivate /dev/mapper/mpathg1

Thanks !
Comment 2 Tomasz Torcz 2017-07-29 03:40:29 EDT
I'm using --reactivate because I'm recreating btrfs-based OSDs with bluestore. I'm using steps documented on http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/#replacing-an-osd , but with the '--reactivate' because '--osd-id' from step 3. seem to no longer exists.

Multipath is irrelevant, I get the same backtrace on OSD using plain partitions, but for the sake of completness:

# multipath -ll /dev/mapper/mpathg
mpathg (3600508b300954b90aa418f385e820016) dm-6 COMPAQ  ,MSA1000 VOLUME  
size=279G features='1 queue_if_no_path' hwhandler='1 hp_sw' wp=rw
|-+- policy='service-time 0' prio=2 status=enabled
| `- 4:0:0:5 sdl        8:176  active ghost running
`-+- policy='service-time 0' prio=4 status=active
  `- 2:0:0:5 sde        8:64   active ready running

lblk:
sde              8:64   0 279.4G  0 disk  
└─mpathg       253:6    0 279.4G  0 mpath 
  ├─mpathg1    253:13   0   100M  0 part  
  └─mpathg2    253:14   0 279.3G  0 part  
sdl              8:176  0 279.4G  0 disk  
└─mpathg       253:6    0 279.4G  0 mpath 
  ├─mpathg1    253:13   0   100M  0 part  
  └─mpathg2    253:14   0 279.3G  0 part  

And no special configuration, just multipath defaults.



To rule out multipath, here's the backtrace from OTHER machine. It has sda3 as mountable OSD store and sda6 as block store.


# ceph-disk activate /dev/sda3 1>&2 2>/tmp/out.txt
got monmap epoch 8
mount_activate: Failed to activate
ceph-disk: Error: ['ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', u'8', '--monmap', '/var/lib/ceph/tmp/mnt.eD_OyV/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/
mnt.eD_OyV', '--osd-uuid', u'7671ea65-8cdd-407f-963b-fa4ad85ba9b1', '--keyring', '/var/lib/ceph/tmp/mnt.eD_OyV/keyring', '--setuser', 'ceph', '--setgroup', 'ceph'] failed : *** Ca
ught signal (Illegal instruction) **
 in thread 7f0029f4cd00 thread_name:ceph-osd
 ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
 1: (()+0x9b3458) [0x22fe05a458]
 2: (()+0x12720) [0x7f002761b720]
 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x22fe44c939]
 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x22fe34787f]
 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x22fe313ed3]
 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescript
or, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x2
2fe315941]
 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x22fe316fde]
 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x22fdfa46cd]
 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x22fdfa5e14]
 10: (BlueStore::_open_db(bool)+0x4f3) [0x22fdf348e3]
 11: (BlueStore::mkfs()+0x8e8) [0x22fdf60a38]
 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x22fdacf8b9]
 13: (main()+0xe29) [0x22fda278b9]
 14: (__libc_start_main()+0xea) [0x7f002653400a]
 15: (_start()+0x2a) [0x22fdaad98a]
2017-07-29 09:38:28.304361 7f0029f4cd00 -1 *** Caught signal (Illegal instruction) **
 in thread 7f0029f4cd00 thread_name:ceph-osd

 ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
 1: (()+0x9b3458) [0x22fe05a458]
 2: (()+0x12720) [0x7f002761b720]
 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x22fe44c939]
 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x22fe34787f]
 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x22fe313ed3]
 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescript
or, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x2
2fe315941]
 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x22fe316fde]
 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x22fdfa46cd]
 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x22fdfa5e14]
 10: (BlueStore::_open_db(bool)+0x4f3) [0x22fdf348e3]
 11: (BlueStore::mkfs()+0x8e8) [0x22fdf60a38]
 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x22fdacf8b9]
 13: (main()+0xe29) [0x22fda278b9]
 14: (__libc_start_main()+0xea) [0x7f002653400a]
 15: (_start()+0x2a) [0x22fdaad98a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2017-07-29 09:38:28.304361 7f0029f4cd00 -1 *** Caught signal (Illegal instruction) **
 in thread 7f0029f4cd00 thread_name:ceph-osd
 ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
 1: (()+0x9b3458) [0x22fe05a458]
 2: (()+0x12720) [0x7f002761b720]
 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x22fe44c939]
 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x22fe34787f]
 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x22fe313ed3]
 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x22fe315941]
 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x22fe316fde]
 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x22fdfa46cd]
 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x22fdfa5e14]
 10: (BlueStore::_open_db(bool)+0x4f3) [0x22fdf348e3]
 11: (BlueStore::mkfs()+0x8e8) [0x22fdf60a38]
 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x22fdacf8b9]
 13: (main()+0xe29) [0x22fda278b9]
 14: (__libc_start_main()+0xea) [0x7f002653400a]
 15: (_start()+0x2a) [0x22fdaad98a]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

/usr/bin/timeout: the monitored command dumped core
Comment 3 Loic Dachary 2017-07-29 07:49:34 EDT
It's interesting that it can be reproduced without multipath (because multipath adds a level of complexity that may make it more difficult to diagnose the problem). 

Would it be possible for me to reproduce the same problem ? Should I first create an OSD with a btrfs file system, then try to migrate that OSD to bluestore ? Ideally I would run a series of commands on my own machine and run into the same problem. Could you send me such a series of commands ?

Thanks !
Comment 4 Tomasz Torcz 2017-07-30 05:19:20 EDT
There's no need to start from btrfs OSD. This is 100% reproducible for me on clean disks, when creating new OSD with 12.1.1. Following steps should let you reproduce it:

1. Get Fedora rawhide installed.
2. Rawhide repositories seem not be updated for past few days, so get ceph 12.1.1-3 build manually:

koji download-build --arch=x86_64 923148

3. Install downloaded RPMS

4. Prepare disk with two partitions. For me it would be sda3 and sda6:

# wipefs -a /dev/sda6
# wipefs -a /dev/sda3
/dev/sda3: 4 bytes were erased at offset 0x00000000 (xfs): 58 46 53 42

5. ceph-prepare disk:
# ceph-disk prepare /dev/sda3
set_data_partition: incorrect partition UUID: 0x83, expected ['4fbd7e29-9d25-41b8-afd0-5ec00ceff05d', '4fbd7e29-9d25-41b8-afd0-062c0ceff05d', '4fbd7e29-8ae0-4982-bf9d-5a8d867af560', '4fbd7e29-9d25-41b8-afd0-35865ceff05d']
meta-data=/dev/sda3              isize=2048   agcount=4, agsize=1310720 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
data     =                       bsize=4096   blocks=5242880, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


5. More preparation - point to block storage partition. It should be symlink to partuuid, but for clarity I've used short name:

# mount /dev/sda3 /mnt/tmp
# ls -l /mnt/tmp
total 16
-rw-r--r--. 1 ceph ceph 37 Jul 30 11:06 ceph_fsid
-rw-r--r--. 1 ceph ceph 37 Jul 30 11:06 fsid
-rw-r--r--. 1 ceph ceph 21 Jul 30 11:06 magic
-rw-r--r--. 1 ceph ceph 10 Jul 30 11:06 type
# ln -s /dev/sda6 /mnt/tmp/block
# chown ceph:ceph /dev/sda6
# umount /mnt/tmp


6. Activate new OSD, receive backtrace:

# ceph-disk activate /dev/sda3
got monmap epoch 8
mount_activate: Failed to activate
ceph-disk: Error: ['ceph-osd', '--cluster', 'ceph', '--mkfs', '--mkkey', '-i', u'17', '--monmap', '/var/lib/ceph/tmp/mnt.qG3_2R/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.qG3_2R', '--osd-uuid', u'2e70309e-12dd-4a54-9547-ab68a3f842de', '--keyring', '/var/lib/ceph/tmp/mnt.qG3_2R/keyring', '--setuser', 'ceph', '--setgroup', 'ceph'] failed : *** Caught signal (Illegal instruction) **
 in thread 7f2653804d00 thread_name:ceph-osd
 ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)
 1: (()+0x9b3458) [0x906fb14458]
 2: (()+0x12720) [0x7f2650ed3720]
 3: (rocksdb::VersionBuilder::SaveTo(rocksdb::VersionStorageInfo*)+0x7c9) [0x906ff06939]
 4: (rocksdb::VersionSet::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool)+0x15cf) [0x906fe0187f]
 5: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0x113) [0x906fdcded3]
 6: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0xf31) [0x906fdcf941]
 7: (rocksdb::DB::Open(rocksdb::Options const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::DB**)+0x66e) [0x906fdd0fde]
 8: (RocksDBStore::do_open(std::ostream&, bool)+0x62d) [0x906fa5e6cd]
 9: (RocksDBStore::create_and_open(std::ostream&)+0x174) [0x906fa5fe14]
 10: (BlueStore::_open_db(bool)+0x4f3) [0x906f9ee8e3]
 11: (BlueStore::mkfs()+0x8e8) [0x906fa1aa38]
 12: (OSD::mkfs(CephContext*, ObjectStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, uuid_d, int)+0x259) [0x906f5898b9]
 13: (main()+0xe29) [0x906f4e18b9]
 14: (__libc_start_main()+0xea) [0x7f264fdec00a]
 15: (_start()+0x2a) [0x906f56798a]
2017-07-30 11:09:10.793415 7f2653804d00 -1 *** Caught signal (Illegal instruction) **
 in thread 7f2653804d00 thread_name:ceph-osd

[… snipped … ]
Comment 5 Loic Dachary 2017-07-31 02:27:46 EDT
Thanks for the detailed instructions, this is very helpful. Preparing each partition individually is uncommon but it should not crash the way it does. I'll reproduce this and figure out what to do.

Unrelated question: is there a reason why you do not simply ceph-disk prepare /dev/sda and let it partition the disk itself ?
Comment 6 Tomasz Torcz 2017-07-31 09:33:34 EDT
This is my experimental CEPH cluster, running latest code to catch bugs like this one early. As such, is not production ready, created from generally decommisioned hardware and this particular node has only one HDD. This drive is shared between operating system and OSD – thus partitions.
Comment 7 Jan Kurik 2017-08-15 02:43:15 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 27 development cycle.
Changing version to '27'.

Note You need to log in before you can comment on or make changes to this bug.