Description of problem: I am not sure if this is ceph-ansible or core Ceph osd issue, Please feel free to change after first level analysis specify dedicated journal on nvme with other config as [clients] pluto005.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1' [mdss] pluto008.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1' [mgrs] pluto004.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1' [mons] pluto004.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1' pluto009.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1' [osds] pluto005.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1' pluto006.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1' pluto010.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1' Running sensible playbook and following issue is seen 2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:got monmap epoch 1 2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837653 7f4463512d80 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837685 7f4463512d80 -1 journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected c325f439-6849-47ef-ac43-439d9909d391, invalid (someone else's?) journal 2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837727 7f4463512d80 -1 filestore(/var/lib/ceph/tmp/mnt.7k5fVX) mkjournal(1068): error creating journal on /var/lib/ceph/tmp/mnt.7k5fVX/journal: (22) Invalid argument 2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837783 7f4463512d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (22) Invalid argument 2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837855 7f4463512d80 -1 [0;31m ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.7k5fVX: (22) Invalid argument[0m 2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:mount_activate: Failed to activate 2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:Traceback (most recent call last): 2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout: File "/sbin/ceph-disk", line 9, in <module> 2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout: load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')() 2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5735, in run 2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout: main(sys.argv[1:]) 2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5688, in main 2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout: main_catch(args.func, args) 2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5713, in main_catch 2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout: func(args) 2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3776, in main_activate 2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout: reactivate=args.reactivate, 2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3539, in mount_activate 2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout: (osd_id, cluster) = activate(path, activate_key_template, init) 2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3716, in activate 2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout: keyring=keyring, 2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3183, in mkfs 2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout: '--setgroup', get_ceph_group(), 2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout: File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 566, in command_check_call 2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout: return subprocess.check_call(arguments) 2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout: File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call 2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout: raise CalledProcessError(retcode, cmd) 2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '-i', u'0', '--monmap', '/var/lib/ceph/tmp/mnt.7k5fVX/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.7k5fVX', '--osd-journal', '/var/lib/ceph/tmp/mnt.7k5fVX/journal', '--osd-uuid', u'c325f439-6849-47ef-ac43-439d9909d391', '--setuser', 'ceph', '--setgroup', 'ceph']' returned non-zero exit status 1 Full logs: http://magna002.ceph.redhat.com/rakesh-2018-08-17_07:29:56-smoke-luminous-distro-basic-pluto/306626/teuthology.log
This error is reported by ceph itself when doing mkfs. Was the journal device purged correctly and removed from any Ceph metadata on it? Thanks. This does not seem like a Ceph ansible issue, although we could run checks for this I suppose.
osd_auto_discovery is false so I believe the device specified in inventory should be used cleaned up by ansible. teuthology also does its own cleanup as well at the beginning and I need to check that
John, I was not using lv_create option in ceph-ansible during that time, I think we can add to clean up any old partitions manually if admin hits this issue and retry again. Thanks
Vasu, is the Doc Text I added accurate/good?
That looks good to me, thanks.
Thanks Vasu. I need to rebuild the Release Notes to pull in this Doc Text.
Level setting the severity of this defect to "High" with a bulk update. Pls refine it to a more closure value, as defined by the severity definition in https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity