Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1619090 - nvme journal: ondisk fsid 00000000-0000-0000-0000-00000000000 doesn't match expected c325f439-6849-47ef-ac43-439d9909d391
nvme journal: ondisk fsid 00000000-0000-0000-0000-00000000000 doesn't match e...
Status: ASSIGNED
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible (Show other bugs)
3.1
Unspecified Unspecified
medium Severity unspecified
: z2
: 3.1
Assigned To: leseb
Vasu Kulkarni
Bara Ancincova
: Reopened
Depends On:
Blocks: 1629656 1584264
  Show dependency treegraph
 
Reported: 2018-08-20 00:22 EDT by Vasu Kulkarni
Modified: 2018-10-24 15:40 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
.When putting a dedicated journal on an NVMe device installation can fail When the `dedicated_devices` setting contains an NVMe device and it has partitions or signatures on it Ansible installation might fail with an error like the following: ---- journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected c325f439-6849-47ef-ac43-439d9909d391, invalid (someone else's?) journal ---- To work around this issue ensure there are no partitions or signatures on the NVMe device.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-10-08 14:32:39 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Vasu Kulkarni 2018-08-20 00:22:42 EDT
Description of problem:


I am not sure if this is ceph-ansible or core Ceph osd issue, Please feel free to change after first level analysis

specify dedicated journal on nvme with other config as

[clients]
pluto005.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'

[mdss]
pluto008.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'

[mgrs]
pluto004.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'

[mons]
pluto004.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'
pluto009.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'

[osds]
pluto005.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'
pluto006.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'
pluto010.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'


Running sensible playbook and following issue is seen

2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:got monmap epoch 1
2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837653 7f4463512d80 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837685 7f4463512d80 -1 journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected c325f439-6849-47ef-ac43-439d9909d391, invalid (someone else's?) journal
2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837727 7f4463512d80 -1 filestore(/var/lib/ceph/tmp/mnt.7k5fVX) mkjournal(1068): error creating journal on /var/lib/ceph/tmp/mnt.7k5fVX/journal: (22) Invalid argument
2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837783 7f4463512d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (22) Invalid argument
2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837855 7f4463512d80 -1 [0;31m ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.7k5fVX: (22) Invalid argument[0m
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:mount_activate: Failed to activate
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:Traceback (most recent call last):
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/sbin/ceph-disk", line 9, in <module>
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5735, in run
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:    main(sys.argv[1:])
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5688, in main
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:    main_catch(args.func, args)
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5713, in main_catch
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:    func(args)
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3776, in main_activate
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:    reactivate=args.reactivate,
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3539, in mount_activate
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:    (osd_id, cluster) = activate(path, activate_key_template, init)
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3716, in activate
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:    keyring=keyring,
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3183, in mkfs
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:    '--setgroup', get_ceph_group(),
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 566, in command_check_call
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:    return subprocess.check_call(arguments)
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:    raise CalledProcessError(retcode, cmd)
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '-i', u'0', '--monmap', '/var/lib/ceph/tmp/mnt.7k5fVX/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.7k5fVX', '--osd-journal', '/var/lib/ceph/tmp/mnt.7k5fVX/journal', '--osd-uuid', u'c325f439-6849-47ef-ac43-439d9909d391', '--setuser', 'ceph', '--setgroup', 'ceph']' returned non-zero exit status 1

Full logs:

http://magna002.ceph.redhat.com/rakesh-2018-08-17_07:29:56-smoke-luminous-distro-basic-pluto/306626/teuthology.log
Comment 3 leseb 2018-08-20 06:23:32 EDT
This error is reported by ceph itself when doing mkfs. Was the journal device purged correctly and removed from any Ceph metadata on it?
Thanks.

This does not seem like a Ceph ansible issue, although we could run checks for this I suppose.
Comment 4 Vasu Kulkarni 2018-08-20 22:40:41 EDT
osd_auto_discovery is false so I believe the device specified in inventory should be used cleaned up by ansible.  teuthology  also does its own cleanup as well at the beginning and I need to check that
Comment 7 Vasu Kulkarni 2018-10-01 17:17:30 EDT
John,

I was not using lv_create option in ceph-ansible during that time, I think we can add to clean up any old partitions manually if admin hits this issue and retry again.

Thanks
Comment 8 John Brier 2018-10-02 16:53:00 EDT
Vasu, is the Doc Text I added accurate/good?
Comment 10 Vasu Kulkarni 2018-10-05 17:41:42 EDT
That looks good to me, thanks.
Comment 11 John Brier 2018-10-05 18:33:04 EDT
Thanks Vasu.

I need to rebuild the Release Notes to pull in this Doc Text.

Note You need to log in before you can comment on or make changes to this bug.