Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 1619090

Summary: nvme journal: ondisk fsid 00000000-0000-0000-0000-00000000000 doesn't match expected c325f439-6849-47ef-ac43-439d9909d391
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasu Kulkarni <vakulkar>
Component: Ceph-AnsibleAssignee: Sébastien Han <shan>
Status: CLOSED NOTABUG QA Contact: Vasishta <vashastr>
Severity: high Docs Contact: Bara Ancincova <bancinco>
Priority: medium    
Version: 3.1CC: agunn, aschoen, ceph-eng-bugs, gabrioux, gmeno, hnallurv, jbrier, nthomas, pasik, tchandra, vakulkar, vashastr
Target Milestone: z2Keywords: Reopened
Target Release: 3.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
.When putting a dedicated journal on an NVMe device installation can fail When the `dedicated_devices` setting contains an NVMe device and it has partitions or signatures on it Ansible installation might fail with an error like the following: ---- journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected c325f439-6849-47ef-ac43-439d9909d391, invalid (someone else's?) journal ---- To work around this issue, ensure there are no partitions or signatures on the NVMe device.
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-03 15:12:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1584264, 1629656    

Description Vasu Kulkarni 2018-08-20 04:22:42 UTC
Description of problem:


I am not sure if this is ceph-ansible or core Ceph osd issue, Please feel free to change after first level analysis

specify dedicated journal on nvme with other config as

[clients]
pluto005.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'

[mdss]
pluto008.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'

[mgrs]
pluto004.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'

[mons]
pluto004.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'
pluto009.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'

[osds]
pluto005.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'
pluto006.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'
pluto010.ceph.redhat.com dedicated_devices='["/dev/nvme0n1"]' devices='["/dev/sdb"]' monitor_interface='eno1' public_network='10.8.128.0/21' radosgw_interface='eno1'


Running sensible playbook and following issue is seen

2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:got monmap epoch 1
2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837653 7f4463512d80 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837685 7f4463512d80 -1 journal check: ondisk fsid 00000000-0000-0000-0000-000000000000 doesn't match expected c325f439-6849-47ef-ac43-439d9909d391, invalid (someone else's?) journal
2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837727 7f4463512d80 -1 filestore(/var/lib/ceph/tmp/mnt.7k5fVX) mkjournal(1068): error creating journal on /var/lib/ceph/tmp/mnt.7k5fVX/journal: (22) Invalid argument
2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837783 7f4463512d80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (22) Invalid argument
2018-08-18T17:04:26.885 INFO:teuthology.orchestra.run.pluto009.stdout:2018-08-18 21:04:24.837855 7f4463512d80 -1 [0;31m ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.7k5fVX: (22) Invalid argument[0m
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:mount_activate: Failed to activate
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:Traceback (most recent call last):
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/sbin/ceph-disk", line 9, in <module>
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:    load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5735, in run
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:    main(sys.argv[1:])
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5688, in main
2018-08-18T17:04:26.886 INFO:teuthology.orchestra.run.pluto009.stdout:    main_catch(args.func, args)
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5713, in main_catch
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:    func(args)
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3776, in main_activate
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:    reactivate=args.reactivate,
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3539, in mount_activate
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:    (osd_id, cluster) = activate(path, activate_key_template, init)
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3716, in activate
2018-08-18T17:04:26.887 INFO:teuthology.orchestra.run.pluto009.stdout:    keyring=keyring,
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 3183, in mkfs
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:    '--setgroup', get_ceph_group(),
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 566, in command_check_call
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:    return subprocess.check_call(arguments)
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:  File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:    raise CalledProcessError(retcode, cmd)
2018-08-18T17:04:26.888 INFO:teuthology.orchestra.run.pluto009.stdout:subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', '--cluster', 'ceph', '--mkfs', '-i', u'0', '--monmap', '/var/lib/ceph/tmp/mnt.7k5fVX/activate.monmap', '--osd-data', '/var/lib/ceph/tmp/mnt.7k5fVX', '--osd-journal', '/var/lib/ceph/tmp/mnt.7k5fVX/journal', '--osd-uuid', u'c325f439-6849-47ef-ac43-439d9909d391', '--setuser', 'ceph', '--setgroup', 'ceph']' returned non-zero exit status 1

Full logs:

http://magna002.ceph.redhat.com/rakesh-2018-08-17_07:29:56-smoke-luminous-distro-basic-pluto/306626/teuthology.log

Comment 3 Sébastien Han 2018-08-20 10:23:32 UTC
This error is reported by ceph itself when doing mkfs. Was the journal device purged correctly and removed from any Ceph metadata on it?
Thanks.

This does not seem like a Ceph ansible issue, although we could run checks for this I suppose.

Comment 4 Vasu Kulkarni 2018-08-21 02:40:41 UTC
osd_auto_discovery is false so I believe the device specified in inventory should be used cleaned up by ansible.  teuthology  also does its own cleanup as well at the beginning and I need to check that

Comment 7 Vasu Kulkarni 2018-10-01 21:17:30 UTC
John,

I was not using lv_create option in ceph-ansible during that time, I think we can add to clean up any old partitions manually if admin hits this issue and retry again.

Thanks

Comment 8 John Brier 2018-10-02 20:53:00 UTC
Vasu, is the Doc Text I added accurate/good?

Comment 10 Vasu Kulkarni 2018-10-05 21:41:42 UTC
That looks good to me, thanks.

Comment 11 John Brier 2018-10-05 22:33:04 UTC
Thanks Vasu.

I need to rebuild the Release Notes to pull in this Doc Text.

Comment 14 Giridhar Ramaraju 2019-08-20 07:17:10 UTC
Level setting the severity of this defect to "High" with a bulk update. Pls
refine it to a more closure value, as defined by the severity definition in
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity