Created attachment 1865996 [details] ceph-osd.0.2022-03-14.log Description of problem: After upgrading to 17.1.0-0.2.rc1.fc37.x86_64, 5 out of 6 of my OSDs are crashing on start. 2022-03-14T11:20:44.682+0100 7ff5a50d0180 -1 bluestore::NCB::__restore_allocator::Failed open_for_read with error-code -2 2022-03-14T11:20:44.682+0100 7ff5a50d0180 0 bluestore(/var/lib/ceph/osd/ceph-0) _init_alloc::NCB::restore_allocator() failed! Run Full Recovery from ONodes (might ta ke a while) ... 2022-03-14T11:20:54.767+0100 7ff5a50d0180 -1 /builddir/build/BUILD/ceph-17.1.0/src/os/bluestore/AvlAllocator.cc: In function 'virtual void AvlAllocator::init_add_free (uint64_t, uint64_t)' thread 7ff5a50d0180 time 2022-03-14T11:20:54.766296+0100 /builddir/build/BUILD/ceph-17.1.0/src/os/bluestore/AvlAllocator.cc: 442: FAILED ceph_assert(offset + length <= uint64_t(device_size)) ceph version 17.1.0 (c675060073a05d40ef404d5921c81178a52af6e0) quincy (dev) (full log attached) Version-Release number of selected component (if applicable): 17.1.0-0.2.rc1.fc37.x86_64 How reproducible: Steps to Reproduce: 1. Upgrade working cluster to quincy rc1 release. 2. 3. Actual results: OSD crashing Expected results: OSD working. Additional info: My cluster has 3 control nodes running rawhide (mons, mgrs, mds). 1 physical server with 6 HDDs running 6 OSDs (rawhide). I'm using CephFS and RGW.
Try the latest build ceph-17.1.0-0.3.28.g1b309fef.fc37 at https://koji.fedoraproject.org/koji/buildinfo?buildID=1934049. I believe we are waiting for a compose before you can just dnf update. Or the scratch build of ceph-17.1.0-0.4.31.g1ccf6db7 at https://koji.fedoraproject.org/koji/taskinfo?taskID=84236387
https://github.com/ceph/ceph/pull/45342
FEDORA-2022-5ca7aa480b has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-5ca7aa480b
FEDORA-2022-5ca7aa480b has been pushed to the Fedora 37 stable repository. If problem still persists, please make note of it in this bug report.
Actually, g1ccf6db7 scratch build did not fix the problem, the OSDs are still crashing. But I cannot locate this commit in https://github.com/ceph/ceph/commits/quincy , so I do not know if PR was merged before g1ccf6db7. Anyway, the fix is known and in the upstream repo, so next release should work for me. I'm going to leave this bug closed.
FYI, the scratch build did not contain the fix. The fix was added (in Patch0020) to the koji build at https://koji.fedoraproject.org/koji/buildinfo?buildID=1935306. The fix is commit bf57e1631607dfb8446e9a2061a855c6cab4c09b in the quincy branch