Description of problem (please be detailed as possible and provide log
snippets):
- The OSD pods are in CLBO state with several restarts of containers.
-----------------------------------------------
rook-ceph-osd-0-588b7db67b-r9hnr 1/2 CrashLoopBackOff 329 (2m2s ago) 5d14h 10.130.2.33 dvtslocnw03-data.nbsdev.co.uk <none> <none>
rook-ceph-osd-2-767b5c54f5-rdwrj 1/2 CrashLoopBackOff 330 (88s ago) 5d14h 10.128.4.40 dvtslocnw01-data.nbsdev.co.uk <none> <none>
-----------------------------------------------
- The devices are attached to the node and there're no issues with the disk.
- The ceph osd pods are crashed with the below error:
-----------------------------------------------
2022-05-17T08:22:47.459126416Z debug -3> 2022-05-17T08:22:47.416+0000 7f10dcdd6080 1 bluefs _allocate unable to allocate 0x400000 on bdev 1, allocator name block, allocator type hybrid, capacity 0x4b0000000
0, block size 0x1000, free 0xd8e089000, fragmentation 0.586552, allocated 0x0
2022-05-17T08:22:47.459126416Z debug -2> 2022-05-17T08:22:47.416+0000 7f10dcdd6080 -1 bluefs _allocate allocation failed, needed 0x4000002022-05-17T08:22:47.459139711Z
2022-05-17T08:22:47.459139711Z debug -1> 2022-05-17T08:22:47.424+0000 7f10dcdd6080 -1 /builddir/build/BUILD/ceph-16.2.7/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_and_sync_log(std::unique_lo
ck<std::mutex>&, uint64_t, uint64_t)' thread 7f10dcdd6080 time 2022-05-17T08:22:47.417513+0000
2022-05-17T08:22:47.459139711Z /builddir/build/BUILD/ceph-16.2.7/src/os/bluestore/BlueFS.cc: 2554: FAILED ceph_assert(r == 0)
2022-05-17T08:22:47.459139711Z
2022-05-17T08:22:47.459139711Z ceph version 16.2.7-98.el8cp (b20d33c3b301e005bed203d3cad7245da3549f80) pacific (stable)
2022-05-17T08:22:47.459139711Z 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x158) [0x55cd61fc2e3c]
2022-05-17T08:22:47.459139711Z 2: ceph-osd(+0x56b056) [0x55cd61fc3056]
2022-05-17T08:22:47.459139711Z 3: (BlueFS::_flush_and_sync_log(std::unique_lock<std::mutex>&, unsigned long, unsigned long)+0x1c93) [0x55cd626c24f3]
2022-05-17T08:22:47.459139711Z 4: (BlueFS::_fsync(BlueFS::FileWriter*, std::unique_lock<std::mutex>&)+0x322) [0x55cd626c2f22]
2022-05-17T08:22:47.459139711Z 5: (BlueRocksWritableFile::Sync()+0x6c) [0x55cd626ea79c]
2022-05-17T08:22:47.459139711Z 6: (rocksdb::LegacyWritableFileWrapper::Sync(rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x1f) [0x55cd62b83aef]
2022-05-17T08:22:47.459139711Z 7: (rocksdb::WritableFileWriter::SyncInternal(bool)+0x402) [0x55cd62c95262]
2022-05-17T08:22:47.459139711Z 8: (rocksdb::WritableFileWriter::Sync(bool)+0x88) [0x55cd62c968a8]
2022-05-17T08:22:47.459139711Z 9: (rocksdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*, rocksdb::FileSystem*, rocksdb::ImmutableCFOptions co
nst&, rocksdb::MutableCFOptions const&, rocksdb::FileOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, std::vector<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, s
td::default_delete<rocksdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> > > >, ro
cksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::uniqu
e_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > con
st&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, unsigned long, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStat
s*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint, unsigned long)+0x2ddb) [
0x55cd62d63a0b]
-----------------------------------------------
Version of all relevant components (if applicable):
- v4.10
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
- Two OSD pods are restarting which makes the cluster unstable.
Is there any workaround available to the best of your knowledge?
N/A
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
N/A
Can this issue reproducible?
No, specific to the cu environment.
Can this issue reproduce from the UI?
N/A
If this is a regression, please provide more details to justify this:
N/A
Steps to Reproduce:
N/A
Actual results:
- The OSD pods are in CLBO state.
Expected results:
- The OSD pods should be running fine.
Additional info:
In the next comments