Created attachment 1458119 [details] osd.10.log Description of problem: one of the osds randomly flapped multiple times with 1525: FAILED assert(0 == "bluefs enospc") Version-Release number of selected component (if applicable): ceph version 12.2.5-27.el7cp How reproducible: Intermittent Steps: 1. Created cluster with only bluestore osds. Filled the cluster with ~70% data. Actual results: One of the osds (osd.10) started flapping. 2018-07-11 07:35:48.258555 7f354e0ae700 -1 bluefs _allocate failed to allocate 0x3000000 on bdev 2, free 0x1700000 2018-07-11 07:35:48.258569 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump free bin 0: 0 extents 2018-07-11 07:35:48.258574 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump free bin 1: 0 extents 2018-07-11 07:35:48.258576 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump free bin 2: 0 extents 2018-07-11 07:35:48.258577 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump free bin 3: 0 extents 2018-07-11 07:35:48.258579 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump free bin 4: 0 extents 2018-07-11 07:35:48.258580 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump free bin 5: 0 extents 2018-07-11 07:35:48.258582 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump free bin 6: 0 extents 2018-07-11 07:35:48.258584 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump free bin 7: 0 extents 2018-07-11 07:35:48.258585 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump free bin 8: 0 extents 2018-07-11 07:35:48.258587 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump free bin 9: 14 extents 2018-07-11 07:35:48.258588 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0x26fcc00000~100000 2018-07-11 07:35:48.258590 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0x26ff300000~100000 2018-07-11 07:35:48.258591 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0x33ed800000~100000 2018-07-11 07:35:48.258592 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0x7649900000~100000 2018-07-11 07:35:48.258594 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0x9c2d700000~100000 2018-07-11 07:35:48.258595 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0x9c4d200000~100000 2018-07-11 07:35:48.258597 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0x9d69d00000~200000 2018-07-11 07:35:48.258598 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0xa1f8e00000~100000 2018-07-11 07:35:48.258599 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0xa3b9d00000~100000 2018-07-11 07:35:48.258601 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0xc70a100000~500000 2018-07-11 07:35:48.258602 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0xc72bb00000~200000 2018-07-11 07:35:48.258603 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0xc76ae00000~100000 2018-07-11 07:35:48.258604 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0xc781400000~200000 2018-07-11 07:35:48.258606 7f354e0ae700 0 stupidalloc 0x0x55a30f764070 dump 0xc98f300000~300000 2018-07-11 07:35:48.258608 7f354e0ae700 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0x2f0901c 2018-07-11 07:35:48.264086 7f354e0ae700 -1 /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7f354e0ae700 time 2018-07-11 07:35:48.258621 /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/BlueFS.cc: 1525: FAILED assert(0 == "bluefs enospc") ceph version 12.2.5-27.el7cp (6b7d1a8cf2e2376fd0a12424947c8c97aa1c0890) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55a304abd710] 2: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1b1b) [0x55a304a3c13b] 3: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x188) [0x55a304a3c6a8] 4: (BlueRocksWritableFile::Flush()+0x3c) [0x55a304a5823c] 5: (rocksdb::WritableFileWriter::Flush()+0xaf) [0x55a304e2755f] 6: (rocksdb::WritableFileWriter::Sync(bool)+0x33) [0x55a304e27773] 7: (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status const&, rocksdb::CompactionJob::SubcompactionState*, rocksdb::RangeDelAggregator*, CompactionIterationStats*, rocksdb::Slice const*)+0xa1b) [0x55a304eed7ab] 8: (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0xea2) [0x55a304ef0222] 9: (rocksdb::CompactionJob::Run()+0x1de) [0x55a304ef0cde] 10: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, void*)+0x9c7) [0x55a304e7c687] 11: (rocksdb::DBImpl::BackgroundCallCompaction(void*)+0x283) [0x55a304e7fc73] 12: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x271) [0x55a304ee6e31] 13: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x3d) [0x55a304ee701d] 14: (()+0xb5070) [0x7f355fafc070] 15: (()+0x7dd5) [0x7f356016fdd5] 16: (clone()+0x6d) [0x7f355f260b3d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- Additional info: There were no IOs going on the cluster. There were slow/blocked requests when scrubbing was going on. 1 pg was stuck in active+clean+remapped state (bz #1600040)
Created attachment 1458772 [details] ceph.log
Created attachment 1458773 [details] mon logs
How reproducible is this? I'm looking at the luminous code and I think the only way this would happen is if the device is fast and/or the osd is idle, but rocksdb is doing compaction. We have a configurable that gives bluefs a minimum if 1 GB of free space. I think the way to address this is to bump that to, say, 5GB or 10GB. It's hard to tell if that will do the trick, though, without being able to reproduce... Can verify it reproduces, and then try it again with bluestore_bluefs_min_free = 10737418240 # 10gb (current default is 1gb) ?
Moving to 3.* unless we can reproduce to test setting bluestore_bluefs_min_free to a higher value.
Tomas, what is the current state of the customer cluster? We are working on backporting this patch, but that might take some time.
Adam(akupczyk) is investigating a way to get around this issue without applying the patch immediately.
Sounds good. Thanks Vikhyat.
We will be backporting https://github.com/ceph/ceph/pull/26735 to luminous and shipping it in 3.2z2.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2019:0911