1600138 – [Bluestore]: one of the osds flapped multiple times with 1525: FAILED assert(0 == "bluefs enospc")

Bug 1600138 - [Bluestore]: one of the osds flapped multiple times with 1525: FAILED assert(0 == "bluefs enospc")

Summary: [Bluestore]: one of the osds flapped multiple times with 1525: FAILED assert(...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	RADOS
Sub Component:
Version:	3.1
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	z2
Target Release:	3.2
Assignee:	Sage Weil
QA Contact:	Manohar Murthy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1629656
TreeView+	depends on / blocked

Reported:	2018-07-11 13:57 UTC by Parikshith
Modified:	2023-10-06 17:52 UTC (History)
CC List:	23 users (show)
Fixed In Version:	RHEL: ceph-12.2.8-108.el7cp Ubuntu: ceph_12.2.8-93redhat1xenial
Doc Type:	Bug Fix
Doc Text:	.RocksDB compaction no longer exhausts free space of BlueFS Previously, the balancing of free space between main storage and storage for RocksDB, managed by BlueFS, happened only when write operations were underway. This caused an `ENOSPC` error for BlueFS to be returned when RocksDB compaction was triggered right before long interval without write operations. With this update, the code has been modified to periodically check free space balance even if no write operations are ongoing so that compaction no longer exhausts free space of BlueFS.
Clone Of:
Environment:
Last Closed:	2019-04-30 15:56:43 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
osd.10.log (17.54 MB, text/plain) 2018-07-11 13:57 UTC, Parikshith	no flags	Details
ceph.log (657.53 KB, text/plain) 2018-07-13 18:02 UTC, Parikshith	no flags	Details
mon logs (4.83 MB, text/plain) 2018-07-13 18:03 UTC, Parikshith	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Ceph Project Bug Tracker	38574	None	None	None	2019-03-07 19:00:44 UTC
Github	ceph ceph pull 26735	'None'	closed	mimic: os/bluestore: Fix bluefs - block balance.	2021-02-08 11:01:55 UTC
Red Hat Issue Tracker	RHCEPH-7625	None	None	None	2023-10-06 17:52:36 UTC
Red Hat Knowledge Base (Solution)	3973081	None	None	None	2019-04-13 06:55:17 UTC
Red Hat Product Errata	RHSA-2019:0911	None	None	None	2019-04-30 15:57:00 UTC

Description Parikshith 2018-07-11 13:57:13 UTC

Created attachment 1458119 [details]
osd.10.log

Description of problem:
one of the osds randomly flapped multiple times with 1525: FAILED assert(0 == "bluefs enospc")

Version-Release number of selected component (if applicable):
ceph version 12.2.5-27.el7cp

How reproducible:  
Intermittent

Steps: 
1. Created cluster with only bluestore osds. Filled the cluster with ~70% data.

Actual results:
One of the osds (osd.10) started flapping.

2018-07-11 07:35:48.258555 7f354e0ae700 -1 bluefs _allocate failed to allocate 0x3000000 on bdev 2, free 0x1700000
2018-07-11 07:35:48.258569 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump free bin 0: 0 extents
2018-07-11 07:35:48.258574 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump free bin 1: 0 extents
2018-07-11 07:35:48.258576 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump free bin 2: 0 extents
2018-07-11 07:35:48.258577 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump free bin 3: 0 extents
2018-07-11 07:35:48.258579 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump free bin 4: 0 extents
2018-07-11 07:35:48.258580 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump free bin 5: 0 extents
2018-07-11 07:35:48.258582 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump free bin 6: 0 extents
2018-07-11 07:35:48.258584 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump free bin 7: 0 extents
2018-07-11 07:35:48.258585 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump free bin 8: 0 extents
2018-07-11 07:35:48.258587 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump free bin 9: 14 extents
2018-07-11 07:35:48.258588 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0x26fcc00000~100000
2018-07-11 07:35:48.258590 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0x26ff300000~100000
2018-07-11 07:35:48.258591 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0x33ed800000~100000
2018-07-11 07:35:48.258592 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0x7649900000~100000
2018-07-11 07:35:48.258594 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0x9c2d700000~100000
2018-07-11 07:35:48.258595 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0x9c4d200000~100000
2018-07-11 07:35:48.258597 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0x9d69d00000~200000
2018-07-11 07:35:48.258598 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0xa1f8e00000~100000
2018-07-11 07:35:48.258599 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0xa3b9d00000~100000
2018-07-11 07:35:48.258601 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0xc70a100000~500000
2018-07-11 07:35:48.258602 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0xc72bb00000~200000
2018-07-11 07:35:48.258603 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0xc76ae00000~100000
2018-07-11 07:35:48.258604 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0xc781400000~200000
2018-07-11 07:35:48.258606 7f354e0ae700  0 stupidalloc 0x0x55a30f764070 dump  0xc98f300000~300000
2018-07-11 07:35:48.258608 7f354e0ae700 -1 bluefs _flush_range allocated: 0x0 offset: 0x0 length: 0x2f0901c
2018-07-11 07:35:48.264086 7f354e0ae700 -1 /builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7f354e0ae700 time 2018-07-11 07:35:48.258621
/builddir/build/BUILD/ceph-12.2.5/src/os/bluestore/BlueFS.cc: 1525: FAILED assert(0 == "bluefs enospc")

 ceph version 12.2.5-27.el7cp (6b7d1a8cf2e2376fd0a12424947c8c97aa1c0890) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x110) [0x55a304abd710]
 2: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1b1b) [0x55a304a3c13b]
 3: (BlueFS::_flush(BlueFS::FileWriter*, bool)+0x188) [0x55a304a3c6a8]
 4: (BlueRocksWritableFile::Flush()+0x3c) [0x55a304a5823c]
 5: (rocksdb::WritableFileWriter::Flush()+0xaf) [0x55a304e2755f]
 6: (rocksdb::WritableFileWriter::Sync(bool)+0x33) [0x55a304e27773]
 7: (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status const&, rocksdb::CompactionJob::SubcompactionState*, rocksdb::RangeDelAggregator*, CompactionIterationStats*, rocksdb::Slice const*)+0xa1b) [0x55a304eed7ab]
 8: (rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0xea2) [0x55a304ef0222]
 9: (rocksdb::CompactionJob::Run()+0x1de) [0x55a304ef0cde]
 10: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, void*)+0x9c7) [0x55a304e7c687]
 11: (rocksdb::DBImpl::BackgroundCallCompaction(void*)+0x283) [0x55a304e7fc73]
 12: (rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)+0x271) [0x55a304ee6e31]
 13: (rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)+0x3d) [0x55a304ee701d]
 14: (()+0xb5070) [0x7f355fafc070]
 15: (()+0x7dd5) [0x7f356016fdd5]
 16: (clone()+0x6d) [0x7f355f260b3d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---

Additional info:
There were no IOs going on the cluster.
There were slow/blocked requests when scrubbing was going on.
1 pg was stuck in active+clean+remapped state (bz #1600040)

Comment 7 Parikshith 2018-07-13 18:02:17 UTC

Created attachment 1458772 [details]
ceph.log

Comment 8 Parikshith 2018-07-13 18:03:07 UTC

Created attachment 1458773 [details]
mon logs

Comment 11 Sage Weil 2018-07-24 15:56:46 UTC

How reproducible is this?  I'm looking at the luminous code and I think the only way this would happen is if the device is fast and/or the osd is idle, but rocksdb is doing compaction.  We have a configurable that gives bluefs a minimum if 1 GB of free space.  I think the way to address this is to bump that to, say, 5GB or 10GB.  It's hard to tell if that will do the trick, though, without being able to reproduce...

Can verify it reproduces, and then try it again with

 bluestore_bluefs_min_free = 10737418240   # 10gb (current default is 1gb)

?

Comment 13 Josh Durgin 2018-09-14 22:29:57 UTC

Moving to 3.* unless we can reproduce to test setting bluestore_bluefs_min_free to a higher value.

Comment 21 Neha Ojha 2019-02-15 17:51:31 UTC

Tomas, what is the current state of the customer cluster? We are working on backporting this patch, but that might take some time.

Comment 25 Neha Ojha 2019-02-15 23:32:29 UTC

Adam(akupczyk) is investigating a way to get around this issue without applying the patch immediately.

Comment 39 Neha Ojha 2019-02-22 18:54:17 UTC

Sounds good. Thanks Vikhyat.

Comment 65 Neha Ojha 2019-03-07 19:00:06 UTC

We will be backporting https://github.com/ceph/ceph/pull/26735 to luminous and shipping it in 3.2z2.

Comment 74 errata-xmlrpc 2019-04-30 15:56:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0911

Note You need to log in before you can comment on or make changes to this bug.

akupczyk
anharris
assingh
ceph-eng-bugs
dzafman
edonnell
gsitlani
hnallurv
jdurgin
kchai
linuxkidd
mamccoma
mhackett
mmurthy
nojha
pasik
shalygin.k
sweil
tchandra
tonay
tpetr
tserlin
vumrao