Bug 1856960 - [Tool] Update the ceph-bluestore-tool for adding rescue procedure for bluefs log replay
Summary: [Tool] Update the ceph-bluestore-tool for adding rescue procedure for bluefs ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 3.3
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 4.2
Assignee: Adam Kupczyk
QA Contact: Manohar Murthy
URL:
Whiteboard:
Depends On: 1821133 1856961
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-14 19:12 UTC by Neha Ojha
Modified: 2021-01-13 15:44 UTC (History)
23 users (show)

Fixed In Version: ceph-14.2.11-13.el8cp, ceph-14.2.11-13.el7cp
Doc Type: Bug Fix
Doc Text:
Cause: There was not enough checks on BlueFS log replay log size. When OSD was not processing any external requests, it still was periodically sending a small update to RocksDB. This translated to appending some to BlueFS log. Note: any actual OP on OSD would have triggered log compaction. Consequence: BlueFS log grows so large that it can no longer be read. It remained unnoticed until OSD restart. Fix: Once error condition is reached, OSD is unable to boot. Heuristic procedure has been created that attempts to find on device missing parts of log. It is enabled when "bluefs_replay_recovery=true" is set. Because it is only heuristic solution, fsck is necessary to check if process was successful. In normal mode, BlueFS compacts log right after bootup. To prevent this compaction "bluefs_replay_recovery_disable_compact=true" should be used until *fsck* returns success. So, fix procedure is 2 steps: 1) CHECK ceph-bluestore-tool -l /proc/self/fd/1 --log-level 5 --path *osd path* fsck --debug_bluefs=5/5 --bluefs_replay_recovery=true --bluefs_replay_recovery_disable_compact=true 2) ACTUAL FIX ceph-bluestore-tool -l /proc/self/fd/1 --log-level 5 --path *osd path* fsck --debug_bluefs=5/5 --bluefs_replay_recovery=true Result: Now OSD can boot up.
Clone Of: 1821133
Environment:
Last Closed: 2021-01-12 14:56:02 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 36930 0 None closed nautilus: Rescue procedure for extremely large bluefs log 2021-01-13 09:38:06 UTC
Red Hat Product Errata RHSA-2021:0081 0 None None None 2021-01-12 14:56:35 UTC

Comment 8 errata-xmlrpc 2021-01-12 14:56:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 4.2 Security and Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0081


Note You need to log in before you can comment on or make changes to this bug.