2151762 – ceph osd pod failed to start up with error: rocksdb: Corruption: Bad table magic number

Bug 2151762 - ceph osd pod failed to start up with error: rocksdb: Corruption: Bad table magic number

Summary: ceph osd pod failed to start up with error: rocksdb: Corruption: Bad table ma...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ceph
Sub Component:
Version:	4.10
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	---
Assignee:	Neha Ojha
QA Contact:	Elad
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-12-08 05:19 UTC by jiaxl
Modified:	2023-12-08 04:31 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-04-05 18:34:37 UTC
Embargoed:

Attachments	(Terms of Use)

Description jiaxl 2022-12-08 05:19:56 UTC

Description of problem (please be detailed as possible and provide log
snippets):

After installing ODF 4.10 on OCP 4.10, one of the ceph osd pods keeped crashed.
Logs:

debug 2022-12-07T12:18:23.249+0000 7f2a7013d080 4 rocksdb: Options.optimize_filters_for_hits: 0
debug 2022-12-07T12:18:23.249+0000 7f2a7013d080 4 rocksdb: Options.paranoid_file_checks: 0
debug 2022-12-07T12:18:23.249+0000 7f2a7013d080 4 rocksdb: Options.force_consistency_checks: 0
debug 2022-12-07T12:18:23.249+0000 7f2a7013d080 4 rocksdb: Options.report_bg_io_stats: 0
debug 2022-12-07T12:18:23.249+0000 7f2a7013d080 4 rocksdb: Options.ttl: 2592000
debug 2022-12-07T12:18:23.249+0000 7f2a7013d080 4 rocksdb: Options.periodic_compaction_seconds: 0
debug 2022-12-07T12:18:23.250+0000 7f2a7013d080 4 rocksdb: [column_family.cc:555] (skipping printing options)
debug 2022-12-07T12:18:23.250+0000 7f2a7013d080 4 rocksdb: [column_family.cc:555] (skipping printing options)
debug 2022-12-07T12:18:23.264+0000 7f2a7013d080 4 rocksdb: [db_impl/db_impl.cc:397] Shutdown: canceling all background work
debug 2022-12-07T12:18:23.265+0000 7f2a7013d080 4 rocksdb: [db_impl/db_impl.cc:573] Shutdown complete
debug 2022-12-07T12:18:23.267+0000 7f2a7013d080 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/000094.sst
debug 2022-12-07T12:18:23.268+0000 7f2a7013d080 -1 bluestore(/var/lib/ceph/osd/ceph-2) _open_db erroring opening db:
debug 2022-12-07T12:18:23.268+0000 7f2a7013d080 1 bluefs umount
debug 2022-12-07T12:18:23.269+0000 7f2a7013d080 1 bdev(0x557c4f81d000 /var/lib/ceph/osd/ceph-2/block) close
debug 2022-12-07T12:18:23.361+0000 7f2a7013d080 1 bdev(0x557c4f81cc00 /var/lib/ceph/osd/ceph-2/block) close
debug 2022-12-07T12:18:23.619+0000 7f2a7013d080 -1 osd.2 0 OSD:init: unable to mount object store
debug 2022-12-07T12:18:23.619+0000 7f2a7013d080 -1 ** ERROR: osd init failed: (5) Input/output error

Version of all relevant components (if applicable):

ODF version: 4.10.8
Local Storage operator: 4.10.0
Ceph version: 16.2.7-126.el8cp (fe0af61d104d48cb9d116cde6e593b5fc8c197e4) pacific (stable)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?
After some google searches, this issue is most relevant.
https://tracker.ceph.com/issues/54547

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
From the log, it is likely that this is an occasional bug

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:


Expected results:


Additional info:

Comment 11 Red Hat Bugzilla 2023-12-08 04:31:37 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days

Note You need to log in before you can comment on or make changes to this bug.