Bug 2193399
Summary: | ceph: Corruption: unknown checksum type 4 (ceph-osd fails to start) | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Tomasz Torcz <tomek> | ||||
Component: | ceph | Assignee: | Kaleb KEITHLEY <kkeithle> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rawhide | CC: | branto, go-sig, hegjon, i, josef, kkeithle, loic, steve | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | ceph-18.1.0-0.1 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2023-06-19 09:41:21 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Tomasz Torcz
2023-05-05 14:13:43 UTC
Created attachment 1962578 [details]
ceph-osd.2.log
I see that rocksdb was updated, too: rocksdb-7.8.3-2.fc39.x86_64 → rocksdb-8.1.1-1.fc39.x86_64. Adding rocksdb maintainer. Checksum type 4 is kXXH3, it was added to RocksDB in 6.27.0 (2021-11-19). It was later made default checksum type. When you switched CEPH compilation to bundled RocksDB, it resulted in using RocksDB version: 6.15.5, which does not know type 4. Thus ceph clusters broke. What can be done for a distribution? Either Ceph will be ported to RocksDB 8.1, or bundled rocksdb should be updated to at least 6.27.0. Or maybe new checksum type could be backported to bundled 6.15.5? Patch adding XXH3: https://github.com/facebook/rocksdb/pull/9069/files Well, ceph doesn't use the system rocksdb (as noted above). See https://src.fedoraproject.org/rpms/ceph/blob/rawhide/f/ceph.spec#_1389 It uses its own bundled rocksdb because it doesn't build with the latest, system rocksb after rocksdb was updated in rawhide recently. (Yes, it has BR for rockdb-devel, and it could argued that that's a bug. It's a bit misleading at best.) It's unknown when ceph will refresh the bundled rocksdb or update ceph to work with newer rocksb. Just to be clear, ceph is not using system rocksdb NOW. But you switched to using bundled only two weeks ago (https://src.fedoraproject.org/rpms/ceph/c/e5f159485648a6d52f19d47e799c924eeb787fe8?branch=rawhide). Before it was using system RocksDB. This means ceph clusters which were created more than 2 weeks ago are all broken. (In reply to Tomasz Torcz from comment #6) > Just to be clear, ceph is not using system rocksdb NOW. But you switched to > using bundled only two weeks ago Correct. That's when rockdb was updated in rawhide to rocksdb-8.x > (https://src.fedoraproject.org/rpms/ceph/c/ > e5f159485648a6d52f19d47e799c924eeb787fe8?branch=rawhide). Before it was > using system RocksDB. That's correct. rocksdb was updated in rawhide and ceph 17.2.5 and 17.2.6 do not build with rocksdb-8.x. > This means ceph clusters which were created more than 2 weeks ago are all > broken. ceph clusters created on Fedora Rawhide in the last two weeks! You're certainly welcome to escalate with the ceph developers. The ceph tracker corresponding to this BZ is above. Anyone using rawhide, and who has built a new ceph cluster in the last two weeks will just have to rebuild. AFAIC that's just how it is on rawhide. Nobody in their right mind should deploy a production environment on rawhide. > ceph clusters created on Fedora Rawhide in the last two weeks!
No, that's not correct. My cluster was created in 2018 and it has stopped working because of this change. While using system-rocksdb, months ago, the checksum used was updated to kXXH3. Now, when Fedora CEPH is using bundled rocksdb which is old and does not know kXXH3, OSD doesn't start.
My cluster is not critical (I fire it up once a month to store some backups) and it's kept on Rawhide to notice such compatibility problems early. I understand that fixing this in Fedora is a lot of work and may not be worth. I'll wait until upstream Ceph updates bundled rocksdb or gets compatible with 8.1. No worries.
I appreciate the work you do with keeping Ceph running in Rawhide. Let's leave this bug as an explanation for other people who may stumble on this checksum issue.
should be fixed in ceph-18.1.0-0.1.fc39 (RC1)w3.ibm.com/w3publisher/mhhv-happenings/2023-u-s-holiday-schedule |