Bug 2310344

Summary: [ceph-bluestore-tool] qfsck using CBT fails with allocator mismatch error
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Harsh Kumar <hakumar>
Component: RADOSAssignee: Adam Kupczyk <akupczyk>
Status: CLOSED ERRATA QA Contact: Harsh Kumar <hakumar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.0CC: akraj, akupczyk, bhubbard, ceph-eng-bugs, cephqe-warriors, nojha, tserlin, vumrao
Target Milestone: ---Keywords: Automation, Regression
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-19.2.0-37.el9cp Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-11-25 09:08:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2317218    

Description Harsh Kumar 2024-09-06 02:24:36 UTC
Description of problem:
Following the instructions mentioned in upstream documentation here https://docs.ceph.com/en/latest/man/8/ceph-bluestore-tool/

  qfsck

    run consistency check on BlueStore metadata comparing allocator data (from RocksDB CFB when exists and if not uses allocation-file) with ONodes state.


Upon execution of the command, the CBT complains about allocator mismatch between file and metadata
    bluestore::NCB::read_allocation_from_drive_for_bluestore_tool::FAILURE. Allocator from file and allocator from metadata differ::ret=-1

>>> NOTE: The command execution was passing with downstream Squid version 19.1.0-60.el9cp as on 29-Aug-2024
Automation Pass logs - http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-BD1Z0Q/ceph-bluestore-tool_utility_0.log

>>> Also passes in Reef: http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-X5LVH7/ceph-bluestore-tool_utility_0.log

Version-Release number of selected component (if applicable):
ceph version 19.1.0-71.el9cp (783f8688bece0437eed13f61111c3a4f633f4e38) squid (rc)

How reproducible:
1/1

Steps to Reproduce:
1. Choose an OSD at random and stop its service
    # systemctl stop ceph-cdb7f7c4-6b34-11ef-a888-fa163efb0216.service
2. Get inside the OSD container to run ceph-bluestore-tool commands
    # cephadm shell --name osd.4
3. Execute 'qfsck' command
    # ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-4/ qfsck     



Actual results:
    # ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-4/ qfsck     
    qfsck bluestore.quick-fsck
    2024-09-06T01:41:40.863+0000 7fae27a65980 -1 bluestore::NCB::operator()::(2)compare_allocators:: spillover
    2024-09-06T01:41:40.863+0000 7fae27a65980 -1 bluestore::NCB::compare_allocators::mismatch:: idx1=491 idx2=492
    2024-09-06T01:41:40.863+0000 7fae27a65980 -1 bluestore::NCB::read_allocation_from_drive_for_bluestore_tool::FAILURE. Allocator from file and allocator from metadata differ::ret=-1
    qfsck failed: (1) Operation not permitted
    [ceph: root@ceph-hakumar-pzd763-node10 /]# ceph --version
    ceph version 19.1.0-71.el9cp (783f8688bece0437eed13f61111c3a4f633f4e38) squid (rc)

Expected results:
    No error message. consistency should return success
    # cephadm shell --name osd.12 -- ceph-bluestore-tool qfsck --path /var/lib/ceph/osd/ceph-12
    qfsck bluestore.quick-fsck
    qfsck success

Additional info:
    [root@ceph-hakumar-pzd763-node10 ~]# systemctl stop ceph-cdb7f7c4-6b34-11ef-a888-fa163efb0216.service
    [root@ceph-hakumar-pzd763-node10 ~]# cephadm shell --name osd.4
    Inferring fsid cdb7f7c4-6b34-11ef-a888-fa163efb0216
    Inferring config /var/lib/ceph/cdb7f7c4-6b34-11ef-a888-fa163efb0216/osd.4/config
    Using ceph image with id '37e3ade1853a' and tag '<none>' created on 2024-09-04 14:23:38 +0000 UTC
    registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:6293da836007c2453cbf8ce5b948dfa7814b69ba8e7347295e77039a98af98c9
    Creating an OSD daemon form without an OSD FSID value
    [ceph: root@ceph-hakumar-pzd763-node10 /]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-4/ fsck
    fsck success
    [ceph: root@ceph-hakumar-pzd763-node10 /]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-4/ fsck --deep true
    fsck success
    [ceph: root@ceph-hakumar-pzd763-node10 /]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-4/ show-label      
    inferring bluefs devices from bluestore path
    {
    "/var/lib/ceph/osd/ceph-4/block": {
        "osd_uuid": "123a547c-4371-40af-9bb1-d859bb1b93bf",
        "size": 26839351296,
        "btime": "2024-09-05T03:27:51.529609+0000",
        "description": "main",
        "bfm_blocks": "6552576",
        "bfm_blocks_per_key": "128",
        "bfm_bytes_per_block": "4096",
        "bfm_size": "26839351296",
        "bluefs": "1",
        "ceph_fsid": "cdb7f7c4-6b34-11ef-a888-fa163efb0216",
        "ceph_version_when_created": "ceph version 19.1.0-71.el9cp (783f8688bece0437eed13f61111c3a4f633f4e38) squid (rc)",
        "created_at": "2024-09-05T03:27:53.170078Z",
        "elastic_shared_blobs": "1",
        "epoch": "23",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "multi": "yes",
        "osd_key": "AQA2Jdlm6bgLABAARTWsEAWSacK6pdOjm0rklA==",
        "osdspec_affinity": "osd_spec_collocated",
        "ready": "ready",
        "require_osd_release": "19",
        "type": "bluestore",
        "whoami": "4"
    }
    }
    [ceph: root@ceph-hakumar-pzd763-node10 /]# ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-4/ qfsck     
    qfsck bluestore.quick-fsck
    2024-09-06T01:41:40.863+0000 7fae27a65980 -1 bluestore::NCB::operator()::(2)compare_allocators:: spillover
    2024-09-06T01:41:40.863+0000 7fae27a65980 -1 bluestore::NCB::compare_allocators::mismatch:: idx1=491 idx2=492
    2024-09-06T01:41:40.863+0000 7fae27a65980 -1 bluestore::NCB::read_allocation_from_drive_for_bluestore_tool::FAILURE. Allocator from file and allocator from metadata differ::ret=-1
    qfsck failed: (1) Operation not permitted
    [ceph: root@ceph-hakumar-pzd763-node10 /]# ceph --version
    ceph version 19.1.0-71.el9cp (783f8688bece0437eed13f61111c3a4f633f4e38) squid (rc)

Comment 9 errata-xmlrpc 2024-11-25 09:08:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:10216

Comment 10 Red Hat Bugzilla 2025-03-26 04:26:04 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days