Bug 2292323

Summary: [ceph-bluestore-tool] bluefs-export using CBT returns mount failed error
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Harsh Kumar <hakumar>
Component: RADOSAssignee: Adam Kupczyk <akupczyk>
Status: CLOSED ERRATA QA Contact: Harsh Kumar <hakumar>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.3CC: akupczyk, bhubbard, bkunal, ceph-eng-bugs, cephqe-warriors, dwalveka, gsitlani, jcaratza, ngangadh, nojha, racpatel, rzarzyns, vumrao
Target Milestone: ---Keywords: Automation, Regression
Target Release: 5.3z7Flags: dwalveka: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-16.2.10-265 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-06-26 10:02:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Harsh Kumar 2024-06-14 03:36:27 UTC
Description of problem:
  Following upstream documentation for ceph-bluestore-tools
  bluefs-export

      Export the contents of BlueFS (i.e., RocksDB files) to an output directory.

  ceph-bluestore-tool bluefs-export --path osd path --out-dir dir

  Refer: https://docs.ceph.com/en/latest/man/8/ceph-bluestore-tool/

  The bluefs-export command is currently failing on latest nightly build of RHCS 5.3 (16.2.10-264.el8cp)

  Version-Release number of selected component (if applicable):
  ceph version 16.2.10-264.el8cp (8f5f7a32a6ad0fa100fdf9e8823564d26e554e9d) pacific (stable)


How reproducible:
3/3

Steps to Reproduce:
1. Deploy a RHCS 5.3 cluster
2. On an OSD node, stop any OSD service at random
  # systemctl stop ceph-3d3ab846-2951-11ef-b4fa-fa163e72f4bd.service
3. From inside the OSD container, run the ceph-bluestore-tool bluefs-export command
  # cephadm shell --name osd.4 -- ceph-bluestore-tool bluefs-export --out-dir /tmp/ --path /var/lib/ceph/osd/ceph-4

Actual results:
[root@ceph-hakumar-pl12lg-node4 ~]# systemctl stop ceph-3d3ab846-2951-11ef-b4fa-fa163e72f4bd.service
[root@ceph-hakumar-pl12lg-node4 ~]# cephadm shell --name osd.4
Inferring fsid 3d3ab846-2951-11ef-b4fa-fa163e72f4bd
Inferring config /var/lib/ceph/3d3ab846-2951-11ef-b4fa-fa163e72f4bd/osd.4/config
Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:64dd6a2d61230837791b0bcf23b79b2b022f41cab332670593502ee9458fc4bc
[ceph: root@ceph-hakumar-pl12lg-node4 /]# ceph-bluestore-tool bluefs-export --out-dir /tmp/ --path /var/lib/ceph/osd/ceph-4
inferring bluefs devices from bluestore path
 slot 1 /var/lib/ceph/osd/ceph-4/block -> /dev/dm-1
unable to mount bluefs: (14) Bad address
2024-06-13T10:44:58.778+0000 7f9351dd2540 -1 bluefs _verify_alloc_granularity OP_FILE_UPDATE of 1:0x4c2000~50000 does not align to alloc_size 0x10000
2024-06-13T10:44:58.778+0000 7f9351dd2540 -1 bluefs mount failed to replay log: (14) Bad address

Expected results:
[ceph: root@ceph-sumar-regression-1pz1zj-node4 /]# ceph-bluestore-tool bluefs-export --out-dir /tmp/ --path /var/lib/ceph/osd/ceph-6
inferring bluefs devices from bluestore path
 slot 1 /var/lib/ceph/osd/ceph-6/block -> /dev/dm-2
db/
db/000030.sst
db/000035.sst
db/000036.sst
db/000037.sst
db/CURRENT
db/IDENTITY
db/LOCK
db/MANIFEST-000040
db/OPTIONS-000034
db/OPTIONS-000042
db.slow/
db.wal/
db.wal/000039.log
db.wal/000043.log
db.wal/000044.log
db.wal/000045.log
db.wal/000046.log
db.wal/000047.log
db.wal/000048.log
db.wal/000049.log
db.wal/000050.log
db.wal/000051.log
db.wal/000052.log
db.wal/000053.log
db.wal/000054.log
db.wal/000055.log
sharding/
sharding/def

Additional info:
Failure on RHCS 5.3 
=======================================
[root@ceph-hakumar-pl12lg-node1-installer ~]# cephadm shell -- ceph versions
Inferring fsid 3d3ab846-2951-11ef-b4fa-fa163e72f4bd
Using recent ceph image registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:f5ec1577dc3deeb5add748f08ecb54f55b1ebecc2d2d5a1d470c390083de9428
{
    "mon": {
        "ceph version 16.2.10-264.el8cp (8f5f7a32a6ad0fa100fdf9e8823564d26e554e9d) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.10-264.el8cp (8f5f7a32a6ad0fa100fdf9e8823564d26e554e9d) pacific (stable)": 3
    },
    "osd": {
        "ceph version 16.2.10-264.el8cp (8f5f7a32a6ad0fa100fdf9e8823564d26e554e9d) pacific (stable)": 7
    },
    "mds": {
        "ceph version 16.2.10-264.el8cp (8f5f7a32a6ad0fa100fdf9e8823564d26e554e9d) pacific (stable)": 2
    },
    "rgw": {
        "ceph version 16.2.10-264.el8cp (8f5f7a32a6ad0fa100fdf9e8823564d26e554e9d) pacific (stable)": 2
    },
    "overall": {
        "ceph version 16.2.10-264.el8cp (8f5f7a32a6ad0fa100fdf9e8823564d26e554e9d) pacific (stable)": 17
    }
}

[ceph: root@ceph-hakumar-pl12lg-node4 /]# ceph-bluestore-tool bluefs-export --out-dir /tmp/ --path /var/lib/ceph/osd/ceph-4
inferring bluefs devices from bluestore path
 slot 1 /var/lib/ceph/osd/ceph-4/block -> /dev/dm-1
unable to mount bluefs: (14) Bad address
2024-06-13T10:44:58.778+0000 7f9351dd2540 -1 bluefs _verify_alloc_granularity OP_FILE_UPDATE of 1:0x4c2000~50000 does not align to alloc_size 0x10000
2024-06-13T10:44:58.778+0000 7f9351dd2540 -1 bluefs mount failed to replay log: (14) Bad address

RHCS 7.1
===========================================
[root@ceph-sumar-regression-1pz1zj-node1-installer ~]# cephadm shell -- ceph versions
Inferring fsid 0ebe00a6-2945-11ef-acbc-fa163e3305ce
Inferring config /var/lib/ceph/0ebe00a6-2945-11ef-acbc-fa163e3305ce/mon.ceph-sumar-regression-1pz1zj-node1-installer/config
Using ceph image with id '5412073bd769' and tag '7-385' created on 2024-05-31 19:37:19 +0000 UTC
registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:579e5358418e176194812eeab523289a0c65e366250688be3f465f1a633b026d
{
    "mon": {
        "ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef (stable)": 3
    },
    "mgr": {
        "ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef (stable)": 2
    },
    "osd": {
        "ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef (stable)": 11
    },
    "mds": {
        "ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef (stable)": 5
    },
    "overall": {
        "ceph version 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef (stable)": 21
    }
}

[ceph: root@ceph-sumar-regression-1pz1zj-node4 /]# ceph-bluestore-tool bluefs-export --out-dir /tmp/ --path /var/lib/ceph/osd/ceph-6
inferring bluefs devices from bluestore path
 slot 1 /var/lib/ceph/osd/ceph-6/block -> /dev/dm-2
db/
db/000030.sst
db/000035.sst
db/000036.sst
db/000037.sst
db/CURRENT
db/IDENTITY
db/LOCK
db/MANIFEST-000040
db/OPTIONS-000034
db/OPTIONS-000042
db.slow/
db.wal/
db.wal/000039.log
db.wal/000043.log
db.wal/000044.log
db.wal/000045.log
db.wal/000046.log
db.wal/000047.log
db.wal/000048.log
db.wal/000049.log
db.wal/000050.log
db.wal/000051.log
db.wal/000052.log
db.wal/000053.log
db.wal/000054.log
db.wal/000055.log
sharding/
sharding/def

Comment 15 errata-xmlrpc 2024-06-26 10:02:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 5.3 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4118

Comment 16 Red Hat Bugzilla 2024-10-25 04:25:20 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days