Previously, there was an error in the code and the code did not check if the ObjectStore collection (equivalent of PG) exists or not. As a result, there would be segmentation faults on accessing null objects.
With this fix, the code now checks and skips the operation if null and COT prints that the collection does not exist.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Red Hat Ceph Storage 7.1 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2024:9010
Description of problem: Observing segmentation faults on the OSD, when we try to list the omap entries on the OSD. # systemctl stop ceph-4ac55332-c500-11ee-ad37-fa163e664e45.service [root@ceph-pdhiran-hd3aat-node9 ~]# cephadm shell --name osd.8 -- ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-8 --pgid 11.1f benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object20 list-omap Inferring fsid 4ac55332-c500-11ee-ad37-fa163e664e45 Inferring config /var/lib/ceph/4ac55332-c500-11ee-ad37-fa163e664e45/osd.8/config Using ceph image with id '23f1e3d0a21b' and tag '<none>' created on 2024-01-31 00:09:32 +0000 UTC registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:4ce4fff33a42564a2f877420a1898e060d316b2c818c53258e7beb2cd57ce7f3 *** Caught signal (Segmentation fault) ** in thread 7ff86c8e6580 thread_name:ceph-objectstor ceph version 18.2.0-144.el9cp (f2621d6df88c0fe16f313952d9dd897bbec5d90d) reef (stable) 1: /lib64/libc.so.6(+0x54db0) [0x7ff86ceeedb0] 2: (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x4b) [0x55987c529c4b] 3: (_action_on_all_objects_in_pg(ObjectStore*, coll_t, action_on_object_t&, bool)+0x4cc) [0x55987c07476c] 4: (action_on_all_objects_in_exact_pg(ObjectStore*, coll_t, action_on_object_t&, bool)+0x64) [0x55987c075654] 5: main() 6: /lib64/libc.so.6(+0x3feb0) [0x7ff86ced9eb0] 7: __libc_start_main() 8: _start() Version-Release number of selected component (if applicable): # ceph version ceph version 18.2.0-144.el9cp (f2621d6df88c0fe16f313952d9dd897bbec5d90d) reef (stable) How reproducible: Always Steps to Reproduce: 1. Create a EC pool, write objects 2. Identify a test object, identify the primary OSD for the PG. # rados -p Inconsistent_snap_pool_ec ls benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object37 benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object38 benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object48 benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object43 # ceph osd map Inconsistent_snap_pool_ec benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object20 -f json-pretty { "epoch": 250, "pool": "Inconsistent_snap_pool_ec", "pool_id": 11, "objname": "benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object20", "raw_pgid": "11.367617bf", "pgid": "11.1f", "up": [ 8, 17, 11, 5 ], "up_primary": 8, "acting": [ 8, 17, 11, 5 ], "acting_primary": 8 } 3. Run COT command to get the omap list. Observe Crash. # systemctl stop ceph-4ac55332-c500-11ee-ad37-fa163e664e45.service [root@ceph-pdhiran-hd3aat-node9 ~]# cephadm shell --name osd.8 -- ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-8 --pgid 11.1f benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object20 list-omap Inferring fsid 4ac55332-c500-11ee-ad37-fa163e664e45 Inferring config /var/lib/ceph/4ac55332-c500-11ee-ad37-fa163e664e45/osd.8/config Using ceph image with id '23f1e3d0a21b' and tag '<none>' created on 2024-01-31 00:09:32 +0000 UTC registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:4ce4fff33a42564a2f877420a1898e060d316b2c818c53258e7beb2cd57ce7f3 *** Caught signal (Segmentation fault) ** in thread 7ff86c8e6580 thread_name:ceph-objectstor ceph version 18.2.0-144.el9cp (f2621d6df88c0fe16f313952d9dd897bbec5d90d) reef (stable) 1: /lib64/libc.so.6(+0x54db0) [0x7ff86ceeedb0] 2: (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x4b) [0x55987c529c4b] 3: (_action_on_all_objects_in_pg(ObjectStore*, coll_t, action_on_object_t&, bool)+0x4cc) [0x55987c07476c] 4: (action_on_all_objects_in_exact_pg(ObjectStore*, coll_t, action_on_object_t&, bool)+0x64) [0x55987c075654] 5: main() 6: /lib64/libc.so.6(+0x3feb0) [0x7ff86ced9eb0] 7: __libc_start_main() 8: _start() Actual results: Observing segmentation fault Expected results: There should not be segmentation fault for the command execution Additional info: