Bug 2263023 - Segmentation fault observed on OSD upon running COT command to list omap entries on the OSD for an EC pool
Summary: Segmentation fault observed on OSD upon running COT command to list omap entr...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RADOS
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 7.1z2
Assignee: Adam Kupczyk
QA Contact: Harsh Kumar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-02-06 16:41 UTC by Pawan
Modified: 2024-11-07 20:58 UTC (History)
10 users (show)

Fixed In Version: ceph-18.2.1-244.el9cp
Doc Type: Bug Fix
Doc Text:
Previously, there was an error in the code and the code did not check if the ObjectStore collection (equivalent of PG) exists or not. As a result, there would be segmentation faults on accessing null objects. With this fix, the code now checks and skips the operation if null and COT prints that the collection does not exist.
Clone Of:
Environment:
Last Closed: 2024-11-07 14:38:28 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 58353 0 None open tools/objectstore: check for wrong coll open_collection 2024-07-01 07:49:54 UTC
Github ceph ceph pull 58734 0 None open reef: tools/objectstore: check for wrong coll open_collection 2024-07-23 13:55:21 UTC
Red Hat Bugzilla 2262907 0 unspecified POST Segmentation fault encountered during object manipulation using ceph-objectstore-tool 2024-11-01 08:28:33 UTC
Red Hat Issue Tracker RHCEPH-8267 0 None None None 2024-02-06 16:45:36 UTC
Red Hat Product Errata RHBA-2024:9010 0 None None None 2024-11-07 14:38:31 UTC

Description Pawan 2024-02-06 16:41:59 UTC
Description of problem:

Observing segmentation faults on the OSD, when we try to list the omap entries on the OSD.

# systemctl stop ceph-4ac55332-c500-11ee-ad37-fa163e664e45.service
[root@ceph-pdhiran-hd3aat-node9 ~]# cephadm shell --name osd.8 -- ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-8 --pgid 11.1f benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object20  list-omap
Inferring fsid 4ac55332-c500-11ee-ad37-fa163e664e45
Inferring config /var/lib/ceph/4ac55332-c500-11ee-ad37-fa163e664e45/osd.8/config
Using ceph image with id '23f1e3d0a21b' and tag '<none>' created on 2024-01-31 00:09:32 +0000 UTC
registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:4ce4fff33a42564a2f877420a1898e060d316b2c818c53258e7beb2cd57ce7f3
*** Caught signal (Segmentation fault) **
 in thread 7ff86c8e6580 thread_name:ceph-objectstor
 ceph version 18.2.0-144.el9cp (f2621d6df88c0fe16f313952d9dd897bbec5d90d) reef (stable)
 1: /lib64/libc.so.6(+0x54db0) [0x7ff86ceeedb0]
 2: (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x4b) [0x55987c529c4b]
 3: (_action_on_all_objects_in_pg(ObjectStore*, coll_t, action_on_object_t&, bool)+0x4cc) [0x55987c07476c]
 4: (action_on_all_objects_in_exact_pg(ObjectStore*, coll_t, action_on_object_t&, bool)+0x64) [0x55987c075654]
 5: main()
 6: /lib64/libc.so.6(+0x3feb0) [0x7ff86ced9eb0]
 7: __libc_start_main()
 8: _start()


Version-Release number of selected component (if applicable):
# ceph version
ceph version 18.2.0-144.el9cp (f2621d6df88c0fe16f313952d9dd897bbec5d90d) reef (stable)

How reproducible:
Always

Steps to Reproduce:
1. Create a EC pool, write objects
2. Identify a test object, identify the primary OSD for the PG.

# rados -p Inconsistent_snap_pool_ec ls
benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object37
benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object38
benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object48
benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object43

# ceph osd map Inconsistent_snap_pool_ec benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object20 -f json-pretty

{
    "epoch": 250,
    "pool": "Inconsistent_snap_pool_ec",
    "pool_id": 11,
    "objname": "benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object20",
    "raw_pgid": "11.367617bf",
    "pgid": "11.1f",
    "up": [
        8,
        17,
        11,
        5
    ],
    "up_primary": 8,
    "acting": [
        8,
        17,
        11,
        5
    ],
    "acting_primary": 8
}
3. Run COT command to get the omap list. Observe Crash.

# systemctl stop ceph-4ac55332-c500-11ee-ad37-fa163e664e45.service
[root@ceph-pdhiran-hd3aat-node9 ~]# cephadm shell --name osd.8 -- ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-8 --pgid 11.1f benchmark_data_ceph-pdhiran-hd3aat-node7_4171_object20  list-omap
Inferring fsid 4ac55332-c500-11ee-ad37-fa163e664e45
Inferring config /var/lib/ceph/4ac55332-c500-11ee-ad37-fa163e664e45/osd.8/config
Using ceph image with id '23f1e3d0a21b' and tag '<none>' created on 2024-01-31 00:09:32 +0000 UTC
registry-proxy.engineering.redhat.com/rh-osbs/rhceph@sha256:4ce4fff33a42564a2f877420a1898e060d316b2c818c53258e7beb2cd57ce7f3
*** Caught signal (Segmentation fault) **
 in thread 7ff86c8e6580 thread_name:ceph-objectstor
 ceph version 18.2.0-144.el9cp (f2621d6df88c0fe16f313952d9dd897bbec5d90d) reef (stable)
 1: /lib64/libc.so.6(+0x54db0) [0x7ff86ceeedb0]
 2: (BlueStore::collection_list(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, ghobject_t const&, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x4b) [0x55987c529c4b]
 3: (_action_on_all_objects_in_pg(ObjectStore*, coll_t, action_on_object_t&, bool)+0x4cc) [0x55987c07476c]
 4: (action_on_all_objects_in_exact_pg(ObjectStore*, coll_t, action_on_object_t&, bool)+0x64) [0x55987c075654]
 5: main()
 6: /lib64/libc.so.6(+0x3feb0) [0x7ff86ced9eb0]
 7: __libc_start_main()
 8: _start()

Actual results:
Observing segmentation fault

Expected results:
There should not be segmentation fault for the command execution

Additional info:

Comment 21 errata-xmlrpc 2024-11-07 14:38:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.1 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:9010


Note You need to log in before you can comment on or make changes to this bug.