Bug 2245147 - [rgw][indexless]: on Indexless placement, rgw daemon crashes with " ceph_assert(index.type == BucketIndexType::Normal)" (6.1)
Summary: [rgw][indexless]: on Indexless placement, rgw daemon crashes with " ceph_asse...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RGW
Version: 6.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 6.1z3
Assignee: J. Eric Ivancich
QA Contact: Madhavi Kasturi
Disha Walvekar
URL:
Whiteboard:
Depends On:
Blocks: 2247624
TreeView+ depends on / blocked
 
Reported: 2023-10-19 19:11 UTC by J. Eric Ivancich
Modified: 2023-12-12 13:56 UTC (History)
5 users (show)

Fixed In Version: ceph-17.2.6-149.el9cp
Doc Type: Bug Fix
Doc Text:
.Testing for reshardable bucket layouts is added to prevent crashes Previously, with the added bucket layout code to enable dynamic bucket resharding with multisite, there was no check to verify if the bucket layout supported resharding during dynamic, immediate, or rescheduled resharding. Due to this, the Ceph Object gateway daemon would crash in case of dynamic bucket resharding and the `radosgw-admin` command would crash in case of immediate or scheduled resharding. With this fix, a test for reshardable bucket layouts is added and the crashes no longer occur. When immediate and scheduled resharding occurs, an error message is displayed. When dynamic bucket resharding occurs, the bucket is skipped.
Clone Of:
Environment:
Last Closed: 2023-12-12 13:56:04 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-7749 0 None None None 2023-10-19 19:15:10 UTC
Red Hat Product Errata RHSA-2023:7740 0 None None None 2023-12-12 13:56:09 UTC

Description J. Eric Ivancich 2023-10-19 19:11:45 UTC
This bug was initially created as a copy of Bug #2242987

I am copying this bug because: 



Description of problem:
RGW daemon crashes on with "ceph_assert(index.type == BucketIndexType::Normal)" when placement target is indexless.

Version-Release number of selected component (if applicable):
18.2.0-79.el9cp

How reproducible:
always

Steps to Reproduce:
1. Deploy a cluster on 7.0(18.2.0-79.el9cp)
2. Followed the object gateway guide [1] to change the placement to indexless 
2a) add new placement to zonegroup:
  radosgw-admin zonegroup placement add --rgw-zonegroup="default" \
  --placement-id="indexless-placement"
2b) Add new placement to zone
 radosgw-admin zone placement add --rgw-zone="default" \
   --placement-id="indexless-placement" \
   --data-pool="default.rgw.buckets.data" \
   --index-pool="default.rgw.buckets.index" \
   --data_extra_pool="default.rgw.buckets.non-ec" \
   --placement-index-type="indexless"
2c) set zonegroup’s default placement to indexless-placement
  radosgw-admin zonegroup placement default --placement-id "indexless-placement"
3. Perform restart of daemon for changes to take effect.
4. RGW daemon silently crashes with "ceph_assert(index.type == BucketIndexType::Normal)"

[1]: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/6/html-single/object_gateway_guide/index#creating-indexless-buckets_rgw

Actual results:
rgw daemon silently crashes upon restart with indexless placement target

Expected results:
No crash seen 

Additional info:
Snippet of Crash:
    -4> 2023-10-10T03:10:57.211+0000 7f4089572640 -1 /builddir/build/BUILD/ceph-18.2.0/src/rgw/rgw_bucket_layout.h: In function 'uint32_t rgw::num_shards(const rgw::bucket_index_layout&)' thread 7f4089572640 time 2023-10-10T03:10:57.210080+0000
/builddir/build/BUILD/ceph-18.2.0/src/rgw/rgw_bucket_layout.h: 269: FAILED ceph_assert(index.type == BucketIndexType::Normal)

 ceph version 18.2.0-79.el9cp (56bbf1a74fb28dfc94f31cedcb46d9f55b1eda56) reef (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x7f40b440ff57]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x161115) [0x7f40b4410115]
 3: (RGWRados::check_bucket_shards(RGWBucketInfo const&, rgw_bucket const&, unsigned long, DoutPrefixProvider const*)+0x417) [0x55a0690ab197]
 4: (rgw_user_sync_all_stats(DoutPrefixProvider const*, rgw::sal::Driver*, rgw::sal::User*, optional_yield)+0x1bd) [0x55a068fc60bd]
 5: (RGWUserStatsCache::sync_user(DoutPrefixProvider const*, rgw_user const&, optional_yield)+0x209) [0x55a068ef4b09]
 6: (RGWUserStatsCache::sync_all_users(DoutPrefixProvider const*, optional_yield)+0x367) [0x55a068ef51f7]
 7: /usr/bin/radosgw(+0x5ae884) [0x55a068ef5884]
 8: /lib64/libc.so.6(+0x9f802) [0x7f40b31a5802]
 9: /lib64/libc.so.6(+0x3f450) [0x7f40b3145450]

    -3> 2023-10-10T03:10:57.211+0000 7f408a574640  5 lifecycle: RGWLC::process(): ENTER: index: 15 worker ix: 2
    -2> 2023-10-10T03:10:57.214+0000 7f408c578640  5 lifecycle: RGWLC::process() process shard rollover lc_shard=lc.28 head.marker= head.shard_rollover_date=0
    -1> 2023-10-10T03:10:57.214+0000 7f4092d85640 20 garbage collection: RGWGC::process cls_rgw_gc_list returned with returned:0, entries.size=0, truncated=0, next_marker=''
     0> 2023-10-10T03:10:57.214+0000 7f4089572640 -1 *** Caught signal (Aborted) **
 in thread 7f4089572640 thread_name:rgw_user_st_syn

 ceph version 18.2.0-79.el9cp (56bbf1a74fb28dfc94f31cedcb46d9f55b1eda56) reef (stable)
 1: /lib64/libc.so.6(+0x54df0) [0x7f40b315adf0]
 2: /lib64/libc.so.6(+0xa154c) [0x7f40b31a754c]
 3: raise()
 4: abort()
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7f40b440ffb1]
 6: /usr/lib64/ceph/libceph-common.so.2(+0x161115) [0x7f40b4410115]
 7: (RGWRados::check_bucket_shards(RGWBucketInfo const&, rgw_bucket const&, unsigned long, DoutPrefixProvider const*)+0x417) [0x55a0690ab197]
 8: (rgw_user_sync_all_stats(DoutPrefixProvider const*, rgw::sal::Driver*, rgw::sal::User*, optional_yield)+0x1bd) [0x55a068fc60bd]
 9: (RGWUserStatsCache::sync_user(DoutPrefixProvider const*, rgw_user const&, optional_yield)+0x209) [0x55a068ef4b09]
 10: (RGWUserStatsCache::sync_all_users(DoutPrefixProvider const*, optional_yield)+0x367) [0x55a068ef51f7]
 11: /usr/bin/radosgw(+0x5ae884) [0x55a068ef5884]
 12: /lib64/libc.so.6(+0x9f802) [0x7f40b31a5802]
 13: /lib64/libc.so.6(+0x3f450) [0x7f40b3145450]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

PFA, the coredump, rgw logs at http://magna002.ceph.redhat.com/ceph-qe-logs/madhavi/indexless/

Comment 1 RHEL Program Management 2023-10-19 19:12:59 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 10 errata-xmlrpc 2023-12-12 13:56:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:7740


Note You need to log in before you can comment on or make changes to this bug.