Bug 2245147

Summary: [rgw][indexless]: on Indexless placement, rgw daemon crashes with " ceph_assert(index.type == BucketIndexType::Normal)" (6.1)
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: J. Eric Ivancich <ivancich>
Component: RGWAssignee: J. Eric Ivancich <ivancich>
Status: CLOSED ERRATA QA Contact: Madhavi Kasturi <mkasturi>
Severity: high Docs Contact: Disha Walvekar <dwalveka>
Priority: unspecified    
Version: 6.1CC: ceph-eng-bugs, cephqe-warriors, dwalveka, tserlin, vereddy
Target Milestone: ---   
Target Release: 6.1z3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-17.2.6-149.el9cp Doc Type: Bug Fix
Doc Text:
.Testing for reshardable bucket layouts is added to prevent crashes Previously, with the added bucket layout code to enable dynamic bucket resharding with multisite, there was no check to verify if the bucket layout supported resharding during dynamic, immediate, or rescheduled resharding. Due to this, the Ceph Object gateway daemon would crash in case of dynamic bucket resharding and the `radosgw-admin` command would crash in case of immediate or scheduled resharding. With this fix, a test for reshardable bucket layouts is added and the crashes no longer occur. When immediate and scheduled resharding occurs, an error message is displayed. When dynamic bucket resharding occurs, the bucket is skipped.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-12-12 13:56:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2247624    

Description J. Eric Ivancich 2023-10-19 19:11:45 UTC
This bug was initially created as a copy of Bug #2242987

I am copying this bug because: 



Description of problem:
RGW daemon crashes on with "ceph_assert(index.type == BucketIndexType::Normal)" when placement target is indexless.

Version-Release number of selected component (if applicable):
18.2.0-79.el9cp

How reproducible:
always

Steps to Reproduce:
1. Deploy a cluster on 7.0(18.2.0-79.el9cp)
2. Followed the object gateway guide [1] to change the placement to indexless 
2a) add new placement to zonegroup:
  radosgw-admin zonegroup placement add --rgw-zonegroup="default" \
  --placement-id="indexless-placement"
2b) Add new placement to zone
 radosgw-admin zone placement add --rgw-zone="default" \
   --placement-id="indexless-placement" \
   --data-pool="default.rgw.buckets.data" \
   --index-pool="default.rgw.buckets.index" \
   --data_extra_pool="default.rgw.buckets.non-ec" \
   --placement-index-type="indexless"
2c) set zonegroup’s default placement to indexless-placement
  radosgw-admin zonegroup placement default --placement-id "indexless-placement"
3. Perform restart of daemon for changes to take effect.
4. RGW daemon silently crashes with "ceph_assert(index.type == BucketIndexType::Normal)"

[1]: https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/6/html-single/object_gateway_guide/index#creating-indexless-buckets_rgw

Actual results:
rgw daemon silently crashes upon restart with indexless placement target

Expected results:
No crash seen 

Additional info:
Snippet of Crash:
    -4> 2023-10-10T03:10:57.211+0000 7f4089572640 -1 /builddir/build/BUILD/ceph-18.2.0/src/rgw/rgw_bucket_layout.h: In function 'uint32_t rgw::num_shards(const rgw::bucket_index_layout&)' thread 7f4089572640 time 2023-10-10T03:10:57.210080+0000
/builddir/build/BUILD/ceph-18.2.0/src/rgw/rgw_bucket_layout.h: 269: FAILED ceph_assert(index.type == BucketIndexType::Normal)

 ceph version 18.2.0-79.el9cp (56bbf1a74fb28dfc94f31cedcb46d9f55b1eda56) reef (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x7f40b440ff57]
 2: /usr/lib64/ceph/libceph-common.so.2(+0x161115) [0x7f40b4410115]
 3: (RGWRados::check_bucket_shards(RGWBucketInfo const&, rgw_bucket const&, unsigned long, DoutPrefixProvider const*)+0x417) [0x55a0690ab197]
 4: (rgw_user_sync_all_stats(DoutPrefixProvider const*, rgw::sal::Driver*, rgw::sal::User*, optional_yield)+0x1bd) [0x55a068fc60bd]
 5: (RGWUserStatsCache::sync_user(DoutPrefixProvider const*, rgw_user const&, optional_yield)+0x209) [0x55a068ef4b09]
 6: (RGWUserStatsCache::sync_all_users(DoutPrefixProvider const*, optional_yield)+0x367) [0x55a068ef51f7]
 7: /usr/bin/radosgw(+0x5ae884) [0x55a068ef5884]
 8: /lib64/libc.so.6(+0x9f802) [0x7f40b31a5802]
 9: /lib64/libc.so.6(+0x3f450) [0x7f40b3145450]

    -3> 2023-10-10T03:10:57.211+0000 7f408a574640  5 lifecycle: RGWLC::process(): ENTER: index: 15 worker ix: 2
    -2> 2023-10-10T03:10:57.214+0000 7f408c578640  5 lifecycle: RGWLC::process() process shard rollover lc_shard=lc.28 head.marker= head.shard_rollover_date=0
    -1> 2023-10-10T03:10:57.214+0000 7f4092d85640 20 garbage collection: RGWGC::process cls_rgw_gc_list returned with returned:0, entries.size=0, truncated=0, next_marker=''
     0> 2023-10-10T03:10:57.214+0000 7f4089572640 -1 *** Caught signal (Aborted) **
 in thread 7f4089572640 thread_name:rgw_user_st_syn

 ceph version 18.2.0-79.el9cp (56bbf1a74fb28dfc94f31cedcb46d9f55b1eda56) reef (stable)
 1: /lib64/libc.so.6(+0x54df0) [0x7f40b315adf0]
 2: /lib64/libc.so.6(+0xa154c) [0x7f40b31a754c]
 3: raise()
 4: abort()
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7f40b440ffb1]
 6: /usr/lib64/ceph/libceph-common.so.2(+0x161115) [0x7f40b4410115]
 7: (RGWRados::check_bucket_shards(RGWBucketInfo const&, rgw_bucket const&, unsigned long, DoutPrefixProvider const*)+0x417) [0x55a0690ab197]
 8: (rgw_user_sync_all_stats(DoutPrefixProvider const*, rgw::sal::Driver*, rgw::sal::User*, optional_yield)+0x1bd) [0x55a068fc60bd]
 9: (RGWUserStatsCache::sync_user(DoutPrefixProvider const*, rgw_user const&, optional_yield)+0x209) [0x55a068ef4b09]
 10: (RGWUserStatsCache::sync_all_users(DoutPrefixProvider const*, optional_yield)+0x367) [0x55a068ef51f7]
 11: /usr/bin/radosgw(+0x5ae884) [0x55a068ef5884]
 12: /lib64/libc.so.6(+0x9f802) [0x7f40b31a5802]
 13: /lib64/libc.so.6(+0x3f450) [0x7f40b3145450]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

PFA, the coredump, rgw logs at http://magna002.ceph.redhat.com/ceph-qe-logs/madhavi/indexless/

Comment 1 RHEL Program Management 2023-10-19 19:12:59 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 10 errata-xmlrpc 2023-12-12 13:56:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:7740