Bug 2311290

Summary: rgw-multisite: segfault during data sync init
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: shilpa <smanjara>
Component: RGW-MultisiteAssignee: shilpa <smanjara>
Status: CLOSED ERRATA QA Contact: Vidushi Mishra <vimishra>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 8.0CC: akraj, ceph-eng-bugs, cephqe-warriors, mbenjamin, tserlin, vimishra
Target Milestone: ---   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-19.1.1-28.el9cp Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-11-25 09:09:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2317218    

Description shilpa 2024-09-10 19:04:33 UTC
Description of problem:
based on upstream ceph tracker
https://tracker.ceph.com/issues/63378

Version-Release number of selected component (if applicable):


How reproducible:
reproduces in teuthology. not easily reproducible by hand.

Steps to Reproduce:
test data sync init after bucket sync disable/enable in combination with resharding.
1. create multiple buckets
2. add some objects
3. reshard the buckets
4. bucket sync disable, add more objects and enable bucket sync
5. wait for bucket sync to catch up
5. stop secondary gateway and add more objects and buckets on primary
6. run data sync init on secondary
7. restart secondary gateway
(actual tests are in upstream multisite suite. need to enable all bucket sync and data sync init tests)

Actual results:
crash with the segfault:

2023-10-30T22:07:41.493+0000 7f4899a5a640 20 rgw rados thread: cr:s=0x55e0b6295900:op=0x55e0b6478000:28RGWDataFullSyncSingleEntryCR: operate()
2023-10-30T22:07:41.494+0000 7f4899a5a640 -1 ** Caught signal (Segmentation fault) *
in thread 7f4899a5a640 thread_name:data-sync

ceph version 18.0.0-6880-g8b1cc681 (8b1cc681d09f809ade48e839fde79ae1b6bd1850) reef (dev)
 1: /lib64/libc.so.6(+0x54db0) [0x7f48c2454db0]
 2: radosgw(+0xc8a07d) [0x55e0aefe807d]
 3: radosgw(+0x38ad82) [0x55e0ae6e8d82]
 4: radosgw(+0x836fa9) [0x55e0aeb94fa9]
 5: radosgw(+0x9d14c7) [0x55e0aed2f4c7]
 6: (RGWCoroutinesStack::operate(DoutPrefixProvider const*, RGWCoroutinesEnv*)+0x125) [0x55e0ae90f405]
 7: (RGWCoroutinesManager::run(DoutPrefixProvider const*, std::__cxx11::list&lt;RGWCoroutinesStack*, std::allocator&lt;RGWCoroutinesStack*&gt; >&)+0x2b6) [0x55e0ae910c76]
 8: (RGWCoroutinesManager::run(DoutPrefixProvider const*, RGWCoroutine*)+0xad) [0x55e0ae911c2d]
 9: (RGWRemoteDataLog::run_sync(DoutPrefixProvider const*, int)+0x4dc) [0x55e0aed3c02c]
 10: radosgw(+0x781f08) [0x55e0aeadff08]

Expected results:
should not segfault.

Additional info:

Comment 1 Storage PM bot 2024-09-10 19:04:44 UTC
Please specify the severity of this bug. Severity is defined here:
https://bugzilla.redhat.com/page.cgi?id=fields.html#bug_severity.

Comment 7 errata-xmlrpc 2024-11-25 09:09:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:10216