Bug 2177220

Summary: [CEE/sd][RGW-Multisite][Ceph Upgrade] RGW crashes with Segmentation fault on s3:copy_obj post RHCS 5.3.1 upgrade
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Tridibesh Chakraborty <trchakra>
Component: RGW-MultisiteAssignee: Mark Kogan <mkogan>
Status: CLOSED ERRATA QA Contact: Vidushi Mishra <vimishra>
Severity: urgent Docs Contact: Akash Raj <akraj>
Priority: urgent    
Version: 5.3CC: aemerson, akraj, cbodley, ceph-eng-bugs, cephqe-warriors, ckulal, hklein, ivancich, jpoole, mbenjamin, mcaldeir, mkogan, mwatts, prsrivas, tpetr, tserlin
Target Milestone: ---Flags: ivancich: needinfo? (mwatts)
aemerson: needinfo-
ivancich: needinfo? (prsrivas)
aemerson: needinfo-
trchakra: needinfo? (mkogan)
akraj: needinfo? (mkogan)
Target Release: 5.3z2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
.Segmentation fault no longer occurs in the Ceph Object gateway process Previously, a segmentation fault would occur in the Ceph Object Gateway process when an admin user performed the below operations: - Copying a non-existing object. - Copying an existing object over itself. With this fix, with admin or system privileges, you can initialize objects that were not initialized and the segmentation fault no longer occurs.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-04-11 20:07:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2185621    

Description Tridibesh Chakraborty 2023-03-10 13:06:47 UTC
Description of problem:
Customer last night upgraded primary site of RGW multisite from RHCS 5.3 to RHCS 5.3.1 and they observe RGW crashes with Segmentation fault on s3:copy_obj 

Version-Release number of selected component (if applicable):
RHCS 5.3z1 (16.2.10-138.el8cp)

How reproducible:
Customer environment specific

Steps to Reproduce:
1. Have a RGW multisite running on version RHCS 5.3 with testfix
2. Upgrade primary site to 5.3.1
3. Enable the RGW sync
4. RGW daemon crashes on primary site due to segmentation fault on s3:copy_obj
5. If customer stops the secondary site, they are able to bring up the primary site RGW daemons and it is running for last 15 hours 

Actual results:
RGW daemons are crashing due to segmentation fault

Expected results:
Customer should be able to start the RGW daemons


Additional info:

Comment 52 errata-xmlrpc 2023-04-11 20:07:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.3 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:1732

Comment 53 Manny 2023-06-09 19:21:51 UTC
Added a link to another impacted customer, SFDC #03530266

Also wrote KCS #7017201 for this issue, (https://access.redhat.com/solutions/7017201)

Best regards,
Manny