Bug 1476888
| Summary: | rgw: segfault in RGWMetaSyncShardCR::incremental_sync completion | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Matt Benjamin (redhat) <mbenjamin> | ||||||||
| Component: | RGW | Assignee: | Matt Benjamin (redhat) <mbenjamin> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Warren <wusui> | ||||||||
| Severity: | high | Docs Contact: | Bara Ancincova <bancinco> | ||||||||
| Priority: | high | ||||||||||
| Version: | 2.3 | CC: | anharris, cbodley, ceph-eng-bugs, gmeno, hklein, hnallurv, kbader, kdreyer, mbenjamin, owasserm, smanjara, sweil, tmuthami, tserlin, uboppana, vakulkar, vimishra, vumrao | ||||||||
| Target Milestone: | rc | ||||||||||
| Target Release: | 2.4 | ||||||||||
| Hardware: | All | ||||||||||
| OS: | All | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | RHEL: ceph-10.2.7-33.el7cp Ubuntu: ceph_10.2.7-34redhat1 | Doc Type: | Bug Fix | ||||||||
| Doc Text: | .The multi-site synchronization works as expected
Due to an object lifetime defect in the Ceph Object Gateway multi-site synchronization code path, a failure could occur during incremental sync. The underlying source code has been modified, and the multi-site synchronization works as expected. | Story Points: | --- | ||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2017-10-17 18:12:51 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 1473436, 1479701 | ||||||||||
| Attachments: | 
 | ||||||||||
| 
        
          Description
        
        
          Matt Benjamin (redhat)
        
        
        
            
        
        
          2017-07-31 17:51:08 UTC
        
       Hi Casey, Could you please provide steps to reproduce this BZ? Created attachment 1331489 [details]
bucketbrigade.py
Used to reproduce this problem, (run on master)
Created attachment 1331490 [details]
bucketbrigade.py
Used to reproduce the problem.  Run on master.
Created attachment 1331491 [details]
rstart.sh
Used to reproduce the bug.  Run on secondary.
Successfully reproduced: Running bucketbrigade.py on the master, and rstart.sh on tne secondary, this problem occurred 3 times in about 14 hours. I reproduced this on magna009 (in case anyone wants to look at the Segmentation faults in the rgw log). This test has been running with the patch for over 5 hours without reporting the problem. I will leave it to run overnight, and will be in before 9 AM PST. If this test shows no more problems at that time, then I will mark it as Verified. The fix has been running on the test bed for 17 hours now with NO sign of segmentation fault. Marking it as "verified". Running this test on the 2.4A async build failed. Talking to tserlin, it appears that this change is in both sets of patches: ON 24.A https://code.engineering.redhat.com/gerrit/gitweb?p=ceph.git;a=commit;h=0c28f6912f03f2def4532c9c6a4c958f714bd206 ON Hotfix https://code.engineering.redhat.com/gerrit/gitweb?p=ceph.git;a=commit;h=d1aad1b7c92e7305fe3e1a8cd6496c7d1df124a2 The Crash appears once and is a different one from the bug that was fixed. I will report that crash as another bug. I am marking this as verified for 2.4Async Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2903 |