Red Hat Bugzilla – Bug 1476888
rgw: segfault in RGWMetaSyncShardCR::incremental_sync completion
Last modified: 2017-10-18 14:13:58 EDT
Description of problem:
--- Comment #13 from Casey Bodley <email@example.com> ---
(In reply to Harald Klein from comment #3)
> 0> 2017-07-27 16:50:59.945399 7fe5fffcf700 -1 *** Caught signal
> (Segmentation fault) **
> in thread 7fe5fffcf700 thread_name:radosgw
> ceph version 10.2.7-28.el7cp (216cda64fd9a9b43c4b0c2f8c402d36753ee35f7)
> 1: (()+0x58e79a) [0x7fe8325b979a]
> 2: (()+0xf370) [0x7fe8319aa370]
> 3: (Mutex::Lock(bool)+0x4) [0x7fe832735b44]
> 4: (RGWCompletionManager::wakeup(void*)+0x18) [0x7fe832305418]
> 5: (RGWMetaSyncShardCR::incremental_sync()+0xda1) [0x7fe8323b5b41]
> 6: (RGWMetaSyncShardCR::operate()+0x44) [0x7fe8323b7714]
> 7: (RGWCoroutinesStack::operate(RGWCoroutinesEnv*)+0x7e) [0x7fe83230497e]
> 8: (RGWCoroutinesManager::run(std::list<RGWCoroutinesStack*,
> std::allocator<RGWCoroutinesStack*> >&)+0x3f8) [0x7fe832
> 9: (RGWCoroutinesManager::run(RGWCoroutine*)+0x70) [0x7fe832307fe0]
> 10: (RGWRemoteMetaLog::run_sync()+0xfc2) [0x7fe8323a7202]
> 11: (RGWMetaSyncProcessorThread::process()+0xd) [0x7fe83248d7cd]
> 12: (RGWRadosThread::Worker::entry()+0x133) [0x7fe83242d043]
> 13: (()+0x7dc5) [0x7fe8319a2dc5]
> 14: (clone()+0x6d) [0x7fe830faf73d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
Could you please provide steps to reproduce this BZ?
Created attachment 1331489 [details]
Used to reproduce this problem, (run on master)
Created attachment 1331490 [details]
Used to reproduce the problem. Run on master.
Created attachment 1331491 [details]
Used to reproduce the bug. Run on secondary.
Running bucketbrigade.py on the master, and rstart.sh on tne secondary, this problem occurred 3 times in about 14 hours.
I reproduced this on magna009 (in case anyone wants to look at the Segmentation faults in the rgw log).
This test has been running with the patch for over 5 hours without reporting the problem. I will leave it to run overnight, and will be in before 9 AM PST. If this test shows no more problems at that time, then I will mark it as Verified.
The fix has been running on the test bed for 17 hours now with NO sign of segmentation fault.
Marking it as "verified".
Running this test on the 2.4A async build failed. Talking to tserlin, it appears that this change is in both sets of patches:
The Crash appears once and is a different one from the bug that was fixed. I will report that crash as another bug. I am marking this as verified for 2.4Async
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.