Bug 1144794
| Summary: | [abrt] ceph-common: send(): rados killed by SIGILL | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Jeffrey C. Ollie <jeff> | ||||||||||||||||||||||||
| Component: | ceph | Assignee: | Boris Ranto <branto> | ||||||||||||||||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||||||||||||||
| Severity: | unspecified | Docs Contact: | |||||||||||||||||||||||||
| Priority: | unspecified | ||||||||||||||||||||||||||
| Version: | 21 | CC: | arjun, branto, codonell, crobinso, dave, david, fedora, jakub, kkeithle, law, pfrankli, spoyarek, steve, zaitcev | ||||||||||||||||||||||||
| Target Milestone: | --- | ||||||||||||||||||||||||||
| Target Release: | --- | ||||||||||||||||||||||||||
| Hardware: | x86_64 | ||||||||||||||||||||||||||
| OS: | Unspecified | ||||||||||||||||||||||||||
| URL: | https://retrace.fedoraproject.org/faf/reports/bthash/5650292d87291074b49dea20c1828ea430aa9839 | ||||||||||||||||||||||||||
| Whiteboard: | abrt_hash:b3314392b86dcc337684e419082aea13152385ea | ||||||||||||||||||||||||||
| Fixed In Version: | ceph-0.80.7-2.fc21 | Doc Type: | Bug Fix | ||||||||||||||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||||||||||||||
| Clone Of: | Environment: | ||||||||||||||||||||||||||
| Last Closed: | 2014-12-23 18:32:19 UTC | Type: | --- | ||||||||||||||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||||||||||||||
| Documentation: | --- | CRM: | |||||||||||||||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||||||
| Embargoed: | |||||||||||||||||||||||||||
| Attachments: |
|
||||||||||||||||||||||||||
|
Description
Jeffrey C. Ollie
2014-09-21 03:39:03 UTC
Created attachment 939678 [details]
File: backtrace
Created attachment 939679 [details]
File: cgroup
Created attachment 939680 [details]
File: core_backtrace
Created attachment 939681 [details]
File: dso_list
Created attachment 939682 [details]
File: environ
Created attachment 939683 [details]
File: limits
Created attachment 939684 [details]
File: maps
Created attachment 939685 [details]
File: open_fds
Created attachment 939686 [details]
File: proc_pid_status
Created attachment 939687 [details]
File: var_log_messages
*** Bug 1157192 has been marked as a duplicate of this bug. *** FYI: I suspect that this is related to bz1146967. It appears to be an Intel TSX instruction that causes the SIGILL. This appears to be a glibc issue with Intel TSX instructions -> reassigning. Copying from bz1157192 which seems to be related to this: On shutdown, the program calls rados_shutdown() which calls the appropriate destructors. In particular, it calls ~RWLock() which issues pthread_rwlock_unlock(). This causes the program to receive SIGILL signal. Debugging with gdb, it seems that the instruction that causes this is xend [1] which is an Intel TSX instruction. I am no expert in this matter but it seems that the issue in [2] is not fully resolved, yet. I've looked at the patches there and xbegin seems to be explicitly disabled -- unlike xend. Maybe, we need to explicitly disable xend in the code as well? [1] layout asm in gdb shows this as the crashing line: >│0x7ffff6c75153 <__GI___pthread_rwlock_unlock+19> xend | [2] https://bugzilla.redhat.com/show_bug.cgi?id=1146967 FYI, this is actually a bug in Ceph: it's unlocking an already unlocked pthread_rwlock_t, which invokes undefined behavior. It was reported upstream in https://sourceware.org/bugzilla/show_bug.cgi?id=17561 , and the glibc devs decided to not make the crash more graceful as it would slow down the unlock path for correct programs. I'm attaching a minimal repro C file in case you care to reproduce, but afaict the linked bug is actually just a bug in Ceph, nothing to do with TSX or glibc directly. Created attachment 964858 [details]
Reproduction of the Ceph xend crash
Note the undefined behavior of unlocking an unlocked rwlock.
(In reply to David Anderson from comment #14) > FYI, this is actually a bug in Ceph: it's unlocking an already unlocked > pthread_rwlock_t, which invokes undefined behavior. It was reported upstream > in https://sourceware.org/bugzilla/show_bug.cgi?id=17561 , and the glibc > devs decided to not make the crash more graceful as it would slow down the > unlock path for correct programs. > > I'm attaching a minimal repro C file in case you care to reproduce, but > afaict the linked bug is actually just a bug in Ceph, nothing to do with TSX > or glibc directly. Thanks David. I'm assigning this back to ceph for them to investigate the potential double unlock. Turns out the Ceph folks fixed it upstream 3 days ago: http://tracker.ceph.com/issues/10085 . From the bug details, it looks like they're planning to publish this in a point release, so it should eventually find its way into the Fedora package. And if you care to patch this in Fedora ahead of the upstream release, `git diff 42c85e8 77deeaa` in the Ceph git repository will produce the minimum necessary patch to apply. It applies cleanly to the 0.87 Ceph source tree. I added the patch David mentions to a private build of Ceph and it fixed my problem. I'm not sure when upstream is going to release a 0.87.x point release so I think that it would be good to apply the patch before that. ceph-0.80.7-2.fc21 has been submitted as an update for Fedora 21. https://admin.fedoraproject.org/updates/ceph-0.80.7-2.fc21 Package ceph-0.80.7-2.fc21: * should fix your issue, * was pushed to the Fedora 21 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing ceph-0.80.7-2.fc21' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-16519/ceph-0.80.7-2.fc21 then log in and leave karma (feedback). *** Bug 1170657 has been marked as a duplicate of this bug. *** ceph-0.80.7-2.fc21 has been pushed to the Fedora 21 stable repository. If problems still persist, please make note of it in this bug report. |