Bug 918959
Summary: | [abrt] libvirt-0.10.2-18.el6: _int_free: Process /usr/sbin/libvirtd was killed by signal 11 (SIGSEGV) | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | David Jaša <djasa> | ||||||||||||||||||||
Component: | libvirt | Assignee: | John Ferlan <jferlan> | ||||||||||||||||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||||||
Priority: | high | ||||||||||||||||||||||
Version: | 6.4 | CC: | acathrow, dyasny, dyuan, eblake, jentrena, mzhan, pkrempa, pzhukov, rdassen, rwu, tdosek, whuang, ydu | ||||||||||||||||||||
Target Milestone: | rc | Keywords: | Regression | ||||||||||||||||||||
Target Release: | --- | ||||||||||||||||||||||
Hardware: | x86_64 | ||||||||||||||||||||||
OS: | Unspecified | ||||||||||||||||||||||
Whiteboard: | abrt_hash:2fc968e737a27deb64b13469804ac233fbd92448 | ||||||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||||||
Last Closed: | 2013-05-08 17:53:35 UTC | Type: | --- | ||||||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||||||
Embargoed: | |||||||||||||||||||||||
Bug Depends On: | |||||||||||||||||||||||
Bug Blocks: | 835616, 928309, 960054 | ||||||||||||||||||||||
Attachments: |
|
Description
David Jaša
2013-03-07 10:07:24 UTC
Created attachment 706483 [details]
File: maps
Created attachment 706484 [details]
File: var_log_messages
Created attachment 706485 [details]
File: environ
Created attachment 706486 [details]
File: dso_list
Created attachment 706487 [details]
File: limits
Created attachment 706488 [details]
File: sosreport.tar.xz
Created attachment 706489 [details]
File: backtrace
Created attachment 706490 [details]
File: build_ids
Created attachment 706491 [details]
File: cgroup
Is https://bugzilla.redhat.com/show_bug.cgi?id=924756 duplicate of this bug? Just so you know - I am digging into this. It's a bit slow going as I am new at digging into RH/libvirtd problems. I'm running under the assumption it's an error path type thing right now. I know the case indicates the error occurred with back-n-forth migration; however, I'm curious if there was anything else being attempted? I note in the messages output from the sos tarball that there's a series of "Listening on interface #xx" and "Deleting interface #xx" messages right around the crash (where xx = 10, 11, 12, 13, 14, 15, & 16). Around the time 13, 15, & 16 go through their iterations there are other segfaults listed in the output dealing with qemu-kvm and libspice-server.so. The reason I note this is I have to "wonder" if this type of migration was working without error until only recently. What caught my eye was the yum.log output indicating a recent change/update to spice-server and I'm wondering if there's a relationship between the two. I'm not pointing fingers, but just trying to glean some more data. In particular if this was working well previously and libvirt didn't change, then what other environmental factor could caused a failure. (In reply to comment #15) > The reason I note this is I have to "wonder" if this type of migration was > working without error until only recently. What caught my eye was the > yum.log output indicating a recent change/update to spice-server and I'm > wondering if there's a relationship between the two. I'm not pointing > fingers, but just trying to glean some more data. In particular if this was > working well previously and libvirt didn't change, then what other > environmental factor could caused a failure. John, It's RHEV-H system, You could not find any changes because there are not yum there as well as we cannot install customs RPM without hacks... FYI The problem case with "Red Hat Enterprise Virtualization Hypervisor release 6.4 (20130306.2.el6_4)" and bundled libvirt-0.10.2-18.el6. I wonder if this upstream patch has any relation: https://www.redhat.com/archives/libvir-list/2013-March/msg01489.html Another one worth looking at (still needs upstream review as I type this comment): https://www.redhat.com/archives/libvir-list/2013-March/msg01469.html (In reply to comment #0) > Description of problem: > This crash occurred during back-and-forth migration of a VM (with another > instance of the libvirt). Was this using peer-to-peer migration? If so, then I'm pretty sure this patch series explains the problem: https://www.redhat.com/archives/libvir-list/2013-March/msg01682.html > > truncated backtrace: > :Thread no. 1 (7 frames) > : #0 _int_free at malloc.c > : #1 virFree at util/memory.c > : #2 virObjectUnref at util/virobject.c > : #3 virEventPollCleanupHandles at util/event_poll.c At any rate, this portion of the stack trace is consistent with trying to free through a pointer deleted in another thread. Peter's patches to fix the close callback race solve a problem introduced in upstream 0.10.0, and therefore present in RHEL 6.4 (based on upstream 0.10.2) but not 6.3 (based on upstream 0.9.10): https://www.redhat.com/archives/libvir-list/2013-April/msg00672.html As such, I'm adding the regression flag. A scratch build containing fixes that are believed to fix this problem is available at: https://brewweb.devel.redhat.com/taskinfo?taskID=5610687 (In reply to comment #21) > (In reply to comment #0) > > Description of problem: > > This crash occurred during back-and-forth migration of a VM (with another > > instance of the libvirt). > > Was this using peer-to-peer migration? If peer-to-peer migration is result of commands like these: virsh -c qemu+tcp://source_host/system migrate --live VM_NAME qemu+tcp://dest_host/system then yes, it was peer-to-peer migration. I hit the bug just once though so I'm not able to tell decisively that the bug is fixed for me. peer-to-peer migration involves the --p2p flag of 'virsh migrate'. But the command line you used omitted --p2p, so it was direct. http://libvirt.org/migration.html shows the difference - in direct migration, libvirt.so is the client to two different libvirtd processes; in peer-to-peer migration, libvirt.so is the client to only one libvirtd process, and that libvirtd is in turn client to another libvirtd. Peter's patches (comment 23) had to do with a crash in the client; that would explain the source libvirtd crashing on a peer-to-peer migration (since the source is a client to the destination), but would not explain you seeing a crash in libvirtd with direct migration (there, you would expect virsh to die as the client to either source or destination, but not for libvirtd to die). See also bug 911609. I'm still looking for other potential races, where the race would affect the server rather than the client, to match with your report of libvirtd crashing on a direct migration. bug 915353 describes a crash on shutdown; it was fixed for libvirt-0.10.2-18.el6_4.1 - I'm starting to think that this particular fix is the one that solves the problem at hand. Another possible cause is a crash on auto-destroy, bug 950286. Migration uses auto-destroy on the destination until the source is far enough along in the migration process, where a bug there could crash libvirtd. Since this problem was not easily reproduced and there is a patch available that resolves similarly described problems, I was asked to close this bug as insufficient data with a reference to the available patch. If after installing the updates described here: http://rhn.redhat.com/errata/RHBA-2013-0756.html the problem still occurs, then feel free to reopen this case or open a new problem. |