Bug 919057

Summary: [libvirt] Service Libvirt crashed during power off multiple VM's action
Product: Red Hat Enterprise Linux 6 Reporter: vvyazmin <vvyazmin>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 6.4CC: abaron, acathrow, bazulay, dallan, dyasny, dyuan, eblake, hateya, honzhang, iheim, mzhan, rwu, weizhan, whuang, ydu, ykaul
Target Milestone: rcKeywords: Regression, TestBlocker
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-19 15:51:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
## Logs vdsm, rhevm, libvirt, messages, dump none

Description vvyazmin@redhat.com 2013-03-07 14:04:52 UTC
Created attachment 706608 [details]
## Logs vdsm, rhevm, libvirt, messages, dump

Description of problem:
Service Libvirt crashed during power off multiple VM's action

Version-Release number of selected component (if applicable):
RHEVM 3.1.3 - SI27.3 environment:

RHEVM: rhevm-3.1.0-50.el6ev.noarch
VDSM: vdsm-4.10.2-1.6.el6.x86_64
LIBVIRT: libvirt-0.10.2-18.el6.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.355.el6_4.1.x86_64
SANLOCK: sanlock-2.6-2.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
My environment:
DC: FC
Hosts: Servers x 4 (tigris[01-02].environment)
SD: XIO connected via FC switch (8Gbit)
VM's: Pool with 1000 VM's (automatic pool)

1. Create a pool with multiple VM's (VM's in state “UP”
2. Power off multiple VM's, with bulk 20-30 VM's
  
Actual results:
Service libvirt crashed (in my scenarios it's happened on both host's)
[root@tigris02 ~]# initctl status libvirtd
libvirtd stop/waiting

Expected results:
Successfully delete VM's 

Additional info:

/var/log/ovirt-engine/engine.log

/var/log/vdsm/vdsm.log

Comment 2 Huang Wenlong 2013-03-11 06:12:30 UTC
Hi, vvyazmin

I can not reproduce this bug with 
libvirt-0.10.2-18.el6.x86_64
vdsm-4.10.2-1.8.el6ev.x86_64

1) add HOST and FC Storage Domain to rhevm 

2) create 40 vms 

3) start vms  and destroy vms and restart vms it works well 

4) check libvirtd works well 

# service libvirtd status
libvirtd (pid  3755) is running...


So can you provide some core method to reproduce it or if there is a new package fixed this bug, can you verify it ?
Thanks very much

Wenlong

Comment 3 Eric Blake 2013-03-27 15:07:48 UTC
sounds like similar symptoms to bug 924756, although I don't have root cause yet on either bug to state it for certain

Comment 4 Eric Blake 2013-04-01 23:41:51 UTC
I'm still investigating whether this upstream patch series for a crash on close could be the culprit for the memory corruption you are seeing:
https://www.redhat.com/archives/libvir-list/2013-April/msg00057.html

Comment 5 Eric Blake 2013-04-05 22:45:55 UTC
(In reply to comment #0)
> 
> 1. Create a pool with multiple VM's (VM's in state “UP”
> 2. Power off multiple VM's, with bulk 20-30 VM's

How are you powering off VMs?  Is it with virDomainDestroy (forceful, should always work) or virDomainShutdown (graceful, but guest interaction required)?

I didn't see any crash message in the libvirt.log.9.gz included in your log capture, and still haven't reproduced anything locally, so I'm trying to figure out what API was involved just before the crash.

Comment 6 Eric Blake 2013-04-05 23:02:59 UTC
bug 924756 mentions another heap smashing bug seen when shutting down a domain; I suspect they are the same cause but have not yet found where the heap smashing is happening.

Comment 7 Eric Blake 2013-04-05 23:05:14 UTC
are the domains being shut down transient or persistent?

Comment 8 Eric Blake 2013-04-10 02:03:35 UTC
bug 915353 describes a crash on shutdown; it was fixed for libvirt-0.10.2-18.el6_4.1 - I'm starting to think that this particular fix is the one that solves the problem at hand.

Comment 9 Eric Blake 2013-04-19 15:51:18 UTC
I'm closing as a duplicate of bug 915353; we can reopen if we find more relevant information to prove that it is an independent issue.

*** This bug has been marked as a duplicate of bug 915353 ***