Bug 667309

Summary: [RHEVM] Running Vms for the second time always fails
Product: Red Hat Enterprise Linux 6 Reporter: Idan Mansano <imansano>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.0CC: abaron, ajia, bazulay, berrange, dallan, danken, dnaori, dyuan, eblake, hateya, jialiu, jyang, mgoldboi, vbian, xen-maint, ykaul
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-24 21:06:04 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
libvirt log none

Description Idan Mansano 2011-01-05 07:58:37 UTC
VDSM Version: vdsm-4.9-39.el6.
Libvirt Version: libvirt-0.8.1-27.el6_0.2

We encountered the following issue:
1. We create and run a new VM
2. we destroy that VM (Stop it)
3. we run the VM again
After the second run, libvirt sends a destroyed event which means the run failed.  
We believe this is a runaway event referring to step 2 (the VM is actually running fine), so as far as VDSM is concerned the second run VM failed.

Comment 2 Idan Mansano 2011-01-05 08:54:38 UTC
Important info:
1. It seems that this issue had already existed in the previous RHEVM builds.
(for example: ic74)
2. The issue happans only if there is no running VMs at all in the cluster.
In case there is at least one running vm in the cluster, everyting works fine.

Moving this bus to urgent state, according to my manager.

Comment 3 Daniel Berrangé 2011-01-05 11:18:34 UTC
This could be a similar problem to that described in bug 666158

Comment 5 Daniel Berrangé 2011-01-06 15:02:03 UTC
Please provide the full XML for the guest being created, the logfile in /var/log/libvirt/qemu/$GUEST.log, and finally the exact error message received from libvirt when the 2nd virDomainCreate attempt fails.

Comment 6 Daniel Berrangé 2011-01-06 15:03:37 UTC
If this is truely a regression, then please also provide info on what version of libvirt you need to downgrade to before it works correctly again. eg does the previous build 0.8.1-27.el6_0.1 work ?

Comment 7 Haim 2011-01-06 17:40:09 UTC
Created attachment 472103 [details]
libvirt log

(In reply to comment #5)
> Please provide the full XML for the guest being created, the logfile in
> /var/log/libvirt/qemu/$GUEST.log, and finally the exact error message received
> from libvirt when the 2nd virDomainCreate attempt fails.

LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -S -M rhel6.0.0 -cpu Conroe -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name rhel6-nfs-1 -uuid 796d95ea-1640-4aea-9f12-0d9ea0440ee3 -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/rhel6-nfs-1.monitor,server,nowait -mon chardev=monitor,mode=control -rtc base=2011-01-06T17:31:36 -boot nc -device virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x4 -drive file=/rhev/data-center/cf4e325a-482b-4e20-8b1d-6b1acd5c7dc4/78cbee4a-f021-47d1-9f90-c6ef34c2935d/images/7c571638-4826-46ee-8a9b-9d4232154ace/f5d32eff-5adc-4787-a47d-3cccb98b8ccb,if=none,id=drive-virtio-disk0,boot=on,format=raw,serial=ee-8a9b-9d4232154ace,cache=none,werror=stop,rerror=stop -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=25,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=00:1a:4a:16:87:30,bus=pci.0,addr=0x3 -chardev socket,id=channel0,path=/var/lib/libvirt/qemu/channels/rhel6-nfs-1.com.redhat.rhevm.vdsm,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=0,chardev=channel0,name=com.redhat.rhevm.vdsm -usb -device usb-tablet,id=input0 -vnc 0:0,password -k en-us -vga cirrus
2011-01-06 19:31:38.982: shutting down

error is not clear from the logs; found a warning with permission error at the end, though i'm not sure it's related.
attached please find full libvirt log with 2 virDomainCreate attempts.

19:31:20.243: 30734: warning : virDomainDiskDefForeachPath:8298 : Ignoring open failure on /rhev/data-center/cf4e325a-482b-4e20-8b1d-6b1acd5c7dc4/78cbee4a-f021-47d1-9f90-c6ef34c2935d/images/7c571638-4826-46ee-8a9b-9d4232154ace/f5d32eff-5adc-4787-a47d-3cccb98b8ccb: Permission denied

Comment 8 Daniel Berrangé 2011-01-06 17:49:49 UTC
> 3. we run the VM again
> After the second run, libvirt sends a destroyed event which means the run
> failed.  

I don't see any evidence in the logs that the 2nd VM run failed. Everything indicates it successfully started QEMU, and then QEMU shutdown. Did you actually get an *error code* when starting the guest, or are you simply inferring failure from the fact that you got an async event ?

Comment 9 Dan Kenigsberg 2011-01-06 20:45:09 UTC
Daniel, the VM fails to start - as perceived by rhevm. Libvirt creates the domain, but immediately *after* creation, vdsm receives VIR_DOMAIN_EVENT_STOPPED VIR_DOMAIN_EVENT_STOPPED_CRASHED. Could this be related to bug 624252?

(I suddenly think there might be an inherent race in the way vdsm handles libvirt's events. We may need to add a barrier that makes sure that all events have been processed before issuing a critical libvirt API. Do you have a suggestion?)

Comment 10 Jiri Denemark 2011-01-18 12:58:45 UTC
I think you are right that this could be related to bug 624252. Quoting from Cole's description:

The events actually weren't being lost, it's just that the event loop didn't
think there was anything that needed to be dispatched. So all those 'lost
events' should actually get re-triggered if you manually kick the loop by
generating a new event (like creating a new guest).

That is, the stopped event might have been waiting for dispatch until a new domain was started.

We will need to check this again after that bug is fixed.

Comment 11 Jiri Denemark 2011-01-24 12:47:50 UTC
Could you check whether this issue still exists with libvirt-0.8.7-3.el6 package, which fixes bug 624252?

Comment 12 David Naori 2011-01-24 18:37:14 UTC
Checked on libvirt-0.8.7-3 - seems like the issue solved

Comment 13 Jiri Denemark 2011-01-24 21:06:04 UTC
Great, thanks for the testing. I'll close this as a duplicate.

*** This bug has been marked as a duplicate of bug 624252 ***