Red Hat Bugzilla – Bug 1255033
VM status events are ignored after host is rebooted
Last modified: 2016-04-19 21:11:49 EDT
Description of problem:
Linux VM get stuck at "wait for launch" state when running it with GPU attached.
(restored using Fedora and RHEL VMs).
When running Windows7 VM with GPU attached, VM is running properly.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create RHEL7 or Fedora22 VM and attach GPU to it (in my case quadro k4200)
2. Run VM.
VM get stuck at wait for launch state.
VM should run properly.
Log-collector attached. Relevant host for this bug is intel-vfio
Created attachment 1064850 [details]
i could not find in the logs any error or any vm stuck in wait for launch,
please specify vm name/id and time of the error
I forgot to mention it...
VM name: fedora_intel
Host name: intel-vfio
Start VM time: 2015-Aug-19, 15:28
vm id is: f9796cb9-758e-49cf-a23a-1484c3fb13be
the vm was run on host: intel-vfio
but the attached log collector contains logs only from amd-vfio host
so we are missing vdsm.log from the correct host
please attach this vdsm.log for the time of the run
also, did you happen to check the vm status on the host, to verify it is reported in this status on the host as well? (just to rule out monitoring issues)
I did not check VM status on the host.
as for the log containment, i guess it's log collector issue.
VDSM log attached (vdsm.log.5.xz)
Created attachment 1065128 [details]
intel-vfio host, vdsm log
vdsm log indicates the create call got stuck in libvirt
please do include relevant logs from libvirt and qemu as well, and journal/system messages, please
As i understand, these logs are included in logcollector file attached.
(In reply to Nisim Simsolo from comment #9)
> As i understand, these logs are included in logcollector file attached.
As I understand from comment #4 it doesn't. Without relevant logs there's nothing to look at. Perhaps this is reproducible? If so, please include only relevant time frame data. Thanks.
Issue occurred again using engine version: RHEVM 3.6.0-0.12.master.el6.
This time, i can see the host monitor from GPU card, but in the webadmin it is still in "wait for launch" state.
Also, the status of the VM in the host is up.
Trying to refresh browser and logout/login is not solving this issue.
VDSM and engine.log attached
VM ID: ef302904-aab5-4814-b33e-48a8c6de5eb6
VM name: win12_intel
Created attachment 1070934 [details]
Created attachment 1070935 [details]
VM start time: 2015-09-07 12:58:38
Looking at the latest logs attached I can see that we lost connectivity to vdsm due heartbeat exceeded issue. This means that vdsm haven't sent response on time and the connection was dropped. I can see that the host is heavily loaded there is bunch of places in the logs like:
Thread-21431::INFO::2015-09-07 10:12:07,178::xmlrpc::92::vds.XMLRPCServer::(_process_requests) Request handler for 127.0.0.1:56157 stopped
Thread-21432::DEBUG::2015-09-07 10:12:11,445::stompreactor::304::yajsonrpc.StompServer::(send) Sending response
Thread-4076::DEBUG::2015-09-07 10:12:16,893::fileSD::169::Storage.Misc.excCmd::(getReadDelay) SUCCESS: <err> = '0+1 records in\n0+1 records out\n691 bytes (691 B) copied, 0.000593475 s, 1.2 MB/s\n'; <rc> = 0
Reactor thread::INFO::2015-09-07 10:12:22,191::protocoldetector::72::ProtocolDetector.AcceptorImpl::(handle_accept) Accepting connection from 127.0.0.1:56158
where there is several seconds missing from the log.
Please make sure that your host is able to work properly or increase your heartbeat interval.
this is related to VM events rather than VFIO
The host has been rebooted in the meantime (11:57) and since then events are being ignored when VM started again
More detailed explanation: we use monotonic time in the host which is not monotonic after reboot of the host. so let's say that a VM runs on a host and after the host is being rebooted, we run the VM again on the same host - all the events for this VM will be ignored since the engine thinks it already processed newer events
1. Run VM with GPU attached.
2. During launching VM, reboot related host.
3. After host is up again, run VM and verify VM is running properly.
4. Repeat test case, but this time reboot related host while VM is up.