Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1102814

Summary: [REST API] Failed to GetVmStatsVDS from VM which is DOWN
Product: Red Hat Enterprise Virtualization Manager Reporter: Yuri Obshansky <yobshans>
Component: ovirt-engineAssignee: Michal Skrivanek <michal.skrivanek>
Status: CLOSED DUPLICATE QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.4.0CC: acathrow, bazulay, gklein, iheim, lpeer, oourfali, oramraz, rgolan, Rhev-m-bugs, yeylon, yobshans
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-24 10:03:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine.log
none
vdsm.log none

Description Yuri Obshansky 2014-05-29 15:26:01 UTC
Description of problem:
Failed to GetVmStatsVDS from VM which is DOWN
I run performance test which retrieve different information, shutdown and start VM. JMeter script performs GET requests on VM to be sure it is down before send start command. Be sure Authorization is not admin@internal
Exception raised in RHEVM engine.log

Version-Release number of selected component (if applicable):
RHEVM: 3.4.0-0.16.rc.el6ev

How reproducible:
100%

Steps to Reproduce:
1. Shutdown VM 
Be sure Authorization is not admin@internal
2. Perform REST API request 
GET https://10.1.32.58/api/vms/5dbcc22f-df99-495e-b804-b825c46d8ec1
Request Headers:
Connection: keep-alive
Session-TTL: 360
Accept-Language: en-US,en;q=0.5
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131023 Firefox/17.0
Referer: *.*.*.*/UserPortal/org.ovirt.engine.ui.userportal.UserPortal/UserPortal.html?locale=en_US
Accept-Encoding: gzip, deflate
Prefer: persistent-auth
Authorization: Basic <unique Base64 decode user@domain >

Actual results:
2014-05-29 14:09:30,651 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-6) [215c3fcb] Failed in GetVmStatsVDS method
2014-05-29 14:09:30,652 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-6) [215c3fcb] Command GetVmStatsVDSCommand(HostName = HOST-20, HostId = 1fe72e64-d142-438c-89af-3e64ce4f879f, vmId=4517d577-5680-456b-a470-2966513b8ffb) execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetVmStatsVDS, error = Virtual machine does not exist, code = 1
2014-05-29 14:09:30,652 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-6) [215c3fcb] failed to fetch scale_39 stats. status remain unchanged (PoweringDown)

Response is OK

Expected results:
No exception

Additional info:

Comment 1 Juan Hernández 2014-06-03 14:30:28 UTC
Actually these error messages come from one of the scheduler threads, that run periodically, not triggered by a call to the RESTAPI.

The only way I can reproduce this is the following:

1. Start the VM, without an operating system inside.

2. Shut it down with the "shutdown" option, not with the "power off" option. Note that this requires the collaboration of the operating
system, and as the VM doesn't have any operating system it will just stay in "powering down" for ever.

3. Manually kill the qemu-kvm process in the host. This will produce exactly the same messages that you see, but in inmediately after the
engine should detect that and move the VM to "down", showing something like this in the log:

2014-06-03 16:11:06,278 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-40) VM myvm (4e72e2de-d172-484f-adb6-e0e652e03b28) is running in db and not running in VDS node0

Comment 2 Michal Skrivanek 2014-06-10 12:52:08 UTC
I can confirm Juan's explanation, that's how it behaves right now and it's not really a problem
Can you please describe how you are killing the VMs in your scenario?

Comment 3 Yuri Obshansky 2014-06-10 13:00:01 UTC
I shutdown VM using REST API POST request /api/vms/${vm_id}/shutdown
I agree with you that it is not really a problem, but our engine.log is full of exceptions. Would you like to cleanup it? In this case we can pass information about VM from DB instead of raise Exception.

Comment 4 Michal Skrivanek 2014-06-10 13:06:08 UTC
shutdown does a graceful shutdown only, which doesn't kill the VM. Are all of them  actually shutting down?
Do you have an extra poweroff call anywhere?
Regular shutdown should leave the VM object in VDSM till it's detected by engine and destroyed - so you'd never see the noVM error you mentioned

Comment 5 Yuri Obshansky 2014-06-10 13:08:34 UTC
No, I don't have additional poweroff call.
BTW, all VMs are with installed RHEL and guest-agent applications.

Comment 6 Michal Skrivanek 2014-06-10 13:24:40 UTC
hm…then this could not happen unless there is a bug:-)
can you please capture vdsm+engine log and pinpoint a specific VM which demonstrates the problem? Just the time since the shutdown call till the error you see + few secs

Comment 7 Yuri Obshansky 2014-06-10 13:27:52 UTC
Well, it is a little bit complicated. So, I need a time for reproduce,pinpoint and capture log. Do you need vdsm log from host where is running VMs, right?

Comment 8 Michal Skrivanek 2014-06-10 13:31:28 UTC
yes, engine and that one host where the VM resides.

I understand it needs time&effort to get, but unfortunately we exhausted all the simple possibilities…:/

Comment 9 Yuri Obshansky 2014-06-11 09:08:31 UTC
See below errors from log files:
1. engine.log
2014-06-09 06:50:33,225 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-25) Failed in GetVmStatsVDS method
2014-06-09 06:50:33,226 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-25) Command org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmStatsVDSCommand return value 
 org.ovirt.engine.core.vdsbroker.vdsbroker.VMInfoListReturnForXmlRpc@6167d9e4
2014-06-09 06:50:33,226 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-25) HostName = HOST-20
2014-06-09 06:50:33,234 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-25) Command GetVmStatsVDSCommand(HostName = HOST-20, HostId = 1fe72e64-d142-438c-89af-3e64ce4f879f, vmId=065722a5-7835-4b56-9794-51e338689cf1) execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetVmStatsVDS, error = Virtual machine does not exist, code = 1
2014-06-09 06:50:33,234 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-25) failed to fetch scale_10 stats. status remain unchanged (PoweringDown)

2. vdsm.log
Thread-34821::ERROR::2014-06-09 06:50:33,234::BindingXMLRPC::1077::vds::(wrapper) libvirt error
Traceback (most recent call last):
  File "/usr/share/vdsm/BindingXMLRPC.py", line 1070, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/BindingXMLRPC.py", line 410, in vmGetStats
    return vm.getStats()
  File "/usr/share/vdsm/API.py", line 357, in getStats
    stats = v.getStats().copy()
  File "/usr/share/vdsm/vm.py", line 2747, in getStats
    stats = self._getStatsInternal()
  File "/usr/share/vdsm/vm.py", line 2847, in _getStatsInternal
    stats['balloonInfo'] = self._getBalloonInfo()
  File "/usr/share/vdsm/vm.py", line 4674, in _getBalloonInfo
    cur_mem = self._dom.info()[2]
  File "/usr/share/vdsm/vm.py", line 860, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 92, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1874, in info
    if ret is None: raise libvirtError ('virDomainGetInfo() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid '065722a5-7835-4b56-9794-51e338689cf1'
VM Channels Listener::DEBUG::2014-06-09 06:50:33,249::vmChannels::128::vds::(_handle_unconnected) Trying to connect fileno 97.

Comment 10 Michal Skrivanek 2014-06-11 10:23:56 UTC
Yuri, I need the full logs as per comment #6;-)

Comment 11 Yuri Obshansky 2014-06-12 05:11:15 UTC
Created attachment 907943 [details]
engine.log

Comment 12 Yuri Obshansky 2014-06-12 05:11:52 UTC
Created attachment 907944 [details]
vdsm.log

Comment 13 Michal Skrivanek 2014-06-16 11:44:32 UTC
The system is a bit slow (you have a high load I suppose) and it takes time between the VM is shut down and the stat thread gets properly ended. There are getVmStats calls in the middle…and so you see lots of errors in vdsm log, but they  are all harmless

However I don't find the original issue you mention in description in the logs you've attached.

Comment 14 Yuri Obshansky 2014-06-16 11:57:54 UTC
I don't think that load was high:
Min Server Hits per second = 1
Average Server Hits per second = 5
Max Server Hits per second = 34
There was not LOAD on application from my point of view.
System should be faster.
Regarding logs: 
Please, look for 
line 1270 at engine.log
and 
line 1570 at vdsm.log

Comment 15 Michal Skrivanek 2014-06-24 10:03:43 UTC
thanks for pointing out the specific case.
this error is a consequence of bug 1110835, and should not be happening anymore. The problem was exactly in the stats retrieval, it's supposed to only copy the information from stats thread (where there are errors when VM goes down - that is expected and ignored), but due to that bug we were accessing libvirt and the VM was already gone...

*** This bug has been marked as a duplicate of bug 1110835 ***