Bug 1102814
| Summary: | [REST API] Failed to GetVmStatsVDS from VM which is DOWN | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Yuri Obshansky <yobshans> | ||||||
| Component: | ovirt-engine | Assignee: | Michal Skrivanek <michal.skrivanek> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | |||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 3.4.0 | CC: | acathrow, bazulay, gklein, iheim, lpeer, oourfali, oramraz, rgolan, Rhev-m-bugs, yeylon, yobshans | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | virt | ||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2014-06-24 10:03:43 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
|
Description
Yuri Obshansky
2014-05-29 15:26:01 UTC
Actually these error messages come from one of the scheduler threads, that run periodically, not triggered by a call to the RESTAPI. The only way I can reproduce this is the following: 1. Start the VM, without an operating system inside. 2. Shut it down with the "shutdown" option, not with the "power off" option. Note that this requires the collaboration of the operating system, and as the VM doesn't have any operating system it will just stay in "powering down" for ever. 3. Manually kill the qemu-kvm process in the host. This will produce exactly the same messages that you see, but in inmediately after the engine should detect that and move the VM to "down", showing something like this in the log: 2014-06-03 16:11:06,278 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-40) VM myvm (4e72e2de-d172-484f-adb6-e0e652e03b28) is running in db and not running in VDS node0 I can confirm Juan's explanation, that's how it behaves right now and it's not really a problem Can you please describe how you are killing the VMs in your scenario? I shutdown VM using REST API POST request /api/vms/${vm_id}/shutdown
I agree with you that it is not really a problem, but our engine.log is full of exceptions. Would you like to cleanup it? In this case we can pass information about VM from DB instead of raise Exception.
shutdown does a graceful shutdown only, which doesn't kill the VM. Are all of them actually shutting down? Do you have an extra poweroff call anywhere? Regular shutdown should leave the VM object in VDSM till it's detected by engine and destroyed - so you'd never see the noVM error you mentioned No, I don't have additional poweroff call. BTW, all VMs are with installed RHEL and guest-agent applications. hm…then this could not happen unless there is a bug:-) can you please capture vdsm+engine log and pinpoint a specific VM which demonstrates the problem? Just the time since the shutdown call till the error you see + few secs Well, it is a little bit complicated. So, I need a time for reproduce,pinpoint and capture log. Do you need vdsm log from host where is running VMs, right? yes, engine and that one host where the VM resides. I understand it needs time&effort to get, but unfortunately we exhausted all the simple possibilities…:/ See below errors from log files:
1. engine.log
2014-06-09 06:50:33,225 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-25) Failed in GetVmStatsVDS method
2014-06-09 06:50:33,226 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-25) Command org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmStatsVDSCommand return value
org.ovirt.engine.core.vdsbroker.vdsbroker.VMInfoListReturnForXmlRpc@6167d9e4
2014-06-09 06:50:33,226 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-25) HostName = HOST-20
2014-06-09 06:50:33,234 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVmStatsVDSCommand] (DefaultQuartzScheduler_Worker-25) Command GetVmStatsVDSCommand(HostName = HOST-20, HostId = 1fe72e64-d142-438c-89af-3e64ce4f879f, vmId=065722a5-7835-4b56-9794-51e338689cf1) execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to GetVmStatsVDS, error = Virtual machine does not exist, code = 1
2014-06-09 06:50:33,234 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-25) failed to fetch scale_10 stats. status remain unchanged (PoweringDown)
2. vdsm.log
Thread-34821::ERROR::2014-06-09 06:50:33,234::BindingXMLRPC::1077::vds::(wrapper) libvirt error
Traceback (most recent call last):
File "/usr/share/vdsm/BindingXMLRPC.py", line 1070, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/BindingXMLRPC.py", line 410, in vmGetStats
return vm.getStats()
File "/usr/share/vdsm/API.py", line 357, in getStats
stats = v.getStats().copy()
File "/usr/share/vdsm/vm.py", line 2747, in getStats
stats = self._getStatsInternal()
File "/usr/share/vdsm/vm.py", line 2847, in _getStatsInternal
stats['balloonInfo'] = self._getBalloonInfo()
File "/usr/share/vdsm/vm.py", line 4674, in _getBalloonInfo
cur_mem = self._dom.info()[2]
File "/usr/share/vdsm/vm.py", line 860, in f
ret = attr(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/vdsm/libvirtconnection.py", line 92, in wrapper
ret = f(*args, **kwargs)
File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1874, in info
if ret is None: raise libvirtError ('virDomainGetInfo() failed', dom=self)
libvirtError: Domain not found: no domain with matching uuid '065722a5-7835-4b56-9794-51e338689cf1'
VM Channels Listener::DEBUG::2014-06-09 06:50:33,249::vmChannels::128::vds::(_handle_unconnected) Trying to connect fileno 97.
Yuri, I need the full logs as per comment #6;-) Created attachment 907943 [details]
engine.log
Created attachment 907944 [details]
vdsm.log
The system is a bit slow (you have a high load I suppose) and it takes time between the VM is shut down and the stat thread gets properly ended. There are getVmStats calls in the middle…and so you see lots of errors in vdsm log, but they are all harmless However I don't find the original issue you mention in description in the logs you've attached. I don't think that load was high: Min Server Hits per second = 1 Average Server Hits per second = 5 Max Server Hits per second = 34 There was not LOAD on application from my point of view. System should be faster. Regarding logs: Please, look for line 1270 at engine.log and line 1570 at vdsm.log thanks for pointing out the specific case. this error is a consequence of bug 1110835, and should not be happening anymore. The problem was exactly in the stats retrieval, it's supposed to only copy the information from stats thread (where there are errors when VM goes down - that is expected and ignored), but due to that bug we were accessing libvirt and the VM was already gone... *** This bug has been marked as a duplicate of bug 1110835 *** |