Bug 1467696
| Summary: | Update of headless VM in "PoweringUP" state is not handled properly by VDSM. Multiple tracebacks appear. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [oVirt] vdsm | Reporter: | Vitalii Yerys <vyerys> | ||||
| Component: | General | Assignee: | Nobody <nobody> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Vitalii Yerys <vyerys> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | medium | ||||||
| Version: | 4.20.0 | CC: | bugs, mavital, michal.skrivanek, mtessun, tjelinek, vyerys | ||||
| Target Milestone: | ovirt-4.2.0 | Flags: | michal.skrivanek:
ovirt-4.2?
mtessun: planning_ack+ mtessun: devel_ack? mtessun: testing_ack? |
||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-11-24 14:53:09 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Vitalii Yerys
2017-07-04 16:03:23 UTC
If QEMU guest agent is indeed not responding, why not file a bug on it? As I understood from the tracebacks, qemu becomes not responsive because of the issues that come before it, which I think are vdsm issues, you can see them in the first part of the bugs description. Sorry if I chose incorrect title for this bug. (In reply to Vitalii Yerys from comment #2) > As I understood from the tracebacks, qemu becomes not responsive because of > the issues that come before it, which I think are vdsm issues, you can see > them in the first part of the bugs description. Sorry if I chose incorrect > title for this bug. The title can be edited. Please fix it. Does the VM have a balloon device, btw? > Does the VM have a balloon device, btw?
I think Balloon device was enabled, because VM was booted from template which was created from one of our base VMs (golden_env_mixed_virtio_0) which has this option enabled.
The title is still misleading - does that happen with regular VMs, or only headless? Can you check the qemu command line or libvirt XML and verify if the balloon device is enabled or not? > The title is still misleading - does that happen with regular VMs, or only > headless? changed. > Can you check the qemu command line or libvirt XML and verify if the balloon > device is enabled or not? vdsm log: 2017-06-30 08:14:20,746+0300 INFO (vm/86bf22fe) [virt.vm] (vmId='86bf22fe-e904-4d28-be5c-684fb2b014e4') <?xml version='1.0' encoding='UTF-8'?> <domain xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm"> ... <memballoon model="none" /> ... Art log: <memory_policy> <ballooning>false</ballooning> <guaranteed>2147483648</guaranteed> <max>4294967296</max> </memory_policy> Looks like ballooning device was not enabled. Nevertheless I don't think we should see such tracebacks, with: IndexError: list index out of range TypeError: 'NoneType' object has no attribute '__getitem__' etc. I was not able to reproduce it manually, it happened with the headless VM during test-run. (In reply to Vitalii Yerys from comment #6) > > The title is still misleading - does that happen with regular VMs, or only > > headless? > > changed. > > > Can you check the qemu command line or libvirt XML and verify if the balloon > > device is enabled or not? > > vdsm log: > > 2017-06-30 08:14:20,746+0300 INFO (vm/86bf22fe) [virt.vm] > (vmId='86bf22fe-e904-4d28-be5c-684fb2b014e4') <?xml version='1.0' > encoding='UTF-8'?> > <domain xmlns:ovirt-tune="http://ovirt.org/vm/tune/1.0" > xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm"> > > ... > <memballoon model="none" /> > ... As expected, the headless VM does not enable the balloon device. > > Art log: > <memory_policy> > <ballooning>false</ballooning> > <guaranteed>2147483648</guaranteed> > <max>4294967296</max> > </memory_policy> > > Looks like ballooning device was not enabled. Nevertheless I don't think we > should see such tracebacks, with: > > IndexError: list index out of range > > TypeError: 'NoneType' object has no attribute '__getitem__' > etc. Of course not, but now we know some more details on the environment. > > I was not able to reproduce it manually, it happened with the headless VM > during test-run. Not even when manually running a headless VM? > As expected, the headless VM does not enable the balloon device. But why did I see such messages in the engine log? [org.ovirt.engine.core.utils.ObjectIdentityChecker] (default task-25) [vms_update_3e4ffa42-53ea-4092] Field 'balloonEnabled' can not be updated when status is 'PoweringUp' > Not even when manually running a headless VM? I tried doing it manually, but could not catch the proper moment with 'PoweringUp' I think, I will try to reproduce it few more times, if I succeed - I will post exact steps to reproduce. vdsm ballooon tracebacks are supposedly fixed by bug 1458901 engine's WARN messages on update fields which can't be updated is known and not worth fixing, IMHO, so the only remaining issue should be about the qemu guest agent calls raising unhandled exceptions. The fact that the VM is headless is unlikely to matter for the "Cannot run VM. Selected display type is not supported by the operating system." error I need to know what was the original state of the VM, what was it updated with, how did it look before start. - (check specifically the graphics type and OS) sorry for late response. state of VM before start can be seen here: http://pastebin.test.redhat.com/505747 thanks, but I need all what I asked for, before&after and what was POSTed. note grapshics details are in graphicsconsoles subresource, so either include that one as well or take a look a GUI what does it say Sorry for incomplete response previously, here you can find VM state and options from the "setup" up to "teardown" state. http://pastebin.test.redhat.com/506130 I took this info from the log attached. (art_test_runner.log.3 Engine/vdsm/etc logs are also in the attachments). Unfortunately this is all info there is, I was not able to reproduce the issue previously. Looks like the VM was not able to set to "DOWN" state after previous test case, which caused some trouble, so it was up when graphics console removal attempt was performed. Hope it helps. Yes, I do not see the changed VM to be stopped, then you still see the same configuration until it's powered down. That's a test case error then. So, the only issue is the qemu-ga traceback on powering up, i.e. when it's unlikely to run? Sorry for late response. > Yes, I do not see the changed VM to be stopped, then you still see the same > configuration until it's powered down. That's a test case error then. Actually this does not look like a test case issue, but more likely something went wrong on previous test case teardown. In the logs (http://pastebin.test.redhat.com/512138) we can clearly see the shutdown request and response with complete state even tho VM did go to "powering_down" it never reached "down" and resulted into "UP" on the current test case execution - but I think this is a matter of separate investigation. > So, the only issue is the qemu-ga traceback on powering up, i.e. when it's > unlikely to run? Yes. Also what about this tracebacks? Both of them got fixed by: https://bugzilla.redhat.com/show_bug.cgi?id=1458901 ? Traceback (most recent call last): File "/usr/share/vdsm/virt/vm.py", line 1515, in _getRunningVmStats vm_sample.interval) File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 47, in produce balloon(vm, stats, last_sample) File "/usr/lib/python2.7/site-packages/vdsm/virt/vmstats.py", line 153, in balloon balloon_info = vm.get_balloon_info() File "/usr/share/vdsm/virt/vm.py", line 4674, in get_balloon_info dev = self._devices[hwclass.BALLOON][0] IndexError: list index out of range And 2017-06-30 08:14:59,080+0300 ERROR (Thread-97) [utils.Callback] qemuGuestAgentCallback failed (utils:405) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/utils.py", line 403, in __call__ result = self.func(*self.args, **self.kwargs) File "/usr/lib/python2.7/site-packages/vdsm/virt/vmpowerdown.py", line 107, in qemuGuestAgentCallback if response.is_error(self.vm.qemuGuestAgentShutdown()): File "/usr/lib/python2.7/site-packages/vdsm/common/response.py", line 79, in is_error code = res["status"]["code"] TypeError: 'NoneType' object has no attribute '__getitem__' > Yes. Also what about this tracebacks? Both of them got fixed by: > https://bugzilla.redhat.com/show_bug.cgi?id=1458901 ? > so, closing this bug. *** This bug has been marked as a duplicate of bug 1458901 *** |