Hide Forgot
I have an oVirt host machine running oVirt installed on top of CentOS 7.5. I cannot be 100% sure what version of oVirt is running on the host, but the engine VM is reporting 4.2.3.5-1.el7.centos for itself and it is likely that the host is on the same version. oVirt cockpit reports the following for the host: OS Version: RHEL - 7 - 5.1804.el7.centos OS Description: CentOS Linux 7 (Core) Kernel Version: 3.10.0 - 862.2.3.el7.x86_64 KVM Version: 2.10.0 - 21.el7_5.2.1 LIBVIRT Version: libvirt-3.9.0-14.el7_5.2 VDSM Version: vdsm-4.20.27.1-1.el7.centos SPICE Version: 0.14.0 - 2.el7 GlusterFS Version: [N/A] CEPH Version: librbd1-0.94.5-2.el7 Kernel Features: PTI: 1, IBRS: 0, RETP: 1 The problem I have is that I need to be able to create a VM directly on the host machine, not managed by oVirt at all. On the previous version of oVirt (i.e. 4.1) this was possible, but now that I've upgraded to 4.2 when the newly created external VM shuts down, it is deleted from the host entirely. I am using "virsh define MyVm.xml" as the method to create the external VM on the host. I have tested this also with a new server built from scratch running first Centos alone without oVirt installed, then with oVirt 4.1 installed, and then with the 4.1 upgraded to 4.2. In the first two cases, using the above command creates the VM, which can be run and shutdown without it being deleted. However as soon as oVirt 4.2 is installed on the server, doing a "Shutdown" or "Force Off" from virt-manager, or even doing a "sudo shutdown" from the VM itself, causes the VM definition to be deleted as soon as the VM has shut down. Now, what is interesting is that an external VM created before oVirt 4.2 is installed, e.g. when oVirt 4.1 is installed, then that VM is safe against deleting. I.e. even after oVirt 4.2 is installed, starting and stopping the VM does not delete it. Only VMs created after oVirt 4.2 is installed get deleted in this fashion.
I note that the XML definition file has the line <on_poweroff>destroy</on_poweroff> But I also note that all the other VMs have the same line, so this seems not to be the factor. Beyond that, there is no observable difference between the XML of a fully functional VM and one that deletes as soon as the VM is shut down, so I can only think that it is due to some configuration option inside oVirt that I am not aware of. It is not possible, however, to set the "delete protection" option on the VM from inside oVirt since it complains that it cannot change the settings of an external VM. Again, none of the external VMs have this flag set, though, so this cannot be the factor either.
Just updated to oVirt 4.2.7 and now all external VMs are automatically deleted when they close down. On the plus side, at least the behaviour is consistent; on the down side, it's insane!!! Anyway, it seems there is now a workaround: engine-config -s MaintenanceVdsIgnoreExternalVms=true This configuration prevents the deletion of external VMs. For me, this is solution enough. And for anyone else who follows in my footsteps, I hope the solution above also works for you :o)
Sorry, re-opening bug. I spoke too soon. The "fix" above is not a fix. The external VMs still delete themselves after they are shut down ... if given a few more seconds to do so. In my testing, I was too quick. And now all external VMs are doing this, so it is really bad :o(
Ok, I have some more data (with help from the libvirt mailing list), and I can confirm that it is definitely oVirt that is deleting the external VMs once they close down. From the vdsm.log file I can see the "acknowledgement" that the VM has been shutdown: 2018-11-08 08:21:30,072+0100 INFO (libvirt/events) [virt.vm] (vmId='0bab4766-4765-40c1-abaf-a1c1774ac0ff') underlying process disconnected (vm:1062) 2018-11-08 08:21:30,072+0100 INFO (libvirt/events) [virt.vm] (vmId='0bab4766-4765-40c1-abaf-a1c1774ac0ff') Release VM resources (vm:5283) 2018-11-08 08:21:30,072+0100 INFO (libvirt/events) [virt.vm] (vmId='0bab4766-4765-40c1-abaf-a1c1774ac0ff') Stopping connection (guestagent:442) 2018-11-08 08:21:30,072+0100 INFO (libvirt/events) [virt.vm] (vmId='0bab4766-4765-40c1-abaf-a1c1774ac0ff') Stopping connection (guestagent:442) 2018-11-08 08:21:30,072+0100 WARN (libvirt/events) [root] File: /var/lib/libvirt/qemu/channels/0bab4766-4765-40c1-abaf-a1c1774ac0ff.com.redhat.rhevm.vdsm already removed (fileutils:51) 2018-11-08 08:21:30,073+0100 WARN (libvirt/events) [root] File: /var/lib/libvirt/qemu/channel/target/domain-12-Norah/org.qemu.guest_agent.0 already removed (fileutils:51) 2018-11-08 08:21:30,073+0100 INFO (libvirt/events) [vdsm.api] START inappropriateDevices(thiefId='0bab4766-4765-40c1-abaf-a1c1774ac0ff') from=internal, task_id=5ab358c8-afd3-4b5c-8df3-933f622e20e6 (api:46) 2018-11-08 08:21:30,075+0100 INFO (libvirt/events) [vdsm.api] FINISH inappropriateDevices return=None from=internal, task_id=5ab358c8-afd3-4b5c-8df3-933f622e20e6 (api:52) 2018-11-08 08:21:30,076+0100 INFO (libvirt/events) [virt.vm] (vmId='0bab4766-4765-40c1-abaf-a1c1774ac0ff') Changed state to Down: User shut down from within the guest (code=7) (vm:1693) 2018-11-08 08:21:30,076+0100 INFO (libvirt/events) [virt.vm] (vmId='0bab4766-4765-40c1-abaf-a1c1774ac0ff') Stopping connection (guestagent:442) But then 15 seconds later: 2018-11-08 08:21:44,728+0100 INFO (jsonrpc/5) [api.virt] START destroy(gracefulAttempts=1) from=::ffff:172.16.1.102,56482, vmId=0bab4766-4765-40c1-abaf-a1c1774ac0ff (api:46) 2018-11-08 08:21:44,731+0100 INFO (jsonrpc/5) [api.virt] FINISH destroy return={'status': {'message': 'Machine destroyed', 'code': 0}} from=::ffff:172.16.1.102,56482, vmId=0bab4766-4765-40c1-abaf-a1c1774ac0ff (api:52) And the corresponding debug from libvirt: 2018-11-08 08:21:44.729+0000: 49630: debug : virDomainUndefine:6242 : dom=0x7f9d64003420, (VM: name=Norah, uuid=0bab4766-4765-40c1-abaf-a1c1774ac0ff) 2018-11-08 08:21:44.729+0000: 49630: debug : qemuDomainObjBeginJobInternal:4625 : Starting job: modify (vm=0x7f9d8429be10 name=Norah, current job=none async=none) 2018-11-08 08:21:44.729+0000: 49630: debug : qemuDomainObjBeginJobInternal:4666 : Started job: modify (async=none vm=0x7f9d8429be10 name=Norah) 2018-11-08 08:21:44.730+0000: 49630: info : qemuDomainUndefineFlags:7560 : Undefining domain 'Norah' Please, please, please can there be a way to prevent this behaviour?
(In reply to Andy G from comment #4) > Please, please, please can there be a way to prevent this behaviour? Yes. There could be a simple way that do not interfere with expected oVirt behaviour. Please be aware that this is a very very specific corner case. Are you willing to test some Vdsm patches? The flow will be: - add some metadata (https://libvirt.org/formatdomain.html#elementsMetadata) to your external VMs that you want oVirt to leave alone - Vdsm will skip these VMs like it does today for guestfs or external VM in down state (https://github.com/oVirt/vdsm/blob/master/lib/vdsm/virt/recovery.py#L41)
One of the basic assumptions of Vdsm is that it should own the libvirt. Every VM running in a host on which Vdsm is installed are meant to be either managed or adopted (e.g. external VMs detected and adopted) by Vdsm. So, this very BZ falls in an "unsupported" region, even though I reckon the behaviour looks annoying in this (again, VERY specific and uncommon) use case. However, tools we use and relay on to implement important flows, like guestfs, may need to run VMs that Vdsm should leave alone. I'd like to take this chance to have an universal and supported way to signal Vdsm such VMs, and this could also solve this issue by side effect.
(In reply to Francesco Romani from comment #5) > Yes. > There could be a simple way that do not interfere with expected oVirt > behaviour. Please be aware that this is a very very specific corner case. > Are you willing to test some Vdsm patches? I would be very happy to test patches. I appreciate that I'm pushing oVirt at the edges of its normal use. Thanks for your help :o)
(In reply to Andy G from comment #7) > (In reply to Francesco Romani from comment #5) > > Yes. > > There could be a simple way that do not interfere with expected oVirt > > behaviour. Please be aware that this is a very very specific corner case. > > Are you willing to test some Vdsm patches? > > I would be very happy to test patches. I appreciate that I'm pushing oVirt > at the edges of its normal use. Thanks for your help :o) Great! The patch itself: https://gerrit.ovirt.org/#/c/95550/ The steps for testing: First, amend the domain XML you want to be skipped by oVirt. You need to add this tag in the metadata section: <ovirt-vm:ignore/> example of how it could like: https://gerrit.ovirt.org/#/c/95550/2/tests/virt/data/domain_ignore.xml this tag will be ignored by current code from oVirt, but it will be used by the patched code. Here you can find RPMs for CentOS: https://jenkins.ovirt.org/job/vdsm_4.2_build-artifacts-on-demand-el7-x86_64/61/artifact/ these RPMs contains Vdsm 4.2.7 rc3 + the patch that should fix this issue Upgrade Vdsm and restart it, that should be enough. Once Vdsm is upgraded, it should leave alone VMs with that metadata element in their domain XML. If so, you should see something like this in the logs: External VM $VMID has 'ignore me' metadata, skipped meaning that Vdsm detected that VM, and decided to skip it. Let me know if you need any help, and if the patch fixes your issue, here and/or on the ovirt mailing lists PS: please be aware that even this patch fixes your issue, because we are dealing with a *very* specific corner case, we will need discussion about the merge.
Wonderful! Success! After a few false-starts I found that the metadata section had to be defined as follows: <metadata xmlns:ovirt-vm="http://ovirt.org/vm/1.0"> <ovirt-vm:ignore/> </metadata> And it works :o) Thanks!
(In reply to Andy G from comment #9) > Wonderful! Success! > > After a few false-starts I found that the metadata section had to be defined > as follows: > > <metadata xmlns:ovirt-vm="http://ovirt.org/vm/1.0"> > <ovirt-vm:ignore/> > </metadata> You may place the "xmlns" part in few places, and all are legal. The above snippet looks indeed correct > And it works :o) Great. Please let's wait few hours/days before to claim succes to make sure the periodic check Vdsm does is working too :) > Thanks! No problem! now we (Vdsm developers) will discuss the merge process. Stay tuned.
(In reply to Francesco Romani from comment #10) > (In reply to Andy G from comment #9) > > And it works :o) > > Great. Please let's wait few hours/days before to claim succes to make sure > the periodic check Vdsm does is working too :) I think it is all good. I have left one VM powered off for over 24 hours, and I have had others which have come up and gone down and not been deleted. So it is looking good. > > Thanks! > > No problem! now we (Vdsm developers) will discuss the merge process. Stay > tuned. Thank you very much.
(In reply to Andy G from comment #11) > (In reply to Francesco Romani from comment #10) > > (In reply to Andy G from comment #9) > > > And it works :o) > > > > Great. Please let's wait few hours/days before to claim succes to make sure > > the periodic check Vdsm does is working too :) > > I think it is all good. I have left one VM powered off for over 24 hours, > and I have had others which have come up and gone down and not been deleted. > So it is looking good. > > > > Thanks! > > > > No problem! now we (Vdsm developers) will discuss the merge process. Stay > > tuned. > > Thank you very much. Hi! We (vdsm developers) had a chat about how to fix this issue. We agreed about a different approach: instead of making the external VM invisible to Vdsm, we choose to keep them visible, but avoid undefining them on destroyed. Advantages of the new approach 1. oVirt Engine is (and must kept) aware of the external VMs (thus) 2. minimal management is still possible from the oVirt Engine UI So, here's a new batch of RPMs patched on top of oVirt 4.2.7rc3: http://jenkins.ovirt.org/job/vdsm_4.2_build-artifacts-on-demand-el7-x86_64/62/ To test them: - upgrading from pristine Vdsm 1. just upgrade the RPM packages and play with it :) - upgrading from previous patcj 1. install the new RPMs. This may require some forcing, sorry about that 2. metadata tag is no longer needed, but it will be happily ignored Sorry for the inconvience!
Re-targeting to 4.3.1 since it is missing a patch, an acked blocker flag, or both
Francesco, can we push these patches forward?
From the testing I've done here, the patches seem to be working well, certainly for me.
(In reply to Andy G from comment #15) > From the testing I've done here, the patches seem to be working well, > certainly for me. Thanks Andy. Did you tried again the updated patches described in https://bugzilla.redhat.com/show_bug.cgi?id=1610917#c12 ?
Yes, I did. Thank you :o)
relevant patch merged
Verification version: ovirt-engine-4.3.2.1-0.0.master.20190310172919.git0b3fbad.el7 vdsm-4.40.0-59.git5533158.el7.x86_64 qemu-kvm-ev-2.12.0-18.el7_6.3.1.x86_64 libvirt-client-4.5.0-10.el7_6.4.x86_64 Verification scenario: 1) Create externl vm using host virt-install command. 2) After installation is completed, power off VM. 3) Observe vdsm.log and verify "Will not undefine external VM" is logged for this VM ID. for example: 2019-03-14 11:23:48,971+0200 INFO (jsonrpc/3) [virt.vm] (vmId='4d5387ed-0de8-4dd9-8898-a8ef5ae7021f') Will not undefine external VM 4d5387ed-0de8-4dd9-8898-a8ef5ae7021f (vm:2 390) 4) Keep the VM powered of for few hours and verify it was not deleted.
This bugzilla is included in oVirt 4.3.1 release, published on February 28th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.