Bug 1253043 - Ghost VMs created with prefix "external-" by "null@N/A"
Summary: Ghost VMs created with prefix "external-" by "null@N/A"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ovirt-3.6.3
: 3.6.3
Assignee: Francesco Romani
QA Contact: sefi litmanovich
URL:
Whiteboard:
: 1345942 (view as bug list)
Depends On:
Blocks: 1213937 1121089
TreeView+ depends on / blocked
 
Reported: 2015-08-12 19:38 UTC by Robert McSwain
Modified: 2021-08-30 12:30 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, incorrect handling of the virtual machine recovery file caused virtual machine shut down inside the guest. This caused the virtual machine recovery file to be leaked. This may cause incorrect detection of virtual machines, especially in the case of network failures, and cause the Red Hat Enterprise Virtualization Manager to find unknown external virtual machines. Now, VDSM has been fixed to no longer leak recovery files. Recovery files are now correctly handled and this prevents the Red Hat Enterprise Virtualization Manager from incorrectly detecting virtual machines.
Clone Of:
Environment:
Last Closed: 2016-03-09 21:11:44 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43220 0 None None None 2021-08-30 12:17:33 UTC
Red Hat Knowledge Base (Solution) 1472473 0 None None None 2016-04-04 17:17:03 UTC
Red Hat Product Errata RHEA-2016:0376 0 normal SHIPPED_LIVE Red Hat Enterprise Virtualization Manager 3.6.0 2016-03-10 01:20:52 UTC
oVirt gerrit 51387 0 master MERGED vm: safer removal of the recovery file 2020-07-02 20:24:11 UTC
oVirt gerrit 51799 0 master MERGED vm: delete recovery file only after destroy() 2020-07-02 20:24:11 UTC
oVirt gerrit 52514 0 ovirt-3.6 MERGED vm: extract cleanupRecoveryFile() helper 2020-07-02 20:24:11 UTC
oVirt gerrit 52515 0 ovirt-3.6 MERGED vm: safer removal of the recovery file 2020-07-02 20:24:11 UTC
oVirt gerrit 52516 0 ovirt-3.6 MERGED vm: make _deleteVm() more robust 2020-07-02 20:24:11 UTC
oVirt gerrit 52517 0 ovirt-3.6 MERGED vm: delete recovery file only after destroy() 2020-07-02 20:24:11 UTC

Description Robert McSwain 2015-08-12 19:38:50 UTC
Description of problem:
Empty VMs which had already been removed from RHEV environment are seen reappearing, created by "null@N/A". A VM was removed on July 15, but it reappeared on July 20:

2015-Jul-15, 18:03: VM CHG0007972_www01_remedy_prod_int_phx2 was successfully removed.
2015-Jul-20, 07:21: VM external-www01_remedy_prod_int_phx2 was created by null@N/A.

From splunk, another event right before the import:
 7/20/15 7:21:16.206 AM 	 2015-07-20 07:21:16,206 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-71) [1cda8be9] Importing VM www01_remedy_prod_int_phx2 as external-www01_remedy_prod_int_phx2, as it is running on the on Host, but does not exist in the engine.


Version-Release number of selected component (if applicable):
rhevm-3.5.3.1-1.4.el6ev.noarch

How reproducible:
Unknown

Actual results:
VM should stay deleted

Expected results:
VM returned after 5 days 

Additional info:
Information coming in a future update

Comment 2 Omer Frenkel 2015-08-13 11:36:29 UTC
usually this is a result of unclean removal of host.
the vm is still running or still reported by vdsm, you should stop it and then remove it again.

please let me know if it helped

Comment 3 Robert McSwain 2015-08-26 14:07:57 UTC
I've pinged the customer again and will let you know if we get any more info from them.

Comment 4 Michal Skrivanek 2015-10-22 09:20:42 UTC
any news?

Comment 8 Simon Barrett 2015-11-13 12:20:42 UTC
I have seen the same behaviour where a number of previously removed VM's have come back as external. I see the output below in the engine log but unfortunately I don't have the log from when the VM's were originally removed so I can't see if anything went wrong during the removal. I was also upgrading my nodes from 3.4.2 to 3.5.4 just prior to these re-appearing.

Not sure if that helps in any way but I thought I'd share it anyway. I am running 3.5.4.

015-11-10 05:33:21,685 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-012.demo as external-ny2-laa-012.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,685 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-011-rhel.demo as external-ny2-laa-011-rhel.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,686 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-013-rhel.demo as external-ny2-laa-013-rhel.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,686 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-012-rhel.demo as external-ny2-laa-012-rhel.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,687 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-lda-004a.demo as external-ny2-lda-004a.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,687 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-010.demo as external-ny2-laa-010.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,688 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-014-rhel.demo as external-ny2-laa-014-rhel.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,688 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-014.demo as external-ny2-laa-014.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,689 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-lda-004b.demo as external-ny2-lda-004b.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,689 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-013.demo as external-ny2-laa-013.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,690 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-011.demo as external-ny2-laa-011.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,690 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-009.demo as external-ny2-laa-009.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,690 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-010-rhel.demo as external-ny2-laa-010-rhel.demo, as it is running on the on Host, but does not exist in the engine.
2015-11-10 05:33:21,691 INFO  [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-009-rhel.demo as external-ny2-laa-009-rhel.demo, as it is running on the on Host, but does not exist in the engine.


root> grep "was created by null" engine.log-20151111
2015-11-10 05:33:23,034 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [593a4795] Correlation ID: 593a4795, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-012.demo was created by null@N/A.
2015-11-10 05:33:23,180 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [2795b718] Correlation ID: 2795b718, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-011-rhel.demo was created by null@N/A.
2015-11-10 05:33:23,367 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [2c7eccae] Correlation ID: 2c7eccae, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-013-rhel.demo was created by null@N/A.
2015-11-10 05:33:23,488 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [6d050906] Correlation ID: 6d050906, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-012-rhel.demo was created by null@N/A.
2015-11-10 05:33:23,643 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [77327bf0] Correlation ID: 77327bf0, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-lda-004a.demo was created by null@N/A.
2015-11-10 05:33:23,809 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [7d8ece3f] Correlation ID: 7d8ece3f, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-010.demo was created by null@N/A.
2015-11-10 05:33:23,897 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [6d939916] Correlation ID: 6d939916, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-014-rhel.demo was created by null@N/A.
2015-11-10 05:33:23,981 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [5c6ca5bc] Correlation ID: 5c6ca5bc, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-014.demo was created by null@N/A.
2015-11-10 05:33:24,146 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [cbe9bfb] Correlation ID: cbe9bfb, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-lda-004b.demo was created by null@N/A.
2015-11-10 05:33:24,238 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [5487ee3b] Correlation ID: 5487ee3b, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-013.demo was created by null@N/A.
2015-11-10 05:33:24,328 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [280caed6] Correlation ID: 280caed6, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-011.demo was created by null@N/A.
2015-11-10 05:33:24,417 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [6087c1f1] Correlation ID: 6087c1f1, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-009.demo was created by null@N/A.
2015-11-10 05:33:24,505 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [18e4ef1d] Correlation ID: 18e4ef1d, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-010-rhel.demo was created by null@N/A.
2015-11-10 05:33:24,603 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [92aba53] Correlation ID: 92aba53, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-009-rhel.demo was created by null@N/A.

Comment 10 Michal Skrivanek 2015-11-18 12:55:32 UTC
related to bug 1121089 (same issue apparently)

Comment 11 Michal Skrivanek 2015-11-27 08:27:21 UTC
Investigation on locally reproduced problem showed a possible reason - network connectivity/instability is triggering fencing actions - restart vdsm multiple times. When the vdsm VM recovery is interrupted before it finishes its job, the recovery files are left behind and reused on following restarts. 

Current 3.5 code makes troubleshooting quite difficult as it doesn't provide sufficient information to know what's going on for sure, so this is still more of a theory than a proof.
There are improvements to logging as well as logic within the recovery thread on master, worth backporting to 3.6 at least

Targeting 3.6 at the moment, though we may need to wait for more occurrences there

Comment 12 Francesco Romani 2016-01-04 14:45:35 UTC
(In reply to Michal Skrivanek from comment #11)
> Investigation on locally reproduced problem showed a possible reason -
> network connectivity/instability is triggering fencing actions - restart
> vdsm multiple times. When the vdsm VM recovery is interrupted before it
> finishes its job, the recovery files are left behind and reused on following
> restarts. 
> 
> Current 3.5 code makes troubleshooting quite difficult as it doesn't provide
> sufficient information to know what's going on for sure, so this is still
> more of a theory than a proof.
> There are improvements to logging as well as logic within the recovery
> thread on master, worth backporting to 3.6 at least
> 
> Targeting 3.6 at the moment, though we may need to wait for more occurrences
> there

After more investigation, we think the most probable cause is Vdsm misdetecting shutdown from within the guest.
In that case Vdsm correctly deletes the recovery file, but recreates it after the deletion with 'Down' status.

The reason is Vdsm does 'saveState' after releaseVm in the onQemuDeath callback.

saveState has a safety guard to avoid this recreation, which triggers only in the 'destroy' path.

While we have pretty solid evidence this is the root cause, I haven't reproduced it yet. Will do ASAP, and post a patch if this is indeed the case.

Comment 13 Francesco Romani 2016-01-13 15:49:49 UTC
(In reply to Francesco Romani from comment #12)
> (In reply to Michal Skrivanek from comment #11)

> While we have pretty solid evidence this is the root cause, I haven't
> reproduced it yet. Will do ASAP, and post a patch if this is indeed the case.

Forgot to update (sorry about that): confirmed/reproduced.

Comment 14 Francesco Romani 2016-01-25 13:59:48 UTC
patches merged on 3.6 branch -> MODIFIED

Comment 15 sefi litmanovich 2016-02-10 10:37:12 UTC
Verified with rhevm-3.6.3-0.1.el6.noarch.

1. Created vm with rhel 7.2 installed.
2. Started vm - recovery file created.
3. Shutdown vm from within the vm.
4. After shutting down and engine reports vm in down status - no recovery file re created on host.

Comment 23 errata-xmlrpc 2016-03-09 21:11:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0376.html

Comment 24 Michal Skrivanek 2016-06-14 07:00:39 UTC
*** Bug 1345942 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.