Description of problem: Empty VMs which had already been removed from RHEV environment are seen reappearing, created by "null@N/A". A VM was removed on July 15, but it reappeared on July 20: 2015-Jul-15, 18:03: VM CHG0007972_www01_remedy_prod_int_phx2 was successfully removed. 2015-Jul-20, 07:21: VM external-www01_remedy_prod_int_phx2 was created by null@N/A. From splunk, another event right before the import: 7/20/15 7:21:16.206 AM 2015-07-20 07:21:16,206 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-71) [1cda8be9] Importing VM www01_remedy_prod_int_phx2 as external-www01_remedy_prod_int_phx2, as it is running on the on Host, but does not exist in the engine. Version-Release number of selected component (if applicable): rhevm-3.5.3.1-1.4.el6ev.noarch How reproducible: Unknown Actual results: VM should stay deleted Expected results: VM returned after 5 days Additional info: Information coming in a future update
usually this is a result of unclean removal of host. the vm is still running or still reported by vdsm, you should stop it and then remove it again. please let me know if it helped
I've pinged the customer again and will let you know if we get any more info from them.
any news?
I have seen the same behaviour where a number of previously removed VM's have come back as external. I see the output below in the engine log but unfortunately I don't have the log from when the VM's were originally removed so I can't see if anything went wrong during the removal. I was also upgrading my nodes from 3.4.2 to 3.5.4 just prior to these re-appearing. Not sure if that helps in any way but I thought I'd share it anyway. I am running 3.5.4. 015-11-10 05:33:21,685 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-012.demo as external-ny2-laa-012.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,685 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-011-rhel.demo as external-ny2-laa-011-rhel.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,686 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-013-rhel.demo as external-ny2-laa-013-rhel.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,686 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-012-rhel.demo as external-ny2-laa-012-rhel.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,687 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-lda-004a.demo as external-ny2-lda-004a.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,687 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-010.demo as external-ny2-laa-010.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,688 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-014-rhel.demo as external-ny2-laa-014-rhel.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,688 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-014.demo as external-ny2-laa-014.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,689 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-lda-004b.demo as external-ny2-lda-004b.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,689 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-013.demo as external-ny2-laa-013.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,690 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-011.demo as external-ny2-laa-011.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,690 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-009.demo as external-ny2-laa-009.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,690 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-010-rhel.demo as external-ny2-laa-010-rhel.demo, as it is running on the on Host, but does not exist in the engine. 2015-11-10 05:33:21,691 INFO [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-32) [518940e8] Importing VM ny2-laa-009-rhel.demo as external-ny2-laa-009-rhel.demo, as it is running on the on Host, but does not exist in the engine. root> grep "was created by null" engine.log-20151111 2015-11-10 05:33:23,034 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [593a4795] Correlation ID: 593a4795, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-012.demo was created by null@N/A. 2015-11-10 05:33:23,180 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [2795b718] Correlation ID: 2795b718, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-011-rhel.demo was created by null@N/A. 2015-11-10 05:33:23,367 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [2c7eccae] Correlation ID: 2c7eccae, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-013-rhel.demo was created by null@N/A. 2015-11-10 05:33:23,488 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [6d050906] Correlation ID: 6d050906, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-012-rhel.demo was created by null@N/A. 2015-11-10 05:33:23,643 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [77327bf0] Correlation ID: 77327bf0, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-lda-004a.demo was created by null@N/A. 2015-11-10 05:33:23,809 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [7d8ece3f] Correlation ID: 7d8ece3f, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-010.demo was created by null@N/A. 2015-11-10 05:33:23,897 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [6d939916] Correlation ID: 6d939916, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-014-rhel.demo was created by null@N/A. 2015-11-10 05:33:23,981 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [5c6ca5bc] Correlation ID: 5c6ca5bc, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-014.demo was created by null@N/A. 2015-11-10 05:33:24,146 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [cbe9bfb] Correlation ID: cbe9bfb, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-lda-004b.demo was created by null@N/A. 2015-11-10 05:33:24,238 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [5487ee3b] Correlation ID: 5487ee3b, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-013.demo was created by null@N/A. 2015-11-10 05:33:24,328 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [280caed6] Correlation ID: 280caed6, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-011.demo was created by null@N/A. 2015-11-10 05:33:24,417 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [6087c1f1] Correlation ID: 6087c1f1, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-009.demo was created by null@N/A. 2015-11-10 05:33:24,505 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [18e4ef1d] Correlation ID: 18e4ef1d, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-010-rhel.demo was created by null@N/A. 2015-11-10 05:33:24,603 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (DefaultQuartzScheduler_Worker-32) [92aba53] Correlation ID: 92aba53, Call Stack: null, Custom Event ID: -1, Message: VM external-ny2-laa-009-rhel.demo was created by null@N/A.
related to bug 1121089 (same issue apparently)
Investigation on locally reproduced problem showed a possible reason - network connectivity/instability is triggering fencing actions - restart vdsm multiple times. When the vdsm VM recovery is interrupted before it finishes its job, the recovery files are left behind and reused on following restarts. Current 3.5 code makes troubleshooting quite difficult as it doesn't provide sufficient information to know what's going on for sure, so this is still more of a theory than a proof. There are improvements to logging as well as logic within the recovery thread on master, worth backporting to 3.6 at least Targeting 3.6 at the moment, though we may need to wait for more occurrences there
(In reply to Michal Skrivanek from comment #11) > Investigation on locally reproduced problem showed a possible reason - > network connectivity/instability is triggering fencing actions - restart > vdsm multiple times. When the vdsm VM recovery is interrupted before it > finishes its job, the recovery files are left behind and reused on following > restarts. > > Current 3.5 code makes troubleshooting quite difficult as it doesn't provide > sufficient information to know what's going on for sure, so this is still > more of a theory than a proof. > There are improvements to logging as well as logic within the recovery > thread on master, worth backporting to 3.6 at least > > Targeting 3.6 at the moment, though we may need to wait for more occurrences > there After more investigation, we think the most probable cause is Vdsm misdetecting shutdown from within the guest. In that case Vdsm correctly deletes the recovery file, but recreates it after the deletion with 'Down' status. The reason is Vdsm does 'saveState' after releaseVm in the onQemuDeath callback. saveState has a safety guard to avoid this recreation, which triggers only in the 'destroy' path. While we have pretty solid evidence this is the root cause, I haven't reproduced it yet. Will do ASAP, and post a patch if this is indeed the case.
(In reply to Francesco Romani from comment #12) > (In reply to Michal Skrivanek from comment #11) > While we have pretty solid evidence this is the root cause, I haven't > reproduced it yet. Will do ASAP, and post a patch if this is indeed the case. Forgot to update (sorry about that): confirmed/reproduced.
patches merged on 3.6 branch -> MODIFIED
Verified with rhevm-3.6.3-0.1.el6.noarch. 1. Created vm with rhel 7.2 installed. 2. Started vm - recovery file created. 3. Shutdown vm from within the vm. 4. After shutting down and engine reports vm in down status - no recovery file re created on host.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-0376.html
*** Bug 1345942 has been marked as a duplicate of this bug. ***