Bug 1780022 - An internal retry to run VM is ignoring ignition payload, RHCOS goes into emergency
Summary: An internal retry to run VM is ignoring ignition payload, RHCOS goes into eme...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.3.7.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ovirt-4.4.0
: ---
Assignee: Liran Rotenberg
QA Contact: Nisim Simsolo
URL:
Whiteboard:
Depends On:
Blocks: 1726907
TreeView+ depends on / blocked
 
Reported: 2019-12-05 09:22 UTC by Roy Golan
Modified: 2020-05-20 20:04 UTC (History)
4 users (show)

Fixed In Version: rhv-4.4.0-29
Clone Of:
Environment:
Last Closed: 2020-05-20 20:04:17 UTC
oVirt Team: Virt
Embargoed:
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 105857 0 master MERGED core: move init status to running vm 2020-09-30 11:32:24 UTC

Description Roy Golan 2019-12-05 09:22:31 UTC
Description of problem:

An RHCOS VM which is run for the first time, with ignition set correctly 
in custom_script will fail to start, if it entered the 'retry' mechanism.
The engine starts a VM on vdsm, but for some errors on the machine it may fail, before reporting a status 'Running' from libvirt, and in that case the engine
retries to run the VM on a different host. 
On that retry the ignition payload is lost, probably because we already 
saved the state the 'initial run' already done, which is obvisously wrong because the VM didn't run.

Version-Release number of selected component (if applicable):


Steps to Reproduce:
1. Have 2 hosts
2. Create an RHCOS VM with custom_script with basic ignition  https://coreos.com/ignition/docs/latest/examples.html

2. hack vdsm to fail the VM or kill the qemu process manually on command line 
3. watch the VM restarts on the other host, examine the domainXML passed in the engine.log. Its base64

Actual results:
the custom_script is different, VM will fail to start


Expected results:
Since we don't know exactly when the initialization did run, but we can be sure the initialization didn't run if the VM didn't reach the status of Running, we must make sure the VM.isInitialized is set correctly.

Additional info:

Comment 1 Roy Golan 2019-12-05 09:28:47 UTC
Might happen on a single host just as well.

Workaround with RunOnce, possible choosing a known to work host.

Comment 2 RHV bug bot 2020-01-24 19:50:11 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Tag 'ovirt-engine-4.4.0' doesn't contain patch 'https://gerrit.ovirt.org/105857']
gitweb: https://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=shortlog;h=refs/tags/ovirt-engine-4.4.0

For more info please contact: infra

Comment 4 Nisim Simsolo 2020-05-03 14:38:41 UTC
Verified:
ovirt-engine-4.4.0-0.33.master.el8ev
qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950.x86_64
vdsm-4.40.13-1.el8ev.x86_64
libvirt-daemon-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64

Verification scenraio:
1. Create an RHCOS VM with custom_script with basic ignition 
2. Run VM and verify VM is running with ignition script correctly.
3. Kill vdsm proccess on the host. 
4. Run the VM again on a different host and verify VM is running with ignition script correctly.
5. Migrate VM to different host. 
6. Power off and run the VM. Verify VM is running with ignition script correctly.

Comment 5 Sandro Bonazzola 2020-05-20 20:04:17 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.