Bug 1780022

Summary: An internal retry to run VM is ignoring ignition payload, RHCOS goes into emergency
Product: [oVirt] ovirt-engine Reporter: Roy Golan <rgolan>
Component: BLL.VirtAssignee: Liran Rotenberg <lrotenbe>
Status: CLOSED CURRENTRELEASE QA Contact: Nisim Simsolo <nsimsolo>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.3.7.2CC: bugs, dfodor, nsimsolo, rbarry
Target Milestone: ovirt-4.4.0Flags: pm-rhel: ovirt-4.4+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: rhv-4.4.0-29 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-20 20:04:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1726907    

Description Roy Golan 2019-12-05 09:22:31 UTC
Description of problem:

An RHCOS VM which is run for the first time, with ignition set correctly 
in custom_script will fail to start, if it entered the 'retry' mechanism.
The engine starts a VM on vdsm, but for some errors on the machine it may fail, before reporting a status 'Running' from libvirt, and in that case the engine
retries to run the VM on a different host. 
On that retry the ignition payload is lost, probably because we already 
saved the state the 'initial run' already done, which is obvisously wrong because the VM didn't run.

Version-Release number of selected component (if applicable):


Steps to Reproduce:
1. Have 2 hosts
2. Create an RHCOS VM with custom_script with basic ignition  https://coreos.com/ignition/docs/latest/examples.html

2. hack vdsm to fail the VM or kill the qemu process manually on command line 
3. watch the VM restarts on the other host, examine the domainXML passed in the engine.log. Its base64

Actual results:
the custom_script is different, VM will fail to start


Expected results:
Since we don't know exactly when the initialization did run, but we can be sure the initialization didn't run if the VM didn't reach the status of Running, we must make sure the VM.isInitialized is set correctly.

Additional info:

Comment 1 Roy Golan 2019-12-05 09:28:47 UTC
Might happen on a single host just as well.

Workaround with RunOnce, possible choosing a known to work host.

Comment 2 RHV bug bot 2020-01-24 19:50:11 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Tag 'ovirt-engine-4.4.0' doesn't contain patch 'https://gerrit.ovirt.org/105857']
gitweb: https://gerrit.ovirt.org/gitweb?p=ovirt-engine.git;a=shortlog;h=refs/tags/ovirt-engine-4.4.0

For more info please contact: infra

Comment 4 Nisim Simsolo 2020-05-03 14:38:41 UTC
Verified:
ovirt-engine-4.4.0-0.33.master.el8ev
qemu-kvm-4.2.0-19.module+el8.2.0+6296+6b821950.x86_64
vdsm-4.40.13-1.el8ev.x86_64
libvirt-daemon-6.0.0-17.module+el8.2.0+6257+0d066c28.x86_64

Verification scenraio:
1. Create an RHCOS VM with custom_script with basic ignition 
2. Run VM and verify VM is running with ignition script correctly.
3. Kill vdsm proccess on the host. 
4. Run the VM again on a different host and verify VM is running with ignition script correctly.
5. Migrate VM to different host. 
6. Power off and run the VM. Verify VM is running with ignition script correctly.

Comment 5 Sandro Bonazzola 2020-05-20 20:04:17 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.