Bug 1753618

Summary: VM created from container fails to start for the first time when image is pulled
Product: OpenShift Container Platform Reporter: Radim Hrazdil <rhrazdil>
Component: Console Kubevirt PluginAssignee: Marek Libra <mlibra>
Status: CLOSED ERRATA QA Contact: Radim Hrazdil <rhrazdil>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 2.1.0CC: aos-bugs, cnv-qe-bugs, mlibra, sgott, tjelinek
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-01-23 11:06:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Radim Hrazdil 2019-09-19 12:48:09 UTC
Description of problem:
Running a VM from container after fresh CNV deployment often leads to the VM being stuck on the following error:
Readiness probe errored: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1

Eventually, the VM manages to start and becomes Running, but it takes up to few minutes.

Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-09-19-040356
HCO_BUNDLE_REGISTRY_TAG=v2.1.0-56

How reproducible:
75%

Steps to Reproduce:
1. Deploy OCP + CNV
2. Create a cirros VM from container
3.

Actual results:
VM is stuck on Starting phase for considerable amount of time

Expected results:
VM should be able to start after the image is pulled

Additional info:
must gather: http://file.rdu.redhat.com/rhrazdil/cirros-stuck-must-gather.tar.gz
However, before the must gather managed to finish, the VM managed to get to Running phase
VM YAML: http://pastebin.test.redhat.com/798843

Comment 2 Radim Hrazdil 2019-09-25 13:15:34 UTC
Update:
After discussion with Peter Kotas, the reason behind this is probably a short timeout for checking Readiness probe (25 seconds).
If the VM doesn't pass the readiness check in 25 seconds, it's not reported as Ready and thus it's not presented as Running in the UI (although the VM is actually running).

Unfortunately, we use the status a lot in UI automation, and there is no easy workaround ATM for us. (We could check presence of IP address in VM Detail View, but then list view doesn't display IP addresses...). Adding AutomationBlocker keyword.

Comment 11 Radim Hrazdil 2019-11-28 15:36:38 UTC
Verified that this issue no longer occurs,
4.3.0-0.nightly-2019-11-21-122827

Comment 13 errata-xmlrpc 2020-01-23 11:06:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062