Description of problem: Running a VM from container after fresh CNV deployment often leads to the VM being stuck on the following error: Readiness probe errored: rpc error: code = Unknown desc = command error: command timed out, stdout: , stderr: , exit code -1 Eventually, the VM manages to start and becomes Running, but it takes up to few minutes. Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-09-19-040356 HCO_BUNDLE_REGISTRY_TAG=v2.1.0-56 How reproducible: 75% Steps to Reproduce: 1. Deploy OCP + CNV 2. Create a cirros VM from container 3. Actual results: VM is stuck on Starting phase for considerable amount of time Expected results: VM should be able to start after the image is pulled Additional info: must gather: http://file.rdu.redhat.com/rhrazdil/cirros-stuck-must-gather.tar.gz However, before the must gather managed to finish, the VM managed to get to Running phase VM YAML: http://pastebin.test.redhat.com/798843
Update: After discussion with Peter Kotas, the reason behind this is probably a short timeout for checking Readiness probe (25 seconds). If the VM doesn't pass the readiness check in 25 seconds, it's not reported as Ready and thus it's not presented as Running in the UI (although the VM is actually running). Unfortunately, we use the status a lot in UI automation, and there is no easy workaround ATM for us. (We could check presence of IP address in VM Detail View, but then list view doesn't display IP addresses...). Adding AutomationBlocker keyword.
Verified that this issue no longer occurs, 4.3.0-0.nightly-2019-11-21-122827
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062