Bug 1970372

Summary: Virt-handler fails to verify container-disk
Product: Container Native Virtualization (CNV) Reporter: lpivarc
Component: VirtualizationAssignee: sgott
Status: CLOSED ERRATA QA Contact: Israel Pinto <ipinto>
Severity: high Docs Contact:
Priority: high    
Version: 2.6.4CC: cnv-qe-bugs, fdeutsch, sgott, zpeng
Target Milestone: ---   
Target Release: 2.6.6   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: virt-operator-container-v2.6.6-4 hco-bundle-registry-container-v2.6.6-31 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1970454 (view as bug list) Environment:
Last Closed: 2021-08-10 17:33:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1970454    

Description lpivarc 2021-06-10 11:52:18 UTC
Description of problem:
When we create new vms we can see warnings `runtime cannot allocate memory`. It is caused by verifying container disk ownership(therefore only vms with container disk are affected). This issue happens at random AFAIK but if we get unlucky it could block creation of vm. Otherwise this issue should be retried by the handler and vm should start.
The root cause is golang new behaviour. It can only be seen with golang 1.14+.


Version-Release number of selected component (if applicable):
2.6.z, 4.8


How reproducible:
Create VM and observe logs of virt-handler.


Steps to Reproduce:
1.
2.
3.

Actual results:
Log of virt-handler:
failed to get image info: failed to invoke qemu-img: exit status 2: 'fatal error: runtime: cannot allocate memory


Expected results:
Nothing in log.


Additional info:

Comment 2 Fabian Deutsch 2021-06-10 12:28:01 UTC
How reproducible is this bug?

Comment 3 lpivarc 2021-06-10 12:34:40 UTC
I saw 2-4 failures from the whole test suit. So something like 2/600 probability?

Comment 4 Fabian Deutsch 2021-06-10 12:38:58 UTC
Okay - while I don't see if all 600 cases used a containerDIsk.

WHat will happen if the bug is triggered? Speak what happens to a VM running into this bug? Will it resolve itself or does it need admin attention?

Comment 5 lpivarc 2021-06-10 12:45:21 UTC
This operation will be retried in next `sync` of the handler. The VM should eventually start if we don't hit the timeout(read the error will not occur too much).

Comment 6 Fabian Deutsch 2021-06-10 13:03:55 UTC
Okay, due to the low level of reproducability, and because if might eventually fix itself I'm not considering this to be a blocker.

Comment 12 zhe peng 2021-07-26 06:41:28 UTC
after check with test suite result, this issue is removed, move this to verified.
verify build is: 
hco-bundle-registry-container-v2.6.6-35

Comment 17 errata-xmlrpc 2021-08-10 17:33:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.6 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3119