Bug 1970454

Summary: Virt-handler fails to verify container-disk[4.8]
Product: Container Native Virtualization (CNV) Reporter: lpivarc
Component: VirtualizationAssignee: sgott
Status: CLOSED DUPLICATE QA Contact: Israel Pinto <ipinto>
Severity: high Docs Contact:
Priority: high    
Version: 4.8.0CC: cnv-qe-bugs, fdeutsch, ipinto, sgott
Target Milestone: ---   
Target Release: 4.8.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1970372 Environment:
Last Closed: 2021-07-12 17:46:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1970372    
Bug Blocks:    

Description lpivarc 2021-06-10 13:49:25 UTC
+++ This bug was initially created as a clone of Bug #1970372 +++

Description of problem:
When we create new vms we can see warnings `runtime cannot allocate memory`. It is caused by verifying container disk ownership(therefore only vms with container disk are affected). This issue happens at random AFAIK but if we get unlucky it could block creation of vm. Otherwise this issue should be retried by the handler and vm should start.
The root cause is golang new behaviour. It can only be seen with golang 1.14+.


Version-Release number of selected component (if applicable):
2.6.z, 4.8


How reproducible:
Create VM and observe logs of virt-handler.


Steps to Reproduce:
1.
2.
3.

Actual results:
Log of virt-handler:
failed to get image info: failed to invoke qemu-img: exit status 2: 'fatal error: runtime: cannot allocate memory


Expected results:
Nothing in log.


Additional info:

--- Additional comment from  on 2021-06-10 12:19:24 UTC ---

To clear some confusion. This is specific for cnv downstream as we already fixed it in https://github.com/kubevirt/kubevirt/pull/5495. The question is do we want to backport it and the bug also servers as justification for the back-port.

--- Additional comment from Fabian Deutsch on 2021-06-10 12:28:01 UTC ---

How reproducible is this bug?

--- Additional comment from  on 2021-06-10 12:34:40 UTC ---

I saw 2-4 failures from the whole test suit. So something like 2/600 probability?

--- Additional comment from Fabian Deutsch on 2021-06-10 12:38:58 UTC ---

Okay - while I don't see if all 600 cases used a containerDIsk.

WHat will happen if the bug is triggered? Speak what happens to a VM running into this bug? Will it resolve itself or does it need admin attention?

--- Additional comment from  on 2021-06-10 12:45:21 UTC ---

This operation will be retried in next `sync` of the handler. The VM should eventually start if we don't hit the timeout(read the error will not occur too much).

--- Additional comment from Fabian Deutsch on 2021-06-10 13:03:55 UTC ---

Okay, due to the low level of reproducability, and because if might eventually fix itself I'm not considering this to be a blocker.

Comment 1 lpivarc 2021-07-08 11:18:30 UTC
PR merged: https://github.com/kubevirt/kubevirt/pull/5866

Comment 2 sgott 2021-07-12 17:46:12 UTC
This BZ is effectively a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1979631 as they both address the same issue.

Closing this BZ as a duplicate as the PR to address this one is superceded by the PR to make this setting configurable.

*** This bug has been marked as a duplicate of bug 1979631 ***