1970454 – Virt-handler fails to verify container-disk[4.8]

Bug 1970454 - Virt-handler fails to verify container-disk[4.8]

Summary: Virt-handler fails to verify container-disk[4.8]

Keywords:
Status:	CLOSED DUPLICATE of bug 1979631
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.8.1
Assignee:	sgott
QA Contact:	Israel Pinto
Docs Contact:
URL:
Whiteboard:
Depends On:	1970372
Blocks:
TreeView+	depends on / blocked

Reported:	2021-06-10 13:49 UTC by lpivarc
Modified:	2021-07-12 17:46 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1970372
Environment:
Last Closed:	2021-07-12 17:46:12 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description lpivarc 2021-06-10 13:49:25 UTC

+++ This bug was initially created as a clone of Bug #1970372 +++

Description of problem:
When we create new vms we can see warnings `runtime cannot allocate memory`. It is caused by verifying container disk ownership(therefore only vms with container disk are affected). This issue happens at random AFAIK but if we get unlucky it could block creation of vm. Otherwise this issue should be retried by the handler and vm should start.
The root cause is golang new behaviour. It can only be seen with golang 1.14+.


Version-Release number of selected component (if applicable):
2.6.z, 4.8


How reproducible:
Create VM and observe logs of virt-handler.


Steps to Reproduce:
1.
2.
3.

Actual results:
Log of virt-handler:
failed to get image info: failed to invoke qemu-img: exit status 2: 'fatal error: runtime: cannot allocate memory


Expected results:
Nothing in log.


Additional info:

--- Additional comment from  on 2021-06-10 12:19:24 UTC ---

To clear some confusion. This is specific for cnv downstream as we already fixed it in https://github.com/kubevirt/kubevirt/pull/5495. The question is do we want to backport it and the bug also servers as justification for the back-port.

--- Additional comment from Fabian Deutsch on 2021-06-10 12:28:01 UTC ---

How reproducible is this bug?

--- Additional comment from  on 2021-06-10 12:34:40 UTC ---

I saw 2-4 failures from the whole test suit. So something like 2/600 probability?

--- Additional comment from Fabian Deutsch on 2021-06-10 12:38:58 UTC ---

Okay - while I don't see if all 600 cases used a containerDIsk.

WHat will happen if the bug is triggered? Speak what happens to a VM running into this bug? Will it resolve itself or does it need admin attention?

--- Additional comment from  on 2021-06-10 12:45:21 UTC ---

This operation will be retried in next `sync` of the handler. The VM should eventually start if we don't hit the timeout(read the error will not occur too much).

--- Additional comment from Fabian Deutsch on 2021-06-10 13:03:55 UTC ---

Okay, due to the low level of reproducability, and because if might eventually fix itself I'm not considering this to be a blocker.

Comment 1 lpivarc 2021-07-08 11:18:30 UTC

PR merged: https://github.com/kubevirt/kubevirt/pull/5866

Comment 2 sgott 2021-07-12 17:46:12 UTC

This BZ is effectively a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1979631 as they both address the same issue.

Closing this BZ as a duplicate as the PR to address this one is superceded by the PR to make this setting configurable.

*** This bug has been marked as a duplicate of bug 1979631 ***

Note You need to log in before you can comment on or make changes to this bug.