2174974 – Tekton: VM is not getting evicted/migrated to a new node due to PVCs accessmode

This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .

Bug 2174974 - Tekton: VM is not getting evicted/migrated to a new node due to PVCs accessmode

Summary: Tekton: VM is not getting evicted/migrated to a new node due to PVCs accessmode

Keywords:
Status:	CLOSED MIGRATED
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Infrastructure
Sub Component:
Version:	4.13.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.14.2
Assignee:	Karel Šimon
QA Contact:	Geetika Kapoor
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2221665
TreeView+	depends on / blocked

Reported:	2023-03-02 19:13 UTC by Geetika Kapoor
Modified:	2023-12-14 16:16 UTC (History)
CC List:	3 users (show)
Fixed In Version:	kubevirt-ssp-operator-rhel9-container-v4.14.1-3
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-12-14 16:16:04 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt ssp-operator pull 550	None	Merged	feat: rework example pipelines	2023-07-03 07:25:55 UTC
Github	kubevirt tekton-tasks-operator pull 139	None	open	[WIP] fix: allow VMs migration during pipelineRun	2023-03-21 13:33:33 UTC
Red Hat Issue Tracker	CNV-26389	None	None	None	2023-12-14 16:16:03 UTC

Description Geetika Kapoor 2023-03-02 19:13:59 UTC

Description of problem:

If node is notready and pipelines are creating VM's so VM's don't moves to a new mode and we see message : EvictionStrategy is set but vmi is not migratable; cannot migrate VMI: PVC windows-5n76q2-installcdrom is not shared, live migration requires that all PVCs must be shared (using ReadWriteMany access mode). 

In tekton job, its RWO : https://github.com/kubevirt/tekton-tasks-operator/blob/main/data/tekton-pipelines/okd/windows-efi-installer-pipeline.yaml#L263

$ oc get nodes
NAME                           STATUS     ROLES                         AGE    VERSION
c01-gk413ccl3-gww9w-master-0   Ready      control-plane,master,worker   5h9m   v1.26.0+9eb81c2
c01-gk413ccl3-gww9w-master-1   NotReady   control-plane,master,worker   5h9m   v1.26.0+9eb81c2
c01-gk413ccl3-gww9w-master-2   Ready      control-plane,master,worker   5h9m   v1.26.0+9eb81c2
[cloud-user@ocp-psi-executor ~]$ oc get vmi -A
NAMESPACE       NAME             AGE   PHASE     IP             NODENAME                       READY
openshift-cnv   windows-5g7u8n   22m   Running   10.129.0.182   c01-gk413ccl3-gww9w-master-1   False

Version-Release number of selected component (if applicable):
4.13.0

How reproducible:
always

Steps to Reproduce:
1.used a compact cluster to reproduce.
2.Run automation job 
3.

Actual results:

VM are not migrated also pipeline end with error as VMI task is never completed

Expected results:
VM should get migrated on a different node

Additional info:

Comment 2 Dominik Holler 2023-03-15 11:48:14 UTC

Would it work if we use the stoarage API instead of a PVC, e.g. like this?

  storage:
    resources:
      requests:
        storage: 9Gi

Comment 3 Karel Šimon 2023-03-21 08:48:13 UTC

When VM is running and node goes down, VM will be in error state and wait-for-vmi-status task will fail whole pipeline, because VM is in error state and even VM would be migrated to a different node, pipeline will not continue. So changing access mode will not help in this case. In case we want to not fail if VM is in error state, we will have to change behaviour of wait-for-vmi-status and not fail if any error occurs. This opens potential issues, that VM will be in err state and not able to recover and pipeline will still run, instead of failing too.

Comment 4 Dominik Holler 2023-03-21 11:02:22 UTC

The current behavior of failing the whole pipeline on the first internal error should be kept, but never the less live migration of the VM should be enabled and we will retry the scenario of running the VM on a node changing to "not-ready" state.

Comment 5 Karel Šimon 2023-07-03 07:25:55 UTC

With rework of example pipelines in https://github.com/kubevirt/ssp-operator/pull/550 this issue should be fixed

Comment 6 Geetika Kapoor 2023-09-08 17:58:38 UTC

I still see this issue and there is a discussion happening. Needinfo from Antonio

Comment 7 Geetika Kapoor 2023-09-11 20:53:46 UTC

Moving it to assigned as QE is blocked on this,

Comment 8 Antonio Cardace 2023-09-19 07:59:50 UTC

@gkapoor What's the issue here exactly? It just seems that the configured evictionStrategy policy is not compatible with what you want to achieve. You should either change it or change the PVC access modes.

Comment 9 Geetika Kapoor 2023-10-02 23:03:06 UTC

After talking with @karel moving target version to 4.14.1

Note You need to log in before you can comment on or make changes to this bug.