1992753 – virtctl start vm does not check whether the pvc within the VM is mounted in another pod and the vm starts

Bug 1992753 - virtctl start vm does not check whether the pvc within the VM is mounted in another pod and the vm starts

Summary: virtctl start vm does not check whether the pvc within the VM is mounted in ...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.12.0
Assignee:	Alice Frosi
QA Contact:	Kevin Alon Goldblatt
Docs Contact:
URL:
Whiteboard:	Doc: Remains a known issue in the 4.1...
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-11 17:29 UTC by Kevin Alon Goldblatt
Modified:	2023-01-17 23:58 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Known Issue
Doc Text:	In some instances, multiple virtual machines can mount the same PVC in read-write mode, which might result in data corruption. (BZ#1992753) As a workaround, avoid using a single PVC in read-write mode with multiple VMs.
Clone Of:
Environment:
Last Closed:	2022-11-02 12:44:46 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubevirt kubevirt pull 6362	None	None	None	2021-09-08 14:44:39 UTC
Red Hat Issue Tracker	CNV-13500	None	None	None	2022-11-02 12:51:50 UTC
Red Hat Issue Tracker	CNV-14071	None	None	None	2022-11-02 12:51:53 UTC

Description Kevin Alon Goldblatt 2021-08-11 17:29:38 UTC

Description of problem:
When a PVC is mounted in a libguestfs pod, virtctl start vm should check the pvc and fail the start operation with "PVC in use"

Version-Release number of selected component (if applicable):
oc version
Client Version: 4.9.0-202107292313.p0.git.1557476.assembly.stream-1557476
Server Version: 4.9.0-0.nightly-2021-08-04-025616
Kubernetes Version: v1.21.1+8268f88

virtctl version
Client Version: version.Info{GitVersion:"v0.44.0-rc.0-59-g656b60bc1", GitCommit:"656b60bc114d592b77b5a25b42dbec2801f9b882", GitTreeState:"clean", BuildDate:"2021-08-08T08:24:09Z", GoVersion:"go1.15.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{GitVersion:"v0.44.0-rc.0-59-g656b60bc1", GitCommit:"656b60bc114d592b77b5a25b42dbec2801f9b882", GitTreeState:"clean", BuildDate:"2021-08-08T09:29:38Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}

oc get csv -n openshift-cnv
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v4.9.0   OpenShift Virtualization   4.9.0     kubevirt-hyperconverged-operator.v2.6.5   Succeeded

How reproducible:
100%

Steps to Reproduce:
1. Create VM and don't start it
oc get vm
NAME             AGE   STATUS    READY
vm-cirros-dv-2   30h   Stopped   False

2. Identify the PVC name in the VM
oc get pvc
NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
cirros-dv-2   Bound    pvc-be94b0d1-42f2-4817-ae1a-39502eff7443   96Mi       RWO            ocs-storagecluster-ceph-rbd   30h

3. Run "virtctl guestfs pvc-name" (PVC is mounted in the libguestfs pod)
virtctl guestfs cirros-dv-2
Use image: registry.redhat.io/container-native-virtualization/libguestfs-tools@sha256:0fcbf6e3099dd2597cdc350da39ff486b08482a9f0907c01cea15c93927ba460 
The PVC has been mounted at /disk 
Waiting for container libguestfs still in pending, reason: ContainerCreating, message:  
If you don't see a command prompt, try pressing enter.+ /bin/bash

oc get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE    IP            NODE                               NOMINATED NODE   READINESS GATES
libguestfs-tools-cirros-dv-2   1/1     Running   0          2m4s   10.128.2.43   stg10-kevin-jdtkw-worker-0-7f69w   <none>           <none>


4. Run virtctl start vm-name >>>>> VM is started. Now this poses a danger of the PVC being manipulated with at VM started
virtctl start vm-cirros-dv-2
VM vm-cirros-dv-2 was scheduled to start

5. Check if vmi is running
oc get vmi
NAME             AGE   PHASE     IP            NODENAME                           READY
vm-cirros-dv-2   7s    Running   10.128.2.44   stg10-kevin-jdtkw-worker-0-7f69w   True


6. Check pods of both libguestfs and virtlauncher are running on the SAME host
oc get pods -o wide
NAME                                 READY   STATUS    RESTARTS   AGE     IP            NODE                               NOMINATED NODE   READINESS GATES
libguestfs-tools-cirros-dv-2         1/1     Running   0          3m17s   10.128.2.43   stg10-kevin-jdtkw-worker-0-7f69w   <none>           <none>
virt-launcher-vm-cirros-dv-2-cj6jm   1/1     Running   0          16s     10.128.2.44   stg10-kevin-jdtkw-worker-0-7f69w   <none>           <none>


Actual results:
The VM is started

Expected results:
virtctl should verify whether the PVC is mounted and fail the start operation with an error of "PVC of vm-name is in use by another pod"


Additional info:

apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: vm-cirros-dv-2
  name: vm-cirros-dv-2
spec:
  dataVolumeTemplates:
  - metadata:
      name: cirros-dv-2
    spec:
      pvc:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100M
        storageClassName: ocs-storagecluster-ceph-rbd
      source:
        http:
          url: "http://cnv-qe-server.rhevdev.lab.eng.rdu2.redhat.com/files/cnv-tests/cirros-images/cirros-0.4.0-x86_64-disk.qcow2"
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-datavolume
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: datavolumevolume
        machine:
          type: ""
        resources:
          requests:
            memory: 64M
      terminationGracePeriodSeconds: 0
      volumes:
      - dataVolume:
          name: cirros-dv-2
        name: datavolumevolume

Comment 1 Alice Frosi 2021-08-12 06:26:04 UTC

This is a known issue because we don't have a locking or protection mechanism for PVCs in Kubernetes. 

The problem is not that 2 pods access the PVC at the same time, but that 2 instances of QEMU could write at the same time the disk.

QEMU offers partial protection as in certain cases the second instance fails to run because it cannot acquire the write lock. 
Kevin, you should try in your example to run again libguestfs after the VM has started. It should fail in this case because QEMU cannot acquire the lock.

However, I've already been able to reproduce the case where QEMU was not able to detect the lock and start 2 QEMU instances with the same disk. In my setup, it was an RWX with ceph, and the 2 pods were scheduled on 2 different nodes. At least in my experience, QEMU has always have been able to detect the lock in the case the QEMU instances were running on the same node.

Kubernetes is introducing a new access mode (ReadWriteOncePod) that prevents 2 pods to use the same PVC. This could prevent multiple pods to use the same PVC however it is a very restrictive mode and it prevents the VM to be migratable.

In order to solve this, we need a proper solution to protect PVC at least in KubeVirt also when the mode is ReadWriteMany.

Comment 2 sgott 2021-08-17 17:21:02 UTC

Thanks Alice,

What do you envision that KubeVirt would be able to do about this outside of what Kubernetes is already planning?

Comment 4 Alice Frosi 2021-09-07 07:49:49 UTC

This PR should fix the issue once merged:
- https://github.com/kubevirt/kubevirt/pull/6362

Comment 5 sgott 2021-11-03 12:33:07 UTC

Alice,

It looks like https://github.com/kubevirt/kubevirt/pull/6362 was closed. Is this something you're actively working on?

Comment 6 Alice Frosi 2021-11-03 12:39:23 UTC

Stu,

unfortunately, this requires some locking mechanism in Kubernetes or coordination from KubeVirt. I tried to introduce a partial control in KubeVirt in that PR but it was rejected. So, I still need to figure out a proper way.

Comment 10 sgott 2021-12-15 13:13:07 UTC

Deferring to the next release due to the anticpated complexity of fixing this.

Comment 11 ctomasko 2022-03-15 22:29:49 UTC

Added Release note > known issue

In some instances, multiple virtual machines can mount the same PVC in read-write mode, which might result in data corruption.

As a workaround, avoid using a single PVC in read-write mode with multiple VMs. (BZ#1992753)

https://github.com/openshift/openshift-docs/pull/42530
https://deploy-preview-42530--osdocs.netlify.app/openshift-enterprise/latest/virt/virt-4-10-release-notes#virt-4-10-known-issues

Future link: After the OpenShift Virtualization 4.10 releases, you can find the release notes here: https://docs.openshift.com/container-platform/4.10/virt/virt-4-10-release-notes.html
or on the portal,
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.10

Comment 12 Kedar Bidarkar 2022-05-10 10:10:18 UTC

Moving this to 4.12, due to complexity of fixing this.

Comment 13 Kedar Bidarkar 2022-07-20 12:41:23 UTC

Adam, This bug looks more related to storage component, dealing with PVC and ReadWrite access.
Do you want to take the ownership of this bug and be moved to the Storage component?

Comment 14 Adam Litke 2022-08-11 19:57:40 UTC

Yes, taking this into the Storage component.

Comment 15 Adam Litke 2022-10-04 13:30:40 UTC

Alice, Can we introduce a check before starting the VM that the PVCs are not in use?  I understand that this is racy but it may be better than nothing.  Thoughts?

Comment 16 Alice Frosi 2022-10-04 14:00:22 UTC

Adam, yes I tried to do it in the PR mentioned above: https://github.com/kubevirt/kubevirt/pull/6362 but the solution was rejected upstream

Comment 17 Adam Litke 2022-11-02 12:44:46 UTC

Our proposed solution has been rejected by the kubevirt maintainers so at this time it will not be possible to provide a fix for this issue.

Comment 18 ctomasko 2023-01-17 23:58:39 UTC

Closed - Won't fix. Per @apinnick. Reviewed on Jan 12, 2023, Leave this known issue in the 4.12 release notes because it's still a known issue and won't be fixed.

Note You need to log in before you can comment on or make changes to this bug.