Description of problem: When a PVC is mounted in a libguestfs pod, virtctl start vm should check the pvc and fail the start operation with "PVC in use" Version-Release number of selected component (if applicable): oc version Client Version: 4.9.0-202107292313.p0.git.1557476.assembly.stream-1557476 Server Version: 4.9.0-0.nightly-2021-08-04-025616 Kubernetes Version: v1.21.1+8268f88 virtctl version Client Version: version.Info{GitVersion:"v0.44.0-rc.0-59-g656b60bc1", GitCommit:"656b60bc114d592b77b5a25b42dbec2801f9b882", GitTreeState:"clean", BuildDate:"2021-08-08T08:24:09Z", GoVersion:"go1.15.13", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{GitVersion:"v0.44.0-rc.0-59-g656b60bc1", GitCommit:"656b60bc114d592b77b5a25b42dbec2801f9b882", GitTreeState:"clean", BuildDate:"2021-08-08T09:29:38Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"} oc get csv -n openshift-cnv NAME DISPLAY VERSION REPLACES PHASE kubevirt-hyperconverged-operator.v4.9.0 OpenShift Virtualization 4.9.0 kubevirt-hyperconverged-operator.v2.6.5 Succeeded How reproducible: 100% Steps to Reproduce: 1. Create VM and don't start it oc get vm NAME AGE STATUS READY vm-cirros-dv-2 30h Stopped False 2. Identify the PVC name in the VM oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cirros-dv-2 Bound pvc-be94b0d1-42f2-4817-ae1a-39502eff7443 96Mi RWO ocs-storagecluster-ceph-rbd 30h 3. Run "virtctl guestfs pvc-name" (PVC is mounted in the libguestfs pod) virtctl guestfs cirros-dv-2 Use image: registry.redhat.io/container-native-virtualization/libguestfs-tools@sha256:0fcbf6e3099dd2597cdc350da39ff486b08482a9f0907c01cea15c93927ba460 The PVC has been mounted at /disk Waiting for container libguestfs still in pending, reason: ContainerCreating, message: If you don't see a command prompt, try pressing enter.+ /bin/bash oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES libguestfs-tools-cirros-dv-2 1/1 Running 0 2m4s 10.128.2.43 stg10-kevin-jdtkw-worker-0-7f69w <none> <none> 4. Run virtctl start vm-name >>>>> VM is started. Now this poses a danger of the PVC being manipulated with at VM started virtctl start vm-cirros-dv-2 VM vm-cirros-dv-2 was scheduled to start 5. Check if vmi is running oc get vmi NAME AGE PHASE IP NODENAME READY vm-cirros-dv-2 7s Running 10.128.2.44 stg10-kevin-jdtkw-worker-0-7f69w True 6. Check pods of both libguestfs and virtlauncher are running on the SAME host oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES libguestfs-tools-cirros-dv-2 1/1 Running 0 3m17s 10.128.2.43 stg10-kevin-jdtkw-worker-0-7f69w <none> <none> virt-launcher-vm-cirros-dv-2-cj6jm 1/1 Running 0 16s 10.128.2.44 stg10-kevin-jdtkw-worker-0-7f69w <none> <none> Actual results: The VM is started Expected results: virtctl should verify whether the PVC is mounted and fail the start operation with an error of "PVC of vm-name is in use by another pod" Additional info: apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachine metadata: labels: kubevirt.io/vm: vm-cirros-dv-2 name: vm-cirros-dv-2 spec: dataVolumeTemplates: - metadata: name: cirros-dv-2 spec: pvc: accessModes: - ReadWriteOnce resources: requests: storage: 100M storageClassName: ocs-storagecluster-ceph-rbd source: http: url: "http://cnv-qe-server.rhevdev.lab.eng.rdu2.redhat.com/files/cnv-tests/cirros-images/cirros-0.4.0-x86_64-disk.qcow2" running: false template: metadata: labels: kubevirt.io/vm: vm-datavolume spec: domain: devices: disks: - disk: bus: virtio name: datavolumevolume machine: type: "" resources: requests: memory: 64M terminationGracePeriodSeconds: 0 volumes: - dataVolume: name: cirros-dv-2 name: datavolumevolume
This is a known issue because we don't have a locking or protection mechanism for PVCs in Kubernetes. The problem is not that 2 pods access the PVC at the same time, but that 2 instances of QEMU could write at the same time the disk. QEMU offers partial protection as in certain cases the second instance fails to run because it cannot acquire the write lock. Kevin, you should try in your example to run again libguestfs after the VM has started. It should fail in this case because QEMU cannot acquire the lock. However, I've already been able to reproduce the case where QEMU was not able to detect the lock and start 2 QEMU instances with the same disk. In my setup, it was an RWX with ceph, and the 2 pods were scheduled on 2 different nodes. At least in my experience, QEMU has always have been able to detect the lock in the case the QEMU instances were running on the same node. Kubernetes is introducing a new access mode (ReadWriteOncePod) that prevents 2 pods to use the same PVC. This could prevent multiple pods to use the same PVC however it is a very restrictive mode and it prevents the VM to be migratable. In order to solve this, we need a proper solution to protect PVC at least in KubeVirt also when the mode is ReadWriteMany.
Thanks Alice, What do you envision that KubeVirt would be able to do about this outside of what Kubernetes is already planning?
This PR should fix the issue once merged: - https://github.com/kubevirt/kubevirt/pull/6362
Alice, It looks like https://github.com/kubevirt/kubevirt/pull/6362 was closed. Is this something you're actively working on?
Stu, unfortunately, this requires some locking mechanism in Kubernetes or coordination from KubeVirt. I tried to introduce a partial control in KubeVirt in that PR but it was rejected. So, I still need to figure out a proper way.
Deferring to the next release due to the anticpated complexity of fixing this.
Added Release note > known issue In some instances, multiple virtual machines can mount the same PVC in read-write mode, which might result in data corruption. As a workaround, avoid using a single PVC in read-write mode with multiple VMs. (BZ#1992753) https://github.com/openshift/openshift-docs/pull/42530 https://deploy-preview-42530--osdocs.netlify.app/openshift-enterprise/latest/virt/virt-4-10-release-notes#virt-4-10-known-issues Future link: After the OpenShift Virtualization 4.10 releases, you can find the release notes here: https://docs.openshift.com/container-platform/4.10/virt/virt-4-10-release-notes.html or on the portal, https://access.redhat.com/documentation/en-us/openshift_container_platform/4.10
Moving this to 4.12, due to complexity of fixing this.
Adam, This bug looks more related to storage component, dealing with PVC and ReadWrite access. Do you want to take the ownership of this bug and be moved to the Storage component?
Yes, taking this into the Storage component.
Alice, Can we introduce a check before starting the VM that the PVCs are not in use? I understand that this is racy but it may be better than nothing. Thoughts?
Adam, yes I tried to do it in the PR mentioned above: https://github.com/kubevirt/kubevirt/pull/6362 but the solution was rejected upstream
Our proposed solution has been rejected by the kubevirt maintainers so at this time it will not be possible to provide a fix for this issue.
Closed - Won't fix. Per @apinnick. Reviewed on Jan 12, 2023, Leave this known issue in the 4.12 release notes because it's still a known issue and won't be fixed.