Description of problem: Failed to migrate VM with ocs-storagecluster-cephfs and the VM's status is changed to Paused after migration Version-Release number of selected component (if applicable): CNV4.10 How reproducible: 100% Steps to Reproduce: 1. Start a VM form DV with storage class: ocs-storagecluster-cephfs # oc create -f asb-vm-dv-ocs-cephfs.yaml 2. Login to VM, touch file: migration 3. Try to migrate the VM in web console by clicking "Migrate Node to Node", the VM is not migrated and the status is changed to Paused. # oc get pod -o wide|grep virt-launcher| grep cephfs virt-launcher-asb-vm-dv-ocs-cephfs-dg747 1/1 Running 0 46s 10.129.1.92 dell-per730-64.lab.eng.pek2.redhat.com <none> 0/1 virt-launcher-asb-vm-dv-ocs-cephfs-wfft5 1/1 Running 0 2m1s 10.128.1.61 dell-per730-63.lab.eng.pek2.redhat.com <none> # oc get pod -o wide|grep virt-launcher| grep cephfs virt-launcher-asb-vm-dv-ocs-cephfs-dg747 0/1 Completed 0 71s 10.129.1.92 dell-per730-64.lab.eng.pek2.redhat.com <none> 0/1 virt-launcher-asb-vm-dv-ocs-cephfs-wfft5 1/1 Running 0 2m26s 10.128.1.61 dell-per730-63.lab.eng.pek2.redhat.com <none> # oc rsh virt-launcher-asb-vm-dv-ocs-cephfs-wfft5 sh-4.4# virsh list --all Id Name State --------------------------------------------------- 1 openshift-cnv_asb-vm-dv-ocs-cephfs paused # mount|grep cephfs 172.30.225.152:6789,172.30.162.143:6789,172.30.149.241:6789:/volumes/csi/csi-vol-a604ad71-e17e-11ec-93c3-0a580a82017d/484a041a-0d62-43da-bcfe-218c2985be1f on /run/kubevirt-private/vmi-disks/rootdisk type ceph (rw,relatime,seclabel,name=csi-cephfs-node,secret=<hidden>,acl,mds_namespace=ocs-storagecluster-cephfilesystem) 4. Get the error messages: "server error. command Migrate failed: "migration job 60df6743-158c-4afd-b07f-01e1f7c6b33d already executed, finished at 2022-06-01 07:46:51.413073411 +0000 UTC, completed: true, failed: true, abortStatus: " Actual results: In step3: Failed to migrate the VM and the VM status is changed to paused Expected results: In step3: Migrate VM successfully, or forbid this operation if it's not supported Additional info: - asb-vm-dv-ocs-cephfs.yaml - /var/log/libvirt/qemu/openshift-cnv_asb-vm-dv-ocs-cephfs.log
Chenli, would you be able to re-test this scenario while using the RBD storage class? It might be that the issue here could be IO related. It would be helpful if you were able to capture the related virt-launcher and virt-handler logs. Would you also be able to post the Pod and VMI manifests?
(In reply to sgott from comment #3) > Chenli, would you be able to re-test this scenario while using the RBD > storage class? It might be that the issue here could be IO related. > > It would be helpful if you were able to capture the related virt-launcher > and virt-handler logs. Would you also be able to post the Pod and VMI > manifests? Stu, I re-test this scenario with ceph rbd storage class, migrate VM from node to another node successfully. The issue only happened on VM with cephfs storage class. Please see the attached file: asb-vm-dv-ocs-cephfs.yaml for the VMI manifests, and the described pod information in files: virt-launcher-*-tjs6l-source/target - Create VM # oc create -f asb-vm-dv-ocs-cephfs.yaml # oc get pod -o wide|grep virt-launcher virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l 1/1 Running 0 2m33s 10.128.1.72 dell-per730-63.lab.eng.pek2.redhat.com <none> 1/1 - Migrate VM in web console # oc get pod -o wide|grep virt-launcher virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx 0/1 ContainerCreating 0 4s <none> dell-per730-64.lab.eng.pek2.redhat.com <none> 0/1 virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l 1/1 Running 0 2m49s 10.128.1.72 dell-per730-63.lab.eng.pek2.redhat.com <none> 1/1 # oc get pod -o wide|grep virt-launcher virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx 1/1 Running 0 9s 10.129.0.65 dell-per730-64.lab.eng.pek2.redhat.com <none> 0/1 virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l 1/1 Running 0 2m54s 10.128.1.72 dell-per730-63.lab.eng.pek2.redhat.com <none> 1/1 - Describe the pod information # oc describe pod virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l > virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l-source # oc describe pod virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx > virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx-target # oc get pod -o wide|grep virt-launcher virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx 0/1 Completed 0 4m9s 10.129.0.65 dell-per730-64.lab.eng.pek2.redhat.com <none> 0/1 virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l 1/1 Running 0 6m54s 10.128.1.72 dell-per730-63.lab.eng.pek2.redhat.com <none> 0/1 # oc rsh virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l sh-4.4# virsh list --all Id Name State --------------------------------------------------- 1 openshift-cnv_asb-vm-dv-ocs-cephfs paused # tail -f /var/log/libvirt/qemu/openshift-cnv_asb-vm-dv-ocs-cephfs.log -device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 \ -device virtio-balloon-pci-non-transitional,id=balloon0,bus=pci.5,addr=0x0 \ -object rng-random,id=objrng0,filename=/dev/urandom \ -device virtio-rng-pci-non-transitional,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on 2022-06-07 09:08:10.706+0000: Domain id=1 is tainted: custom-ga-command 2022-06-07 09:10:37.059+0000: initiating migration 2022-06-07T09:10:42.355174Z qemu-kvm: warning: Failed to unlock byte 201 2022-06-07T09:10:42.355267Z qemu-kvm: warning: Failed to unlock byte 201
Cephfs does not support read-write-many as a valid mode, so it's not surprising that this sequence caused an IOError. However, this conflicting invalid/configuration should likely have been caught during provisioning. With that in mind, changing the component to Storage for further evaluation. Please feel free to change the component if this appears to be in error.
Cephfs actually does ReadWriteMany: https://github.com/ceph/ceph-csi/blob/c85d03c79edcd46c0399dbd0fedd6a8be7703a58/examples/cephfs/pvc.yaml#L8 Actually, we even tried to bring it to our upstream CI at one point: https://github.com/kubevirt/kubevirtci/pull/768 So it should be eligible for migration AFAIK, could I also join Stu's request for manifests and ask for the PVC & DataVolume? @chhu
Stu, I believe we provided all the information from storage side, it looks like a migration issue, can we move to Virt?
Thanks, Yan!
Hi, Stu, Alex Please see the dv, pvc, pv information in attached files: dv.yaml, pvc.yaml, pv.yaml, thank you! # oc get dv NAME PHASE PROGRESS RESTARTS AGE asb-dv-ocs-cephfs Succeeded 100.0% 146m # oc get dv asb-dv-ocs-cephfs -o yaml >dv.yaml # oc get pvc|grep asb-dv-ocs-cephfs asb-dv-ocs-cephfs Bound pvc-212aae52-7459-4d6b-bf6e-b9018bc56866 12Gi RWX ocs-storagecluster-cephfs 149m # oc get pvc asb-dv-ocs-cephfs -o yaml >pvc.yaml # oc get pv|grep asb-dv-ocs-cephfs pvc-212aae52-7459-4d6b-bf6e-b9018bc56866 12Gi RWX Delete Bound openshift-cnv/asb-dv-ocs-cephfs ocs-storagecluster-cephfs 150m # oc get pv pvc-212aae52-7459-4d6b-bf6e-b9018bc56866 -o yaml >pv.yaml
Hi, I will add CephFS support in KubevirtCI upstream, will try to reproduce it there.
Hi, I managed to reproduce the issue, but what fixed it is a configuration of CephFS CRD can you please provide us with the CephFileSystem CRD? I think it may be misconfiguration of CephFS. The number of data and metadata replicas should be equal to the number of OSDs that are running on the cluster. TIA Igor
Also we suspect this issue as the root cause: https://github.com/ceph/ceph-csi/issues/3562
(In reply to Igor Bezukh from comment #17) > Hi, > > I managed to reproduce the issue, but what fixed it is a configuration of > CephFS CRD > > can you please provide us with the CephFileSystem CRD? I think it may be > misconfiguration of CephFS. > > The number of data and metadata replicas should be equal to the number of > OSDs that are running on the cluster. > > TIA > Igor Hi Igor I'll setup the env and provide the CephFileSystem CRD later, thank you!
(In reply to chhu from comment #20) > (In reply to Igor Bezukh from comment #17) > > Hi, > > > > I managed to reproduce the issue, but what fixed it is a configuration of > > CephFS CRD > > > > can you please provide us with the CephFileSystem CRD? I think it may be > > misconfiguration of CephFS. > > > > The number of data and metadata replicas should be equal to the number of > > OSDs that are running on the cluster. > > > > TIA > > Igor > > Hi Igor > > I'll setup the env and provide the CephFileSystem CRD later, thank you! Hi Igor I reproduced it on my environment with the steps in "Description" part. For the environment setup, I just installed the ODF, and I haven't do any configuration for the CephFileSystem CRD. will you please help to have a check on my env ? I sent the env information to you by gchat, thank you!
The issue that we see with live migration is a side effect of the original issue with CephFS RWX, as described here: https://github.com/ceph/ceph-csi/issues/3562 I will move this bug to CNV Storage team for further observation.
OCP storage team here: if it's really https://github.com/ceph/ceph-csi/issues/3562, i.e. two Pods with different SELinux contexts are trying to use the same ReadWriteMany volume at the same time, then it's not a bug, but a feature of Kubernetes / OpenShift - it protects data "leaked" from a Pod to a different Pod that uses a different SELinux context. Please get yaml of both Pods an check their pod.spec.securityContext.seLinuxOptions and/or "crictl inspect <container>" if it's really the case. If two (or more) Pods want to share data on a volume, they must run with the same SELinux context (pod.spec.securityContext.seLinuxOptions or spec.containers[*].securityContext.seLinuxOptions of all Pod's containers that have the volume mounted). If the fields are missing or empty, the container runtime will assign a random one for each Pod! In OpenShift, if the Pods are in the same namespace and their SCC has "SELinuxContext: type: MustRunAs" (e.g. "restricted" SCC), OCP will assign SELinux context to the Pods from from namespace annotations, i.e. they should run with the same SELinux context and be able to share a volume. (If not, we have a bug somewhere.) However, if the Pods are in different namespaces *or* their SCC has a different "SELinuxContext" value, then their SELinux contexts are most probably different and they can't share data on a volume. It's somewhat documented at https://docs.openshift.com/container-platform/4.12/authentication/managing-security-context-constraints.html To sum it, if "restricted" SCC is not enough for CNV, please use any other SCC that uses "SELinuxContext: type: MustRunAs" and all Pods in the same namespace will be able to share their volumes. There are other workarounds possible, but SCC would be the best.
Stu, can you take a look at comment #23 from Jan regarding SELinux contexts? It seems that the migration destination Pod really should start with the same context as the source.
Matching the security context of the source on the target is what was done in this PR: https://github.com/kubevirt/kubevirt/pull/9246
verify with build: CNV-v4.14.0.rhel9-1553 step: 1. create a vm with CephFS storage ... storage: resources: requests: storage: 30Gi storageClassName: ocs-storagecluster-cephfs ... 2. start vm and do live migration $ oc get pods NAME READY STATUS RESTARTS AGE virt-launcher-vm-fedora-jpcz4 1/1 Running 0 3m13s virt-launcher-vm-fedora-l685m 0/1 Completed 0 8m5s $ oc get virtualmachineinstancemigrations.kubevirt.io NAME PHASE VMI vm-fedora-migration-uouwq Succeeded vm-fedora $ oc get vm NAME AGE STATUS READY vm-fedora 10m Running True migration succeeded, move to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:6817