Bug 2092271

Summary:	CephFS-based VM status changes to "paused" after migration
Product:	Container Native Virtualization (CNV)	Reporter:	chhu
Component:	Virtualization	Assignee:	Jed Lejosne <jlejosne>
Status:	CLOSED ERRATA	QA Contact:	zhe peng <zpeng>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.11.0	CC:	acardace, akalenyu, alitke, danken, fdeutsch, ibezukh, jlejosne, jsafrane, kbidarka, nashok, pelauter, sgott, yadu
Target Milestone:	---	Flags:	ibezukh: needinfo+
Target Release:	4.14.0
Hardware:	x86_64
OS:	Linux
Whiteboard:	libvirt_CNV_INT
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	When you use two pods with different SELinux contexts, VMs with the ocs-storagecluster-cephfs storage class no longer fail to migrate. (BZ#2092271)	Story Points:	---
Clone Of:
Clones:	2174226 (view as bug list)		Environment:
Last Closed:	2023-11-08 14:05:03 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	2135381, 2174226

Description chhu 2022-06-01 08:25:29 UTC

Description of problem:
Failed to migrate VM with ocs-storagecluster-cephfs and the VM's status is changed to Paused after migration

Version-Release number of selected component (if applicable):
CNV4.10

How reproducible:
100%

Steps to Reproduce:
1. Start a VM form DV with storage class: ocs-storagecluster-cephfs
# oc create -f asb-vm-dv-ocs-cephfs.yaml
2. Login to VM, touch file: migration
3. Try to migrate the VM in web console by clicking "Migrate Node to Node",
   the VM is not migrated and the status is changed to Paused.

# oc get pod -o wide|grep virt-launcher| grep cephfs
virt-launcher-asb-vm-dv-ocs-cephfs-dg747               1/1     Running   0              46s    10.129.1.92    dell-per730-64.lab.eng.pek2.redhat.com   <none>           0/1
virt-launcher-asb-vm-dv-ocs-cephfs-wfft5               1/1     Running   0              2m1s   10.128.1.61    dell-per730-63.lab.eng.pek2.redhat.com   <none>

# oc get pod -o wide|grep virt-launcher| grep cephfs
virt-launcher-asb-vm-dv-ocs-cephfs-dg747               0/1     Completed   0              71s     10.129.1.92    dell-per730-64.lab.eng.pek2.redhat.com   <none>           0/1
virt-launcher-asb-vm-dv-ocs-cephfs-wfft5               1/1     Running     0              2m26s   10.128.1.61    dell-per730-63.lab.eng.pek2.redhat.com   <none>

# oc rsh virt-launcher-asb-vm-dv-ocs-cephfs-wfft5
sh-4.4# virsh list --all
 Id   Name                                 State
---------------------------------------------------
 1    openshift-cnv_asb-vm-dv-ocs-cephfs   paused

# mount|grep cephfs
172.30.225.152:6789,172.30.162.143:6789,172.30.149.241:6789:/volumes/csi/csi-vol-a604ad71-e17e-11ec-93c3-0a580a82017d/484a041a-0d62-43da-bcfe-218c2985be1f on /run/kubevirt-private/vmi-disks/rootdisk type ceph (rw,relatime,seclabel,name=csi-cephfs-node,secret=<hidden>,acl,mds_namespace=ocs-storagecluster-cephfilesystem)

4. Get the error messages:
"server error. command Migrate failed: "migration job 60df6743-158c-4afd-b07f-01e1f7c6b33d already executed, finished at 2022-06-01 07:46:51.413073411 +0000 UTC, completed: true, failed: true, abortStatus: "

Actual results:
In step3: Failed to migrate the VM and the VM status is changed to paused

Expected results:
In step3: Migrate VM successfully, or forbid this operation if it's not supported

Additional info:
- asb-vm-dv-ocs-cephfs.yaml
- /var/log/libvirt/qemu/openshift-cnv_asb-vm-dv-ocs-cephfs.log

Comment 3 sgott 2022-06-01 12:23:29 UTC

Chenli, would you be able to re-test this scenario while using the RBD storage class? It might be that the issue here could be IO related.

It would be helpful if you were able to capture the related virt-launcher and virt-handler logs. Would you also be able to post the Pod and VMI manifests?

Comment 4 chhu 2022-06-07 09:33:38 UTC

(In reply to sgott from comment #3)
> Chenli, would you be able to re-test this scenario while using the RBD
> storage class? It might be that the issue here could be IO related.
> 
> It would be helpful if you were able to capture the related virt-launcher
> and virt-handler logs. Would you also be able to post the Pod and VMI
> manifests?

Stu, I re-test this scenario with ceph rbd storage class, migrate VM from node to another node successfully.
The issue only happened on VM with cephfs storage class.

Please see the attached file: asb-vm-dv-ocs-cephfs.yaml for the VMI manifests,
and the described pod information in files: virt-launcher-*-tjs6l-source/target

- Create VM
# oc create -f asb-vm-dv-ocs-cephfs.yaml
# oc get pod -o wide|grep virt-launcher
virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l               1/1     Running   0             2m33s   10.128.1.72    dell-per730-63.lab.eng.pek2.redhat.com   <none>           1/1

- Migrate VM in web console
# oc get pod -o wide|grep virt-launcher
virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx               0/1     ContainerCreating   0             4s      <none>         dell-per730-64.lab.eng.pek2.redhat.com   <none>           0/1 
virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l               1/1     Running             0             2m49s   10.128.1.72    dell-per730-63.lab.eng.pek2.redhat.com   <none>           1/1  

# oc get pod -o wide|grep virt-launcher
virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx               1/1     Running   0             9s      10.129.0.65    dell-per730-64.lab.eng.pek2.redhat.com   <none>           0/1
virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l               1/1     Running   0             2m54s   10.128.1.72    dell-per730-63.lab.eng.pek2.redhat.com   <none>           1/1

- Describe the pod information
# oc describe pod virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l > virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l-source
# oc describe pod virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx > virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx-target 

# oc get pod -o wide|grep virt-launcher
virt-launcher-asb-vm-dv-ocs-cephfs-pjlsx               0/1     Completed   0             4m9s    10.129.0.65    dell-per730-64.lab.eng.pek2.redhat.com   <none>           0/1
virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l               1/1     Running     0             6m54s   10.128.1.72    dell-per730-63.lab.eng.pek2.redhat.com   <none>           0/1

# oc rsh virt-launcher-asb-vm-dv-ocs-cephfs-tjs6l
sh-4.4# virsh list --all
 Id   Name                                 State
---------------------------------------------------
 1    openshift-cnv_asb-vm-dv-ocs-cephfs   paused

# tail -f /var/log/libvirt/qemu/openshift-cnv_asb-vm-dv-ocs-cephfs.log
-device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 \
-device virtio-balloon-pci-non-transitional,id=balloon0,bus=pci.5,addr=0x0 \
-object rng-random,id=objrng0,filename=/dev/urandom \
-device virtio-rng-pci-non-transitional,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2022-06-07 09:08:10.706+0000: Domain id=1 is tainted: custom-ga-command
2022-06-07 09:10:37.059+0000: initiating migration
2022-06-07T09:10:42.355174Z qemu-kvm: warning: Failed to unlock byte 201
2022-06-07T09:10:42.355267Z qemu-kvm: warning: Failed to unlock byte 201

Comment 7 sgott 2022-06-08 12:16:42 UTC

Cephfs does not support read-write-many as a valid mode, so it's not surprising that this sequence caused an IOError.

However, this conflicting invalid/configuration should likely have been caught during provisioning. With that in mind, changing the component to Storage for further evaluation. Please feel free to change the component if this appears to be in error.

Comment 8 Alex Kalenyuk 2022-06-15 12:36:28 UTC

Cephfs actually does ReadWriteMany:
https://github.com/ceph/ceph-csi/blob/c85d03c79edcd46c0399dbd0fedd6a8be7703a58/examples/cephfs/pvc.yaml#L8
Actually, we even tried to bring it to our upstream CI at one point:
https://github.com/kubevirt/kubevirtci/pull/768

So it should be eligible for migration AFAIK,
could I also join Stu's request for manifests and ask for the PVC & DataVolume?

@chhu

Comment 9 Yan Du 2022-06-29 12:22:23 UTC

Stu, I believe we provided all the information from storage side, it looks like a migration issue, can we move to Virt?

Comment 10 sgott 2022-06-29 14:04:51 UTC

Thanks, Yan!

Comment 14 chhu 2022-07-05 06:27:56 UTC

Hi, Stu, Alex

Please see the dv, pvc, pv information in attached files: dv.yaml, pvc.yaml, pv.yaml, thank you!
# oc get dv
NAME                PHASE       PROGRESS   RESTARTS   AGE
asb-dv-ocs-cephfs   Succeeded   100.0%                146m

# oc get dv asb-dv-ocs-cephfs -o yaml >dv.yaml

# oc get pvc|grep asb-dv-ocs-cephfs
asb-dv-ocs-cephfs   Bound    pvc-212aae52-7459-4d6b-bf6e-b9018bc56866   12Gi       RWX            ocs-storagecluster-cephfs   149m

# oc get pvc asb-dv-ocs-cephfs -o yaml >pvc.yaml

# oc get pv|grep asb-dv-ocs-cephfs
pvc-212aae52-7459-4d6b-bf6e-b9018bc56866   12Gi       RWX            Delete           Bound    openshift-cnv/asb-dv-ocs-cephfs                        ocs-storagecluster-cephfs              150m

# oc get pv pvc-212aae52-7459-4d6b-bf6e-b9018bc56866 -o yaml >pv.yaml

Comment 15 Igor Bezukh 2022-11-02 14:02:41 UTC

Hi,

I will add CephFS support in KubevirtCI upstream, will try to reproduce it there.

Comment 17 Igor Bezukh 2022-12-14 10:02:13 UTC

Hi,

I managed to reproduce the issue, but what fixed it is a configuration of CephFS CRD

can you please provide us with the CephFileSystem CRD? I think it may be misconfiguration of CephFS.

The number of data and metadata replicas should be equal to the number of OSDs that are running on the cluster.

TIA
Igor

Comment 18 Igor Bezukh 2022-12-14 12:35:11 UTC

Also we suspect this issue as the root cause:

https://github.com/ceph/ceph-csi/issues/3562

Comment 20 chhu 2023-01-04 03:29:21 UTC

(In reply to Igor Bezukh from comment #17)
> Hi,
> 
> I managed to reproduce the issue, but what fixed it is a configuration of
> CephFS CRD
> 
> can you please provide us with the CephFileSystem CRD? I think it may be
> misconfiguration of CephFS.
> 
> The number of data and metadata replicas should be equal to the number of
> OSDs that are running on the cluster.
> 
> TIA
> Igor

Hi Igor

I'll setup the env and provide the CephFileSystem CRD later, thank you!

Comment 21 chhu 2023-01-16 06:54:49 UTC

(In reply to chhu from comment #20)
> (In reply to Igor Bezukh from comment #17)
> > Hi,
> > 
> > I managed to reproduce the issue, but what fixed it is a configuration of
> > CephFS CRD
> > 
> > can you please provide us with the CephFileSystem CRD? I think it may be
> > misconfiguration of CephFS.
> > 
> > The number of data and metadata replicas should be equal to the number of
> > OSDs that are running on the cluster.
> > 
> > TIA
> > Igor
> 
> Hi Igor
> 
> I'll setup the env and provide the CephFileSystem CRD later, thank you!

Hi Igor

I reproduced it on my environment with the steps in "Description" part.
For the environment setup, I just installed the ODF, 
and I haven't do any configuration for the CephFileSystem CRD.
will you please help to have a check on my env ?
I sent the env information to you by gchat, thank you!

Comment 22 Igor Bezukh 2023-01-16 07:58:41 UTC

The issue that we see with live migration is a side effect of the original issue with CephFS RWX, as described here: https://github.com/ceph/ceph-csi/issues/3562

I will move this bug to CNV Storage team for further observation.

Comment 23 Jan Safranek 2023-01-23 11:26:34 UTC

OCP storage team here: if it's really https://github.com/ceph/ceph-csi/issues/3562, i.e. two Pods with different SELinux contexts are trying to use the same ReadWriteMany volume at the same time, then it's not a bug, but a feature of Kubernetes / OpenShift - it protects data "leaked" from a Pod to a different Pod that uses a different SELinux context. Please get yaml of both Pods an check their pod.spec.securityContext.seLinuxOptions and/or "crictl inspect <container>" if it's really the case.

If two (or more) Pods want to share data on a volume, they must run with the same SELinux context (pod.spec.securityContext.seLinuxOptions or spec.containers[*].securityContext.seLinuxOptions of all Pod's containers that have the volume mounted). If the fields are missing or empty, the container runtime will assign a random one for each Pod!

In OpenShift, if the Pods are in the same namespace and their SCC has "SELinuxContext: type: MustRunAs" (e.g. "restricted" SCC), OCP will assign SELinux context to the Pods from from namespace annotations, i.e. they should run with the same SELinux context and be able to share a volume. (If not, we have a bug somewhere.) However, if the Pods are in different namespaces *or* their SCC has a different "SELinuxContext" value, then their SELinux contexts are most probably different and they can't share data on a volume.

It's somewhat documented at https://docs.openshift.com/container-platform/4.12/authentication/managing-security-context-constraints.html

To sum it, if "restricted" SCC is not enough for CNV, please use any other SCC that uses "SELinuxContext: type: MustRunAs" and all Pods in the same namespace will be able to share their volumes. There are other workarounds possible, but SCC would be the best.

Comment 24 Adam Litke 2023-02-20 17:41:56 UTC

Stu, can you take a look at comment #23 from Jan regarding SELinux contexts?  It seems that the migration destination Pod really should start with the same context as the source.

Comment 25 Jed Lejosne 2023-03-01 15:24:07 UTC

Matching the security context of the source on the target is what was done in this PR:
https://github.com/kubevirt/kubevirt/pull/9246

Comment 26 zhe peng 2023-08-09 06:43:10 UTC

verify with build: CNV-v4.14.0.rhel9-1553

step:
1. create a vm with CephFS storage
...
storage:
          resources:
            requests:
              storage: 30Gi
          storageClassName: ocs-storagecluster-cephfs
...

2. start vm and do live migration
$ oc get pods
NAME                            READY   STATUS      RESTARTS   AGE
virt-launcher-vm-fedora-jpcz4   1/1     Running     0          3m13s
virt-launcher-vm-fedora-l685m   0/1     Completed   0          8m5s

$ oc get virtualmachineinstancemigrations.kubevirt.io 
NAME                        PHASE       VMI
vm-fedora-migration-uouwq   Succeeded   vm-fedora

$ oc get vm
NAME        AGE   STATUS    READY
vm-fedora   10m   Running   True


migration succeeded, move to verified.

Comment 29 errata-xmlrpc 2023-11-08 14:05:03 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Virtualization 4.14.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:6817

Comment 30 Red Hat Bugzilla 2024-05-11 04:25:04 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days