Bug 2044348
| Summary: | VM with ocs-storagecluster-cephfs sc keeps in CrashLoopBackOff | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Yan Du <yadu> |
| Component: | Storage | Assignee: | Alexander Wels <awels> |
| Status: | CLOSED ERRATA | QA Contact: | Yan Du <yadu> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.10.0 | CC: | akalenyu, alitke, awels, cnv-qe-bugs, ndevos, pelauter |
| Target Milestone: | --- | ||
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | virt-launcher v4.10.0-216, CNV v4.10.0-686 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-16 16:06:49 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Is it possible related to miss backport for #bug 1976730? If this is happening in 4.10 it is something else because the fix for that bug should be in 4.10 Hey Niels, do you know what the cephfs block size is? We currently use a 1M aligned size but qemu doesn't seem to like that in this particular scenario. Appears that the automatic resize code was not aligning properly. The linked PR in KubeVirt will fix it. @alitke The problem is not in the alignment in CDI, the problem is with the online resize in KubeVirt not aligning. I did some testing and after the import from CDI, the virtual disk image size on a 10Gi PVC is 10146021376 bytes. Which is 4k aligned (actually 1Mi aligned). Then after I attempted to start the VM and it failed. I checked the virtual size of the image again, and this time it was 10146860544 bytes which is not 4k aligned (looks like it is 512 aligned). I also found a PR to fix exactly that in the online resize which I have linked to this bugzilla. Peter, this should be a blocker as we may have problems creating VMs on any storage with a 4k block size. Verify on CNV v4.10.0-686, can not reproduce the issue again. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0947 |
Description of problem: VM with ocs-storagecluster-cephfs sc keeps in CrashLoopBackOff Version-Release number of selected component (if applicable): CNV v4.10.0-605 How reproducible: Always Steps to Reproduce: 1. Create DV with cephfs --- apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: name: simple-dv-cephfs spec: source: http: url: "http://.../files/cnv-tests/fedora-images/Fedora-Cloud-Base-34-1.2.x86_64.qcow2" pvc: storageClassName: ocs-storagecluster-cephfs volumeMode: Filesystem accessModes: - ReadWriteOnce resources: requests: storage: 10Gi 2. Create VM --- apiVersion: kubevirt.io/v1 kind: VirtualMachine metadata: name: simple-vm spec: running: true template: metadata: labels: kubevirt.io/domain: simple-vm kubevirt.io/vm: simple-vm spec: domain: devices: disks: - disk: bus: virtio name: dv-disk - disk: bus: virtio name: cloudinitdisk resources: requests: memory: 2048M volumes: - dataVolume: name: simple-dv-cephfs name: dv-disk - cloudInitNoCloud: userData: | #cloud-config password: fedora chpasswd: { expire: False } name: cloudinitdisk Actual results: VM keeps in CrashLoopBackOff $ oc get dv NAME PHASE PROGRESS RESTARTS AGE simple-dv-cephfs Succeeded 100.0% 3h15m $ oc get vm NAME AGE STATUS READY simple-vm 12m CrashLoopBackOff False Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 42s virtualmachine-controller Created virtual machine pod virt-launcher-simple-vm-jk2cx Warning SyncFailed 31s virt-handler server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2022-01-24T07:13:50.375272Z qemu-kvm: -device virtio-blk-pci-non-transitional,bus=pci.4,addr=0x0,drive=libvirt-2-format,id=ua-dv-disk,bootindex=1,write-cache=on,werror=stop,rerror=stop: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment')" Warning SyncFailed 30s virt-handler server error. command SyncVMI failed: "LibvirtError(Code=1, Domain=10, Message='internal error: qemu unexpectedly closed the monitor: 2022-01-24T07:13:51.469326Z qemu-kvm: -device virtio-blk-pci-non-transitional,bus=pci.4,addr=0x0,drive=libvirt-2-format,id=ua-dv-disk,bootindex=1,write-cache=on,werror=stop,rerror=stop: Cannot get 'write' permission without 'resize': Image size is not a multiple of request alignment')" Expected results: VM can be running well Additional info: workaround VM can be running if disable filesystemOverhead in hco