Description of problem: When try to clone a DV with hostpath-csi-pvc-block SC sometimes the DV fail to be cloned. Version-Release number of selected component (if applicable): 4.12.1 How reproducible: flaky Steps to Reproduce: 1. Import a DV with hostpath-csi-pvc-block SC Actual results: DV fail to be created Expected results: DV created successfully Additional info: $ oc logs importer-dv-source I0222 12:06:44.028076 1 importer.go:104] Starting importer I0222 12:06:44.028126 1 importer.go:171] begin import process I0222 12:06:44.040358 1 data-processor.go:379] Calculating available size I0222 12:06:44.040377 1 data-processor.go:391] Checking out file system volume size. I0222 12:06:44.040388 1 data-processor.go:399] Request image size not empty. I0222 12:06:44.040395 1 data-processor.go:404] Target size 1136234735. I0222 12:06:44.040785 1 nbdkit.go:303] Waiting for nbdkit PID. I0222 12:06:44.541810 1 nbdkit.go:324] nbdkit ready. I0222 12:06:44.541830 1 data-processor.go:282] New phase: Convert I0222 12:06:44.541871 1 data-processor.go:288] Validating image I0222 12:06:44.933526 1 qemu.go:259] 0.00 E0222 12:06:44.942566 1 prlimit.go:174] qemu-img failed output is: E0222 12:06:44.942581 1 prlimit.go:175] (0.00/100%) E0222 12:06:44.942587 1 prlimit.go:176] qemu-img: /data/disk.img: error while converting raw: Could not create '/data/disk.img': Permission denied E0222 12:06:44.942649 1 data-processor.go:278] exit status 1 qemu-img execution failed kubevirt.io/containerized-data-importer/pkg/system.executeWithLimits /remote-source/app/pkg/system/prlimit.go:178 kubevirt.io/containerized-data-importer/pkg/system.ExecWithLimits /remote-source/app/pkg/system/prlimit.go:111 kubevirt.io/containerized-data-importer/pkg/image.convertToRaw /remote-source/app/pkg/image/qemu.go:128 kubevirt.io/containerized-data-importer/pkg/image.(*qemuOperations).ConvertToRawStream /remote-source/app/pkg/image/qemu.go:146 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).convert /remote-source/app/pkg/importer/data-processor.go:303 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6 /remote-source/app/pkg/importer/data-processor.go:240 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:275 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:184 main.handleImport /remote-source/app/cmd/cdi-importer/importer.go:177 main.main /remote-source/app/cmd/cdi-importer/importer.go:143 runtime.main /usr/lib/golang/src/runtime/proc.go:250 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1594 could not convert image to raw kubevirt.io/containerized-data-importer/pkg/image.convertToRaw /remote-source/app/pkg/image/qemu.go:136 kubevirt.io/containerized-data-importer/pkg/image.(*qemuOperations).ConvertToRawStream /remote-source/app/pkg/image/qemu.go:146 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).convert /remote-source/app/pkg/importer/data-processor.go:303 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6 /remote-source/app/pkg/importer/data-processor.go:240 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:275 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:184 main.handleImport /remote-source/app/cmd/cdi-importer/importer.go:177 main.main /remote-source/app/cmd/cdi-importer/importer.go:143 runtime.main /usr/lib/golang/src/runtime/proc.go:250 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1594 Conversion to Raw failed kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).convert /remote-source/app/pkg/importer/data-processor.go:305 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6 /remote-source/app/pkg/importer/data-processor.go:240 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:275 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:184 main.handleImport /remote-source/app/cmd/cdi-importer/importer.go:177 main.main /remote-source/app/cmd/cdi-importer/importer.go:143 runtime.main /usr/lib/golang/src/runtime/proc.go:250 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1594 Unable to convert source data to target format kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6 /remote-source/app/pkg/importer/data-processor.go:242 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:275 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:184 main.handleImport /remote-source/app/cmd/cdi-importer/importer.go:177 main.main /remote-source/app/cmd/cdi-importer/importer.go:143 runtime.main /usr/lib/golang/src/runtime/proc.go:250 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1594 E0222 12:06:44.942750 1 importer.go:180] exit status 1 qemu-img execution failed kubevirt.io/containerized-data-importer/pkg/system.executeWithLimits /remote-source/app/pkg/system/prlimit.go:178 kubevirt.io/containerized-data-importer/pkg/system.ExecWithLimits /remote-source/app/pkg/system/prlimit.go:111 kubevirt.io/containerized-data-importer/pkg/image.convertToRaw /remote-source/app/pkg/image/qemu.go:128 kubevirt.io/containerized-data-importer/pkg/image.(*qemuOperations).ConvertToRawStream /remote-source/app/pkg/image/qemu.go:146 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).convert /remote-source/app/pkg/importer/data-processor.go:303 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6 /remote-source/app/pkg/importer/data-processor.go:240 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:275 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:184 main.handleImport /remote-source/app/cmd/cdi-importer/importer.go:177 main.main /remote-source/app/cmd/cdi-importer/importer.go:143 runtime.main /usr/lib/golang/src/runtime/proc.go:250 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1594 could not convert image to raw kubevirt.io/containerized-data-importer/pkg/image.convertToRaw /remote-source/app/pkg/image/qemu.go:136 kubevirt.io/containerized-data-importer/pkg/image.(*qemuOperations).ConvertToRawStream /remote-source/app/pkg/image/qemu.go:146 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).convert /remote-source/app/pkg/importer/data-processor.go:303 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6 /remote-source/app/pkg/importer/data-processor.go:240 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:275 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:184 main.handleImport /remote-source/app/cmd/cdi-importer/importer.go:177 main.main /remote-source/app/cmd/cdi-importer/importer.go:143 runtime.main /usr/lib/golang/src/runtime/proc.go:250 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1594 Conversion to Raw failed kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).convert /remote-source/app/pkg/importer/data-processor.go:305 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6 /remote-source/app/pkg/importer/data-processor.go:240 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:275 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:184 main.handleImport /remote-source/app/cmd/cdi-importer/importer.go:177 main.main /remote-source/app/cmd/cdi-importer/importer.go:143 runtime.main /usr/lib/golang/src/runtime/proc.go:250 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1594 Unable to convert source data to target format kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6 /remote-source/app/pkg/importer/data-processor.go:242 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause /remote-source/app/pkg/importer/data-processor.go:275 kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData /remote-source/app/pkg/importer/data-processor.go:184 main.handleImport /remote-source/app/cmd/cdi-importer/importer.go:177 main.main /remote-source/app/cmd/cdi-importer/importer.go:143 runtime.main /usr/lib/golang/src/runtime/proc.go:250 runtime.goexit /usr/lib/golang/src/runtime/asm_amd64.s:1594 $oc get dv -oyaml message: 'Unable to process data: Unable to convert source data to target format: Conversion to Raw failed: could not convert image to raw : qemu-img execution failed: exit
Looking at the cluster where it initially occurred, it seems that fsGroup chmod didn't happen for this particular PVC on the node path so that would explain the permission error. In the node, there are a bunch of errors about the ceph backing block pvc: Error: UnmapDevice.TearDownDevice failed for volume "pvc-1c059266-431a-423e-820f-94efd0116410" (UniqueName: "kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com^0001-0011-openshift-storage-0000000000000001-6fb3d41e-5da1-489c-a05f-170ee3ad0be1") on node "cnv-qe-infra-18.cnvqe2.lab.eng.rdu2.redhat.com" : kubernetes.io/csi: blockMapper.unstageVolumeForBlock failed: rpc error: code = Internal desc = rbd: unmap for spec (ocs-storagecluster-cephblockpool/csi-vol-6fb3d41e-5da1-489c-a05f-170ee3ad0be1) failed (an error (exit status 16) occurred while running rbd args: [unmap ocs-storagecluster-cephblockpool/csi-vol-6fb3d41e-5da1-489c-a05f-170ee3ad0be1 --device-type krbd --options noudev]): (rbd: sysfs write failed Feb 22 13:19:32 cnv-qe-infra-18 kubenswrapper[7753]: rbd: unmap failed: (16) Device or resource busy Feb 22 13:19:32 cnv-qe-infra-18 kubenswrapper[7753]: ) So this leads me to think we have some rbd issue here and not an HPP one some digging yielded https://lore.kernel.org/all/20220923035826.GA1830185@onthe.net.au/T/
I'm also seeing this on bm04 in 4.13 with hostpath-csi-basic, which is local storage, NOT backed by Ceph PVC.
We see this failure on hostpath-csi-basic on PSI cluster as well (local storage not backed by hpp pool) Continue to work with Alex on tackling the issue
So the flow was: 1. Create a DV on HPP storage, which is WaitForFirstConsumer 2. Create a first consumer pod with fsGroup 2000 3. The importer pod gets created right away by CDI, its fsGroup is 107 akalenyu increased a kubelet log verbosity and found in the logs that the 2000 fsGroup gets applied after 107, and that's why the importer pod doesn't have the right permissions: oc debug node/<node_name> sh-5.1# journalctl -u kubelet --no-pager | grep 'Perform recursive ownership change for directory" path="/var/lib/kubelet/pods/6e865590-e8d2-4c6d-8322-53b984ecc6af/volumes/kubernetes.io~csi/pvc-32afa320-9a92-4b1b-b409-724b20067dd4/mou' -A300 -B 300 May 15 16:37:12 c01-jp413-h-4vqdp-worker-0-fnw9h kubenswrapper[3809]: I0515 16:37:12.468991 3809 csi_mounter.go:344] kubernetes.io/csi: mounter.SetupAt fsGroup [107] applied successfully to pvc-32afa320-9a92-4b1b-b409-724b20067dd4 ... May 15 16:37:12 c01-jp413-h-4vqdp-worker-0-fnw9h kubenswrapper[3809]: I0515 16:37:12.563296 3809 csi_mounter.go:344] kubernetes.io/csi: mounter.SetupAt fsGroup [2000] applied successfully to pvc-32afa320-9a92-4b1b-b409-724b20067dd4 It is worth opening a k8s issue. From our side - considering that the virt-launcher pod has fsGroup 107, and it is most likely to be the first consumer for a DV, we will change the first consumer pod's fsGroup to 107 in our automation.
*** Bug 2192346 has been marked as a duplicate of this bug. ***
Changing the first consumer pod's fsGroup to 107 fixes the issue. But we still need to write a reproducer and open an issue for k8s.