Bug 2172853 - Fail to import a datavolume with hostpath-csi-pvc-block storage class
Summary: Fail to import a datavolume with hostpath-csi-pvc-block storage class
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 4.12.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.13.1
Assignee: Alex Kalenyuk
QA Contact: Jenia Peimer
URL:
Whiteboard:
: 2192346 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-02-23 10:13 UTC by dalia
Modified: 2023-05-30 14:22 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-05-30 14:22:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker CNV-26072 0 None None None 2023-02-23 10:14:29 UTC

Description dalia 2023-02-23 10:13:26 UTC
Description of problem:
When try to clone a DV with hostpath-csi-pvc-block SC sometimes the DV fail to be cloned.

Version-Release number of selected component (if applicable):
4.12.1

How reproducible:
flaky

Steps to Reproduce:
1. Import a DV with hostpath-csi-pvc-block SC

Actual results:
DV fail to be created

Expected results:
DV created successfully 

Additional info:
$ oc logs importer-dv-source
I0222 12:06:44.028076       1 importer.go:104] Starting importer
I0222 12:06:44.028126       1 importer.go:171] begin import process
I0222 12:06:44.040358       1 data-processor.go:379] Calculating available size
I0222 12:06:44.040377       1 data-processor.go:391] Checking out file system volume size.
I0222 12:06:44.040388       1 data-processor.go:399] Request image size not empty.
I0222 12:06:44.040395       1 data-processor.go:404] Target size 1136234735.
I0222 12:06:44.040785       1 nbdkit.go:303] Waiting for nbdkit PID.
I0222 12:06:44.541810       1 nbdkit.go:324] nbdkit ready.
I0222 12:06:44.541830       1 data-processor.go:282] New phase: Convert
I0222 12:06:44.541871       1 data-processor.go:288] Validating image
I0222 12:06:44.933526       1 qemu.go:259] 0.00
E0222 12:06:44.942566       1 prlimit.go:174] qemu-img failed output is:
E0222 12:06:44.942581       1 prlimit.go:175]     (0.00/100%)


E0222 12:06:44.942587       1 prlimit.go:176] qemu-img: /data/disk.img: error while converting raw: Could not create '/data/disk.img': Permission denied

E0222 12:06:44.942649       1 data-processor.go:278] exit status 1
qemu-img execution failed
kubevirt.io/containerized-data-importer/pkg/system.executeWithLimits
	/remote-source/app/pkg/system/prlimit.go:178
kubevirt.io/containerized-data-importer/pkg/system.ExecWithLimits
	/remote-source/app/pkg/system/prlimit.go:111
kubevirt.io/containerized-data-importer/pkg/image.convertToRaw
	/remote-source/app/pkg/image/qemu.go:128
kubevirt.io/containerized-data-importer/pkg/image.(*qemuOperations).ConvertToRawStream
	/remote-source/app/pkg/image/qemu.go:146
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).convert
	/remote-source/app/pkg/importer/data-processor.go:303
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6
	/remote-source/app/pkg/importer/data-processor.go:240
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	/remote-source/app/pkg/importer/data-processor.go:275
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	/remote-source/app/pkg/importer/data-processor.go:184
main.handleImport
	/remote-source/app/cmd/cdi-importer/importer.go:177
main.main
	/remote-source/app/cmd/cdi-importer/importer.go:143
runtime.main
	/usr/lib/golang/src/runtime/proc.go:250
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1594
could not convert image to raw 
kubevirt.io/containerized-data-importer/pkg/image.convertToRaw
	/remote-source/app/pkg/image/qemu.go:136
kubevirt.io/containerized-data-importer/pkg/image.(*qemuOperations).ConvertToRawStream
	/remote-source/app/pkg/image/qemu.go:146
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).convert
	/remote-source/app/pkg/importer/data-processor.go:303
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6
	/remote-source/app/pkg/importer/data-processor.go:240
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	/remote-source/app/pkg/importer/data-processor.go:275
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	/remote-source/app/pkg/importer/data-processor.go:184
main.handleImport
	/remote-source/app/cmd/cdi-importer/importer.go:177
main.main
	/remote-source/app/cmd/cdi-importer/importer.go:143
runtime.main
	/usr/lib/golang/src/runtime/proc.go:250
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1594
Conversion to Raw failed
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).convert
	/remote-source/app/pkg/importer/data-processor.go:305
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6
	/remote-source/app/pkg/importer/data-processor.go:240
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	/remote-source/app/pkg/importer/data-processor.go:275
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	/remote-source/app/pkg/importer/data-processor.go:184
main.handleImport
	/remote-source/app/cmd/cdi-importer/importer.go:177
main.main
	/remote-source/app/cmd/cdi-importer/importer.go:143
runtime.main
	/usr/lib/golang/src/runtime/proc.go:250
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1594
Unable to convert source data to target format
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6
	/remote-source/app/pkg/importer/data-processor.go:242
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	/remote-source/app/pkg/importer/data-processor.go:275
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	/remote-source/app/pkg/importer/data-processor.go:184
main.handleImport
	/remote-source/app/cmd/cdi-importer/importer.go:177
main.main
	/remote-source/app/cmd/cdi-importer/importer.go:143
runtime.main
	/usr/lib/golang/src/runtime/proc.go:250
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1594
E0222 12:06:44.942750       1 importer.go:180] exit status 1
qemu-img execution failed
kubevirt.io/containerized-data-importer/pkg/system.executeWithLimits
	/remote-source/app/pkg/system/prlimit.go:178
kubevirt.io/containerized-data-importer/pkg/system.ExecWithLimits
	/remote-source/app/pkg/system/prlimit.go:111
kubevirt.io/containerized-data-importer/pkg/image.convertToRaw
	/remote-source/app/pkg/image/qemu.go:128
kubevirt.io/containerized-data-importer/pkg/image.(*qemuOperations).ConvertToRawStream
	/remote-source/app/pkg/image/qemu.go:146
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).convert
	/remote-source/app/pkg/importer/data-processor.go:303
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6
	/remote-source/app/pkg/importer/data-processor.go:240
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	/remote-source/app/pkg/importer/data-processor.go:275
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	/remote-source/app/pkg/importer/data-processor.go:184
main.handleImport
	/remote-source/app/cmd/cdi-importer/importer.go:177
main.main
	/remote-source/app/cmd/cdi-importer/importer.go:143
runtime.main
	/usr/lib/golang/src/runtime/proc.go:250
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1594
could not convert image to raw 
kubevirt.io/containerized-data-importer/pkg/image.convertToRaw
	/remote-source/app/pkg/image/qemu.go:136
kubevirt.io/containerized-data-importer/pkg/image.(*qemuOperations).ConvertToRawStream
	/remote-source/app/pkg/image/qemu.go:146
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).convert
	/remote-source/app/pkg/importer/data-processor.go:303
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6
	/remote-source/app/pkg/importer/data-processor.go:240
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	/remote-source/app/pkg/importer/data-processor.go:275
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	/remote-source/app/pkg/importer/data-processor.go:184
main.handleImport
	/remote-source/app/cmd/cdi-importer/importer.go:177
main.main
	/remote-source/app/cmd/cdi-importer/importer.go:143
runtime.main
	/usr/lib/golang/src/runtime/proc.go:250
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1594
Conversion to Raw failed
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).convert
	/remote-source/app/pkg/importer/data-processor.go:305
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6
	/remote-source/app/pkg/importer/data-processor.go:240
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	/remote-source/app/pkg/importer/data-processor.go:275
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	/remote-source/app/pkg/importer/data-processor.go:184
main.handleImport
	/remote-source/app/cmd/cdi-importer/importer.go:177
main.main
	/remote-source/app/cmd/cdi-importer/importer.go:143
runtime.main
	/usr/lib/golang/src/runtime/proc.go:250
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1594
Unable to convert source data to target format
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).initDefaultPhases.func6
	/remote-source/app/pkg/importer/data-processor.go:242
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessDataWithPause
	/remote-source/app/pkg/importer/data-processor.go:275
kubevirt.io/containerized-data-importer/pkg/importer.(*DataProcessor).ProcessData
	/remote-source/app/pkg/importer/data-processor.go:184
main.handleImport
	/remote-source/app/cmd/cdi-importer/importer.go:177
main.main
	/remote-source/app/cmd/cdi-importer/importer.go:143
runtime.main
	/usr/lib/golang/src/runtime/proc.go:250
runtime.goexit
	/usr/lib/golang/src/runtime/asm_amd64.s:1594


$oc get dv -oyaml
      message: 'Unable to process data: Unable to convert source data to target format:
        Conversion to Raw failed: could not convert image to raw : qemu-img execution
        failed: exit

Comment 1 Alex Kalenyuk 2023-02-23 11:43:38 UTC
Looking at the cluster where it initially occurred, it seems that fsGroup chmod didn't happen for this particular PVC on the node path
so that would explain the permission error.

In the node, there are a bunch of errors about the ceph backing block pvc:
Error: UnmapDevice.TearDownDevice failed for volume "pvc-1c059266-431a-423e-820f-94efd0116410" (UniqueName: "kubernetes.io/csi/openshift-storage.rbd.csi.ceph.com^0001-0011-openshift-storage-0000000000000001-6fb3d41e-5da1-489c-a05f-170ee3ad0be1") on node "cnv-qe-infra-18.cnvqe2.lab.eng.rdu2.redhat.com" : kubernetes.io/csi: blockMapper.unstageVolumeForBlock failed: rpc error: code = Internal desc = rbd: unmap for spec (ocs-storagecluster-cephblockpool/csi-vol-6fb3d41e-5da1-489c-a05f-170ee3ad0be1) failed (an error (exit status 16) occurred while running rbd args: [unmap ocs-storagecluster-cephblockpool/csi-vol-6fb3d41e-5da1-489c-a05f-170ee3ad0be1 --device-type krbd --options noudev]): (rbd: sysfs write failed
Feb 22 13:19:32 cnv-qe-infra-18 kubenswrapper[7753]: rbd: unmap failed: (16) Device or resource busy
Feb 22 13:19:32 cnv-qe-infra-18 kubenswrapper[7753]: )

So this leads me to think we have some rbd issue here and not an HPP one
some digging yielded https://lore.kernel.org/all/20220923035826.GA1830185@onthe.net.au/T/

Comment 3 Jenia Peimer 2023-05-10 07:41:41 UTC
I'm also seeing this on bm04 in 4.13 with hostpath-csi-basic, which is local storage, NOT backed by Ceph PVC.

Comment 5 Jenia Peimer 2023-05-15 14:09:40 UTC
We see this failure on hostpath-csi-basic on PSI cluster as well (local storage not backed by hpp pool)
Continue to work with Alex on tackling the issue

Comment 6 Jenia Peimer 2023-05-17 08:45:00 UTC
So the flow was:
1. Create a DV on HPP storage, which is WaitForFirstConsumer
2. Create a first consumer pod with fsGroup 2000
3. The importer pod gets created right away by CDI, its fsGroup is 107

akalenyu increased a kubelet log verbosity and found in the logs that the 2000 fsGroup gets applied after 107, and that's why the importer pod doesn't have the right permissions:

   oc debug node/<node_name>

   sh-5.1# journalctl -u kubelet --no-pager | grep 'Perform recursive ownership change for directory" path="/var/lib/kubelet/pods/6e865590-e8d2-4c6d-8322-53b984ecc6af/volumes/kubernetes.io~csi/pvc-32afa320-9a92-4b1b-b409-724b20067dd4/mou' -A300 -B 300

   May 15 16:37:12 c01-jp413-h-4vqdp-worker-0-fnw9h kubenswrapper[3809]: I0515 16:37:12.468991    3809 csi_mounter.go:344] kubernetes.io/csi: mounter.SetupAt fsGroup [107] applied successfully to pvc-32afa320-9a92-4b1b-b409-724b20067dd4
   ...
   May 15 16:37:12 c01-jp413-h-4vqdp-worker-0-fnw9h kubenswrapper[3809]: I0515 16:37:12.563296    3809 csi_mounter.go:344] kubernetes.io/csi: mounter.SetupAt fsGroup [2000] applied successfully to pvc-32afa320-9a92-4b1b-b409-724b20067dd4

It is worth opening a k8s issue.

From our side - considering that the virt-launcher pod has fsGroup 107, and it is most likely to be the first consumer for a DV, we will change the first consumer pod's fsGroup to 107 in our automation.

Comment 8 Jenia Peimer 2023-05-18 11:00:17 UTC
*** Bug 2192346 has been marked as a duplicate of this bug. ***

Comment 9 Jenia Peimer 2023-05-30 14:22:01 UTC
Changing the first consumer pod's fsGroup to 107 fixes the issue.
But we still need to write a reproducer and open an issue for k8s.


Note You need to log in before you can comment on or make changes to this bug.