Bug 2180801

Summary: [4.12]cdi.kubevirt.io/storage.bind.immediate.requested is not propagated down to the DataVolume if set on an existing DataImportCronTemplate
Product: Container Native Virtualization (CNV) Reporter: Yan Du <yadu>
Component: StorageAssignee: Arnon Gilboa <agilboa>
Status: CLOSED ERRATA QA Contact: Yan Du <yadu>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.12.0CC: agilboa, awels, mhenriks, stirabos, yadu
Target Milestone: ---   
Target Release: 4.12.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.12.3-1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2166394 Environment:
Last Closed: 2023-05-23 22:31:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2166394    
Bug Blocks:    

Description Yan Du 2023-03-22 11:06:01 UTC
+++ This bug was initially created as a clone of Bug #2166394 +++

Description of problem:

Create a custom dataImportCronTemplate on HCO cr:

  dataImportCronTemplates:
  - metadata:
      name: my-centos-stream8-image-cron
    spec:
      garbageCollect: Outdated
      managedDataSource: my-centos-stream8
      schedule: 0 */12 * * *
      template:
        spec:
          source:
            registry:
              url: docker://quay.io/containerdisks/centos-stream:8
          storage:
            resources:
              requests:
                storage: 10Gi

without setting there:
  - metadata:
      annotations:
        cdi.kubevirt.io/storage.bind.immediate.requested: "true" 

if the storage classes of che cluster are configured with WaitForFirstConsumer,
the PVC for the golden image will never be bound waiting forever in WaitForFirstConsumer.

The workaround for this is explicitly setting:
  - metadata:
      annotations:
        cdi.kubevirt.io/storage.bind.immediate.requested: "true" 
on the dataImportCronTemplate

But this works only if done at the creation of the dataImportCronTemplate while if it's simply added later on it's not going to be propagated down do the DataVolume that is going to be stuck forever as the PVC and as the dataImportCron.

Currently the only option is explicitly deleting the DataVolume,
only in this case it will get recreated with:

 apiVersion: cdi.kubevirt.io/v1beta1
 kind: DataVolume
 metadata:
   annotations:
     cdi.kubevirt.io/storage.bind.immediate.requested: "true"
     cdi.kubevirt.io/storage.deleteAfterCompletion: "true"

and only now the PVC will be finally bound on the storage class with WaitForFirstConsumer.

Version-Release number of selected component (if applicable):
4.12.0

How reproducible:
100%

Steps to Reproduce:
1. deploy a cluster with a single storage class configured with VOLUMEBINDINGMODE=WaitForFirstConsumer
2. add a custom dataImportCronTemplate on HCO without explicitly setting the annotation cdi.kubevirt.io/storage.bind.immediate.requested=true
3. amend the dataImportCronTemplate setting cdi.kubevirt.io/storage.bind.immediate.requested=true

Actual results:
the DiC controller is not going to propagate the annotation down to the DataVolume that is going to be stuck forever in WaitForFirstConsumer

Expected results:
the DiC controller should amend or eventually recreate the DataVolume propagating there the cdi.kubevirt.io/storage.bind.immediate.requested=true annotation so that the PVC could be finally bound.

Additional info:

--- Additional comment from Arnon Gilboa on 2023-03-20 13:23:51 UTC ---

DIC controller should not care about the default storage class of the cluster, and whether it's WaitForFirstConsumer. Once it creates the DV, it won't update it later.
Moreover, even when DIC is not used at all, if you add the annotation to an existing WaitForFirstConsumer DV, the DV controller won't reconcile it.

Options to solve this issue:
(1) DIC controller can add the annotation to the DV on creation if SC is WaitForFirstConsumer
(2) DIC controller can delete and re-create the DV (if WaitForFirstConsumer + DICT updated?)
(3) DIC controller can update the DV, and DV controller should reconcile it

However, not sure it worth the effort and maybe it should just be documented.

--- Additional comment from Alexander Wels on 2023-03-20 14:34:32 UTC ---

In general DVs are immutable, but it is kind of hard to enforce for labels and annotations. The DV controller essentially ignores changes to annotations and labels on DVs but we allow them in case other controllers are interested these. So even if the DiC would propagate the annotation to the DV, the DV controller won't propagate it to the PVC. As you noted an easy work around is deleting the DV and it gets re-created with the right annotations on it. Also in general we are moving towards using populators (work is underway) and not DVs. So I don't know if it is worth investing time into solving this issue when the work around is pretty trivial.

--- Additional comment from Michael Henriksen on 2023-03-20 16:00:21 UTC ---

Why not always set cdi.kubevirt.io/storage.bind.immediate.requested: "true" for DataVolumes created for DataImportCron?

--- Additional comment from Arnon Gilboa on 2023-03-20 16:08:50 UTC ---

Michael, that was ~my suggestion in (1). So it's less clumsy to always set the annotation without checking for WFFC?

--- Additional comment from Michael Henriksen on 2023-03-20 16:33:59 UTC ---

> Michael, that was ~my suggestion in (1). So it's less clumsy to always set the annotation without checking for WFFC?

I think that's the best solution

--- Additional comment from Adam Litke on 2023-03-20 18:56:04 UTC ---

I agree with Michael that we can always request immediate binding for DataImportCrons since the created DV is not intended to be used by a traditional consumer anyway.

Comment 1 Yan Du 2023-04-19 02:47:50 UTC
Test on CNV-v4.12.3-27

The PVC can be Bound after adding custom dataImportCronTemplate on HCO cr without setting storage.bind.immediate.requested

$ oc get hco -n openshift-cnv
------------8<----------------
    dataImportCronTemplates:
    - metadata:
        name: custom-data-import-cron2
      spec:
        managedDataSource: custom-data-source2
        retentionPolicy: None
        schedule: '* * * * *'
        template:
          spec:
            source:
              registry:
                pullMethod: node
                url: docker://quay.io/containerdisks/fedora:latest
            storage:
              resources:
                requests:
                  storage: 10Gi
------------8<-----------------
$ oc get sc hostpath-csi-basic 
NAME                           PROVISIONER                        RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
hostpath-csi-basic (default)   kubevirt.io.hostpath-provisioner   Delete          WaitForFirstConsumer   false                  13h
$ oc get pvc custom-data-source2-56ccabc01cbe -n openshift-virtualization-os-images
NAME                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS         AGE
custom-data-source2-56ccabc01cbe   Bound    pvc-ba201ab4-9a72-4fc5-b953-cdeaf1d658c1   149Gi      RWO            hostpath-csi-basic   3m16s

Comment 7 errata-xmlrpc 2023-05-23 22:31:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 4.12.3 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:3283