Bug 2237418

Summary: Cloned VM stuck in "WaitingForVolumeBinding" if the source VM have hotplug disk and SC binding mode is WaitForFirstConsumer
Product: Container Native Virtualization (CNV) Reporter: nijin ashok <nashok>
Component: VirtualizationAssignee: Itamar Holder <iholder>
Status: CLOSED MIGRATED QA Contact: Kedar Bidarkar <kbidarka>
Severity: high Docs Contact:
Priority: high    
Version: 4.13.3CC: dholler
Target Milestone: ---   
Target Release: 4.15.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-12-14 16:05:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description nijin ashok 2023-09-05 12:13:13 UTC
Description of problem:

The storageClass got volumeBindingMode as WaitForFirstConsumer:

~~~
oc get sc ceph-dup
NAME       PROVISIONER                          RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
ceph-dup   openshift-storage.rbd.csi.ceph.com   Delete          WaitForFirstConsumer   true                   94m
~~~

The source VM got hotplug volume:

~~~
# oc get vm rhel8-xv62h6xonijl663c-clone -o yaml |yq '.spec.template.spec.volumes'
- dataVolume:
    name: rhel8-xv62h6xonijl663c-volume-clone
  name: rootdisk
- cloudInitNoCloud:
    userData: |-
      #cloud-config
      user: cloud-user
      password: ylb7-w1em-no0h
      chpasswd: { expire: False }
  name: cloudinitdisk
- dataVolume:
    hotpluggable: true                                          <===
    name: rhel8-xv62h6xonijl663c-disk-loyal-llama-volume-clone
  name: disk-loyal-llama
~~~ 

A VM cloned from this VM will end up in `WaitingForVolumeBinding` status:

~~~
# oc get vm rhel8-xv62h6xonijl663c-clone
NAME                           AGE   STATUS                    READY
rhel8-xv62h6xonijl663c-clone   50m   WaitingForVolumeBinding   False
~~~

The clone of hotplug volume's DV is in WaitForFirstConsumer and PVC is in pending:

~~~
# oc get dv
NAME                                                   PHASE                  PROGRESS   RESTARTS   AGE
rhel8-xv62h6xonijl663c-disk-loyal-llama-volume-clone   WaitForFirstConsumer   N/A                   52m

# oc get pvc rhel8-xv62h6xonijl663c-disk-loyal-llama-volume-clone
NAME                                                   STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
rhel8-xv62h6xonijl663c-disk-loyal-llama-volume-clone   Pending                                      ceph-dup       52m
~~~

The temporary virt-launcher pod that will be created for WaitForFirstConsumer DVs will not bound hotplug volumes since the hotplug volumes will not be part of virt-launcher pod.

~~~
# oc get pod virt-launcher-rhel8-xv62h6xonijl663c-clone-n78wb
NAME                                               READY   STATUS      RESTARTS   AGE
virt-launcher-rhel8-xv62h6xonijl663c-clone-n78wb   0/1     Completed   0          64m
~~~


Version-Release number of selected component (if applicable):

OpenShift Virtualization   4.13.3

How reproducible:

100 %

Steps to Reproduce:

1. Create a VM with a disk in a Storage class that has a binding mode as WaitForFirstConsumer.
2. Hotplug a disk to the VM. 
3. Shutdown and clone the VM.
4. Start the cloned VM. It will not start and will be stuck in WaitingForVolumeBinding and DV of hotplug volume in WaitForFirstConsumer.  

Actual results:

Cloned VM stuck in "WaitingForVolumeBinding" if the source VM have hotplug disk and SC binding mode is WaitForFirstConsumer.

Expected results:


Additional info: