Bug 1999571

Summary: NFS clone not progressing when clone sizes mismatch (target > source)
Product: Container Native Virtualization (CNV) Reporter: Alex Kalenyuk <akalenyu>
Component: StorageAssignee: Michael Henriksen <mhenriks>
Status: CLOSED ERRATA QA Contact: Jenia Peimer <jpeimer>
Severity: high Docs Contact:
Priority: high    
Version: 4.9.0CC: alitke, cnv-qe-bugs, mhenriks, pelauter, yadu, ycui
Target Milestone: ---Keywords: Regression
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v4.9.0-214 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-02 16:00:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alex Kalenyuk 2021-08-31 11:16:56 UTC
Description of problem:
When cloning an nfs DV with a target size that is bigger than the source, clone does not progress

Version-Release number of selected component (if applicable):
CNV 4.9.0

How reproducible:
100%

Steps to Reproduce:
1. Clone nfs DV (manifests below)

Actual results:
NAMESPACE                            NAME        PHASE       PROGRESS   RESTARTS   AGE
default                              dv-target                                     3s

"level":"error","ts":1630407986.6950176,"logger":"controller-runtime.manager.controller.datavolume-controller","msg":"Reconciler error","name":"dv-target","namespace":"default","error":"source/target sizes not compatible"

Expected results:
DV Succeeded

Additional info:
- In 4.8 this operation succeeds
- Are we ok with looping over https://github.com/kubevirt/containerized-data-importer/blob/main/pkg/controller/datavolume-controller.go#L1558-L1560 indefinitely (it's part of advancedClonePossible check)? shouldn't we just fail here and go to host assisted?
- The only indication of error is in cdi-deployment logs
- When reproducing, keep in mind that the nfs pvc.Status.Capacity can be bigger than your request in the DV

[cnv-qe-jenkins@alex490-143-szbxl-executor clone-nfs-loop]$ cat dv_source.yaml 
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: dv-source
  namespace: openshift-virtualization-os-images
spec:
  source:
      http:
         url: "http://.../rhel-84.qcow2"
  pvc:
    storageClassName: nfs
    namespace: default
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 25Gi
[cnv-qe-jenkins@alex490-143-szbxl-executor clone-nfs-loop]$ cat dv_target.yaml 
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata: 
  name: dv-target
  namespace: default
spec: 
  pvc:
    storageClassName: nfs
    accessModes: 
      - ReadWriteOnce
    resources: 
      requests: 
        storage: 26Gi
  source: 
    pvc: 
      name: dv-source
      namespace: openshift-virtualization-os-images

Comment 1 Adam Litke 2021-09-14 16:28:28 UTC
I am going to propose blocker+ for this bug.  Michael, do we have a PR or plan to fix this?  Could you attach?

Comment 2 Yan Du 2021-09-27 10:25:32 UTC
Test on latest CNV - CNV-v4.9.0-220, issue has been fixed

$ oc get dv
NAME        PHASE             PROGRESS   RESTARTS   AGE
dv-target   CloneInProgress   0.00%                 25s
$ oc get dv
NAME        PHASE       PROGRESS   RESTARTS   AGE
dv-target   Succeeded   100.0%                6m

Comment 5 errata-xmlrpc 2021-11-02 16:00:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.9.0 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4104