Bug 1967086 - Cloning DataVolumes between namespaces fails while creating cdi-upload pod
Summary: Cloning DataVolumes between namespaces fails while creating cdi-upload pod
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 2.5.5
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
: 2.6.6
Assignee: Alexander Wels
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On:
Blocks: 1982269
TreeView+ depends on / blocked
 
Reported: 2021-06-02 12:26 UTC by nijin ashok
Modified: 2024-12-20 20:10 UTC (History)
5 users (show)

Fixed In Version: v2.6.6-37 registry-proxy.engineering.redhat.com/rh-osbs/iib:89865
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1982269 (view as bug list)
Environment:
Last Closed: 2021-08-10 17:33:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt containerized-data-importer pull 1842 0 None closed Set some reasonable requests/limits for workloads 2021-07-07 12:28:46 UTC
Github kubevirt containerized-data-importer pull 1855 0 None closed [release-v1.28] Set some reasonable requests/limits for workloads 2021-07-14 11:43:27 UTC
Red Hat Issue Tracker CNV-12262 0 None None None 2022-12-14 02:32:40 UTC
Red Hat Product Errata RHSA-2021:3119 0 None None None 2021-08-10 17:34:27 UTC

Internal Links: 2003652

Description nijin ashok 2021-06-02 12:26:13 UTC
Description of problem:

While cloning data volume between namespaces, the cloning is scheduled but never starts.

$ oc get dvs
NAME                   PHASE            PROGRESS   RESTARTS   AGE
dv-tests-cloning-001   CloneScheduled   N/A                   30s

The PVC status is "bound".

$ oc get pvc
NAME                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                           AGE
dv-tests-cloning-001   Bound    pvc-326a53af-b8f7-4328-a30f-76ec1d21ee21   12Gi       RWO            ocs-external-storagecluster-ceph-rbd   33s

But there is no cdi-upload pod.

The cdi-deployment logs have got error "Pod \"cdi-upload-dv-tests-cloning-001\" is invalid: spec.containers[0].resources.requests: Invalid value: \"1m\": must be less than or equal to cpu limit".

===
{"level":"error","ts":1622451245.2904956,"logger":"controller","msg":"Reconciler error","controller":"upload-controller","name":"dv-tests-cloning-001","namespace":"tests-cloning","error":"Pod \"cdi-upload-dv-tests-cloning-001\" is invalid: spec.containers[0].resources.requests: Invalid value: \"1m\": must be less than or equal to cpu limit","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/github.com/go-logr/zapr/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:237\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:209\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:188\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90"}
===

As per my understanding, the pod.Spec.Containers[0].Resources is obtained from the defaultPodResourceRequirements and it has got default values.

==
$ oc get cdiconfig -o yaml
apiVersion: v1
items:
- apiVersion: cdi.kubevirt.io/v1beta1
  kind: CDIConfig
  status:
    defaultPodResourceRequirements:
      limits:
        cpu: "0"
        memory: "0"
      requests:
        cpu: "0"
        memory: "0"
===

I cannot find a way to see the spec send by the cdi controller to create the pod but it looks like it's sending requests more than that of limits? However, that doesn't make sense since cdiconfig has got default values.

There are no quotas and limits for the namespace.

The permission is also mapped correctly. 

Version-Release number of selected component (if applicable):

2.5.5

How reproducible:

Observed in a customer environment and not reproduced locally.

Steps to Reproduce:

1. Issue is observed when cloning dv between namespaces.

Actual results:

Cloning DataVolumes between namespaces fails while creating cdi-upload pod.

Expected results:

cloning should work

Additional info:

Comment 2 Alexander Wels 2021-06-04 17:31:48 UTC
Can you check if the target namespace has a LimitRange defined?

Comment 3 nijin ashok 2021-06-07 02:53:40 UTC
(In reply to Alexander Wels from comment #2)
> Can you check if the target namespace has a LimitRange defined?

Target doesn't have LimitRange defined.

Comment 9 Alexander Wels 2021-06-28 11:43:55 UTC
So triple checked the code and there is nothing we do that sets the limits or request to anything other than what is specified in the defaultPodSourceRequirements in the CDIConfig object (which is set in the CDI CR). So there must be a mutating webhook somewhere that automatically modifies those values, and the usual suspect is a LimitRange for those fields. However as we saw, the must gather doesn't report anything about a LimitRange, and there is no Cluster wide LimitRange object in Open Shift.

That being said, all 0s is probably not a great default value and after some testing the following values are reasonable defaults, and we created a PR to set them to those values by default if not specified:

CPULimit: 750m (3/4 of a CPU max)
MemLimit: 600M (600M of memory max)
CPURequest 100m (1/10 of a CPU minimum)
MemRequest 60M (60M minimum)

Should be sufficient for most work loads. For a work around you can set those values in the CDI CR, and we can see if that lets them continue testing. The linked PR makes this the default values.

Comment 12 Yan Du 2021-07-07 12:31:23 UTC
Moving back to POST because we haven't modified the release branch yet. please attach the cherrypick PR to this bug.

Comment 25 errata-xmlrpc 2021-08-10 17:33:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.6 Images security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3119


Note You need to log in before you can comment on or make changes to this bug.