Bug 1802120 - [CNV-2.3] CDI operator is failing to complete deployment
Summary: [CNV-2.3] CDI operator is failing to complete deployment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 2.3.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 2.3.0
Assignee: Simone Tiraboschi
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-12 11:52 UTC by Lukas Bednar
Modified: 2020-05-04 19:10 UTC (History)
6 users (show)

Fixed In Version: hco-bundle-registry-container-v2.2.0-321
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 19:10:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
full cdi-operator log (95.72 KB, text/plain)
2020-02-12 11:52 UTC, Lukas Bednar
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2020:2011 0 None None None 2020-05-04 19:10:49 UTC

Description Lukas Bednar 2020-02-12 11:52:45 UTC
Created attachment 1662654 [details]
full cdi-operator log

Description of problem:

I see following error in CDI operator

{"level":"error","ts":1581507990.0780861,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"cdi-operator-controller","request":"/cdi-kubevirt-hyperconverged","error":"ClusterRole.rbac.authorization.k8s.io \"cdi-apiserver\" is invalid: [metadata.labels: Invalid value: \"sha256:2ca28952fb8da452da9fefc2db9a22bd813c9831ac5e93cd6f5489bbc795b32f\": must be no more than 63 characters, metadata.labels: Invalid value: \"sha256:2ca28952fb8da452da9fefc2db9a22bd813c9831ac5e93cd6f5489bbc795b32f\": a valid label must be an empty string or consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyValue',  or 'my_value',  or '12345', regex used for validation is '(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?')]","stacktrace":"kubevirt.io/containerized-data-importer/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/github.com/go-logr/zapr/zapr.go:128\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:258\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:232\nkubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211\nkubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\nkubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\nkubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/kubevirt.io/containerized-data-importer/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}


Version-Release number of selected component (if applicable):
virt-cdi-operator-container-v2.3.0-31


How reproducible: 100


Steps to Reproduce:
1. Deploy cnv-2.3 from rh-osbs-operators
2.
3.

Actual results: Failing to deploy CDI


Expected results: CDI is ready


Additional info:

Comment 1 Alexander Wels 2020-02-12 13:49:24 UTC
I did some digging, and it turns out we are passing the sha of the container image as part of the operator deployment. Here is the example from upstream:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cdi-operator
  namespace: cdi
spec:
  replicas: 1
  selector:
    matchLabels:
      name: cdi-operator
      operator.cdi.kubevirt.io: ""
  strategy: {}
  template:
    metadata:
      labels:
        name: cdi-operator
        operator.cdi.kubevirt.io: ""
    spec:
      containers:
      - env:
        - name: DEPLOY_CLUSTER_RESOURCES
          value: "true"
        - name: OPERATOR_VERSION
          value: v1.12.0  <---------------- this is the one that contains the sha, and it shouldn't. I see how that happened as the rest of values are the container images, and the tags are the same. However for CNV it should be v2.3.x
        - name: CONTROLLER_IMAGE
          value: kubevirt/cdi-controller:v1.12.0
        - name: IMPORTER_IMAGE
          value: kubevirt/cdi-importer:v1.12.0
        - name: CLONER_IMAGE
          value: kubevirt/cdi-cloner:v1.12.0
        - name: APISERVER_IMAGE
          value: kubevirt/cdi-apiserver:v1.12.0
        - name: UPLOAD_SERVER_IMAGE
          value: kubevirt/cdi-uploadserver:v1.12.0
        - name: UPLOAD_PROXY_IMAGE
          value: kubevirt/cdi-uploadproxy:v1.12.0
        - name: VERBOSITY
          value: "1"
        - name: PULL_POLICY
          value: IfNotPresent
        image: kubevirt/cdi-operator:v1.12.0
        imagePullPolicy: IfNotPresent
        name: cdi-operator
        ports:
        - containerPort: 60000
          name: metrics
          protocol: TCP
        resources: {}
      securityContext:
        runAsNonRoot: true
      serviceAccountName: cdi-operator

Comment 3 Simone Tiraboschi 2020-02-12 14:33:41 UTC
(In reply to Alexander Wels from comment #1)
> I did some digging, and it turns out we are passing the sha of the container
> image as part of the operator deployment. Here is the example from upstream:

Yes, correct: we are supposed to use only sha digests and not plain tags.
This is required for the mirroring mechanism for disconnected install since OCP 4.3.1.

A really close issue has been already fixed on CNAO side, see: https://github.com/kubevirt/cluster-network-addons-operator/pull/288

Comment 5 Alexander Wels 2020-02-12 14:38:33 UTC
So this particular variable tells us which version the operator is. it needs to be semver compatible and simply cannot contain shas. It has NOTHING to do container images, its the version of the operator. We don't release CNV version sha256:something do we, it should be the version of CNV.

Comment 6 Simone Tiraboschi 2020-02-12 15:00:06 UTC
Ok,
in that case I can easily fix it passing the CNV version to the CSV generator.

Comment 7 Adam Litke 2020-02-12 15:52:14 UTC
Simone, can you please link the PR that fixes this into the bug?

Comment 10 Lukas Bednar 2020-02-13 11:07:56 UTC
I can confirm that CDI was deployed and CDI operator doesn't contain any error in logs.

cdi-apiserver-6cbb564b9c-tt2nx                        1/1     Running             0          73s
cdi-deployment-6d78f4db86-cx54m                       1/1     Running             0          72s
cdi-operator-68466ff7f8-js8b5                         1/1     Running             0          2m8s
cdi-uploadproxy-675d59b8b-kw8gm                       1/1     Running             0          73s


    - name: DEPLOY_CLUSTER_RESOURCES
      value: "true"
    - name: OPERATOR_VERSION
      value: v2.3.0
    - name: CONTROLLER_IMAGE
      value: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-cdi-controller@sha256:7b37b5b3b664839f448b300f1293b3d1d5a1d8d8aa1c23d129917dbc40985e2b
    - name: IMPORTER_IMAGE
      value: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-cdi-importer@sha256:f7bd104d73331a4338075311030083b81a6fa9f697cf15280f35bc50c8a73d11
    - name: CLONER_IMAGE
      value: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-cdi-cloner@sha256:f28a7cf083a7f81bb6f538cb2415b5369f5280f5b12ea30c8764f9a8bd05ba67
    - name: APISERVER_IMAGE
      value: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-cdi-apiserver@sha256:efc69f852668ac9535718b68f59a9a89955834c973699cc61fe364fb4e7e7565
    - name: UPLOAD_SERVER_IMAGE
      value: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-cdi-uploadserver@sha256:ab094d96049cd3cbe42644f8022c543d9ac7aa584f82ba23587045d661a02f25
    - name: UPLOAD_PROXY_IMAGE
      value: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-cdi-uploadproxy@sha256:ddb4db6080442d0b87d7ed537f9b804e82f815ff59b33626709354dcfcabbfc0
    - name: VERBOSITY
      value: "1"
    - name: PULL_POLICY
      value: IfNotPresent
    image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-cdi-operator@sha256:2ca28952fb8da452da9fefc2db9a22bd813c9831ac5e93cd6f5489bbc795b32f

Comment 11 Alexander Wels 2020-03-03 13:50:17 UTC
So the problem seems to manifest itself when CNV is deployed in the openshift-operators namespace. Now that that is possible is a separate issue and being fixed somewhere else, but to verify the problem is fixed we should test that it still works even if we deploy in openshift-operators.

Comment 14 errata-xmlrpc 2020-05-04 19:10:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2011


Note You need to log in before you can comment on or make changes to this bug.