Description of problem: This is happening on OCP 4.7.0-0.nightly-2020-12-04-013308 IPI installed cluster on AWS. When you deploy the Node Feature Discovery (NFD) operator from OperatorHub in a custom-namespace, the operator goes to CrashLoopBackOff state and failed to deploy successfully. Our automated install (Flexy) creates a new catalgesource qe-app-registry for the latest NFD images. The operatore image it was trying to deploy was: 4.7.0-202012082225.p0. # oc get pods -n test-nfd NAME READY STATUS RESTARTS AGE nfd-operator-fbc5d5dc-hdvrg 0/1 CrashLoopBackOff 20 82m # oc describe -n test-nfd pod/nfd-operator-fbc5d5dc-hdvrg Name: nfd-operator-fbc5d5dc-hdvrg Namespace: test-nfd Priority: 0 Node: ip-10-0-186-113.us-east-2.compute.internal/10.0.186.113 Start Time: Wed, 09 Dec 2020 13:50:15 +0000 Labels: name=nfd-operator pod-template-hash=fbc5d5dc Annotations: alm-examples: [ { "apiVersion": "nfd.openshift.io/v1alpha1", "kind": "NodeFeatureDiscovery", "metadata": { "name": "nfd-master-server" }, "spec": { "namespace": "openshift-nfd" } } ] capabilities: Basic Install categories: Database certified: false containerImage: createdAt: 2019-05-30T00:00:00Z description: This software enables node feature discovery for Kubernetes. It detects hardware features available on each node in a Kubernetes cluster, ... k8s.v1.cni.cncf.io/network-status: [{ "name": "", "interface": "eth0", "ips": [ "10.129.0.78" ], "default": true, "dns": {} }] k8s.v1.cni.cncf.io/networks-status: [{ "name": "", "interface": "eth0", "ips": [ "10.129.0.78" ], "default": true, "dns": {} }] olm.operatorGroup: test-nfd-7l7ls olm.operatorNamespace: test-nfd olm.skipRange: >=4.2.0 <4.8.0 olm.targetNamespaces: test-nfd openshift.io/scc: anyuid operatorframework.io/properties: {"properties":[{"type":"olm.gvk","value":{"group":"nfd.openshift.io","kind":"NodeFeatureDiscovery","version":"v1alpha1"}},{"type":"olm.pac... provider: Red Hat repository: https://github.com/openshift/cluster-nfd-operator support: Red Hat Status: Running IP: 10.129.0.78 IPs: IP: 10.129.0.78 Controlled By: ReplicaSet/nfd-operator-fbc5d5dc Containers: nfd-operator: Container ID: cri-o://a944b5addb6ebd2b3b72bc4aebb7fc747db1d4c2a9c8d961755e02be116fa7d9 Image: registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:9561a4097697224b326a7e604d6eec80fcae14c1c3d57c14280e3ecee8f37741 Image ID: registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:40f4a5acdccde5f58884d1e319e7fa7ece13c2f66b42671b1667e77096f5619e Port: 60000/TCP Host Port: 0/TCP Command: cluster-nfd-operator State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 1 Started: Wed, 09 Dec 2020 15:11:52 +0000 Finished: Wed, 09 Dec 2020 15:11:58 +0000 Ready: False Restart Count: 20 Readiness: exec [stat /tmp/operator-sdk-ready] delay=4s timeout=1s period=10s #success=1 #failure=1 Environment: WATCH_NAMESPACE: (v1:metadata.annotations['olm.targetNamespaces']) POD_NAME: nfd-operator-fbc5d5dc-hdvrg (v1:metadata.name) OPERATOR_NAME: cluster-nfd-operator NODE_FEATURE_DISCOVERY_IMAGE: registry.redhat.io/openshift4/ose-node-feature-discovery@sha256:ebfaae813ecf890ced0a517649c02759454ca82b992e4e07111af464c3b3f30c Mounts: /tmp from tmp (rw) /var/run/secrets/kubernetes.io/serviceaccount from nfd-operator-token-qs9wb (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: tmp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> nfd-operator-token-qs9wb: Type: Secret (a volume populated by a Secret) SecretName: nfd-operator-token-qs9wb Optional: false QoS Class: BestEffort Node-Selectors: node-role.kubernetes.io/master= Tolerations: node-role.kubernetes.io/master:NoSchedule node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> Successfully assigned test-nfd/nfd-operator-fbc5d5dc-hdvrg to ip-10-0-186-113.us-east-2.compute.internal Normal AddedInterface 84m multus Add eth0 [10.129.0.78/23] Normal Pulled 84m kubelet, ip-10-0-186-113.us-east-2.compute.internal Successfully pulled image "registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:9561a4097697224b326a7e604d6eec80fcae14c1c3d57c14280e3ecee8f37741" in 16.593631214s Normal Pulled 84m kubelet, ip-10-0-186-113.us-east-2.compute.internal Successfully pulled image "registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:9561a4097697224b326a7e604d6eec80fcae14c1c3d57c14280e3ecee8f37741" in 4.07115016s Normal Pulled 83m kubelet, ip-10-0-186-113.us-east-2.compute.internal Successfully pulled image "registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:9561a4097697224b326a7e604d6eec80fcae14c1c3d57c14280e3ecee8f37741" in 3.892592921s Normal Started 83m (x4 over 84m) kubelet, ip-10-0-186-113.us-east-2.compute.internal Started container nfd-operator Normal Pulled 83m kubelet, ip-10-0-186-113.us-east-2.compute.internal Successfully pulled image "registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:9561a4097697224b326a7e604d6eec80fcae14c1c3d57c14280e3ecee8f37741" in 4.400690076s Normal Pulled 82m kubelet, ip-10-0-186-113.us-east-2.compute.internal Successfully pulled image "registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:9561a4097697224b326a7e604d6eec80fcae14c1c3d57c14280e3ecee8f37741" in 4.069757743s Normal Created 82m (x5 over 84m) kubelet, ip-10-0-186-113.us-east-2.compute.internal Created container nfd-operator Normal Pulling 24m (x17 over 84m) kubelet, ip-10-0-186-113.us-east-2.compute.internal Pulling image "registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:9561a4097697224b326a7e604d6eec80fcae14c1c3d57c14280e3ecee8f37741" Warning BackOff 4m26s (x350 over 83m) kubelet, ip-10-0-186-113.us-east-2.compute.internal Back-off restarting failed container # oc logs -n test-nfd nfd-operator-fbc5d5dc-hdvrg {"level":"info","ts":1607522234.9432366,"logger":"cmd","msg":"Go Version: go1.15.2"} {"level":"info","ts":1607522234.9432628,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1607522234.9432683,"logger":"cmd","msg":"Version of operator-sdk: v0.4.0+git"} {"level":"info","ts":1607522234.943933,"logger":"leader","msg":"Trying to become the leader."} {"level":"info","ts":1607522235.0183256,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."} {"level":"info","ts":1607522235.0199258,"logger":"leader","msg":"Continuing as the leader."} I1209 13:57:16.070410 1 request.go:645] Throttling request took 1.044642767s, request: GET:https://172.30.0.1:443/apis/tuned.openshift.io/v1?timeout=32s {"level":"info","ts":1607522237.5251663,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"} {"level":"info","ts":1607522237.5255368,"logger":"cmd","msg":"Registering Components."} {"level":"info","ts":1607522237.5260978,"logger":"cmd","msg":"Starting the Cmd."} {"level":"info","ts":1607522237.5262556,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"} {"level":"info","ts":1607522237.5263515,"logger":"controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="} {"level":"error","ts":1607522241.5229845,"logger":"controller-runtime.source","msg":"if kind is a CRD, it should be installed before calling Start","kind":"NodeFeatureDiscovery.nfd.openshift.io","error":"no matches for kind \"NodeFeatureDiscovery\" in version \"nfd.openshift.io/v1\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/cluster-nfd-operator/vendor/github.com/go-logr/zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start\n\t/go/src/github.com/openshift/cluster-nfd-operator/vendor/sigs.k8s.io/controller-runtime/pkg/source/source.go:117\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1\n\t/go/src/github.com/openshift/cluster-nfd-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:143\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start\n\t/go/src/github.com/openshift/cluster-nfd-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:184\nsigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startRunnable.func1\n\t/go/src/github.com/openshift/cluster-nfd-operator/vendor/sigs.k8s.io/controller-runtime/pkg/manager/internal.go:661"} {"level":"error","ts":1607522241.5231156,"logger":"cmd","msg":"Manager exited non-zero","error":"no matches for kind \"NodeFeatureDiscovery\" in version \"nfd.openshift.io/v1\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/cluster-nfd-operator/vendor/github.com/go-logr/zapr/zapr.go:132\nmain.main\n\t/go/src/github.com/openshift/cluster-nfd-operator/cmd/manager/main.go:106\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:204"} # oc get catalogsource -A NAMESPACE NAME DISPLAY TYPE PUBLISHER AGE openshift-marketplace certified-operators Certified Operators grpc Red Hat 68m openshift-marketplace community-operators Community Operators grpc Red Hat 68m openshift-marketplace qe-app-registry Production Operators grpc OpenShift QE 11h openshift-marketplace redhat-marketplace Red Hat Marketplace grpc Red Hat 68m openshift-marketplace redhat-operators Red Hat Operators grpc Red Hat 68m # oc get packagemanifest -l catalog=qe-app-registry NAME CATALOG AGE cluster-kube-descheduler-operator Production Operators 11h local-storage-operator Production Operators 11h compliance-operator Production Operators 11h ptp-operator Production Operators 11h elasticsearch-operator Production Operators 11h amq-streams Production Operators 11h cluster-logging Production Operators 11h nfd Production Operators 11h metering-ocp Production Operators 11h sriov-network-operator Production Operators 11h Version-Release number of selected component (if applicable): Server Version: 4.7.0-0.nightly-2020-12-04-013308 Kubernetes Version: v1.19.2+ad738ba How reproducible: Every time. Steps to Reproduce: 1. Create an IPI OCP 4.7 cluster on AWS, 3 master and 3 worker nodes. Use our Flexy automation which creates a catalogesource for qe-app-registry to pull latest NFD images. 2. From Console create a new project called "test-nfd" 3. From Console, Operator -> OperatorHub, search for NFD operator and install it in test-nfd namespace just created - Update Channel: 4.7, Approval Strategy Automatic 4. Wait for operator status to show Succeeded 5. `oc get pods -n test-nfd` should shopw the nfd operator Running 6. Normally you create an instance on Node Feature Discovery but at this stage since the operator failed to deploy, I had to stop here Actual results: nfd-operator-fbc5d5dc-hdvrg 0/1 CrashLoopBackOff 20 82m Expected results: nfd operator running, and Status on console should show Succeeded Additional info:
NFD operator can now be deployed successfully in openshift-nfd namespace, from image built from https://github.com/openshift/cluster-nfd-operator.git master repo: OCP version: 4.7.0-0.nightly-2020-12-14-165231 git clone https://github.com/openshift/cluster-nfd-operator.git cd cluster-nfd-operator ORG=wabouham PULLPOLICY=Always make local-image ORG=wabouham PULLPOLICY=Always make local-image-push ORG=wabouham PULLPOLICY=Always make deploy Waiting on https://bugzilla.redhat.com/show_bug.cgi?id=1908492 to get resolved before we can test deploying it from OperatorHub
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 extras and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5635