Bug 1802744

Summary: OCP 4.3.2 UPI stage - Node Feature Discovery (NFD) nfd-operator fails to deploy from OperatorHub on OpenShift Console
Product: OpenShift Container Platform Reporter: Walid A. <wabouham>
Component: Node Feature Discovery OperatorAssignee: Zvonko Kosic <zkosic>
Status: CLOSED ERRATA QA Contact: Walid A. <wabouham>
Severity: high Docs Contact:
Priority: high    
Version: 4.3.zCC: carangog, ematysek, joncp, mifiedle, sejug
Target Milestone: ---Keywords: TestBlocker
Target Release: 4.3.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-06-17 20:27:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1797678    
Bug Blocks:    

Description Walid A. 2020-02-13 19:23:55 UTC
Description of problem:
This is on OCP UPI stage cluster release 4.3.2:
Release 4.3.2 was created from registry.svc.ci.openshift.org/ocp/release:4.3.0-0.nightly-2020-02-11-224725

Node Feature Discovery (NFD) nfd-operator fails to deploy from OperatorHub on OpenShift Console.
nfd-operator pod is in CrashLoopBackOff

$ oc get pods -n test-nfd
NAME                            READY   STATUS             RESTARTS   AGE
nfd-operator-785c858cdf-shpdp   0/1     CrashLoopBackOff   26         113m

$ oc logs -n test-nfd nfd-operator-785c858cdf-shpdp
{"level":"info","ts":1581620898.744573,"logger":"cmd","msg":"Go Version: go1.12.12"}
{"level":"info","ts":1581620898.7447064,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1581620898.7447116,"logger":"cmd","msg":"Version of operator-sdk: v0.4.0+git"}
{"level":"info","ts":1581620898.7456067,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1581620898.9379377,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1581620898.9380538,"logger":"leader","msg":"Continuing as the leader."}
{"level":"info","ts":1581620899.0455976,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1581620899.0458324,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1581620899.046091,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1581620899.0461957,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1581620899.0463033,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1581620899.0464194,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"error","ts":1581620899.0464404,"logger":"cmd","msg":"","error":"no kind is registered for the type v1.SecurityContextConstraints in scheme \"k8s.io/client-go/kubernetes/scheme/register.go:60\"","stacktrace":"github.com/openshift/cluster-nfd-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/cluster-nfd-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/openshift/cluster-nfd-operator/cmd/manager/main.go:92\nruntime.main\n\t/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/proc.go:200"}

$ oc describe pods -n test-nfd test-nfd nfd-operator-785c858cdf-shpdp
Name:         nfd-operator-785c858cdf-shpdp
Namespace:    test-nfd
Priority:     0
Node:         wduan432-stage1-cz9xq-control-plane-1/10.0.96.89
Start Time:   Thu, 13 Feb 2020 12:18:24 -0500
Labels:       name=nfd-operator
              pod-template-hash=785c858cdf
Annotations:  alm-examples:
                [
                  {
                    "apiVersion": "nfd.openshift.io/v1alpha1",
                    "kind": "NodeFeatureDiscovery",
                    "metadata": {
                      "name": "nfd-master-server"
                    },
                    "spec": {
                      "namespace": "openshift-nfd"
                    }
                  }
                ]
              capabilities: Basic Install
              categories: Database
              certified: false
              containerImage: 
              createdAt: 2019-05-30T00:00:00Z
              description:
                This software enables node feature discovery for Kubernetes. It detects hardware features available on each node in a Kubernetes cluster, ...
              k8s.v1.cni.cncf.io/networks-status:
                [{
                    "name": "openshift-sdn",
                    "interface": "eth0",
                    "ips": [
                        "10.129.0.37"
                    ],
                    "dns": {},
                    "default-route": [
                        "10.129.0.1"
                    ]
                }]
              olm.operatorGroup: test-nfd-z7kpg
              olm.operatorNamespace: test-nfd
              olm.targetNamespaces: test-nfd
              openshift.io/scc: anyuid
              provider: Red Hat
              repository: https://github.com/openshift/cluster-nfd-operator
              support: Red Hat
Status:       Running
IP:           10.129.0.37
IPs:
  IP:           10.129.0.37
Controlled By:  ReplicaSet/nfd-operator-785c858cdf
Containers:
  nfd-operator:
    Container ID:  cri-o://0a5b3ae9c85c6a8b73b7bdbd0a78df4928293383b0b0964d3d69278b19b65ed2
    Image:         registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:6bcae58731a5d854028cfcebbd68f91473032d75dae3f2a11da8c2be0439554d
    Image ID:      registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:0c4511f09904b3d5dc124069ad2f5ba5e19c8bf1221ad6c01d848f1068971c74
    Port:          60000/TCP
    Host Port:     0/TCP
    Command:
      cluster-nfd-operator
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Thu, 13 Feb 2020 14:08:18 -0500
      Finished:     Thu, 13 Feb 2020 14:08:19 -0500
    Ready:          False
    Restart Count:  26
    Readiness:      exec [stat /tmp/operator-sdk-ready] delay=4s timeout=1s period=10s #success=1 #failure=1
    Environment:
      WATCH_NAMESPACE:                (v1:metadata.annotations['olm.targetNamespaces'])
      POD_NAME:                      nfd-operator-785c858cdf-shpdp (v1:metadata.name)
      OPERATOR_NAME:                 cluster-nfd-operator
      NODE_FEATURE_DISCOVERY_IMAGE:  registry.redhat.io/openshift4/ose-node-feature-discovery@sha256:bb3d4fab088b57498ce95f9f712871514cf0b70ded0653c85e16368749d2ecab
    Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from nfd-operator-token-2nkg8 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  nfd-operator-token-2nkg8:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nfd-operator-token-2nkg8
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                    From                                            Message
  ----     ------     ----                   ----                                            -------
  Normal   Scheduled  <unknown>              default-scheduler                               Successfully assigned test-nfd/nfd-operator-785c858cdf-shpdp to wduan432-stage1-cz9xq-control-plane-1
  Normal   Pulled     113m (x4 over 113m)    kubelet, wduan432-stage1-cz9xq-control-plane-1  Successfully pulled image "registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:6bcae58731a5d854028cfcebbd68f91473032d75dae3f2a11da8c2be0439554d"
  Normal   Created    113m (x4 over 113m)    kubelet, wduan432-stage1-cz9xq-control-plane-1  Created container nfd-operator
  Normal   Started    113m (x4 over 113m)    kubelet, wduan432-stage1-cz9xq-control-plane-1  Started container nfd-operator
  Normal   Pulling    112m (x5 over 114m)    kubelet, wduan432-stage1-cz9xq-control-plane-1  Pulling image "registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:6bcae58731a5d854028cfcebbd68f91473032d75dae3f2a11da8c2be0439554d"
  Warning  BackOff    4m3s (x504 over 113m)  kubelet, wduan432-stage1-cz9xq-control-plane-1  Back-off restarting failed container
Error from server (NotFound): pods "test-nfd" not found
MacBook-Pro:oc_clients walid$ 

Version-Release number of selected component (if applicable):

Server Version: 4.3.0-0.nightly-2020-02-11-224725
Kubernetes Version: v1.16.2


How reproducible:
Always

Steps to Reproduce:
1. UPI install of OCP 4.3.2 on openstack, 2 worker nodes and 3 masters
2. From OpenShift console logged in as kubeadmin, with kubeadmin password from install
3. Create namespace called test-nfd
4. Operators -> OperatorHub, search for NFD operator, click install
5. Keep the Update channel value at 4.3, and select test-nfd namespace to install, take all other default selections
6. Create an instance of that operator


Actual results:
NFD operator fails to deploy, is in CrashLoopbackOff state


Expected results:
NFD operator to install successfully


Additional info:
One workaround is to install deploy NFD from GitHub repo.  See:  https://bugzilla.redhat.com/show_bug.cgi?id=1789560

Comment 2 Mike Fiedler 2020-02-20 13:55:02 UTC
QA needs to perform verification

Comment 4 Walid A. 2020-03-05 19:23:11 UTC
Failed QA verification on OCP 4.3.5.  We can deploy NFD from OperatorHub on OCP 4.3.5 but only in the default namespace.  
The nfd-worker pods do not get deployed in the custom namespace.

We are hitting issue described in this BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1808061

Comment 7 Walid A. 2020-05-26 20:12:49 UTC
Deployed successfully on OCP 4.3.32 staging cluster.

Comment 12 errata-xmlrpc 2020-06-17 20:27:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2436