Bug 1789560

Summary: OCP 4.3 - Node Feature Discovery (NFD) nfd-operator fails to deploy from CLI and github repo
Product: OpenShift Container Platform Reporter: Walid A. <wabouham>
Component: Node Feature Discovery OperatorAssignee: Zvonko Kosic <zkosic>
Status: CLOSED ERRATA QA Contact: Walid A. <wabouham>
Severity: high Docs Contact:
Priority: high    
Version: 4.3.0CC: carangog, egallen, ematysek, joncp, mifiedle, scuppett, sejug, zkosic
Target Milestone: ---Keywords: Regression, TestBlocker
Target Release: 4.3.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1797678 (view as bug list) Environment:
Last Closed: 2020-05-04 11:23:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1797678, 1809806    

Description Walid A. 2020-01-09 19:45:06 UTC
Description of problem:
The Node Feature Discovery nfd-operator fails to deploy from CLI after git cloning release-4.3 branch from https://github.com/openshift/cluster-nfd-operator.git repo, and running make deploy.

This was observed on IPI installs on AWS of 4.3.0-0.nightly-2020-01-06-185654 and also on IPI GCP install of 4.3.0-0.nightly-2020-01-02-081435.

Fom AWS cluster:
root@ip-172-31-40-229: ~ # oc get pods -n openshift-nfd-operator
NAME                            READY   STATUS             RESTARTS   AGE
nfd-operator-66b9dcb9f7-jzfgb   0/1     CrashLoopBackOff   189        16h
root@ip-172-31-40-229: ~ # 
root@ip-172-31-40-229: ~ # oc logs pods -n openshift-nfd-operator nfd-operator-66b9dcb9f7-jzfgb 
Error from server (NotFound): pods "pods" not found
root@ip-172-31-40-229: ~ # oc logs -n openshift-nfd-operator nfd-operator-66b9dcb9f7-jzfgb 
{"level":"info","ts":1578598259.5555265,"logger":"cmd","msg":"Go Version: go1.13.5"}
{"level":"info","ts":1578598259.555549,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1578598259.5555537,"logger":"cmd","msg":"Version of operator-sdk: v0.12.0"}
{"level":"info","ts":1578598259.555944,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1578598261.674846,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1578598261.6748726,"logger":"leader","msg":"Continuing as the leader."}
{"level":"info","ts":1578598263.779333,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1578598263.7795694,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1578598263.7797668,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1578598263.7799351,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1578598263.7802992,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1578598263.7805262,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1578598263.7807121,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"error","ts":1578598263.7807345,"logger":"cmd","msg":"","error":"no kind is registered for the type v1.SecurityContextConstraints in scheme \"k8s.io/client-go/kubernetes/scheme/register.go:65\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/cluster-nfd-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/openshift/cluster-nfd-operator/cmd/manager/main.go:92\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}


Version-Release number of selected component (if applicable):
# oc version
Client Version: openshift-clients-4.3.0-201910250623-70-g0ed83003
Server Version: 4.3.0-0.nightly-2020-01-06-185654
Kubernetes Version: v1.16.2

# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2020-01-06-185654   True        False         2d2h    Cluster version is 4.3.0-0.nightly-2020-01-06-185654


How reproducible:
Always.

Steps to Reproduce:
1.  IPI install of OCP 4.3.0-0.nightly-2020-01-06-185654 on AWS 3 masters and 3 worker nodes m5.xlarge
2.  deploy NFD operator:
   - export GOPATH=/root/go
   - cd $GOPATH/src/github.com/openshift
   - git clone https://github.com/openshift/cluster-nfd-operator.git
   - cd cluster-nfd-operator
   - git checkout release-4.3
   - make deploy
3. oc get pods -n openshift-nfd-operator

Actual results:
 # oc get pods -n openshift-nfd-operator
NAME                            READY   STATUS             RESTARTS   AGE
nfd-operator-66b9dcb9f7-jzfgb   0/1     CrashLoopBackOff   189        16h

Expected results:
nfd-operator running 1/1 in openshift-nfd-operator namespace with nfd pods deployed successfully for each node in openshift-nfd namespace

Additional info:
Links to AWS cluster kubeconfig and oc adm must-gather logs are available in next comment

Comment 2 Stephen Cuppett 2020-01-13 12:29:22 UTC
Moving target release to the active development branch (4.4). For any needed fixes where backports are required/requested, BZ clones will be created targeting those specific z-stream releases.

Comment 7 Stephen Cuppett 2020-01-21 16:03:32 UTC
*** Bug 1793535 has been marked as a duplicate of this bug. ***

Comment 9 Jon 2020-01-29 23:15:58 UTC
Reproduced this with the stock NFD from the OperatorHub on an installer-deployed cluster in Azure.

Operator image:
   registry.redhat.io/openshift4/ose-cluster-nfd-operator@sha256:ff782d4c2a3c9436ea9d9713cd6e360337aebdf8c9d6f02d1c4d07ba305de847
 
Installer version info:
   openshift-install v4.3.0
   built from commit 2055609f95b19322ee6cfdd0bea73399297c4a3e
   release image quay.io/openshift-release-dev/ocp-release@sha256:3a516480dfd68e0f87f702b4d7bdd6f6a0acfdac5cd2e9767b838ceede34d70d

Comment 10 Carlos Eduardo Arango Gutierrez 2020-02-03 16:24:13 UTC
*** Bug 1797678 has been marked as a duplicate of this bug. ***

Comment 11 Carlos Eduardo Arango Gutierrez 2020-02-03 16:43:30 UTC
*** Bug 1797678 has been marked as a duplicate of this bug. ***

Comment 12 Walid A. 2020-02-03 19:56:58 UTC
Returning to assigned, we still have the same issue deploying NFD 4.3 from github and CLI:


cd $GOPATH/src/github.com/openshift
git clone https://github.com/openshift/cluster-nfd-operator.git
cd cluster-nfd-operator
git checkout release-4.3
PULLPOLICY=Always make deploy


# oc get pods -n openshift-nfd-operator
NAME                            READY   STATUS   RESTARTS   AGE
nfd-operator-66b9dcb9f7-vdrss   0/1     Error    2          59s

# oc get events -n openshift-nfd-operator
LAST SEEN   TYPE      REASON              OBJECT                               MESSAGE
<unknown>   Normal    Scheduled           pod/nfd-operator-66b9dcb9f7-vdrss    Successfully assigned openshift-nfd-operator/nfd-operator-66b9dcb9f7-vdrss to ip-10-0-149-159.us-west-2.compute.internal
27s         Normal    Pulling             pod/nfd-operator-66b9dcb9f7-vdrss    Pulling image "quay.io/zvonkok/cluster-nfd-operator:latest"
24s         Normal    Pulled              pod/nfd-operator-66b9dcb9f7-vdrss    Successfully pulled image "quay.io/zvonkok/cluster-nfd-operator:latest"
24s         Normal    Created             pod/nfd-operator-66b9dcb9f7-vdrss    Created container nfd-operator
24s         Normal    Started             pod/nfd-operator-66b9dcb9f7-vdrss    Started container nfd-operator
4s          Warning   BackOff             pod/nfd-operator-66b9dcb9f7-vdrss    Back-off restarting failed container
66s         Normal    SuccessfulCreate    replicaset/nfd-operator-66b9dcb9f7   Created pod: nfd-operator-66b9dcb9f7-vdrss
66s         Normal    ScalingReplicaSet   deployment/nfd-operator              Scaled up replica set nfd-operator-66b9dcb9f7 to 1

# oc logs -n openshift-nfd-operator nfd-operator-66b9dcb9f7-vdrss
{"level":"info","ts":1580759597.6013348,"logger":"cmd","msg":"Go Version: go1.13.5"}
{"level":"info","ts":1580759597.6013615,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1580759597.6013656,"logger":"cmd","msg":"Version of operator-sdk: v0.12.0"}
{"level":"info","ts":1580759597.6016638,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1580759599.7664626,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1580759599.7664897,"logger":"leader","msg":"Continuing as the leader."}
{"level":"info","ts":1580759601.9206095,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1580759601.9208539,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1580759601.9210787,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1580759601.9212415,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1580759601.9213212,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1580759601.9213808,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1580759601.9214342,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"error","ts":1580759601.9214447,"logger":"cmd","msg":"","error":"no kind is registered for the type v1.SecurityContextConstraints in scheme \"k8s.io/client-go/kubernetes/scheme/register.go:65\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/cluster-nfd-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/openshift/cluster-nfd-operator/cmd/manager/main.go:92\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}

Comment 13 Walid A. 2020-02-04 13:28:24 UTC
Verified that NFD operator can be deployed form the latest master branch:

cd $GOPATH/src/github.com/openshift/cluster-nfd-operator
git checkout master
git pull 
git reset --hard
PULLPOLICY=Always make deploy
oc get pods -n openshift-nfd-operator

# oc get pods -n openshift-nfd
NAME                           READY   STATUS    RESTARTS   AGE
nfd-master-r2lnf               1/1     Running   0          34m
nfd-master-r9qx2               1/1     Running   0          34m
nfd-master-zshtw               1/1     Running   0          34m
nfd-operator-775b746bb-mctcl   1/1     Running   0          34m
nfd-worker-b7ffn               1/1     Running   2          34m
nfd-worker-pd29g               1/1     Running   2          34m
nfd-worker-qfsvv               1/1     Running   2          34m
nfd-worker-spsrx               1/1     Running   2          34m

Comment 14 Erwan Gallen 2020-02-12 08:39:35 UTC
Bug reproduced on OCP 4.3 with Console Node Feature Discovery installation.

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0     True        False         11h     Cluster version is 4.3.0

$ oc get pods -n openshift-operators
NAME                           READY   STATUS             RESTARTS   AGE
nfd-operator-9bbc65476-vtg67   0/1     CrashLoopBackOff   7          15m

$ oc logs -p nfd-operator-9bbc65476-vtg67 --namespace openshift-operators
{"level":"info","ts":1581436592.3010235,"logger":"cmd","msg":"Go Version: go1.12.12"}
{"level":"info","ts":1581436592.3010638,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1581436592.3010707,"logger":"cmd","msg":"Version of operator-sdk: v0.4.0+git"}
{"level":"info","ts":1581436592.3015237,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1581436592.4333308,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1581436592.4333572,"logger":"leader","msg":"Continuing as the leader."}
{"level":"info","ts":1581436592.5499945,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1581436592.5501857,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1581436592.5502975,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1581436592.5504038,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1581436592.550496,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1581436592.550563,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"error","ts":1581436592.550574,"logger":"cmd","msg":"","error":"no kind is registered for the type v1.SecurityContextConstraints in scheme \"k8s.io/client-go/kubernetes/scheme/register.go:60\"","stacktrace":"github.com/openshift/cluster-nfd-operator/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/cluster-nfd-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/openshift/cluster-nfd-operator/cmd/manager/main.go:92\nruntime.main\n\t/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/proc.go:200"} 


I confirm that this patch solve the issue https://github.com/openshift/cluster-nfd-operator/pull/53:

$ git clone https://github.com/openshift/cluster-nfd-operator
$ cd cluster-nfd-operator
$ sed -i 's/openshift-nfd/openshift-operators/' Makefile
$ make deploy

$ oc get pods -n openshift-operators
NAME                            READY   STATUS    RESTARTS   AGE
nfd-master-9n4vz                1/1     Running   0          23s
nfd-master-cg5kq                1/1     Running   0          23s
nfd-master-lgkrc                1/1     Running   0          24s
nfd-operator-5f47ccf496-sxx82   1/1     Running   0          36s
nfd-worker-p2tft                1/1     Running   2          24s
nfd-worker-pzls4                1/1     Running   2          24s
nfd-worker-rlvd7                1/1     Running   2          24s
nfd-worker-sxbrq                1/1     Running   2          24s

Comment 16 errata-xmlrpc 2020-05-04 11:23:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581