1805427 – Unable to deploy NFD from github on release-4.2 branch

Bug 1805427 - Unable to deploy NFD from github on release-4.2 branch

Summary: Unable to deploy NFD from github on release-4.2 branch

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node Feature Discovery Operator
Sub Component:
Version:	4.2.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	4.2.z
Assignee:	Zvonko Kosic
QA Contact:	Eric Matysek
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1805394 (view as bug list)
Depends On:	1797678
Blocks:
TreeView+	depends on / blocked

Reported:	2020-02-20 18:34 UTC by Eric Matysek
Modified:	2020-07-01 16:08 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-07-01 16:08:20 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-nfd-operator pull 77	0	None	closed	[release-4.2] Bug 1805427: Manual cherry-pick from PR #57	2021-02-04 19:21:55 UTC
Red Hat Product Errata	RHBA-2020:2589	0	None	None	None	2020-07-01 16:08:47 UTC

Description Eric Matysek 2020-02-20 18:34:49 UTC

Description of problem:
NFD operator pod goes into error state on 4.2.x cluster when deploying from CLI using release-4.2 branch on github.

Version-Release number of selected component (if applicable):
4.2

How reproducible:
100%

Steps to Reproduce:
1. 4.2.18 cluster
2. Clone https://github.com/openshift/cluster-nfd-operator
3. Checkout release-4.2
4. PULLPOLICY=Always make deploy
5. oc get pods -n openshift-nfd-operator

Actual results:
[ematysek@jump cluster-nfd-operator]$ oc get pods -n openshift-nfd-operator
NAME                           READY   STATUS             RESTARTS   AGE
nfd-operator-b7f4fbff8-ks85l   0/1     CrashLoopBackOff   4          3m40s
[ematysek@jump cluster-nfd-operator]$ 

Expected results:
nfd operator running successfully.

Additional info:
Pod logs:
[ematysek@jump cluster-nfd-operator]$ oc logs nfd-operator-b7f4fbff8-ks85l
{"level":"info","ts":1582223530.2681565,"logger":"cmd","msg":"Go Version: go1.13.5"}
{"level":"info","ts":1582223530.2681808,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1582223530.2681844,"logger":"cmd","msg":"Version of operator-sdk: v0.12.0"}
{"level":"info","ts":1582223530.2684205,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1582223532.286605,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1582223532.2866318,"logger":"leader","msg":"Continuing as the leader."}
{"level":"info","ts":1582223534.2899957,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"info","ts":1582223534.2903976,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1582223534.2910452,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1582223534.2912865,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1582223534.2914093,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1582223534.2915206,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1582223534.2916193,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodefeaturediscovery-controller","source":"kind source: /, Kind="}
{"level":"error","ts":1582223534.2916377,"logger":"cmd","msg":"","error":"no kind is registered for the type v1.SecurityContextConstraints in scheme \"k8s.io/client-go/kubernetes/scheme/register.go:65\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/openshift/cluster-nfd-operator/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/openshift/cluster-nfd-operator/cmd/manager/main.go:92\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203"}
[ematysek@jump cluster-nfd-operator]$

Comment 2 Zvonko Kosic 2020-03-06 14:30:51 UTC

*** Bug 1805394 has been marked as a duplicate of this bug. ***

Comment 5 Walid A. 2020-03-13 13:14:34 UTC

Failed to deploy NFD operator from github https://github.com/openshift/cluster-nfd-operator release-4.2 branch:


Server Version: 4.2.0-0.nightly-2020-03-09-194140
Kubernetes Version: v1.14.6-152-g117ba1f

$ oc get pods -n openshift-nfd
NAME                            READY   STATUS         RESTARTS   AGE
nfd-operator-5d5d64b769-vjcct   0/1     ErrImagePull   0          9s
MacBook-Pro:cluster-nfd-operator walid$ 
MacBook-Pro:cluster-nfd-operator walid$ oc get pods -n openshift-nfd
NAME                            READY   STATUS             RESTARTS   AGE
nfd-operator-5d5d64b769-vjcct   0/1     ImagePullBackOff   0          12s

$ oc get NodeFeatureDiscovery -A
NAMESPACE       NAME                AGE
openshift-nfd   nfd-master-server   25s

$ oc get pods -n openshift-nfd
NAME                            READY   STATUS         RESTARTS   AGE
nfd-operator-5d5d64b769-vjcct   0/1     ErrImagePull   0          45s

$ oc get events -n openshift-nfd
LAST SEEN   TYPE      REASON              OBJECT                               MESSAGE
59s         Normal    Scheduled           pod/nfd-operator-5d5d64b769-vjcct    Successfully assigned openshift-nfd/nfd-operator-5d5d64b769-vjcct to preserve-stage-42-6bscs-control-plane-2
10s         Normal    Pulling             pod/nfd-operator-5d5d64b769-vjcct    Pulling image "quay.io/zvonkok/cluster-nfd-operator:release-4.2"
9s          Warning   Failed              pod/nfd-operator-5d5d64b769-vjcct    Failed to pull image "quay.io/zvonkok/cluster-nfd-operator:release-4.2": rpc error: code = Unknown desc = Error reading manifest release-4.2 in quay.io/zvonkok/cluster-nfd-operator: manifest unknown: manifest unknown
9s          Warning   Failed              pod/nfd-operator-5d5d64b769-vjcct    Error: ErrImagePull
21s         Normal    BackOff             pod/nfd-operator-5d5d64b769-vjcct    Back-off pulling image "quay.io/zvonkok/cluster-nfd-operator:release-4.2"
21s         Warning   Failed              pod/nfd-operator-5d5d64b769-vjcct    Error: ImagePullBackOff
59s         Normal    SuccessfulCreate    replicaset/nfd-operator-5d5d64b769   Created pod: nfd-operator-5d5d64b769-vjcct
59s         Normal    ScalingReplicaSet   deployment/nfd-operator              Scaled up replica set nfd-operator-5d5d64b769 to 1

$ oc describe pods -n openshift-nfd nfd-operator-5d5d64b769-vjcct
Name:           nfd-operator-5d5d64b769-vjcct
Namespace:      openshift-nfd
Priority:       0
Node:           preserve-stage-42-6bscs-control-plane-2/10.0.97.253
Start Time:     Fri, 13 Mar 2020 09:06:40 -0400
Labels:         name=nfd-operator
                pod-template-hash=5d5d64b769
Annotations:    k8s.v1.cni.cncf.io/networks-status:
                  [{
                      "name": "openshift-sdn",
                      "interface": "eth0",
                      "ips": [
                          "10.130.0.81"
                      ],
                      "default": true,
                      "dns": {}
                  }]
                openshift.io/scc: anyuid
Status:         Pending
IP:             10.130.0.81
IPs:            <none>
Controlled By:  ReplicaSet/nfd-operator-5d5d64b769
Containers:
  nfd-operator:
    Container ID:  
    Image:         quay.io/zvonkok/cluster-nfd-operator:release-4.2
    Image ID:      
    Port:          60000/TCP
    Host Port:     0/TCP
    Command:
      cluster-nfd-operator
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Readiness:      exec [stat /tmp/operator-sdk-ready] delay=4s timeout=1s period=10s #success=1 #failure=1
    Environment:
      WATCH_NAMESPACE:               openshift-nfd (v1:metadata.namespace)
      POD_NAME:                      nfd-operator-5d5d64b769-vjcct (v1:metadata.name)
      OPERATOR_NAME:                 cluster-nfd-operator
      NODE_FEATURE_DISCOVERY_IMAGE:  quay.io/zvonkok/node-feature-discovery:v4.2
    Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from nfd-operator-token-gh8v8 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  nfd-operator-token-gh8v8:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nfd-operator-token-gh8v8
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                From                                              Message
  ----     ------     ----               ----                                              -------
  Normal   Scheduled  97s                default-scheduler                                 Successfully assigned openshift-nfd/nfd-operator-5d5d64b769-vjcct to preserve-stage-42-6bscs-control-plane-2
  Normal   Pulling    48s (x3 over 89s)  kubelet, preserve-stage-42-6bscs-control-plane-2  Pulling image "quay.io/zvonkok/cluster-nfd-operator:release-4.2"
  Warning  Failed     47s (x3 over 89s)  kubelet, preserve-stage-42-6bscs-control-plane-2  Failed to pull image "quay.io/zvonkok/cluster-nfd-operator:release-4.2": rpc error: code = Unknown desc = Error reading manifest release-4.2 in quay.io/zvonkok/cluster-nfd-operator: manifest unknown: manifest unknown
  Warning  Failed     47s (x3 over 89s)  kubelet, preserve-stage-42-6bscs-control-plane-2  Error: ErrImagePull
  Normal   BackOff    9s (x6 over 88s)   kubelet, preserve-stage-42-6bscs-control-plane-2  Back-off pulling image "quay.io/zvonkok/cluster-nfd-operator:release-4.2"
  Warning  Failed     9s (x6 over 88s)   kubelet, preserve-stage-42-6bscs-control-plane-2  Error: ImagePullBackOff

Comment 7 Zvonko Kosic 2020-05-26 18:57:41 UTC

This should work now, we have a release-4.2 tag on quay.io available. (sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4)

Comment 8 Paige Rubendall 2020-06-02 16:23:41 UTC

I was successfully able to deploy NFD using the 4.2 release on a 4.2.x cluster

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.34    True        False         71m     Cluster version is 4.2.34

$ oc get pods -n openshift-nfd
NAME                            READY   STATUS    RESTARTS   AGE
nfd-master-5br7b                1/1     Running   0          73s
nfd-master-c6zrd                1/1     Running   0          73s
nfd-master-r2xlb                1/1     Running   0          73s
nfd-operator-5d5d64b769-jshkc   1/1     Running   0          108s
nfd-worker-cfqt8                1/1     Running   2          74s
nfd-worker-mhvl6                1/1     Running   2          74s
nfd-worker-v48t8                1/1     Running   2          74s

$ oc get NodeFeatureDiscovery -A
NAMESPACE       NAME                AGE
openshift-nfd   nfd-master-server   97s

$ oc describe pod/nfd-operator-5d5d64b769-jshkc -n openshift-nfd
Name:           nfd-operator-5d5d64b769-jshkc
Namespace:      openshift-nfd
Priority:       0
Node:           ip-10-0-148-204.us-east-2.compute.internal/10.0.148.204
Start Time:     Tue, 02 Jun 2020 16:14:38 +0000
Labels:         name=nfd-operator
                pod-template-hash=5d5d64b769
Annotations:    k8s.v1.cni.cncf.io/networks-status:
                  [{
                      "name": "openshift-sdn",
                      "interface": "eth0",
                      "ips": [
                          "10.128.0.39"
                      ],
                      "default": true,
                      "dns": {}
                  }]
                openshift.io/scc: anyuid
Status:         Running
IP:             10.128.0.39
IPs:            <none>
Controlled By:  ReplicaSet/nfd-operator-5d5d64b769
Containers:
  nfd-operator:
    Container ID:  cri-o://536fd8a2e03afa24c2e4bc911fb8ae456b9fa31b70135dc12b06bd7bc11546b6
    Image:         quay.io/zvonkok/cluster-nfd-operator:release-4.2
    Image ID:      quay.io/zvonkok/cluster-nfd-operator@sha256:d54590cfb50c26c813ffd9d02dc157a725d1bf8fb75f748730dd8ae29a5c5476
    Port:          60000/TCP
    Host Port:     0/TCP
    Command:
      cluster-nfd-operator
    State:          Running
      Started:      Tue, 02 Jun 2020 16:15:08 +0000
    Ready:          True
    Restart Count:  0
    Readiness:      exec [stat /tmp/operator-sdk-ready] delay=4s timeout=1s period=10s #success=1 #failure=1
    Environment:
      WATCH_NAMESPACE:               openshift-nfd (v1:metadata.namespace)
      POD_NAME:                      nfd-operator-5d5d64b769-jshkc (v1:metadata.name)
      OPERATOR_NAME:                 cluster-nfd-operator
      NODE_FEATURE_DISCOVERY_IMAGE:  quay.io/zvonkok/node-feature-discovery:v4.2
    Mounts:
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from nfd-operator-token-6htxs (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  nfd-operator-token-6htxs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  nfd-operator-token-6htxs
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age    From                                                 Message
  ----    ------     ----   ----                                                 -------
  Normal  Scheduled  2m44s  default-scheduler                                    Successfully assigned openshift-nfd/nfd-operator-5d5d64b769-jshkc to ip-10-0-148-204.us-east-2.compute.internal
  Normal  Pulling    2m35s  kubelet, ip-10-0-148-204.us-east-2.compute.internal  Pulling image "quay.io/zvonkok/cluster-nfd-operator:release-4.2"
  Normal  Pulled     2m14s  kubelet, ip-10-0-148-204.us-east-2.compute.internal  Successfully pulled image "quay.io/zvonkok/cluster-nfd-operator:release-4.2"
  Normal  Created    2m14s  kubelet, ip-10-0-148-204.us-east-2.compute.internal  Created container nfd-operator
  Normal  Started    2m14s  kubelet, ip-10-0-148-204.us-east-2.compute.internal  Started container nfd-operator

Comment 11 errata-xmlrpc 2020-07-01 16:08:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2589

Note You need to log in before you can comment on or make changes to this bug.