Bug 1822079 - nmstate-handler fails to start and keeps restarting
Summary: nmstate-handler fails to start and keeps restarting
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 2.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 2.4.0
Assignee: Quique Llorente
QA Contact: Meni Yakove
URL:
Whiteboard:
: 1832343 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-08 08:09 UTC by Lukas Bednar
Modified: 2020-07-28 19:09 UTC (History)
5 users (show)

Fixed In Version: kubernetes-nmstate-handler-container-v2.4.0-19
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-28 19:09:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2020:3194 0 None None None 2020-07-28 19:09:58 UTC

Description Lukas Bednar 2020-04-08 08:09:09 UTC
Description of problem:

nmstate-handler-58ntj                                 0/1     CrashLoopBackOff   127        15h

[cnv-qe-jenkins@cnv-executor-lbednar ~]$ oc logs -n openshift-cnv nmstate-handler-58ntj
{"level":"info","ts":1586332456.5840313,"logger":"cmd","msg":"Operator Version: 0.17.0"}
{"level":"info","ts":1586332456.5841203,"logger":"cmd","msg":"Go Version: go1.12.12"}
{"level":"info","ts":1586332456.584137,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1586332456.5841477,"logger":"cmd","msg":"Version of operator-sdk: v0.15.1"}
{"level":"info","ts":1586332459.2468967,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1586332459.2476315,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1586332459.2479975,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/nodenetworkconfigurationpolicies-mutate"}
{"level":"info","ts":1586332459.2480335,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/nodenetworkconfigurationpolicies-status-mutate"}
{"level":"info","ts":1586332459.2480404,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/nodenetworkconfigurationpolicies-timestamp-mutate"}
{"level":"info","ts":1586332469.8886878,"logger":"cmd","msg":"Could not create metrics Service","error":"failed to initialize service object for metrics: &{{{%!w(string=) %!w(string=)} {%!w(string=) %!w(string=) %!w(string=) %!w(*int64=<nil>)} %!w(string=Failure) %!w(string=networkaddonsconfigs.networkaddonsoperator.network.kubevirt.io \"cluster\" is forbidden: User \"system:serviceaccount:openshift-cnv:nmstate-handler\" cannot get resource \"networkaddonsconfigs\" in API group \"networkaddonsoperator.network.kubevirt.io\" at the cluster scope) %!w(v1.StatusReason=Forbidden) %!w(*v1.StatusDetails=&{cluster networkaddonsoperator.network.kubevirt.io networkaddonsconfigs  [] 0}) %!w(int32=403)}}"}
{"level":"info","ts":1586332472.540327,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1586332472.5405352,"logger":"webhook/server","msg":"Starting nodenetworkconfigurationpolicy webhook server"}
{"level":"info","ts":1586332472.540551,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1586332472.5405993,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodenetworkconfigurationpolicy-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1586332472.540617,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"node-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1586332472.54088,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"nodenetworkconfigurationpolicy-controller"}
{"level":"info","ts":1586332472.5410233,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"node-controller"}
{"level":"info","ts":1586332472.5408509,"logger":"webhook/server/certificate/manager","msg":"Updating CA bundle for webhook"}
{"level":"info","ts":1586332472.6411762,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"nodenetworkconfigurationpolicy-controller","worker count":1}
{"level":"info","ts":1586332472.6448724,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"node-controller","worker count":1}
{"level":"info","ts":1586332473.1615887,"logger":"webhook/server/certificate/manager","msg":"Starting cert manager"}
I0408 07:54:33.161704       1 certificate_manager.go:254] Certificate rotation is enabled.
{"level":"info","ts":1586332473.1617868,"logger":"webhook/server/certificate/manager","msg":"Wait for cert/key to be created"}
I0408 07:54:33.161907       1 certificate_manager.go:381] Rotating certificates
E0408 07:54:33.199075       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-4gtq9" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
I0408 07:54:35.218701       1 certificate_manager.go:381] Rotating certificates
E0408 07:54:35.230794       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-mzc54" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
I0408 07:54:39.351379       1 certificate_manager.go:381] Rotating certificates
E0408 07:54:39.361972       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-q9qbp" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
I0408 07:54:47.774323       1 certificate_manager.go:381] Rotating certificates
E0408 07:54:47.787471       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-lnpx4" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
I0408 07:55:05.089483       1 certificate_manager.go:381] Rotating certificates
E0408 07:55:05.099711       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-9tb79" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
E0408 07:55:05.099735       1 certificate_manager.go:290] Reached backoff limit, still unable to rotate certs: timed out waiting for the condition
I0408 07:55:37.101051       1 certificate_manager.go:381] Rotating certificates
E0408 07:55:37.112736       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-srqql" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
I0408 07:56:09.100993       1 certificate_manager.go:381] Rotating certificates
E0408 07:56:09.112697       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-rlrzh" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
{"level":"info","ts":1586332593.163694,"logger":"controller-runtime.controller","msg":"Stopping workers","controller":"node-controller"}
{"level":"info","ts":1586332593.1638722,"logger":"controller-runtime.controller","msg":"Stopping workers","controller":"nodenetworkconfigurationpolicy-controller"}
{"level":"error","ts":1586332593.1636398,"logger":"cmd","msg":"Manager exited non-zero","error":"failed starting webhook cert manager: failed creating webhook tls key/cert: timed out waiting for the condition","errorVerbose":"timed out waiting for the condition\nfailed creating webhook tls key/cert\ngithub.com/nmstate/kubernetes-nmstate/vendor/github.com/qinqon/kube-admission-webhook/pkg/webhook/server/certificate.manager.Start\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/github.com/qinqon/kube-admission-webhook/pkg/webhook/server/certificate/manager.go:156\ngithub.com/nmstate/kubernetes-nmstate/vendor/github.com/qinqon/kube-admission-webhook/pkg/webhook/server.(*Server).Start\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/github.com/qinqon/kube-admission-webhook/pkg/webhook/server/server.go:76\ngithub.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startNonLeaderElectionRunnables.func1\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/manager/internal.go:455\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1337\nfailed starting webhook cert manager\ngithub.com/nmstate/kubernetes-nmstate/vendor/github.com/qinqon/kube-admission-webhook/pkg/webhook/server.(*Server).Start\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/github.com/qinqon/kube-admission-webhook/pkg/webhook/server/server.go:78\ngithub.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startNonLeaderElectionRunnables.func1\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/manager/internal.go:455\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1337","stacktrace":"github.com/nmstate/kubernetes-nmstate/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/nmstate/kubernetes-nmstate/cmd/manager/main.go:172\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:200"}



Version-Release number of selected component (if applicable):
cnv-2.4.0 budle=705.0.0
4.5.0-0.nightly-2020-04-07-082457


How reproducible: 100%


Steps to Reproduce:
1. Deploy CNV
2.
3.

Actual results: nmstate-handler fails to start


Expected results: nmstate-handler up and running


Additional info:

Comment 1 Quique Llorente 2020-04-08 10:23:38 UTC
There is a race between controller-runtime manager start and client-cache ready, to fix that there is a fix at kube-admission-webhook lib https://github.com/qinqon/kube-admission-webhook/commit/d7fd07e354ce4bbb91998530fbed612b07fd1145

This has being alredy updated at kubernetes-nmstate master so this will be fixed with the next release.

Also note that we may want at near future to use openshift cert manager mechanism instead of this one.

Comment 2 Lukas Bednar 2020-04-17 06:29:25 UTC
using CNV Bundle=736.0.0
registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubernetes-nmstate-handler-rhel8@sha256:c021fa81bf36011a4e02e2e67a4c7d82237fe91d7f04b8ac04e91f140662a5fb


openshift-cnv                                                 nmstate-handler-cfkc2                                    0/1     CrashLoopBackOff    7          31m
openshift-cnv                                                 nmstate-handler-hkh8h                                    0/1     CrashLoopBackOff    7          31m
openshift-cnv                                                 nmstate-handler-qxs79                                    0/1     CrashLoopBackOff    7          31m


[cnv-qe-jenkins@talayan-test ~]$ oc logs -n openshift-cnv nmstate-handler-cfkc2
{"level":"info","ts":1587104512.7005155,"logger":"cmd","msg":"Operator Version: 0.19.0"}
{"level":"info","ts":1587104512.7006233,"logger":"cmd","msg":"Go Version: go1.12.12"}
{"level":"info","ts":1587104512.7006283,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1587104512.7006314,"logger":"cmd","msg":"Version of operator-sdk: v0.15.1"}
{"level":"info","ts":1587104512.700636,"logger":"cmd","msg":"Try to take exclusive lock on file: /var/k8s_nmstate/handler_lock"}
{"level":"info","ts":1587104512.7012513,"logger":"cmd","msg":"Successfully took nmstate exclusive lock"}
{"level":"info","ts":1587104515.4101267,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1587104515.4108043,"logger":"cmd","msg":"Registering Components."}
{"level":"info","ts":1587104515.4110224,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/nodenetworkconfigurationpolicies-mutate"}
{"level":"info","ts":1587104515.4110403,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/nodenetworkconfigurationpolicies-status-mutate"}
{"level":"info","ts":1587104515.411045,"logger":"controller-runtime.webhook","msg":"registering webhook","path":"/nodenetworkconfigurationpolicies-timestamp-mutate"}
{"level":"info","ts":1587104523.527616,"logger":"cmd","msg":"Could not generate and serve custom resource metrics","error":"discovering resource information failed for NMstate in nmstate.io/v1alpha1: &{%!w(string=apiVersion nmstate.io/v1alpha1 and kind NMstate not found available in Kubernetes cluster)}"}
{"level":"info","ts":1587104526.249069,"logger":"cmd","msg":"Could not create metrics Service","error":"failed to initialize service object for metrics: &{{{%!w(string=) %!w(string=)} {%!w(string=) %!w(string=) %!w(string=) %!w(*int64=<nil>)} %!w(string=Failure) %!w(string=networkaddonsconfigs.networkaddonsoperator.network.kubevirt.io \"cluster\" is forbidden: User \"system:serviceaccount:openshift-cnv:nmstate-handler\" cannot get resource \"networkaddonsconfigs\" in API group \"networkaddonsoperator.network.kubevirt.io\" at the cluster scope) %!w(v1.StatusReason=Forbidden) %!w(*v1.StatusDetails=&{cluster networkaddonsoperator.network.kubevirt.io networkaddonsconfigs  [] 0}) %!w(int32=403)}}"}
{"level":"info","ts":1587104528.9505768,"logger":"cmd","msg":"Starting the Cmd."}
{"level":"info","ts":1587104528.9508312,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"nodenetworkconfigurationpolicy-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1587104528.9510703,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"nodenetworkconfigurationpolicy-controller"}
{"level":"info","ts":1587104528.9511545,"logger":"controller-runtime.manager","msg":"starting metrics server","path":"/metrics"}
{"level":"info","ts":1587104528.951235,"logger":"webhook/server","msg":"Starting nodenetworkconfigurationpolicy webhook server"}
{"level":"info","ts":1587104528.9513505,"logger":"webhook/server/certificate/manager","msg":"Updating CA bundle for webhook"}
{"level":"info","ts":1587104528.951554,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"node-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1587104528.951669,"logger":"controller-runtime.controller","msg":"Starting Controller","controller":"node-controller"}
{"level":"info","ts":1587104529.0685575,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"nodenetworkconfigurationpolicy-controller","worker count":1}
{"level":"info","ts":1587104529.5518105,"logger":"controller-runtime.controller","msg":"Starting workers","controller":"node-controller","worker count":1}
{"level":"info","ts":1587104534.0655336,"logger":"webhook/server/certificate/manager","msg":"Starting cert manager"}
I0417 06:22:14.065648       1 certificate_manager.go:254] Certificate rotation is enabled.
{"level":"info","ts":1587104534.0657487,"logger":"webhook/server/certificate/manager","msg":"Wait for cert/key to be created"}
I0417 06:22:14.066085       1 certificate_manager.go:381] Rotating certificates
E0417 06:22:14.111632       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-24wbh" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
I0417 06:22:16.143080       1 certificate_manager.go:381] Rotating certificates
E0417 06:22:16.158728       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-gwc25" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
I0417 06:22:20.197696       1 certificate_manager.go:381] Rotating certificates
E0417 06:22:20.211129       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-9kt46" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
I0417 06:22:28.451976       1 certificate_manager.go:381] Rotating certificates
E0417 06:22:28.463471       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-sfkmf" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
I0417 06:22:45.288081       1 certificate_manager.go:381] Rotating certificates
E0417 06:22:45.300786       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-jhwjb" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
E0417 06:22:45.300822       1 certificate_manager.go:290] Reached backoff limit, still unable to rotate certs: timed out waiting for the condition
I0417 06:23:17.302174       1 certificate_manager.go:381] Rotating certificates
E0417 06:23:17.314773       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-zwf42" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
I0417 06:23:49.302160       1 certificate_manager.go:381] Rotating certificates
E0417 06:23:49.316137       1 certificate_manager.go:400] Failed while requesting a signed certificate from the master: cannot create certificate signing request: certificatesigningrequests.certificates.k8s.io "csr-xwfmj" is forbidden: user not permitted to approve requests with signerName "kubernetes.io/legacy-unknown"
{"level":"error","ts":1587104654.0662465,"logger":"cmd","msg":"Manager exited non-zero","error":"failed starting webhook cert manager: failed creating webhook tls key/cert: timed out waiting for the condition","errorVerbose":"timed out waiting for the condition\nfailed creating webhook tls key/cert\ngithub.com/nmstate/kubernetes-nmstate/vendor/github.com/qinqon/kube-admission-webhook/pkg/webhook/server/certificate.manager.Start\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/github.com/qinqon/kube-admission-webhook/pkg/webhook/server/certificate/manager.go:156\ngithub.com/nmstate/kubernetes-nmstate/vendor/github.com/qinqon/kube-admission-webhook/pkg/webhook/server.(*Server).Start\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/github.com/qinqon/kube-admission-webhook/pkg/webhook/server/server.go:76\ngithub.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startNonLeaderElectionRunnables.func1\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/manager/internal.go:455\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1337\nfailed starting webhook cert manager\ngithub.com/nmstate/kubernetes-nmstate/vendor/github.com/qinqon/kube-admission-webhook/pkg/webhook/server.(*Server).Start\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/github.com/qinqon/kube-admission-webhook/pkg/webhook/server/server.go:78\ngithub.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).startNonLeaderElectionRunnables.func1\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/manager/internal.go:455\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1337","stacktrace":"github.com/nmstate/kubernetes-nmstate/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/go/src/github.com/nmstate/kubernetes-nmstate/cmd/manager/main.go:189\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:200"}

Comment 3 Petr Horáček 2020-05-13 10:25:37 UTC
*** Bug 1832343 has been marked as a duplicate of this bug. ***

Comment 4 Lukas Bednar 2020-05-14 14:32:38 UTC
Looks like we are over this.

registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubernetes-nmstate-handler-rhel8@sha256:094f1e4fb25bd98f522c381667a4d333a59da36356a162f709c6a3d3fb533156

```
nmstate-handler-dtfg8                                 1/1     Running   1          6h22m
nmstate-handler-k7xg5                                 1/1     Running   0          6h22m
nmstate-handler-nhwj8                                 1/1     Running   0          6h22m
nmstate-handler-pjg89                                 1/1     Running   2          6h22m
nmstate-handler-whnm8                                 1/1     Running   0          6h22m
nmstate-handler-zvwbx                                 1/1     Running   0          6h22m
```

Comment 7 errata-xmlrpc 2020-07-28 19:09:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:3194


Note You need to log in before you can comment on or make changes to this bug.