Description of problem: After adding a HyperConverged CR in the openshift-vnc namespace the the deployed nmstate-handler-worker pods level off at half to a full core of cpu usage. Version-Release number of selected component (if applicable): Openshift 4.4.7 CNV 2.3.0 How reproducible: I have seen this on Openshift on VMware and on baremetal clusters Steps to Reproduce: 1. Deploy Openshift 4.4 cluster 2. Upgrade to 4.4.7 3. Subscribe to the Container-native virtualization 2.3.0 Operator 4. Create a HyperConverged CR with the defaults. Actual results: Resulting nmstate-handler-worker pods will show 0.500 to 1.000 cpu cores used in web console Expected results: I would think the cpu usage for a service like this should be 0.050 or less. Additional info:
Created attachment 1696766 [details] Console screenshot Screenshot showing CPU Usage.
in cnv.engineering (cnv-2.3) I see nmstate-handler taking ~0.5 per master, and nmstate-handler-worker taking 0.3-0.5 core per worker. However a closer look suggests a bigger etcdserver error. An nmstate-handler pod is busy tight-looping with {"level":"error","ts":1592671549.9629877,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"nodenetworkstate-controller","request":"/cnv-x4lcc-master-2","error":"etcdserver: request timed out","stacktrace":"github.com/nmstate/kubernetes-nmstate/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\ngithub.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\ngithub.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\ngithub.com/nmstate/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\ngithub.com/nmstate/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\ngithub.com/nmstate/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"} And ab nmstate-handler-work pod is in a similar condition. {"level":"error","ts":1592671713.9723308,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"nodenetworkstate-controller","request":"/cnv-x4lcc-worker-0-gndlf","error":"etcdserver: request timed out","stacktrace":"github.com/nmstate/kubernetes-nmstate/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\ngithub.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:192\ngithub.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:171\ngithub.com/nmstate/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:152\ngithub.com/nmstate/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:153\ngithub.com/nmstate/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/github.com/nmstate/kubernetes-nmstate/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88"}
I have start to verify this at github.com/nmstate/kubevirt kubevirtci deploy and I see that every often handler goes to 45% of 6 cores with means it's around 3 cores althugh this cores are vcpus so I don't know if this correlates to the issue here.
Tested with nmstate 0.3 and nmstate 0.4 both uses around 40% of CPUs everytime nmstatectl show is called.
I have being playing around with nmstatectl varlink implementation at nmstate 0.4 and since nmstate is up and running as daemon using it the consumption drops to 6%.
Jason reported that new releases are better we can close it, in case it reappear we can open it again.
Yes, you can close this BZ.