Bug 1774446 - CNV 2.2 unable to install pods are not going into ready state
Summary: CNV 2.2 unable to install pods are not going into ready state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Installation
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 2.2.0
Assignee: Karel Šimon
QA Contact: Irina Gulina
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-20 10:28 UTC by Tareq Alayan
Modified: 2020-01-30 16:27 UTC (History)
8 users (show)

Fixed In Version: kubevirt-ssp-operator:v2.2.0-11, hco-bundle-registry-container-v2.2.0-47
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-01-30 16:27:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
hco.log (11.71 KB, text/plain)
2019-11-20 10:28 UTC, Tareq Alayan
no flags Details
describe cdi-operator (13.55 KB, text/plain)
2019-11-20 10:37 UTC, Tareq Alayan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2020:0307 0 None None None 2020-01-30 16:27:48 UTC

Description Tareq Alayan 2019-11-20 10:28:04 UTC
Created attachment 1638057 [details]
hco.log

Description of problem:
Pods jumping from terminating to pending to crash loop 
mostly ssp cdi-deployment and cdi-operator

ridge-marker-4kfns                                   1/1     Running            0          25m
bridge-marker-6pgb5                                   1/1     Running            0          25m
bridge-marker-8c4cp                                   1/1     Running            0          25m
bridge-marker-nhqqk                                   1/1     Running            0          25m
cdi-apiserver-779b7c455b-jw8fw                        1/1     Running            0          25m
cdi-deployment-59756855fc-27wf5                       0/1     CrashLoopBackOff   6          24m
cdi-operator-87ccdd97f-v8qv2                          0/1     CrashLoopBackOff   19         111m
cdi-uploadproxy-6456f9b5cb-9szg7                      1/1     Running            0          25m
cluster-network-addons-operator-57985b8b55-kgfzx      1/1     Running            0          111m
hco-operator-5d46d855bc-t9mgx                         0/1     Running            3          111m
kube-cni-linux-bridge-plugin-4mn7f                    1/1     Running            0          25m
kube-cni-linux-bridge-plugin-dt4qh                    1/1     Running            0          25m
kube-cni-linux-bridge-plugin-m6pzg                    1/1     Running            0          25m
kube-cni-linux-bridge-plugin-psfb7                    1/1     Running            0          25m
kubemacpool-mac-controller-manager-5965948866-c9rvd   1/1     Terminating        0          20m
kubemacpool-mac-controller-manager-5965948866-chhqw   1/1     Running            0          115s
kubemacpool-mac-controller-manager-5965948866-wkhv8   1/1     Running            0          115s
kubevirt-ssp-operator-64575cf47f-ffsr9                0/1     CrashLoopBackOff   11         111m
nmstate-handler-5p5xx                                 1/1     Running            0          8m35s
nmstate-handler-dzc66                                 1/1     Running            0          8m58s
nmstate-handler-jkxgq                                 0/1     Terminating        0          25m
nmstate-handler-xr8lc                                 1/1     Running            0          8m11s
node-maintenance-operator-b775cddfb-5h7tj             1/1     Running            0          111m
ovs-cni-amd64-b89vf                                   2/2     Running            0          25m
ovs-cni-amd64-cmh84                                   2/2     Running            0          25m
ovs-cni-amd64-qzmgr                                   2/2     Running            0          25m
ovs-cni-amd64-ztthr                                   2/2     Running            0          25m
virt-api-68f7857466-fctvh                             1/1     Running            0          20m
virt-api-68f7857466-mqq86                             1/1     Running            0          20m
virt-controller-59c4c6c84b-h2gsl                      0/1     CrashLoopBackOff   1          20m
virt-controller-59c4c6c84b-nfjtr                      1/1     Running            6          20m
virt-handler-6cnmh                                    1/1     Running            0          20m
virt-handler-frhcw                                    1/1     Running            0          20m
virt-operator-8b94c69b-rvlr5                          0/1     CrashLoopBackOff   14         111m
virt-operator-8b94c69b-zfrzq                          1/1     Running            13         111m


Version-Release number of selected component (if applicable):
Container ID:  cri-o://926e20d95b69b7f0998de57c4a8c3c0c5f80cba3e39795b223a0e4f9f1d02a62
    Image:         registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-hyperconverged-cluster-operator:v2.2.0-5
    Image ID:      registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-hyperconverged-cluster-operator@sha256:a74f005478e8e5f62b14805acf252fae8539ee358f87b7bfaf6415751a902d17

How reproducible:


Steps to Reproduce:
1. deploy hco from rh-verified-operators
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Tareq Alayan 2019-11-20 10:37:41 UTC
Created attachment 1638070 [details]
describe cdi-operator

Comment 2 Simone Tiraboschi 2019-11-20 10:52:23 UTC
After a few minutes the same env got in this status:

[cnv-qe-jenkins@cnv-executor-a43 ~]$ oc get pods -n openshift-cnv
NAME                                                  READY   STATUS             RESTARTS   AGE
bridge-marker-4kfns                                   1/1     Running            0          52m
bridge-marker-6pgb5                                   1/1     Running            0          52m
bridge-marker-8c4cp                                   1/1     Running            0          52m
bridge-marker-nhqqk                                   1/1     Running            0          52m
cdi-apiserver-779b7c455b-jw8fw                        1/1     Running            0          52m
cdi-deployment-59756855fc-27wf5                       1/1     Running            7          51m
cdi-operator-87ccdd97f-ldnp4                          1/1     Running            0          24m
cdi-operator-87ccdd97f-v8qv2                          0/1     Terminating        19         139m
cdi-uploadproxy-6456f9b5cb-2jspg                      1/1     Running            0          24m
cdi-uploadproxy-6456f9b5cb-9szg7                      1/1     Terminating        0          52m
cluster-network-addons-operator-57985b8b55-kgfzx      1/1     Running            0          139m
hco-operator-5d46d855bc-dq76j                         1/1     Running            0          24m
hco-operator-5d46d855bc-t9mgx                         0/1     Terminating        3          139m
kube-cni-linux-bridge-plugin-4mn7f                    1/1     Running            0          52m
kube-cni-linux-bridge-plugin-dt4qh                    1/1     Running            0          52m
kube-cni-linux-bridge-plugin-m6pzg                    1/1     Running            0          52m
kube-cni-linux-bridge-plugin-psfb7                    1/1     Running            0          52m
kubemacpool-mac-controller-manager-5965948866-c9rvd   1/1     Terminating        0          47m
kubemacpool-mac-controller-manager-5965948866-chhqw   1/1     Running            0          29m
kubemacpool-mac-controller-manager-5965948866-wkhv8   1/1     Running            0          29m
kubevirt-ssp-operator-64575cf47f-ffsr9                0/1     CrashLoopBackOff   16         139m
nmstate-handler-5p5xx                                 1/1     Running            0          35m
nmstate-handler-dzc66                                 1/1     Running            0          36m
nmstate-handler-jkxgq                                 0/1     Terminating        0          52m
nmstate-handler-xr8lc                                 1/1     Running            0          35m
node-maintenance-operator-b775cddfb-5h7tj             1/1     Running            0          139m
ovs-cni-amd64-b89vf                                   2/2     Running            0          52m
ovs-cni-amd64-cmh84                                   2/2     Running            0          52m
ovs-cni-amd64-qzmgr                                   2/2     Running            0          52m
ovs-cni-amd64-ztthr                                   2/2     Running            0          52m
virt-api-68f7857466-6c472                             1/1     Running            0          24m
virt-api-68f7857466-fctvh                             1/1     Running            0          47m
virt-api-68f7857466-mqq86                             1/1     Terminating        0          47m
virt-controller-59c4c6c84b-h2gsl                      0/1     Terminating        1          47m
virt-controller-59c4c6c84b-nfjtr                      1/1     Running            6          47m
virt-controller-59c4c6c84b-xvwdk                      1/1     Running            0          24m
virt-handler-6cnmh                                    1/1     Running            0          47m
virt-handler-frhcw                                    1/1     Running            0          47m
virt-operator-8b94c69b-hrk7l                          1/1     Running            0          24m
virt-operator-8b94c69b-rvlr5                          0/1     Terminating        14         139m
virt-operator-8b94c69b-zfrzq                          1/1     Running            13         139m


SSP operator fails with:

{"level":"error","ts":1574247075.376807,"logger":"cmd","msg":"Exposing metrics port failed.","Namespace":"","error":"failed to create or get service for metrics: services \"kubevirt-ssp-operator-metrics\" is forbidden: User \"system:serviceaccount:openshift-cnv:kubevirt-ssp-operator\" cannot update resource \"services\" in API group \"\" in the namespace \"openshift-cnv\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible.Run\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/run.go:153\ngithub.com/operator-framework/operator-sdk/cmd/operator-sdk/run.newRunAnsibleCmd.func1\n\tsrc/github.com/operator-framework/operator-sdk/cmd/operator-sdk/run/ansible.go:38\ngithub.com/spf13/cobra.(*Command).execute\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/spf13/cobra/command.go:826\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/spf13/cobra/command.go:914\ngithub.com/spf13/cobra.(*Command).Execute\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/spf13/cobra/command.go:864\nmain.main\n\tsrc/github.com/operator-framework/operator-sdk/cmd/operator-sdk/main.go:84\nruntime.main\n\t/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/proc.go:200"}
Error: failed to create or get service for metrics: services "kubevirt-ssp-operator-metrics" is forbidden: User "system:serviceaccount:openshift-cnv:kubevirt-ssp-operator" cannot update resource "services" in API group "" in the namespace "openshift-cnv"

Comment 3 Simone Tiraboschi 2019-11-20 10:55:31 UTC
It's trying to deploy registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubevirt-ssp-operator:v2.2.0-8

Comment 4 Martin Sivák 2019-11-20 12:06:47 UTC
I see the permission in our upstream manifest: https://github.com/MarSik/kubevirt-ssp-operator/blob/master/deploy/role.yaml#L66
And also in our manifest template for csv-generator: https://github.com/MarSik/kubevirt-ssp-operator/blob/master/manifests/generated/kubevirt-ssp-operator.vVERSION.clusterserviceversion.yaml#L76
Someone should check the HCO generated file.

Comment 5 Simone Tiraboschi 2019-11-20 13:16:10 UTC
We have this in HCO generated CSV:

        - apiGroups:
          - ""
          resources:
          - serviceaccounts
          - configmaps
          - services
          verbs:
          - create
          - get
          - patch
          - list
          - watch

and also in the CSV deployed on that specific cluster.

Comment 6 Simone Tiraboschi 2019-11-20 14:20:42 UTC
This seams to me really close to https://bugzilla.redhat.com/1773905

Comment 8 Karel Šimon 2019-11-22 09:11:26 UTC
I am not able to reproduce this bug. I tried it locally on my OKD 4.3 cluster and HCO deployed ssp correctly. Tareq please can you provide testing ENV where this bug occured?

Comment 9 Karel Šimon 2019-11-25 10:07:40 UTC
Should be fixed in this PR https://github.com/kubevirt/hyperconverged-cluster-operator/pull/359

Comment 10 Tareq Alayan 2019-11-25 12:54:54 UTC
Env was provided clearing the needinfo

Comment 11 Tareq Alayan 2019-11-26 11:32:22 UTC
Using kubevirt-ssp-operator:v2.2.0-10
the issue seems to be still there:

oc logs -n openshift-cnv kubevirt-ssp-operator-85c4cbcc96-lc524
{"level":"info","ts":1574766559.2150786,"logger":"cmd","msg":"Go Version: go1.12.12"}
{"level":"info","ts":1574766559.2151883,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"}
{"level":"info","ts":1574766559.2152066,"logger":"cmd","msg":"Version of operator-sdk: v0.12.0+git"}
{"level":"info","ts":1574766559.215244,"logger":"cmd","msg":"Watching namespace.","Namespace":""}
{"level":"info","ts":1574766561.9840798,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":"0.0.0.0:8383"}
{"level":"info","ts":1574766561.985181,"logger":"watches","msg":"Failed to parse %v from environment. Using default %v","WORKER_KUBEVIRTCOMMONTEMPLATESBUNDLE_KUBEVIRT_IO":1}
{"level":"info","ts":1574766561.985221,"logger":"watches","msg":"Failed to parse %v from environment. Using default %v","ANSIBLE_VERBOSITY_KUBEVIRTCOMMONTEMPLATESBUNDLE_KUBEVIRT_IO":2}
{"level":"info","ts":1574766561.9852333,"logger":"watches","msg":"Failed to parse %v from environment. Using default %v","WORKER_KUBEVIRTTEMPLATEVALIDATOR_KUBEVIRT_IO":1}
{"level":"info","ts":1574766561.9852378,"logger":"watches","msg":"Failed to parse %v from environment. Using default %v","ANSIBLE_VERBOSITY_KUBEVIRTTEMPLATEVALIDATOR_KUBEVIRT_IO":2}
{"level":"info","ts":1574766561.985245,"logger":"watches","msg":"Failed to parse %v from environment. Using default %v","WORKER_KUBEVIRTNODELABELLERBUNDLE_KUBEVIRT_IO":1}
{"level":"info","ts":1574766561.9852488,"logger":"watches","msg":"Failed to parse %v from environment. Using default %v","ANSIBLE_VERBOSITY_KUBEVIRTNODELABELLERBUNDLE_KUBEVIRT_IO":2}
{"level":"info","ts":1574766561.9852564,"logger":"watches","msg":"Failed to parse %v from environment. Using default %v","WORKER_KUBEVIRTMETRICSAGGREGATION_KUBEVIRT_IO":1}
{"level":"info","ts":1574766561.9852605,"logger":"watches","msg":"Failed to parse %v from environment. Using default %v","ANSIBLE_VERBOSITY_KUBEVIRTMETRICSAGGREGATION_KUBEVIRT_IO":2}
{"level":"info","ts":1574766561.9853222,"logger":"ansible-controller","msg":"Watching resource","Options.Group":"kubevirt.io","Options.Version":"v1","Options.Kind":"KubevirtCommonTemplatesBundle"}
{"level":"info","ts":1574766561.9856575,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"kubevirtcommontemplatesbundle-controller","source":"kind source: kubevirt.io/v1, Kind=KubevirtCommonTemplatesBundle"}
{"level":"info","ts":1574766561.9858563,"logger":"ansible-controller","msg":"Watching resource","Options.Group":"kubevirt.io","Options.Version":"v1","Options.Kind":"KubevirtTemplateValidator"}
{"level":"info","ts":1574766561.9859648,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"kubevirttemplatevalidator-controller","source":"kind source: kubevirt.io/v1, Kind=KubevirtTemplateValidator"}
{"level":"info","ts":1574766561.9860878,"logger":"ansible-controller","msg":"Watching resource","Options.Group":"kubevirt.io","Options.Version":"v1","Options.Kind":"KubevirtNodeLabellerBundle"}
{"level":"info","ts":1574766561.9861827,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"kubevirtnodelabellerbundle-controller","source":"kind source: kubevirt.io/v1, Kind=KubevirtNodeLabellerBundle"}
{"level":"info","ts":1574766561.9862955,"logger":"ansible-controller","msg":"Watching resource","Options.Group":"kubevirt.io","Options.Version":"v1","Options.Kind":"KubevirtMetricsAggregation"}
{"level":"info","ts":1574766561.9863958,"logger":"controller-runtime.controller","msg":"Starting EventSource","controller":"kubevirtmetricsaggregation-controller","source":"kind source: kubevirt.io/v1, Kind=KubevirtMetricsAggregation"}
{"level":"info","ts":1574766561.9866958,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1574766564.7678726,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."}
{"level":"info","ts":1574766564.7679055,"logger":"leader","msg":"Continuing as the leader."}
{"level":"error","ts":1574766578.6226292,"logger":"cmd","msg":"Exposing metrics port failed.","Namespace":"","error":"failed to create or get service for metrics: services \"kubevirt-ssp-operator-metrics\" is forbidden: User \"system:serviceaccount:openshift-cnv:kubevirt-ssp-operator\" cannot update resource \"services\" in API group \"\" in the namespace \"openshift-cnv\"","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/operator-framework/operator-sdk/pkg/ansible.Run\n\tsrc/github.com/operator-framework/operator-sdk/pkg/ansible/run.go:153\ngithub.com/operator-framework/operator-sdk/cmd/operator-sdk/run.newRunAnsibleCmd.func1\n\tsrc/github.com/operator-framework/operator-sdk/cmd/operator-sdk/run/ansible.go:38\ngithub.com/spf13/cobra.(*Command).execute\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/spf13/cobra/command.go:826\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/spf13/cobra/command.go:914\ngithub.com/spf13/cobra.(*Command).Execute\n\tsrc/github.com/operator-framework/operator-sdk/vendor/github.com/spf13/cobra/command.go:864\nmain.main\n\tsrc/github.com/operator-framework/operator-sdk/cmd/operator-sdk/main.go:84\nruntime.main\n\t/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/proc.go:200"}
Error: failed to create or get service for metrics: services "kubevirt-ssp-operator-metrics" is forbidden: User "system:serviceaccount:openshift-cnv:kubevirt-ssp-operator" cannot update resource "services" in API group "" in the namespace "openshift-cnv"
Usage:
  operator-sdk run ansible [flags]

Flags:
      --ansible-verbosity int            Ansible verbosity. Overridden by environment variable. (default 2)
  -h, --help                             help for ansible
      --inject-owner-ref                 The ansible operator will inject owner references unless this flag is false (default true)
      --max-workers int                  Maximum number of workers to use. Overridden by environment variable. (default 1)
      --reconcile-period duration        Default reconcile period for controllers (default 1m0s)
      --watches-file string              Path to the watches file to use (default "./watches.yaml")
      --zap-devel                        Enable zap development mode (changes defaults to console encoder, debug log level, and disables sampling)
      --zap-encoder encoder              Zap log encoding ('json' or 'console')
      --zap-level level                  Zap log level (one of 'debug', 'info', 'error' or any integer value > 0) (default info)
      --zap-sample sample                Enable zap log sampling. Sampling will be disabled for integer log levels > 1
      --zap-time-encoding timeEncoding   Sets the zap time format ('epoch', 'millis', 'nano', or 'iso8601') (default )

Global Flags:
      --verbose   Enable verbose logging

Comment 13 Gal Ben Haim 2019-11-27 17:53:43 UTC
The fix is in kubevirt-ssp-operator-container-v2.2.0-11

Comment 15 Irina Gulina 2019-12-09 11:48:19 UTC
All pods are up and running, ssp logs are clean. All logs attached in the previous comment.

Comment 17 errata-xmlrpc 2020-01-30 16:27:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0307


Note You need to log in before you can comment on or make changes to this bug.