Bug 1827489 - secret grpc-tls is missing in a fresh cluster
Summary: secret grpc-tls is missing in a fresh cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.5.0
Assignee: Pawel Krupa
QA Contact: Junqi Zhao
URL:
Whiteboard:
: 1827123 (view as bug list)
Depends On:
Blocks: 1829974
TreeView+ depends on / blocked
 
Reported: 2020-04-24 03:42 UTC by Junqi Zhao
Modified: 2020-07-13 17:31 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1829974 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:30:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 766 0 None closed Bug 1827489: pkg/tasks: do not remove GRPC secret as it is used by querier 2020-12-31 03:25:03 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:31:19 UTC

Description Junqi Zhao 2020-04-24 03:42:19 UTC
Description of problem:
secret grpc-tls is missing in a fresh cluster(techPreviewUserWorkload is disabled), it causes the monitoring DEGRADED.
workaround is: enable techPreviewUserWorkload, secret grpc-tls is created, monitoring will be fine
# oc get co/monitoring
NAME         VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
monitoring   4.5.0-0.nightly-2020-04-23-202137   False       True          True       10m

# oc get co/monitoring -oyaml
...
  - lastTransitionTime: "2020-04-24T02:52:07Z"
    message: 'Failed to rollout the stack. Error: running task Updating Prometheus-k8s
      failed: waiting for Prometheus GRPC secret failed: waiting for secret grpc-tls:
      secrets "grpc-tls" not found'
    reason: UpdatingPrometheusK8SFailed
    status: "True"
    type: Degraded
...

# oc -n openshift-monitoring get secret grpc-tls
Error from server (NotFound): secrets "grpc-tls" not found

# oc -n openshift-monitoring logs thanos-querier-7fbf9ff6c-6x7lv -c thanos-query
level=info ts=2020-04-24T03:10:47.106024552Z caller=main.go:152 msg="Tracing will be disabled"
level=info ts=2020-04-24T03:10:47.106143629Z caller=client.go:54 msg="enabling client to server TLS"
level=info ts=2020-04-24T03:10:47.106341469Z caller=options.go:76 msg="TLS client using provided certificate pool"
level=info ts=2020-04-24T03:10:47.10656252Z caller=options.go:104 msg="TLS client authentication enabled"
level=info ts=2020-04-24T03:10:47.107998766Z caller=options.go:23 protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
level=info ts=2020-04-24T03:10:47.108459858Z caller=query.go:401 msg="starting query node"
level=info ts=2020-04-24T03:10:47.11219967Z caller=intrumentation.go:48 msg="changing probe status" status=ready
level=info ts=2020-04-24T03:10:47.112400222Z caller=grpc.go:106 service=gRPC/server component=query msg="listening for StoreAPI gRPC" address=127.0.0.1:10901
level=info ts=2020-04-24T03:10:47.112469774Z caller=intrumentation.go:60 msg="changing probe status" status=healthy
level=info ts=2020-04-24T03:10:47.112485375Z caller=http.go:56 service=http/server component=query msg="listening for requests and metrics" address=127.0.0.1:9090
level=warn ts=2020-04-24T03:10:52.119682521Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.129.2.15:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.129.2.15:10901
level=warn ts=2020-04-24T03:10:52.119961596Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.131.0.18:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.131.0.18:10901
level=warn ts=2020-04-24T03:10:57.1203892Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.129.2.15:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.129.2.15:10901
level=warn ts=2020-04-24T03:10:57.120423833Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.131.0.18:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.131.0.18:10901
level=warn ts=2020-04-24T03:11:02.12090855Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.131.0.18:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.131.0.18:10901
level=warn ts=2020-04-24T03:11:02.120979767Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.129.2.15:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.129.2.15:10901
level=warn ts=2020-04-24T03:11:03.707892697Z caller=proxy.go:291 err="No StoreAPIs matched for this query" stores=
level=warn ts=2020-04-24T03:11:03.87547059Z caller=proxy.go:291 err="No StoreAPIs matched for this query" stores=
level=warn ts=2020-04-24T03:11:03.875608288Z caller=proxy.go:291 err="No StoreAPIs matched for this query" stores=
level=warn ts=2020-04-24T03:11:07.121372416Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.131.0.18:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.131.0.18:10901
level=warn ts=2020-04-24T03:11:07.121372534Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.129.2.15:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.129.2.15:10901
level=warn ts=2020-04-24T03:11:12.121743177Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.131.0.18:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.131.0.18:10901
level=warn ts=2020-04-24T03:11:12.121747416Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.129.2.15:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.129.2.15:10901
level=warn ts=2020-04-24T03:11:17.122143971Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.131.0.18:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.131.0.18:10901
level=warn ts=2020-04-24T03:11:17.122143216Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.129.2.15:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.129.2.15:10901
level=warn ts=2020-04-24T03:11:22.122539473Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.131.0.18:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.131.0.18:10901
level=warn ts=2020-04-24T03:11:22.122595043Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.129.2.15:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.129.2.15:10901
level=warn ts=2020-04-24T03:11:27.12300226Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.129.2.15:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.129.2.15:10901
level=warn ts=2020-04-24T03:11:27.122992631Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.131.0.18:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.131.0.18:10901
level=warn ts=2020-04-24T03:11:32.123431404Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.131.0.18:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.131.0.18:10901
level=warn ts=2020-04-24T03:11:32.123446129Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.129.2.15:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.129.2.15:10901
level=warn ts=2020-04-24T03:11:37.123824208Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.131.0.18:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.131.0.18:10901
level=warn ts=2020-04-24T03:11:37.123820601Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.129.2.15:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.129.2.15:10901
level=warn ts=2020-04-24T03:11:42.124246994Z caller=storeset.go:440 component=storeset msg="update of store node failed" err="getting metadata: fetching store info from 10.129.2.15:10901: rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: x509: certificate signed by unknown authority\"" address=10.129.2.15:10901

if we enable techPreviewUserWorkload, monitoring will be fine
# oc -n openshift-monitoring get secret grpc-tls
NAME       TYPE     DATA   AGE
grpc-tls   Opaque   6      11m

# oc get co/monitoring
NAME         VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
monitoring   4.5.0-0.nightly-2020-04-23-202137   True        False         False      16m


Version-Release number of selected component (if applicable):
4.5.0-0.nightly-2020-04-23-202137

How reproducible:
Always

Steps to Reproduce:
1. In a fresh cluster
2.
3.

Actual results:
monitoring DEGRADED

Expected results:
no error

Additional info:

Comment 1 Pawel Krupa 2020-04-24 07:27:03 UTC
*** Bug 1827123 has been marked as a duplicate of this bug. ***

Comment 5 Junqi Zhao 2020-04-26 01:42:54 UTC
Issue is fixed with 4.5.0-0.nightly-2020-04-25-170442
Steps:
In a fresh cluster(techPreviewUserWorkload is disabled)
# oc -n openshift-user-workload-monitoring get pod
No resources found in openshift-user-workload-monitoring namespace.

# oc -n openshift-monitoring get secret grpc-tls
NAME       TYPE     DATA   AGE
grpc-tls   Opaque   6      69m
# oc get co/monitoring
NAME         VERSION                             AVAILABLE   PROGRESSING   DEGRADED   SINCE
monitoring   4.5.0-0.nightly-2020-04-25-170442   True        False         False      65m

Comment 6 errata-xmlrpc 2020-07-13 17:30:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.