Bug 1651899

Summary: Readiness probe failed for grafana pod
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.0Keywords: Regression
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:41:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Readiness probe failed for grafana pod none

Description Junqi Zhao 2018-11-21 07:03:03 UTC
Created attachment 1507562 [details]
Readiness probe failed for grafana pod

Description of problem:
This bug is cloned from https://jira.coreos.com/browse/MON-475
File it again for QE team to track the monitoring issue in Bugzilla.

Deploy cluster monitoring by Next-Gen installer, readiness probe failed for grafana pod.

Describe pod, error is "Readiness probe failed: Get http://10.129.0.13:3000/api/health: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02""

It seems we should use https, not http to do readiness probe 


#oc -n openshift-monitoring get all
NAME                                              READY     STATUS    RESTARTS   AGE
pod/cluster-monitoring-operator-8fbbc8d47-mzl8k   1/1       Running   0          3h
pod/grafana-56567d86b-g5crx                       1/2       Running   0          3h
pod/prometheus-operator-57ddb7f5bb-ql6bw          1/1       Running   0          3h

NAME                                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/cluster-monitoring-operator   ClusterIP   None             <none>        8080/TCP   3h
service/grafana                       ClusterIP   172.30.231.107   <none>        3000/TCP   3h
service/prometheus-operator           ClusterIP   None             <none>        8080/TCP   3h

NAME                                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/cluster-monitoring-operator   1         1         1            1           4h
deployment.apps/grafana                       1         1         1            0           3h
deployment.apps/prometheus-operator           1         1         1            1           3h

NAME                                                    DESIRED   CURRENT   READY     AGE
replicaset.apps/cluster-monitoring-operator-8fbbc8d47   1         1         1         3h
replicaset.apps/grafana-56567d86b                       1         1         0         3h
replicaset.apps/prometheus-operator-57ddb7f5bb          1         1         1         3h

NAME                               HOST/PORT                                                   PATH      SERVICES   PORT      TERMINATION   WILDCARD
route.route.openshift.io/grafana   grafana-openshift-monitoring.apps.1121-1n5.qe.rhcloud.com             grafana    https     reencrypt     None

#oc -n openshift-monitoring describe pod grafana-56567d86b-g5crx

**********snipped*********

Events:
  Type     Reason     Age                 From                                   Message
  ----     ------     ----                ----                                   -------
  Warning  Unhealthy  2m (x1141 over 3h)  kubelet, ip-172-18-4-140.ec2.internal  Readiness probe failed: Get http://10.129.0.13:3000/api/health: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x03\x01\x00\x02\x02"

 

#oc -n openshift-monitoring logs grafana-56567d86b-g5crx -c grafana-proxy

2018/11/21 03:27:16 oauthproxy.go:238: Cookie settings: name:_oauth_proxy secure(https):true httponly:true expiry:168h0m0s domain:<default> refresh:disabled
2018/11/21 03:27:16 http.go:96: HTTPS: listening on [::]:3000
2018/11/21 03:27:17 server.go:2753: http: TLS handshake error from 10.129.0.1:42980: tls: first record does not look like a TLS handshake
2018/11/21 03:27:27 server.go:2753: http: TLS handshake error from 10.129.0.1:43018: tls: first record does not look like a TLS handshake
2018/11/21 03:27:37 server.go:2753: http: TLS handshake error from 10.129.0.1:43052: tls: first record does not look like a TLS handshake

 

BTW: alertmanager-main and prometheus-k8s  are not created

Version-Release number of selected component (if applicable):
quay.io/openshift/origin-cluster-monitoring-operator:v4.0
grafana/grafana:5.2.4
openshift/oauth-proxy:v1.1.0
quay.io/coreos/configmap-reload:v0.0.1
quay.io/coreos/prometheus-operator:v0.25.0

How reproducible:
Always

Steps to Reproduce:
1. Deploy cluster monitoring by Next-Gen installer
2.
3.

Actual results:
Readiness probe failed for grafana pod

Expected results:
Readiness probe should be passed for grafana pod

Additional info:

Comment 1 Junqi Zhao 2018-12-13 06:09:39 UTC
all containers of grafana are start up now

$ oc -n openshift-monitoring get pod | grep grafana
grafana-58456d859d-hcmj2                       2/2       Running   0          48m

used images
docker.io/grafana/grafana:5.2.4
docker.io/openshift/oauth-proxy:v1.1.0

$ oc version
oc v4.0.0-alpha.0+9d2874f-759
kubernetes v1.11.0+9d2874f

Comment 4 errata-xmlrpc 2019-06-04 10:41:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758