Bug 1634680

Summary: Prometheus - apiserver with etcd metric and all values are 0 or NaN
Product: OpenShift Container Platform Reporter: Vladislav Walek <vwalek>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED DUPLICATE QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.9.0CC: anavarro, aos-bugs, minden, oarribas
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-04 09:20:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vladislav Walek 2018-10-01 11:07:40 UTC
Description of problem:

When we try to see the etcd metrics in prometheus, we see all with value "0".
The etcd metrics seems to be fetched by prometheus from the api. If we search for those metrics in the api (/metrics), we see all to 0. If we search for those metrics in the etcd endpoint, we see the correct values.

Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.9
OpenShift Container Platform 3.10
Prometheus

How reproducible:
create fresh cluster 3.9.33, 3.9.41 or 3.10.34
run following commands on master-1, in project where prometheus is deployed

export TOKEN=`oc serviceaccounts get-token prometheus`

// incorrect values
curl -k -H "Authorization: Bearer $TOKEN" https://`hostname`:8443/metrics | grep -v "#" | grep "etcd_"

// correct values (on etcd peer)
curl -s --cert /etc/etcd/peer.crt --key /etc/etcd/peer.key --cacert /etc/etcd/ca.crt https://`hostname -i`:2379/metrics | grep -v "#" | grep "etcd_"


Steps to Reproduce:
1.
2.
3.

Actual results:
etcd_network_client_grpc_received_bytes_total 0
etcd_network_client_grpc_sent_bytes_total 0

Expected results:
etcd_network_client_grpc_received_bytes_total 5.929199954e+09
etcd_network_client_grpc_sent_bytes_total 8.0341137205e+10

Additional info:
Also, I see that some of the metrics are not shown if directly from etcd, or from /metrics url.

Comment 1 Vladislav Walek 2018-10-01 11:10:25 UTC
related to this:
https://github.com/openshift/origin/issues/20194

Comment 2 minden 2018-10-04 09:20:48 UTC
Great catch Vladislav. Thanks for the detailed report.

https://bugzilla.redhat.com/show_bug.cgi?id=1631926 is related and will actually fix the issue. For more details you can take a look at https://github.com/coreos/prometheus-operator/pull/1959/.

I am closing here as a duplicate. Let me know if I am missing something.

*** This bug has been marked as a duplicate of bug 1631926 ***