Bug 1584415
Summary: | Prometheus can't write to etcd during scraping | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Dan Mace <dmace> | ||||
Component: | Monitoring | Assignee: | Dan Mace <dmace> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Junqi Zhao <juzhao> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.10.0 | CC: | aos-bugs, fbranczy, spasquie, xtian | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.10.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||
Doc Text: |
undefined
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-12-20 21:36:37 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Could you try using a non-empty dir based storage? It might be that the emptydir default size is too small, and Prometheus immediately fills it up, and then this issue occurs: https://github.com/prometheus/prometheus/issues/3283 You can configure storage in the cluster-monitoring-operator configuration. (now that I look at the docs it seems we have not documented it, but it works the same way as for the Alertmanager volumeClaimTemplate: https://github.com/openshift/cluster-monitoring-operator/blob/master/Documentation/user-guides/configuring-cluster-monitoring.md#prometheusk8sconfig) Working on this here: https://github.com/openshift/openshift-ansible/pull/8591 Tested with openshift-ansible-3.10.0-0.63.0.git.0.961c60d.el7.noarch prometheus-k8s now has 4 containers: prometheus prometheus-config-reloader alerting-rule-files-configmap-reloader prometheus-proxy and there is not error like the followings: write /prometheus/wal/000001: file already closed |
Created attachment 1445983 [details] Prometheus pod Description of problem: In free-stg, all scraping seems to be broken with the following errors from Prometheus: WAL log samples: log series: write /prometheus/wal/000001: file already closed The deployment is identical to free-int, so there must be some disparity in the cluster configuration that's revealing/masking another problem (potentially volume related; the data directory in this case is using emptydir backed by XFS). Version-Release number of selected component (if applicable): v3.10.0-0.54.0