1584415 – Prometheus can't write to etcd during scraping

Bug 1584415 - Prometheus can't write to etcd during scraping

Summary: Prometheus can't write to etcd during scraping

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Dan Mace
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-30 20:33 UTC by Dan Mace
Modified:	2018-12-20 21:46 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:	undefined
Clone Of:
Environment:
Last Closed:	2018-12-20 21:36:37 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Prometheus pod (7.64 KB, text/plain) 2018-05-30 20:33 UTC, Dan Mace	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	https://github.com/openshift openshift-ansible pull 8591	0	None	None	None	2020-09-17 09:23:09 UTC

Description Dan Mace 2018-05-30 20:33:06 UTC

Created attachment 1445983 [details]
Prometheus pod

Description of problem:

In free-stg, all scraping seems to be broken with the following errors from Prometheus:

  WAL log samples: log series: write /prometheus/wal/000001: file already closed

The deployment is identical to free-int, so there must be some disparity in the cluster configuration that's revealing/masking another problem (potentially volume related; the data directory in this case is using emptydir backed by XFS).

Version-Release number of selected component (if applicable):

v3.10.0-0.54.0

Comment 1 Frederic Branczyk 2018-05-31 07:41:57 UTC

Could you try using a non-empty dir based storage? It might be that the emptydir default size is too small, and Prometheus immediately fills it up, and then this issue occurs: https://github.com/prometheus/prometheus/issues/3283

You can configure storage in the cluster-monitoring-operator configuration. (now that I look at the docs it seems we have not documented it, but it works the same way as for the Alertmanager volumeClaimTemplate: https://github.com/openshift/cluster-monitoring-operator/blob/master/Documentation/user-guides/configuring-cluster-monitoring.md#prometheusk8sconfig)

Comment 2 Dan Mace 2018-05-31 21:18:02 UTC

Working on this here: https://github.com/openshift/openshift-ansible/pull/8591

Comment 4 Junqi Zhao 2018-06-07 08:46:09 UTC

Tested with openshift-ansible-3.10.0-0.63.0.git.0.961c60d.el7.noarch
prometheus-k8s now has 4 containers: prometheus prometheus-config-reloader alerting-rule-files-configmap-reloader prometheus-proxy

and there is not error like the followings:
write /prometheus/wal/000001: file already closed

Note You need to log in before you can comment on or make changes to this bug.