Bug 1508059
Summary: | Prometheus and AlertManager volumes grows infinitely | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Scott Weiss <scweiss> |
Component: | Hawkular | Assignee: | Paul Gier <pgier> |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.6.1 | CC: | aos-bugs, bazulay, ccoleman, eparis, kgeorgie, pgier, pweil |
Target Milestone: | --- | ||
Target Release: | 3.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-11-28 22:20:29 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Scott Weiss
2017-10-31 18:56:28 UTC
@pgier: is this something you should be looking into? or do we need to get someone from the OpenShift side to take this over? I saw a similar phenomena but on the prometheus PV this time on a 3.7 cluster. [root@vm-49-57 exports]# oc version oc v3.7.0-0.178.0 kubernetes v1.7.6+a08f5eeb62 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://baz-ocp-3.7-master01.10.35.49.57.nip.io:8443 openshift v3.7.0-0.178.0 kubernetes v1.7.6+a08f5eeb62 Images in use by the prometheus pod: image: openshift/oauth-proxy:v1.0.0 image: openshift/prometheus:v2.0.0-dev.3 image: openshift/oauth-proxy:v1.0.0 image: openshift/prometheus-alert-buffer:v0.0.2 image: openshift/prometheus-alertmanager:v0.9.1 I have noticed that the prometheus pv grew up to 29G after only 2 days up. pgier - this is currently on the 3.7 blocker list. The growth rate here looks pretty severe. Please take a look and if this isn't something we need to block the release on please update the target release to 3.8. I started investigating Scott's issue with the alertmanager, but I'm not sure yet why the disk usage is growing so much. Tried upgrading alertmanager to 0.9.1 as suggested in the upstream issue (https://github.com/prometheus/alertmanager/issues/1074), but there didn't seem to be any improvement. this might be useful to get an insight on metrics counts/storage issues etc. https://github.com/kausalco/public/tree/master/promvt Tested, Prometheus AlertManager volumes does not grow infinitely now # openshift version openshift v3.7.0-0.198.0 kubernetes v1.7.6+a08f5eeb62 etcd 3.2.8 images prometheus-alert-buffer/images/v3.7.2-1 oauth-proxy/images/v3.7.2-1 prometheus-alertmanager/images/v3.7.2-1 prometheus/images/v3.7.2-1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |