Bug 1554921
Summary: | prometheus deployment fails in OCP3.7 on AWS platform with EBS storage | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | mmariyan | |
Component: | Installer | Assignee: | Paul Gier <pgier> | |
Status: | CLOSED ERRATA | QA Contact: | Junqi Zhao <juzhao> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 3.9.0 | CC: | aos-bugs, aos-storage-staff, avagarwa, bchilds, decarr, jdesousa, jokerman, juzhao, mmariyan, mmccomas, pgier, wmeng | |
Target Milestone: | --- | |||
Target Release: | 3.11.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Known Issue | ||
Doc Text: |
Cause: Installing prometheus in a multi-zone/region cluster using dynamic storage provisioning causes the prometheus pod to become unschedulable in some cases.
Consequence: The prometheus pod requires three PVs (physical volumes): one for the prometheus server, one for the alertmanager, and one for the alert-buffer. In a multi-zone cluster with dynamic storage, it's possible that one or more of these volumes are allocated in a different zone than the others. This causes the prometheus pod to become unschedulable due to each node in the cluster only able to access PVs in it's own zone. So there is no node which can run the Prometheus pod and access all three PVs.
Workaround (if any): The recommended solution is to create a storage class which restricts volumes to a single zone by using the "zone:" parameter, and assigning this storage class to the prometheus volumes using the ansible installer inventory variable "openshift_prometheus_<COMPONENT>_storage_class=<zone_restricted_storage_class>".
Result: All three volumes will be created in the same zone/region, and the prometheus pod will be automatically scheduled to a node in the same zone.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1621887 (view as bug list) | Environment: | ||
Last Closed: | 2018-10-11 07:19:09 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1621887 |
Description
mmariyan
2018-03-13 15:03:05 UTC
This might be a configuration issue. Could you provide controller logs to see what is going on? It also affects 3.9, same steps, same result. Looks like issue: https://github.com/kubernetes/kubernetes/issues/39178 Hello, I am facing the same issue on deploying prometheus on OCP3.7 on Google Storage also. The PV are created but, the were not assigned to nodes. The pod stuck in "pending" like forever. Cheers, /JM jmselmi, this issue is not specific to AWS, every cluster which has a storage class which can provision storage bound to one AZ in multiple AZs is prone to face this issue. You can workaround it by manually creating all the persistent volume claims in the same AZ. *** Bug 1579607 has been marked as a duplicate of this bug. *** See also Bug 1565405 workaround works specify the zone in the StorageClass: parameters: type: gp2 zone: us-east-1d prometheus images version:v3.11.0-0.25.0 openshift-ansible-3.11.0-0.25.0.git.0.7497e69.el7.noarch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2652 |