Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1554921 - prometheus deployment fails in OCP3.7 on AWS platform with EBS storage
prometheus deployment fails in OCP3.7 on AWS platform with EBS storage
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer (Show other bugs)
3.9.0
Unspecified Unspecified
medium Severity medium
: ---
: 3.11.0
Assigned To: Paul Gier
Junqi Zhao
:
: 1579607 (view as bug list)
Depends On:
Blocks: 1621887
  Show dependency treegraph
 
Reported: 2018-03-13 11:03 EDT by mmariyan
Modified: 2018-10-11 03:20 EDT (History)
12 users (show)

See Also:
Fixed In Version:
Doc Type: Known Issue
Doc Text:
Cause: Installing prometheus in a multi-zone/region cluster using dynamic storage provisioning causes the prometheus pod to become unschedulable in some cases. Consequence: The prometheus pod requires three PVs (physical volumes): one for the prometheus server, one for the alertmanager, and one for the alert-buffer. In a multi-zone cluster with dynamic storage, it's possible that one or more of these volumes are allocated in a different zone than the others. This causes the prometheus pod to become unschedulable due to each node in the cluster only able to access PVs in it's own zone. So there is no node which can run the Prometheus pod and access all three PVs. Workaround (if any): The recommended solution is to create a storage class which restricts volumes to a single zone by using the "zone:" parameter, and assigning this storage class to the prometheus volumes using the ansible installer inventory variable "openshift_prometheus_<COMPONENT>_storage_class=<zone_restricted_storage_class>". Result: All three volumes will be created in the same zone/region, and the prometheus pod will be automatically scheduled to a node in the same zone.
Story Points: ---
Clone Of:
: 1621887 (view as bug list)
Environment:
Last Closed: 2018-10-11 03:19:09 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2652 None None None 2018-10-11 03:20 EDT

  None (edit)
Description mmariyan 2018-03-13 11:03:05 EDT
Description of problem:
Prometheus deployment in ocp3.7 on AWS platform with EBS storage, the prometheus playbook ends without any fail but the prometheus pod always being pending state, pod logs as follow "0/6 nodes are available: 3 CheckServiceAffinity, 3 MatchNodeSelector, 6 NoVolumeZoneConflict". 


Version-Release number of selected component (if applicable):


How reproducible:
prometheus deployment fails in OCP3.7 on AWS platform with EBS storage

Steps to Reproduce:
1.run the playbook 
#ansible-playbook -i <inventory-host> /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-prometheus.yml [ playbook runs successfully always]
2. check pod status
3.

Actual results:
Prometheus pods always being pending state

Expected results:
Prometheus pods should run without any error

Additional info:
Tested with empty dir for Prometheus storage the Prometheus pod running successfully,
Comment 1 Avesh Agarwal 2018-03-13 11:39:07 EDT
This might be a configuration issue. Could you provide controller logs to see what is going on?
Comment 5 Juan Luis de Sousa-Valadas 2018-04-20 09:32:18 EDT
It also affects 3.9, same steps, same result.

Looks like issue: https://github.com/kubernetes/kubernetes/issues/39178
Comment 14 jmselmi 2018-08-07 14:53:02 EDT
Hello, 

I am facing the same issue on deploying prometheus on OCP3.7 on Google Storage also.
The PV are created but, the were not assigned to nodes.

The pod stuck in "pending" like forever.

Cheers,
/JM
Comment 17 Juan Luis de Sousa-Valadas 2018-08-09 06:58:45 EDT
jmselmi, this issue is not specific to AWS, every cluster which has a storage class which can provision storage bound to one AZ in multiple AZs is prone to face this issue.

You can workaround it by manually creating all the persistent volume claims in the same AZ.
Comment 24 Paul Gier 2018-08-23 12:56:49 EDT
*** Bug 1579607 has been marked as a duplicate of this bug. ***
Comment 25 Junqi Zhao 2018-08-26 20:59:58 EDT
See also Bug 1565405
Comment 26 Junqi Zhao 2018-08-29 07:10:10 EDT
workaround works

specify the zone in the StorageClass:
parameters:
  type: gp2
  zone: us-east-1d

prometheus images version:v3.11.0-0.25.0

openshift-ansible-3.11.0-0.25.0.git.0.7497e69.el7.noarch
Comment 28 errata-xmlrpc 2018-10-11 03:19:09 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652

Note You need to log in before you can comment on or make changes to this bug.