1568221 – OCP 3.9 Prometheus fails to deploy when nodes are not configured w/ the traditional region=infra node labels.

Bug 1568221 - OCP 3.9 Prometheus fails to deploy when nodes are not configured w/ the traditional region=infra node labels.

Summary: OCP 3.9 Prometheus fails to deploy when nodes are not configured w/ the tradi...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	3.9.z
Assignee:	Scott Dodson
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-17 01:38 UTC by Nick Poyant - npoyant@redhat.com
Modified:	2018-04-20 14:22 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-20 14:22:08 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Nick Poyant - npoyant@redhat.com 2018-04-17 01:38:42 UTC

Description of problem:

It appears that prometheus fails to deploy when nodes are not configured w/ the traditional region=infra node labels. 


Version-Release number of selected component (if applicable):

[root@bastion ~]# yum info openshift-ansible
Loaded plugins: amazon-id, rhui-lb, search-disabled-repos
Installed Packages
Name        : openshift-ansible
Arch        : noarch
Version     : 3.9.14
Release     : 1.git.3.c62bc34.el7
Size        : 56 k
Repo        : installed
From repo   : rhel-7-server-ose-3.9-rpms
Summary     : Openshift and Atomic Enterprise Ansible
URL         : https://github.com/openshift/openshift-ansible
License     : ASL 2.0
Description : Openshift and Atomic Enterprise Ansible
            : 
            : This repo contains Ansible code and playbooks
            : for Openshift and Atomic Enterprise.


How reproducible:

openshift_hosted_prometheus_deploy=true


Steps to Reproduce:
1. # ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml
2. # ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/openshift-prometheus/config.yml


Actual results:


It appears that prometheus fails to deploy when nodes are not configured w/ the traditional region=infra node labels. 



[root@bastion ~]# cat /etc/ansible/hosts 
[OSEv3:vars]

###########################################################################
### Ansible Vars
###########################################################################
timeout=60
ansible_become=yes
ansible_ssh_user=ec2-user

openshift_release=v3.9
openshift_deployment_type=openshift-enterprise
openshift_master_cluster_method=native
openshift_master_cluster_hostname=loadbalancer.bb9f.example.opentlc.com
openshift_master_cluster_public_hostname=loadbalancer.bb9f.example.opentlc.com
openshift_master_default_subdomain=apps.bb9f.example.opentlc.com

osm_default_node_selector='env=app'
openshift_hosted_infra_selector="env=infra"


openshift_install_examples=true

openshift_master_ca_certificate={'certfile': '/root/intermediate_ca.crt', 'keyfile': '/root/intermediate_ca.key'}

openshift_hosted_router_selector='env=infra'
openshift_hosted_router_replicas=2

openshift_hosted_registry_selector='env=infra'
openshift_hosted_registry_replicas=2

openshift_hosted_registry_storage_kind=nfs
openshift_hosted_registry_storage_access_modes=['ReadWriteMany']
openshift_hosted_registry_storage_nfs_directory=/srv/nfs
openshift_hosted_registry_storage_nfs_options='*(rw,root_squash)'
openshift_hosted_registry_storage_volume_name=registry
openshift_hosted_registry_storage_size=10Gi

openshift_logging_install_logging=true
openshift_logging_storage_kind=nfs
openshift_logging_storage_access_modes=['ReadWriteOnce']
openshift_logging_storage_nfs_directory=/srv/nfs
openshift_logging_storage_nfs_options='*(rw,root_squash)'
openshift_logging_storage_volume_name=logging
openshift_logging_storage_volume_size=5Gi
openshift_logging_storage_labels={'storage': 'logging'}

openshift_logging_es_cluster_size=1
openshift_logging_es_nodeselector={"env":"infra"}
openshift_logging_kibana_nodeselector={"env":"infra"}
openshift_logging_curator_nodeselector={"env":"infra"}

openshift_metrics_install_metrics=true
openshift_metrics_storage_kind=nfs
openshift_metrics_storage_access_modes=['ReadWriteOnce']
openshift_metrics_storage_nfs_directory=/srv/nfs
openshift_metrics_storage_nfs_options='*(rw,root_squash)'
openshift_metrics_storage_volume_name=metrics
openshift_metrics_storage_volume_size=5Gi
openshift_metrics_storage_labels={'storage': 'metrics'}


openshift_hosted_prometheus_deploy=true


[OSEv3:children]
lb
masters
etcd
nodes
nfs

[lb]
loadbalancer1.bb9f.internal host_zone=us-east-1b

[masters]
master2.bb9f.internal host_zone=us-east-1b
master3.bb9f.internal host_zone=us-east-1b
master1.bb9f.internal host_zone=us-east-1b

[etcd]
master2.bb9f.internal host_zone=us-east-1b
master3.bb9f.internal host_zone=us-east-1b
master1.bb9f.internal host_zone=us-east-1b

[nodes]
## These are the masters
master2.bb9f.internal openshift_hostname=master2.bb9f.internal  openshift_node_labels="{'logging':'true','openshift_schedulable':'False','cluster': 'bb9f', 'zone': 'us-east-1b'}"
master3.bb9f.internal openshift_hostname=master3.bb9f.internal  openshift_node_labels="{'logging':'true','openshift_schedulable':'False','cluster': 'bb9f', 'zone': 'us-east-1b'}"
master1.bb9f.internal openshift_hostname=master1.bb9f.internal  openshift_node_labels="{'logging':'true','openshift_schedulable':'False','cluster': 'bb9f', 'zone': 'us-east-1b'}"

## These are infranodes
infranode1.bb9f.internal openshift_hostname=infranode1.bb9f.internal  openshift_node_labels="{'logging':'true','cluster': 'bb9f', 'env':'infra', 'zone': 'us-east-1b'}"
infranode2.bb9f.internal openshift_hostname=infranode2.bb9f.internal  openshift_node_labels="{'logging':'true','cluster': 'bb9f', 'env':'infra', 'zone': 'us-east-1b'}"

## These are regular nodes
node3.bb9f.internal openshift_hostname=node3.bb9f.internal  openshift_node_labels="{'logging':'true','cluster': 'bb9f', 'env':'app', 'zone': 'us-east-1b'}"
node2.bb9f.internal openshift_hostname=node2.bb9f.internal  openshift_node_labels="{'logging':'true','cluster': 'bb9f', 'env':'app', 'zone': 'us-east-1b'}"
node1.bb9f.internal openshift_hostname=node1.bb9f.internal  openshift_node_labels="{'logging':'true','cluster': 'bb9f', 'env':'app', 'zone': 'us-east-1b'}"

[nfs]
support1.bb9f.internal openshift_hostname=support1.bb9f.internal



======================================================================



TASK [openshift_master : Ensure that Prometheus has nodes to run on] **********************************************************************************************************************************************
fatal: [master2.bb9f.internal]: FAILED! => {
    "assertion": false, 
    "changed": false, 
    "evaluated_to": false, 
    "msg": "No schedulable nodes found matching node selector for Prometheus - 'region=infra'"
}
	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/openshift-prometheus/config.retry

PLAY RECAP ********************************************************************************************************************************************************************************************************
infranode1.bb9f.internal   : ok=0    changed=0    unreachable=0    failed=0   
infranode2.bb9f.internal   : ok=0    changed=0    unreachable=0    failed=0   
loadbalancer1.bb9f.internal : ok=4    changed=0    unreachable=0    failed=0   
localhost                  : ok=13   changed=0    unreachable=0    failed=0   
master1.bb9f.internal      : ok=19   changed=0    unreachable=0    failed=0   
master2.bb9f.internal      : ok=45   changed=0    unreachable=0    failed=1   
master3.bb9f.internal      : ok=19   changed=0    unreachable=0    failed=0   
node1.bb9f.internal        : ok=0    changed=0    unreachable=0    failed=0   
node2.bb9f.internal        : ok=0    changed=0    unreachable=0    failed=0   
node3.bb9f.internal        : ok=0    changed=0    unreachable=0    failed=0   
support1.bb9f.internal     : ok=1    changed=0    unreachable=0    failed=0   


INSTALLER STATUS **************************************************************************************************************************************************************************************************
Initialization             : Complete (0:00:17)
Prometheus Install         : In Progress (0:00:04)
	This phase can be restarted by running: playbooks/openshift-prometheus/config.yml

==========================================





Expected results:


Additional info:


Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Nick Poyant - npoyant@redhat.com 2018-04-17 01:43:46 UTC

I was not able to find an prometheus variable options to specify an alternate node selector.

Note You need to log in before you can comment on or make changes to this bug.