1589629 – Install logging/metrics failed at TASK [Validate openshift_node_groups and openshift_node_group_name]

Bug 1589629 - Install logging/metrics failed at TASK [Validate openshift_node_groups and openshift_node_group_name]

Summary: Install logging/metrics failed at TASK [Validate openshift_node_groups and op...

Keywords:
Status:	CLOSED DUPLICATE of bug 1569476
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.10.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Michael Gugino
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-06-11 02:33 UTC by Junqi Zhao
Modified:	2018-06-12 18:11 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-06-12 18:11:34 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
inventory file (5.48 KB, text/plain) 2018-06-11 02:33 UTC, Junqi Zhao	no flags	Details
View All

Description Junqi Zhao 2018-06-11 02:33:49 UTC

Created attachment 1449842 [details]
inventory file

Description of problem:
Install logging/metrics failed at TASK [Validate openshift_node_groups and openshift_node_group_name]
TASK [Validate openshift_node_groups and openshift_node_group_name] ***********************************************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/init/sanity_checks.yml:18
Sunday 10 June 2018  22:21:35 -0400 (0:00:01.829)       0:00:13.792 *********** 
fatal: [host-8-249-107.host.centralci.eng.rdu2.redhat.com]: FAILED! => {
    "msg": "last_checked_host: host-8-249-107.host.centralci.eng.rdu2.redhat.com, last_checked_var: openshift_node_group_name;openshift_node_group_name must be defined for all nodes"
}

we don't need to set openshift_node_group_name before

Version-Release number of the following components:
# rpm -qa | grep openshift-ansible
openshift-ansible-roles-3.10.0-0.65.0.git.45.1ea4d05.el7.noarch
openshift-ansible-docs-3.10.0-0.65.0.git.45.1ea4d05.el7.noarch
openshift-ansible-playbooks-3.10.0-0.65.0.git.45.1ea4d05.el7.noarch
openshift-ansible-3.10.0-0.65.0.git.45.1ea4d05.el7.noarch


How reproducible:
Always

Steps to Reproduce:
1. Deploy metrics 3.10, inventory file see the attached file
2.
3.

Actual results:
failed at TASK [Validate openshift_node_groups and openshift_node_group_name]

Expected results:

Additional info:

Comment 1 Junqi Zhao 2018-06-11 02:35:16 UTC

Blocks metrics/logging installstion

Comment 2 Junqi Zhao 2018-06-11 02:36:36 UTC

add skip_sanity_checks=true as workaround

Comment 3 Michael Gugino 2018-06-11 14:58:23 UTC

We have refactored the installer a bit recently.  openshift_node_group_name is required to be set for all nodes.

Comment 4 liujia 2018-06-12 06:14:39 UTC

Hit it when do upgrade on openshift-ansible-3.10.0-0.66.0.git.79.68197f9.el7.noarch.rpm

Comment 5 Anping Li 2018-06-12 06:27:22 UTC

What's purpose for this variable?  Is the openshift_node_group_name one of the configmap name in openshift-node project?

I guess it is used to schedule pods to the specified groups. But when I set openshift_node_group_name=node-config-infra. The pods weren't scheduled to those nodes.  Is that correct? 

Shall we update the other roles(openshift-logging/openshift-metrics/openshift_prometheus/openshift_web_console/openshift_hosted/upgrade and etc) accordingly?

Comment 6 Vadim Rutkovsky 2018-06-12 10:12:25 UTC

(In reply to Anping Li from comment #5)
> What's purpose for this variable?  Is the openshift_node_group_name one of
> the configmap name in openshift-node project?

Yes. This configmaps contains labels (and other config settings) which would be applied to a node group. See https://bugzilla.redhat.com/show_bug.cgi?id=1571194

> 
> I guess it is used to schedule pods to the specified groups. But when I set
> openshift_node_group_name=node-config-infra. The pods weren't scheduled to
> those nodes.  Is that correct? 

Did the nodes get the required labels?

Comment 7 Anping Li 2018-06-12 10:30:21 UTC

Athough, there is configure map node-config-infra. But no node are labelled with node-role.kubernetes.io/infra=true. 

For prometheus, I found the default nodes selector have been changed from region=infra to node-role.kubernetes.io/infra=true. 

For Logging&Metrics, there isn't default node selector, the logging and metrics pod can be scheduled to any nodes by default.

The node_selector is using to schedule pods.  Do really need the openshift_node_group_name? 

The OCP upgrade have been asked to provide openshift_node_group_name. What shall be provided to upgrade all nodes in one job?

Comment 8 Scott Dodson 2018-06-12 18:11:34 UTC

This is all related to the need to document node group configuration.

*** This bug has been marked as a duplicate of bug 1569476 ***

Note You need to log in before you can comment on or make changes to this bug.