1547210 – 3.9.0-0.46.0 logging deploy fails when openshift_logging_es_nodeselector not specified

Bug 1547210 - 3.9.0-0.46.0 logging deploy fails when openshift_logging_es_nodeselector not specified

Summary: 3.9.0-0.46.0 logging deploy fails when openshift_logging_es_nodeselector not ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.9.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.9.0
Assignee:	Vadim Rutkovsky
QA Contact:	Mike Fiedler
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1547375 1547972 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-02-20 18:15 UTC by Mike Fiedler
Modified:	2018-12-13 19:26 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Cause: Default nodeselector for elasticsearch was not set Consequence: Installation with default settings and enabled logging failed Fix: Correct handling of a default nodeselector was implemented Result: Logging install completes without a nodeselector set
Clone Of:
Environment:
Last Closed:	2018-12-13 19:26:51 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:3748	0	None	None	None	2018-12-13 19:26:58 UTC

Description Mike Fiedler 2018-02-20 18:15:32 UTC

Description of problem:

The latest logging installer is requiring openshift_logging_es_nodeselector where it was not required before.  The default should be to allow ES to schedule normally.   The syntax of this variable is also a JSON map which would not be very friendly if this really is a required inventory var.

TASK [openshift_logging_elasticsearch : Ensure that ElasticSearch has nodes to run on] *****************************************************
fatal: [ip-172-31-35-49]: FAILED! => {"msg": "The conditional check 'openshift_schedulable_node_labels | lib_utils_oo_has_no_matching_selector(openshift_logging_es_nodeselector)' failed. The error was: error while evaluating conditional (openshift_schedulable_node_labels | lib_utils_oo_has_no_matching_selector(openshift_logging_es_nodeselector)): {{ groups['oo_nodes_to_config'] | lib_utils_oo_get_node_labels(hostvars) }}: 'dict object' has no attribute 'oo_nodes_to_config'\n\nThe error appears to have been in '/usr/share/ansible/openshift-ansible/roles/openshift_logging_elasticsearch/tasks/main.yaml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Ensure that ElasticSearch has nodes to run on\n  ^ here\n"}



Version-Release number of selected component (if applicable): openshift-ansible and logging v3.9.0-0.46.0


How reproducible: Always


Steps to Reproduce:
1. With openshift-ansible 3.9.0-0.46.0, install logging with the inventory below.


Actual results:

Install fails with the error above.

Expected results:

Install succeeds as in past 3.9 builds (and past releases)

Additional info:

[OSEv3:children]
masters
etcd

[masters]
ip-172-31-35-49

[etcd]
ip-172-31-35-49

[OSEv3:vars]
deployment_type=openshift-enterprise

openshift_deployment_type=openshift-enterprise
openshift_release=v3.9
openshift_docker_additional_registries=registry.reg-aws.openshift.com


openshift_logging_install_logging=true
openshift_logging_master_url=https://ec2-54-201-71-102.us-west-2.compute.amazonaws.com:8443
openshift_logging_master_public_url=https://ec2-54-201-71-102.us-west-2.compute.amazonaws.com:8443
openshift_logging_kibana_hostname=kibana.apps.0220-y8n.qe.rhcloud.com
openshift_logging_namespace=logging
openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/
openshift_logging_image_version=v3.9.0-0.46.0
openshift_logging_es_cluster_size=3
openshift_logging_es_pvc_dynamic=true
openshift_logging_es_pvc_size=50Gi
openshift_logging_es_pvc_storage_class_name=gp2
openshift_logging_fluentd_read_from_head=false
openshift_logging_use_mux=false

Comment 1 Jeff Cantrill 2018-02-20 20:42:54 UTC

@Vadum,

Who advised this change was required?  From the logging team perspective, it is unnecessary to define specific nodes upon which to run the logging pods.  Is this a requirement driven by someone else?

Comment 2 Scott Dodson 2018-02-20 20:59:57 UTC

Mike,

Do you have nodes defined or is that an incomplete inventory pasted in comment 0?

Comment 3 Mike Fiedler 2018-02-20 21:06:52 UTC

Apologies, incomplete inventory.   Full inventory:

[OSEv3:children]                                                      
masters                                                               
etcd                                                                  

[masters]                                                             
ip-172-31-35-49                    

[etcd]                                                                
ip-172-31-35-49                    

[OSEv3:vars]                                                          
deployment_type=openshift-enterprise                                  

openshift_deployment_type=openshift-enterprise                                                                                              
openshift_release=v3.9                                                
openshift_docker_additional_registries=registry.reg-aws.openshift.com                                                                       


openshift_logging_install_logging=true                                
openshift_logging_master_url=https://ec2-54-201-71-102.us-west-2.compute.amazonaws.com:8443                                                 
openshift_logging_master_public_url=https://ec2-54-201-71-102.us-west-2.compute.amazonaws.com:8443                                          
openshift_logging_kibana_hostname=kibana.apps.0220-y8n.qe.rhcloud.com                                                                       
openshift_logging_namespace=logging                                   
openshift_logging_image_prefix=registry.reg-aws.openshift.com:443/openshift3/                                                               
openshift_logging_image_version=v3.9                                  
openshift_logging_es_cluster_size=3                                   
openshift_logging_es_pvc_dynamic=true                                 
openshift_logging_es_pvc_size=50Gi                                    
openshift_logging_es_pvc_storage_class_name=gp2                                                                                             
openshift_logging_fluentd_read_from_head=false                                                                                              
openshift_logging_use_mux=false

Comment 5 Mike Fiedler 2018-02-20 21:11:22 UTC

Sorry - disregard comment 3.  So I should have an actual [nodes] section now for logging install?   Trying that.

Comment 6 Mike Fiedler 2018-02-20 21:21:23 UTC

It works with a [nodes] section.

Comment 7 Vadim Rutkovsky 2018-02-20 21:54:59 UTC

Sorry, I didn't expect this change to cause so many side-effects. The idea behind this is to stop the component install before it would hang up due to incorrect nodeselector or unschedulable nodes. The code now reads ansible config to find out which labels do the nodes have. This would change after https://github.com/openshift/openshift-ansible/pull/7172 is merged - openshift-ansible would dynamically find out node labels, so there would be no need to add '[nodes]' group and specify all node labels

Comment 8 Mike Fiedler 2018-02-21 12:17:57 UTC

Re-opening and assigning to myself to verify once https://github.com/openshift/openshift-ansible/pull/7172 merges

Comment 9 Jeff Cantrill 2018-02-21 14:18:14 UTC

*** Bug 1547375 has been marked as a duplicate of this bug. ***

Comment 10 Jeff Cantrill 2018-02-22 15:12:00 UTC

*** Bug 1547972 has been marked as a duplicate of this bug. ***

Comment 11 Mike Fiedler 2018-02-22 19:13:10 UTC

I am assigning this back for the time being.   This has had a major impact on the functional QE teams automation and seems to be a breaking change after feature freeze.   If the pull in comment 7 fixes this we can move it back to ON_QA.   Is that targeted for 3.9.0?

Comment 12 Scott Dodson 2018-02-28 20:19:32 UTC

https://github.com/openshift/openshift-ansible/pull/7241 in openshift-ansible-3.9.1-1

Comment 13 Anping Li 2018-03-06 07:25:41 UTC

The logging can be deployed without the openshift_logging_es_nodeselector with openshift3/ose-ansible/images/v3.9.2-1, So move bug to verified.

Comment 16 errata-xmlrpc 2018-12-13 19:26:51 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748

Note You need to log in before you can comment on or make changes to this bug.