Bug 1517994

Summary: Installing CNS with osm_default_node_selector fails
Product: OpenShift Container Platform Reporter: Gerald Nunn <gnunn>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Johnny Liu <jialiu>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aos-bugs, aos-storage-staff, bmchugh, gnunn, jokerman, mmccomas, vrutkovs
Version: 3.6.0Flags: gnunn: needinfo-
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-12 05:59:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1526613, 1541443    
Attachments:
Description Flags
Inventory file
none
install log
none
glusterfs events none

Description Gerald Nunn 2017-11-27 21:04:20 UTC
Created attachment 1359603 [details]
Inventory file

Description of problem:

I am installing a single master with dedicated gluster storage and app nodes. I want to ensure that user pods are only ever scheduled on the app nodes and do not run on the master or the gluster nodes. To do so, I add the following to my inventory:

osm_default_node_selector="region=primary"

However this causes playbook to fail when waiting for the daemonset to start as per the attached install log file. If I run the playbook without the osm_default_node_selector everything runs as expected. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Gerald Nunn 2017-11-27 21:04:54 UTC
Created attachment 1359604 [details]
install log

Comment 2 Gerald Nunn 2017-11-27 21:08:29 UTC
Created attachment 1359605 [details]
glusterfs events

Comment 4 Gerald Nunn 2017-11-28 17:26:02 UTC
I'm using the package: openshift-ansible-3.6.173.0.75-1.git.0.0a44128.el7.noarch

Comment 5 Vadim Rutkovsky 2018-01-17 14:05:05 UTC
It seems both 'openshift_storage_glusterfs_nodeselector' ("glusterfs": "storage-host") and 'osm_default_node_selector' ("region": "primary") selectors are used for this pod, so it tries to find the node, which satisfies both clauses.

Which labels are set for your nodes?

Comment 7 Gerald Nunn 2018-01-18 03:13:37 UTC
(In reply to Vadim Rutkovsky from comment #5)
> It seems both 'openshift_storage_glusterfs_nodeselector' ("glusterfs":
> "storage-host") and 'osm_default_node_selector' ("region": "primary")
> selectors are used for this pod, so it tries to find the node, which
> satisfies both clauses.
> 
> Which labels are set for your nodes?

As per the attached inventory file, the gluster nodes have no label whereas the other nodes are labelled with region:primary

Comment 8 Vadim Rutkovsky 2018-01-18 09:59:12 UTC
Are there any pods running in glusterfs namespace?

Could you try labelling one of your nodes with 'region=primary' with 'glusterfs=storage-host' and check if that makes one of the pods be scheduled?

Comment 9 Scott Dodson 2018-01-19 19:26:46 UTC
Jose backported those fixes in https://github.com/openshift/openshift-ansible/pull/6493

The fix should be in openshift-ansible-3.6.173.0.90-1 or newer.

Comment 10 Wenkai Shi 2018-01-23 05:50:53 UTC
Verified with version openshift-ansible-3.6.173.0.96-1.git.0.2954b4a.el7. Installation with default node selector setting can succeed.

Comment 11 Jose A. Rivera 2018-02-14 15:15:42 UTC
*** Bug 1526422 has been marked as a duplicate of this bug. ***

Comment 14 errata-xmlrpc 2018-04-12 05:59:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1106