Bug 1524219

Summary: template service broker installer assumes all the nodes in the same region
Product: OpenShift Container Platform Reporter: raffaele spazzoli <rspazzol>
Component: InstallerAssignee: Jim Minter <jminter>
Status: CLOSED ERRATA QA Contact: Weihua Meng <wmeng>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.7.1CC: aos-bugs, cbucur, gpei, jkaur, jminter, jokerman, mmccomas, snalawad, wmeng
Target Milestone: ---   
Target Release: 3.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: node selectors incorrectly set on template service broker daemonset object Consequence: looping failed deployment of template service broker pods and excessive cpu usage on master and nodes Fix: set node selectors correctly on template service broker daemonset object Result: template service broker pods now deploy correctly
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-23 17:59:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description raffaele spazzoli 2017-12-10 21:27:27 UTC
the template service broker is installed as a daemonset.
It creates a project without checking whether there is a default node selector and the tries to schedule the pods to all the nodes.
This fails if the cluster is installed with different regions and some of the regions are not schedulable with the default node selector. 
This always happens in standard installs because there are at least two regions:
infranodes  and worker nodes. Sometime masters are also in a separate region.

Comment 2 Jim Minter 2017-12-18 19:22:49 UTC
QE, please can you ensure that after install, for the TSB daemonset, there is one pod per infra node, in state Running, and that no other pods exist for the TSB daemonset.

Comment 4 Jim Minter 2017-12-19 16:04:40 UTC
The fix is in the installer, not in OCP.  To test, need to install OCP 3.7 with a version of the ansible installer that includes the PR.

Comment 5 Weihua Meng 2017-12-20 02:08:40 UTC
Not fixed with latest OCP 3.7 build 
openshift-ansible-3.7.15-1.git.0.b20e6be.el7.noarch.rpm

Could you please indicate which build to verify the bug

Thanks.

Comment 6 Jim Minter 2017-12-20 16:15:27 UTC
Yes, looks like the fix is not in 3.7.15.  Presumably it will be in the following release.

Comment 8 Weihua Meng 2018-01-04 11:14:57 UTC
not work as expected.

openshift-ansible-3.7.18-1.git.0.a01e769.el7.noarch.rpm

PR is included

# cat main.yml 
---
# placeholder file?
template_service_broker_remove: False
template_service_broker_install: True
openshift_template_service_broker_namespaces: ['openshift']
template_service_broker_selector: { "region": "infra" }


[root@host-172-16-120-8 ~]# oc get ds -n openshift-template-service-broker
NAME        DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR   AGE
apiserver   5         5         5         5            5           role=node       24m

Comment 9 Jim Minter 2018-01-04 16:05:31 UTC
Weihua, please could you provide more detail on how you're running ansible / access to your environment?

I'm pretty sure that role=node must be being overridden somewhere by your config, but I'm not clear where.

Comment 10 Weihua Meng 2018-01-05 06:30:23 UTC
Thanks, Jim. You are right. 

This Bug is Fixed.
openshift-ansible-3.7.18-1.git.0.a01e769.el7.noarch.rpm

[root@host-172-16-120-49 ~]# oc get ds -n openshift-template-service-broker
NAME        DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR   AGE
apiserver   0         0         0         0            0           region=infra    36m

Comment 13 errata-xmlrpc 2018-01-23 17:59:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0113