Bug 1538616
Summary: | REGRESSION: Template Service Broker does no longer get installed on 3.7.23 | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Wolfgang Kulhanek <wkulhane> | ||||
Component: | Installer | Assignee: | Vadim Rutkovsky <vrutkovs> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | sheng.lao <shlao> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.7.0 | CC: | aos-bugs, chezhang, dcaldwel, dmoessne, erjones, gucore, jcrumple, jiazha, jokerman, mmccomas, openshift-bugs-escalate, shlao, smulholland, vlaad, wkulhane, wmeng, xtian, zhsun, zitang | ||||
Target Milestone: | --- | Keywords: | Regression | ||||
Target Release: | 3.7.z | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1599905 (view as bug list) | Environment: | |||||
Last Closed: | 2018-08-27 20:43:53 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1601378, 1603611, 1603612 | ||||||
Bug Blocks: | 1599905 | ||||||
Attachments: |
|
Description
Wolfgang Kulhanek
2018-01-25 12:51:52 UTC
Wolfgang, There was an error corrected where the TSB was previous deployed to all nodes because it wasn't setting a node selector. The behavior now is to deploy to your infrastructure nodes, which by default are those labeled with `region=infra`. Do you have any nodes labeled in that manner? This is in line with how the router and registry are deployed and will inherit the node selector you've set via the variable `openshift_hosted_infra_selector` Scott, Ah - so that was a bug! I always clear the default node selector (env=app) (well, really rather set it to empty to prevent it from picking up the default) from the TSB project so that it would deploy on all nodes.... So it's only supposed to deploy on Infranodes. OK. Good to know. We don't use "region=infra" but "env=infra". Is "region=infra" the expected convention these days? I seem to remember that the concept of an "Infranode" was always more of a convention than anything that was officially documented. I had not seen the variable openshift_hosted_infra_selector before. Is this a new catch all for router/registry/logging components/metrics components/TSB/etc? I just ran the playbook again with openshift_hosted_infra_selector='env=infra' And it failed as well. The apiserver DaemonSet still had 'region=infra' in it. So something is still off. One of my colleagues meanwhile figured out that template_service_broker_selector={"env":"infra"} seems to work... I do see in the errata on docs.openshift.com now that this TSB fix is mentioned. But it doesn't mention how to set it up. So I think even if I had seen that (errata weren't live two days ago when 3.7.23 shipped) I would have completely missed it. running again with openshift_hosted_infra_selector={"env": "infra"} to see if that makes a difference... Nope. That didn't do it either. So openshift_hosted_infra_selector is not the answer but it appears that template_service_broker_selector={"env":"infra"} is working. Yeah, looks like that's correct. I was on master branch when I looked that code up. In the context of this bug we'll fix the defaulting to work as I suggested and we'll make sure that we document both `openshift_hosted_infra_selector` and `template_service_broker_selector`. Summarizing: In OCP 3.7 GA the TSB incorrectly deployed to all nodes. In 3.7.23 the code was updated to deploy to nodes that match the undocumented variable 'template_service_broker_selector' which defaults to '{"region":"infra"}' A workaround is to set template_service_broker_selector to a label which matches your infra nodes, ie: template_service_broker_selector={"env":"infra"} Created https://github.com/openshift/openshift-ansible/pull/8896 to document nodeselectors for hosted services and TSB in particular 1. Documents about the two parameters : template_service_broker_selector and openshift_hosted_infra_selector, not present and can't search on the websites: 1)、https://docs.openshift.org/3.7 2)、https://docs.openshift.com/container-platform/3.7 2. Verify the 'openshift_hosted_infra_selector' option 1)、versions of playbooks: a)openshift-ansible-playbooks.noarch 3.7.56-1.git.31.91ec9c5.el7 b)openshift-ansible-playbooks-3.7.23-1.git.0.bc406aa.el7.noarch.rpm 2)、values of the configurable options in inventory: [OSEv3:vars] openshift_hosted_infra_selector="env=infra" [nodes] qe-shlao-yyyyyy.com openshift_node_labels="{'role': 'node', 'env' : 'infra'}" 3)result: Failed, the output messages: TASK [template_service_broker : Verify that TSB is running] ************************************************************************************** FAILED - RETRYING: Verify that TSB is running (120 retries left). ... ... FAILED - RETRYING: Verify that TSB is running (1 retries left). fatal: [qe-shlao-yyyyyy.com]: FAILED! => {"attempts": 120, "changed": false, "cmd": ["curl", "-k", "https://apiserver.openshift-template-service-broker.svc/healthz"], "delta": "0:00:01.033899", "end": "2018-06-29 07:11:33.032692", "failed": true, "msg": "non-zero return code", "rc": 7, "start": "2018-06-29 07:11:31.998793", "stderr": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0curl: (7) Failed connect to apiserver.openshift-template-service-broker.svc:443; Connection refused", "stderr_lines": [" % Total % Received % Xferd Average Speed Time Time Time Current", " Dload Upload Total Spent Left Speed", "", " 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0", " 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0curl: (7) Failed connect to apiserver.openshift-template-service-broker.svc:443; Connection refused"], "stdout": "", "stdout_lines": []} to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/config.retry (In reply to sheng.lao from comment #13) > 1. Documents about the two parameters : template_service_broker_selector and > openshift_hosted_infra_selector, not present and can't search on the > websites: > 1)、https://docs.openshift.org/3.7 > 2)、https://docs.openshift.com/container-platform/3.7 Documentation needs to be updated in a separate bug. > 2. Verify the 'openshift_hosted_infra_selector' option > 3)result: Failed, the output messages: > > TASK [template_service_broker : Verify that TSB is running] > ***************************************************************************** > ********* > FAILED - RETRYING: Verify that TSB is running (120 retries left). > ... ... > FAILED - RETRYING: Verify that TSB is running (1 retries left). > fatal: [qe-shlao-yyyyyy.com]: FAILED! => {"attempts": 120, "changed": false, > "cmd": ["curl", "-k", > "https://apiserver.openshift-template-service-broker.svc/healthz"], "delta": > "0:00:01.033899", "end": "2018-06-29 07:11:33.032692", "failed": true, > "msg": "non-zero return code", "rc": 7, "start": "2018-06-29 > 07:11:31.998793", "stderr": " % Total % Received % Xferd Average Speed > Time Time Time Current\n Dload > Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 > 0 0 --:--:-- --:--:-- --:--:-- 0\r 0 0 0 0 0 0 > 0 0 --:--:-- 0:00:01 --:--:-- 0curl: (7) Failed connect to > apiserver.openshift-template-service-broker.svc:443; Connection refused", > "stderr_lines": [" % Total % Received % Xferd Average Speed Time > Time Time Current", " Dload Upload > Total Spent Left Speed", "", " 0 0 0 0 0 0 0 > 0 --:--:-- --:--:-- --:--:-- 0", " 0 0 0 0 0 0 0 > 0 --:--:-- 0:00:01 --:--:-- 0curl: (7) Failed connect to > apiserver.openshift-template-service-broker.svc:443; Connection refused"], > "stdout": "", "stdout_lines": []} > to retry, use: --limit > @/usr/share/ansible/openshift-ansible/playbooks/byo/config.retry ASB may not be up due to various reasons - e.g. etcd cannot mount PV as PVC didn't mount and so on. Did the pods get correct nodeselector? Vadim, Based on comment 8, we need two PRs to fix the bug, one is for 3.7 documentation, another is for openshift-ansible (template_service_broker_selector defaults to openshift_hosted_infra_selector). https://github.com/openshift/openshift-ansible/pull/8896 didn't help at all because it lives in upstream openshift-ansible documentation only. We can't find any related PRs for the bug yet. That would be extremely helpful if you could post the PRs here. Thank you Gan Huang Right, default nodeselectors are not set to infra, however I assumed this was already implemented. Created PR https://github.com/openshift/openshift-ansible/pull/9106 to fix this (In reply to Gan Huang from comment #27) > Vadim, > > Based on comment 8, we need two PRs to fix the bug, one is for 3.7 > documentation, Created https://bugzilla.redhat.com/show_bug.cgi?id=1599905 to track this. > another is for openshift-ansible (template_service_broker_selector defaults to > openshift_hosted_infra_selector). > > https://github.com/openshift/openshift-ansible/pull/8896 didn't help at all > because it lives in upstream openshift-ansible documentation only. Fix is available in openshift-ansible-3.7.58-1 This would only update default TSB nodeselector, so if the issue is still reproducible please attach the inventory and playbook logs (or just the link to jenkins job) The bug, TSB, is fixed in openshift-ansible-3.7.58, and I change the status after the errata has droped the item: REGRESSION: Template Service Broker does no longer get installed on 3.7.23 Regression (release version of openshift-ansible-3.7.57-1.git.33.cf01e48.el7 not fix the bug) Besides, I check ASB and find : it seems that ASB not use the variable openshift_hosted_infra_selector. > Besides, I check ASB and find : it seems that ASB not use the variable openshift_hosted_infra_selector.
Correct, in 3.7 ASB would run on first master and apply the label to it. Its consistent with 3.9 and 3.10 where ASB runs on masters, so infra selector is not used
1. to verify version: openshift-ansible-3.7.58-1.git.37.6db1e6f.el7.noarch.rpm 2. the excerpt of inventory. [OSEv3:vars] openshift_hosted_infra_selector="env=infra" [nodes] XXXX openshift_node_labels="{... , 'env':'infra'}" openshift_schedulable=true 3. result: Passed 1) installation is success 2) oc get ds -n openshift-template-service-broker NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE-SELECTOR AGE apiserver 1 1 1 1 1 env=infra 48m |