Bug 1483787
| Summary: | Unable to deploy service catalog as part of advanced installation | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Kent Hua <khua> | ||||
| Component: | Installer | Assignee: | ewolinet | ||||
| Status: | CLOSED ERRATA | QA Contact: | Johnny Liu <jialiu> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 3.6.0 | CC: | aos-bugs, cbucur, gpei, jokerman, khua, mmccomas, rkharwar, sdodson, smunilla, wsun | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 3.6.z | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||
| Doc Text: |
undefined
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2017-11-21 05:41:13 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
QE can not reproduce this issues, everything is working well. Could you run: # oc describe po apiserver-ghv62 # oc describe po controller-manager-22mcf Didn't we need to set a variable to specify the api server hosts? @Johnny - I can't describe the pod because a pod never comes up. @Scott - I didn't see any in the installation instructions. By default it seems to create the node label on the master node: openshift-infra=apiserver. So it does assign the pods to my master node. I have my osm_default_node_selector="region=primary" set in my ansible hosts. A fellow SA suggested the following: oc edit namespace kube-service-catalog Add the following annotation: openshift.io/node-selector: "" I haven't had a chance to try it yet. I will later today. So I performed the suggestion to blank out the node-selector on the kube-service-catalog project. Does that mean something in the installer needs to be fixed? [root@master ~]# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE apiserver-3nfqd 1/1 Running 0 25m 10.131.0.3 master.example.com controller-manager-9m2m6 1/1 Running 1 25m 10.131.0.4 master.example.com [root@master ~]# oc get nodes --show-labels NAME STATUS AGE VERSION LABELS infra.example.com Ready 57m v1.6.1+5115d708d7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=infra.example.com,logging-infra-fluentd=true,region=infra,zone=default master.example.com Ready,SchedulingDisabled 57m v1.6.1+5115d708d7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=master.example.com,logging-infra-fluentd=true,openshift-infra=apiserver node1.example.com Ready 57m v1.6.1+5115d708d7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node1.example.com,logging-infra-fluentd=true,region=primary,zone=east node2.example.com Ready 57m v1.6.1+5115d708d7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node2.example.com,logging-infra-fluentd=true,region=primary,zone=west Is there a default node-selector being set for projects that are created? By default we shouldn't be setting a selector on the service-catalog project... I'll try to see if I can recreate this using the same version of openshift-ansible. Could you also paste the full contents of your inventory file? I'm guessing you originally provided a truncated part of it. Yes I did set the default_node_selector. osm_default_node_selector='region=primary' in my hosts file. However in the project the node_selector entry wasn't present, so I added the openshift.io/node-selector: "" to force it to reference a specific node_selector. Created attachment 1316808 [details]
ansible hosts file
Removing the entry osm_default_node_selector="region=primary" solved the issue, but shouldn't the apiserver and controller-manager in kube-service-catalog be capable of handling that entry in the ansible hosts? That's strange... I wonder if something has changed with OCP. In the role we are creating the project with a nodeselector of "" ... so that should have been able to overwrite the default node selector parameter. https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_service_catalog/tasks/install.yml#L22 - name: Set Service Catalog namespace oc_project: state: present name: "kube-service-catalog" node_selector: "" I see that line, but it doesn't appear to be setting it. So it'll work with installs that don't have a default node selector set.
[root@master ~]# oc export project kube-service-catalog
apiVersion: v1
kind: Project
metadata:
annotations:
openshift.io/description: ""
openshift.io/display-name: ""
openshift.io/sa.scc.mcs: s0:c8,c7
openshift.io/sa.scc.supplemental-groups: 1000070000/10000
openshift.io/sa.scc.uid-range: 1000070000/10000
creationTimestamp: null
name: kube-service-catalog
spec:
finalizers:
- openshift.io/origin
- kubernetes
status:
phase: Active
Could reproduce this bug with openshift v3.6.173.0.21 + openshift-ansible-3.6.173.0.21-2.git.0.44a4038.el7.noarch, this bug is really similar with BZ#1497047, almost the same root cause. Also could reproduce this bug with openshift v3.6.173.0.21 + openshift-ansible-3.6.173.0.49-1.git.0.7e8ae51.el7.noarch
# oc get ns kube-service-catalog -o yaml
apiVersion: v1
kind: Namespace
metadata:
annotations:
openshift.io/description: ""
openshift.io/display-name: ""
openshift.io/sa.scc.mcs: s0:c8,c7
openshift.io/sa.scc.supplemental-groups: 1000070000/10000
openshift.io/sa.scc.uid-range: 1000070000/10000
creationTimestamp: 2017-10-12T11:10:22Z
name: kube-service-catalog
resourceVersion: "2001"
selfLink: /api/v1/namespaces/kube-service-catalog
uid: f0af2f47-af3d-11e7-b1af-fa163eeaedc0
spec:
finalizers:
- openshift.io/origin
- kubernetes
status:
phase: Active
The PR is targeted to 3.6, while this bug is attached to 3.7 advisory, so move back to MODIFIED status. This bug is not reproduced in 3.7, even if be reproduced, then should be a dup with BZ#1497047. This bug should be specifically targeted to 3.6, so modify target release to 3.6.z. The same comment as comment 17. @Scott, could you help remove this bug from 3.7 errata, and add it into 3.6 openshift-ansible installer errata? ok Verified this bug with openshift-ansible-3.6.173.0.77-1.git.0.c63cec7.el7.noarch, and PASS. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:3255 |
Description of problem: Trying to deploy the service catalog as part of OCP 3.6 installation Version-Release number of selected component (if applicable): 3.6 How reproducible: Every time Steps to Reproduce: 1. Run advanced install: $ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml 2. Service catalog installation fails because service catalog apiserver is not responding Actual results: apiserver and controller-manager pods do not start going through Pending -> MatchNodeSelector -> Terminating Expected results: apiserver and controller-manager pods in project kube-service-catalog are up and running Additional info: [root@master ~]# cat /etc/ansible/hosts ... # service catalog openshift_enable_service_catalog=true # ansible service broker storage openshift_hosted_etcd_storage_kind=nfs openshift_hosted_etcd_storage_nfs_directory=/var/export openshift_hosted_etcd_storage_host=workstation.example.com openshift_hosted_etcd_storage_volume_name=etcd openshift_hosted_etcd_storage_access_modes=['ReadWriteOnce'] openshift_hosted_etcd_storage_volume_size=1Gi openshift_hosted_etcd_storage_labels={'storage': 'etcd'} openshift_template_service_broker_namespaces=['openshift'] # docker storage loopback check openshift_disable_check=docker_storage # host group for masters [masters] master.example.com # host group for nodes, includes region info [nodes] master.example.com openshift_schedulable=false infra.example.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}" node1.example.com openshift_node_labels="{'region': 'primary', 'zone': 'east'}" node2.example.com openshift_node_labels="{'region': 'primary', 'zone': 'west'}" [root@master ~]# oc get nodes --show-labels NAME STATUS AGE VERSION LABELS infra.example.com Ready 43m v1.6.1+5115d708d7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=infra.example.com,region=infra,zone=default master.example.com Ready,SchedulingDisabled 43m v1.6.1+5115d708d7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=master.example.com,openshift-infra=apiserver node1.example.com Ready 43m v1.6.1+5115d708d7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node1.example.com,region=primary,zone=east node2.example.com Ready 43m v1.6.1+5115d708d7 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node2.example.com,region=primary,zone=west [root@master ~]# oc get pv NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE etcd-volume 1Gi RWO Retain Available 39m [root@master ~]# oc get pods NAME READY STATUS RESTARTS AGE apiserver-ghv62 0/1 MatchNodeSelector 0 0s controller-manager-22mcf 0/1 MatchNodeSelector 0 0s [root@master ~]# oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE apiserver-pz404 0/1 Terminating 0 0s <none> master.example.com controller-manager-d03s7 0/1 Pending 0 0s <none> master.example.com [root@master ~]# oc get ev ... 26m 26m 1 controller-manager DaemonSet Normal SuccessfulCreate daemon-set Created pod: controller-manager-cfbr6 26m 26m 1 controller-manager DaemonSet Warning FailedDaemonPod daemonset-controller Found failed daemon pod kube-service-catalog/master.example.com on node controller-manager-cfbr6, will try to kill it 26m 26m 1 controller-manager DaemonSet Normal SuccessfulDelete daemon-set Deleted pod: controller-manager-cfbr6 26m 26m 4 controller-manager DaemonSet Normal SuccessfulCreate daemon-set (combined from similar events): Created pod: controller-manager-tmq6f 1m 26m 9006 controller-manager DaemonSet Warning FailedDaemonPod daemonset-controller (combined from similar events): Found failed daemon pod kube-service-catalog/master.example.com on node controller-manager-8cxb7, will try to kill it 1m 26m 9006 controller-manager DaemonSet Normal SuccessfulDelete daemon-set (combined from similar events): Deleted pod: controller-manager-8cxb7 [root@master ~]# journalctl -f -u atomic-openshift-master ... ] Controller apiserver deleting pod kube-service-catalog/apiserver-5n1t5 Aug 16 13:53:31 master.example.com atomic-openshift-master[29069]: I0816 13:53:31.906323 29069 event.go:217] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-service-catalog", Name:"apiserver", UID:"bb750a4b-82a7-11e7-b3b6-2cc260000000", APIVersion:"extensions", ResourceVersion:"117167", FieldPath:""}): type: 'Warning' reason: 'FailedDaemonPod' Found failed daemon pod kube-service-catalog/master.example.com on node apiserver-5n1t5, will try to kill it Aug 16 13:53:31 master.example.com atomic-openshift-master[29069]: E0816 13:53:31.932522 29069 daemoncontroller.go:233] kube-service-catalog/controller-manager failed with : deleted 1 failed pods of DaemonSet kube-service-catalog/controller-manager Aug 16 13:53:31 master.example.com atomic-openshift-master[29069]: I0816 13:53:31.935503 29069 event.go:217] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-service-catalog", Name:"controller-manager", UID:"cb1e2bfa-82a7-11e7-b3b6-2cc260000000", APIVersion:"extensions", ResourceVersion:"117164", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: controller-manager-rzf7d Aug 16 13:53:32 master.example.com atomic-openshift-master[29069]: I0816 13:53:32.033591 29069 event.go:217] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-service-catalog", Name:"controller-manager", UID:"cb1e2bfa-82a7-11e7-b3b6-2cc260000000", APIVersion:"extensions", ResourceVersion:"117164", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: controller-manager-httc7 Aug 16 13:53:32 master.example.com atomic-openshift-master[29069]: E0816 13:53:32.039736 29069 daemoncontroller.go:233] kube-service-catalog/apiserver failed with : deleted 1 failed pods of DaemonSet kube-service-catalog/apiserver Aug 16 13:53:32 master.example.com atomic-openshift-master[29069]: I0816 13:53:32.041701 29069 event.go:217] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-service-catalog", Name:"apiserver", UID:"bb750a4b-82a7-11e7-b3b6-2cc260000000", APIVersion:"extensions", ResourceVersion:"117167", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: apiserver-5n1t5 [root@master ~]# journalctl -f -u atomic-openshift-node ... Aug 16 13:54:18 master.example.com atomic-openshift-node[20758]: I0816 13:54:18.304726 20758 kubelet.go:1829] SyncLoop (DELETE, "api"): "apiserver-69w9h_kube-service-catalog(ecc976ec-82ab-11e7-b3b6-2cc260000000)" Aug 16 13:54:18 master.example.com atomic-openshift-node[20758]: I0816 13:54:18.311128 20758 kubelet.go:1823] SyncLoop (REMOVE, "api"): "apiserver-69w9h_kube-service-catalog(ecc976ec-82ab-11e7-b3b6-2cc260000000)" Aug 16 13:54:18 master.example.com atomic-openshift-node[20758]: I0816 13:54:18.311252 20758 kubelet.go:2007] Failed to delete pod "apiserver-69w9h_kube-service-catalog(ecc976ec-82ab-11e7-b3b6-2cc260000000)", err: pod not found Aug 16 13:54:18 master.example.com atomic-openshift-node[20758]: W0816 13:54:18.315339 20758 status_manager.go:465] Failed to update status for pod "_()": Operation cannot be fulfilled on pods "apiserver-69w9h": StorageError: invalid object, Code: 4, Key: /kubernetes.io/pods/kube-service-catalog/apiserver-69w9h, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: ecc976ec-82ab-11e7-b3b6-2cc260000000, UID in object meta: Aug 16 13:54:18 master.example.com atomic-openshift-node[20758]: I0816 13:54:18.341290 20758 kubelet.go:1813] SyncLoop (ADD, "api"): "controller-manager-mzx3g_kube-service-catalog(ece24a64-82ab-11e7-b3b6-2cc260000000)" Aug 16 13:54:18 master.example.com atomic-openshift-node[20758]: I0816 13:54:18.341423 20758 predicate.go:106] Predicate failed on Pod: controller-manager-mzx3g_kube-service-catalog(ece24a64-82ab-11e7-b3b6-2cc260000000), for reason: Predicate MatchNodeSelector failed Description of problem: Version-Release number of the following components: rpm -q openshift-ansible rpm -q ansible ansible --version How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag