Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1483787

Summary: Unable to deploy service catalog as part of advanced installation
Product: OpenShift Container Platform Reporter: Kent Hua <khua>
Component: InstallerAssignee: ewolinet
Status: CLOSED ERRATA QA Contact: Johnny Liu <jialiu>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: aos-bugs, cbucur, gpei, jokerman, khua, mmccomas, rkharwar, sdodson, smunilla, wsun
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-21 05:41:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ansible hosts file none

Description Kent Hua 2017-08-22 01:01:12 UTC
Description of problem:
Trying to deploy the service catalog as part of OCP 3.6 installation

Version-Release number of selected component (if applicable):
3.6

How reproducible:
Every time

Steps to Reproduce:
1. Run advanced install:
$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
2. Service catalog installation fails because service catalog apiserver is not responding

Actual results:
apiserver and controller-manager pods do not start going through Pending -> MatchNodeSelector -> Terminating

Expected results:
apiserver and controller-manager pods in project kube-service-catalog are up and running

Additional info:
[root@master ~]# cat /etc/ansible/hosts
...
# service catalog
openshift_enable_service_catalog=true

# ansible service broker storage
openshift_hosted_etcd_storage_kind=nfs
openshift_hosted_etcd_storage_nfs_directory=/var/export
openshift_hosted_etcd_storage_host=workstation.example.com
openshift_hosted_etcd_storage_volume_name=etcd
openshift_hosted_etcd_storage_access_modes=['ReadWriteOnce']
openshift_hosted_etcd_storage_volume_size=1Gi
openshift_hosted_etcd_storage_labels={'storage': 'etcd'}

openshift_template_service_broker_namespaces=['openshift']

# docker storage loopback check
openshift_disable_check=docker_storage

# host group for masters
[masters]
master.example.com

# host group for nodes, includes region info
[nodes]
master.example.com openshift_schedulable=false
infra.example.com openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
node1.example.com openshift_node_labels="{'region': 'primary', 'zone': 'east'}"
node2.example.com openshift_node_labels="{'region': 'primary', 'zone': 'west'}"




[root@master ~]# oc get nodes --show-labels
NAME                 STATUS                     AGE       VERSION             LABELS
infra.example.com    Ready                      43m       v1.6.1+5115d708d7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=infra.example.com,region=infra,zone=default
master.example.com   Ready,SchedulingDisabled   43m       v1.6.1+5115d708d7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=master.example.com,openshift-infra=apiserver
node1.example.com    Ready                      43m       v1.6.1+5115d708d7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node1.example.com,region=primary,zone=east
node2.example.com    Ready                      43m       v1.6.1+5115d708d7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node2.example.com,region=primary,zone=west



[root@master ~]# oc get pv
NAME          CAPACITY   ACCESSMODES   RECLAIMPOLICY   STATUS      CLAIM     STORAGECLASS   REASON    AGE
etcd-volume   1Gi        RWO           Retain          Available                                      39m


[root@master ~]# oc get pods
NAME                       READY     STATUS              RESTARTS   AGE
apiserver-ghv62            0/1       MatchNodeSelector   0          0s
controller-manager-22mcf   0/1       MatchNodeSelector   0          0s


[root@master ~]# oc get pods -o wide
NAME                       READY     STATUS        RESTARTS   AGE       IP        NODE
apiserver-pz404            0/1       Terminating   0          0s        <none>    master.example.com
controller-manager-d03s7   0/1       Pending       0          0s        <none>    master.example.com



[root@master ~]# oc get ev
...
26m        26m         1         controller-manager         DaemonSet               Normal    SuccessfulCreate    daemon-set                    Created pod: controller-manager-cfbr6
26m        26m         1         controller-manager         DaemonSet               Warning   FailedDaemonPod     daemonset-controller          Found failed daemon pod kube-service-catalog/master.example.com on node controller-manager-cfbr6, will try to kill it
26m        26m         1         controller-manager         DaemonSet               Normal    SuccessfulDelete    daemon-set                    Deleted pod: controller-manager-cfbr6
26m        26m         4         controller-manager         DaemonSet               Normal    SuccessfulCreate    daemon-set                    (combined from similar events): Created pod: controller-manager-tmq6f
1m         26m         9006      controller-manager         DaemonSet               Warning   FailedDaemonPod     daemonset-controller          (combined from similar events): Found failed daemon pod kube-service-catalog/master.example.com on node controller-manager-8cxb7, will try to kill it
1m         26m         9006      controller-manager         DaemonSet               Normal    SuccessfulDelete    daemon-set                    (combined from similar events): Deleted pod: controller-manager-8cxb7



[root@master ~]# journalctl -f -u atomic-openshift-master
...
] Controller apiserver deleting pod kube-service-catalog/apiserver-5n1t5
Aug 16 13:53:31 master.example.com atomic-openshift-master[29069]: I0816 13:53:31.906323   29069 event.go:217] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-service-catalog", Name:"apiserver", UID:"bb750a4b-82a7-11e7-b3b6-2cc260000000", APIVersion:"extensions", ResourceVersion:"117167", FieldPath:""}): type: 'Warning' reason: 'FailedDaemonPod' Found failed daemon pod kube-service-catalog/master.example.com on node apiserver-5n1t5, will try to kill it
Aug 16 13:53:31 master.example.com atomic-openshift-master[29069]: E0816 13:53:31.932522   29069 daemoncontroller.go:233] kube-service-catalog/controller-manager failed with : deleted 1 failed pods of DaemonSet kube-service-catalog/controller-manager
Aug 16 13:53:31 master.example.com atomic-openshift-master[29069]: I0816 13:53:31.935503   29069 event.go:217] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-service-catalog", Name:"controller-manager", UID:"cb1e2bfa-82a7-11e7-b3b6-2cc260000000", APIVersion:"extensions", ResourceVersion:"117164", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: controller-manager-rzf7d
Aug 16 13:53:32 master.example.com atomic-openshift-master[29069]: I0816 13:53:32.033591   29069 event.go:217] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-service-catalog", Name:"controller-manager", UID:"cb1e2bfa-82a7-11e7-b3b6-2cc260000000", APIVersion:"extensions", ResourceVersion:"117164", FieldPath:""}): type: 'Normal' reason: 'SuccessfulCreate' Created pod: controller-manager-httc7
Aug 16 13:53:32 master.example.com atomic-openshift-master[29069]: E0816 13:53:32.039736   29069 daemoncontroller.go:233] kube-service-catalog/apiserver failed with : deleted 1 failed pods of DaemonSet kube-service-catalog/apiserver
Aug 16 13:53:32 master.example.com atomic-openshift-master[29069]: I0816 13:53:32.041701   29069 event.go:217] Event(v1.ObjectReference{Kind:"DaemonSet", Namespace:"kube-service-catalog", Name:"apiserver", UID:"bb750a4b-82a7-11e7-b3b6-2cc260000000", APIVersion:"extensions", ResourceVersion:"117167", FieldPath:""}): type: 'Normal' reason: 'SuccessfulDelete' Deleted pod: apiserver-5n1t5



[root@master ~]# journalctl -f -u atomic-openshift-node
...
Aug 16 13:54:18 master.example.com atomic-openshift-node[20758]: I0816 13:54:18.304726   20758 kubelet.go:1829] SyncLoop (DELETE, "api"): "apiserver-69w9h_kube-service-catalog(ecc976ec-82ab-11e7-b3b6-2cc260000000)"
Aug 16 13:54:18 master.example.com atomic-openshift-node[20758]: I0816 13:54:18.311128   20758 kubelet.go:1823] SyncLoop (REMOVE, "api"): "apiserver-69w9h_kube-service-catalog(ecc976ec-82ab-11e7-b3b6-2cc260000000)"
Aug 16 13:54:18 master.example.com atomic-openshift-node[20758]: I0816 13:54:18.311252   20758 kubelet.go:2007] Failed to delete pod "apiserver-69w9h_kube-service-catalog(ecc976ec-82ab-11e7-b3b6-2cc260000000)", err: pod not found
Aug 16 13:54:18 master.example.com atomic-openshift-node[20758]: W0816 13:54:18.315339   20758 status_manager.go:465] Failed to update status for pod "_()": Operation cannot be fulfilled on pods "apiserver-69w9h": StorageError: invalid object, Code: 4, Key: /kubernetes.io/pods/kube-service-catalog/apiserver-69w9h, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: ecc976ec-82ab-11e7-b3b6-2cc260000000, UID in object meta:
Aug 16 13:54:18 master.example.com atomic-openshift-node[20758]: I0816 13:54:18.341290   20758 kubelet.go:1813] SyncLoop (ADD, "api"): "controller-manager-mzx3g_kube-service-catalog(ece24a64-82ab-11e7-b3b6-2cc260000000)"
Aug 16 13:54:18 master.example.com atomic-openshift-node[20758]: I0816 13:54:18.341423   20758 predicate.go:106] Predicate failed on Pod: controller-manager-mzx3g_kube-service-catalog(ece24a64-82ab-11e7-b3b6-2cc260000000), for reason: Predicate MatchNodeSelector failed

Description of problem:

Version-Release number of the following components:
rpm -q openshift-ansible
rpm -q ansible
ansible --version

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Johnny Liu 2017-08-22 09:06:40 UTC
QE can not reproduce this issues, everything is working well.

Could you run:
# oc describe po apiserver-ghv62
# oc describe po controller-manager-22mcf

Comment 2 Scott Dodson 2017-08-22 13:26:54 UTC
Didn't we need to set a variable to specify the api server hosts?

Comment 3 Kent Hua 2017-08-22 13:47:46 UTC
@Johnny - I can't describe the pod because a pod never comes up.

@Scott - I didn't see any in the installation instructions.  By default it seems to create the node label on the master node: openshift-infra=apiserver.  So it does assign the pods to my master node.

I have my osm_default_node_selector="region=primary" set in my ansible hosts.

A fellow SA suggested the following:
oc edit namespace kube-service-catalog
Add the following annotation:
openshift.io/node-selector: ""

I haven't had a chance to try it yet.  I will later today.

Comment 4 Kent Hua 2017-08-22 17:35:29 UTC
So I performed the suggestion to blank out the node-selector on the kube-service-catalog project.  Does that mean something in the installer needs to be fixed?

[root@master ~]# oc get pods -o wide
NAME                       READY     STATUS    RESTARTS   AGE       IP           NODE
apiserver-3nfqd            1/1       Running   0          25m       10.131.0.3   master.example.com
controller-manager-9m2m6   1/1       Running   1          25m       10.131.0.4   master.example.com

[root@master ~]# oc get nodes --show-labels
NAME                 STATUS                     AGE       VERSION             LABELS
infra.example.com    Ready                      57m       v1.6.1+5115d708d7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=infra.example.com,logging-infra-fluentd=true,region=infra,zone=default
master.example.com   Ready,SchedulingDisabled   57m       v1.6.1+5115d708d7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=master.example.com,logging-infra-fluentd=true,openshift-infra=apiserver
node1.example.com    Ready                      57m       v1.6.1+5115d708d7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node1.example.com,logging-infra-fluentd=true,region=primary,zone=east
node2.example.com    Ready                      57m       v1.6.1+5115d708d7   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node2.example.com,logging-infra-fluentd=true,region=primary,zone=west

Comment 5 ewolinet 2017-08-22 20:40:53 UTC
Is there a default node-selector being set for projects that are created? 

By default we shouldn't be setting a selector on the service-catalog project... I'll try to see if I can recreate this using the same version of openshift-ansible.

Could you also paste the full contents of your inventory file? I'm guessing you originally provided a truncated part of it.

Comment 6 Kent Hua 2017-08-22 20:49:58 UTC
Yes I did set the default_node_selector.

osm_default_node_selector='region=primary' in my hosts file.  

However in the project the node_selector entry wasn't present, so I added the openshift.io/node-selector: "" to force it to reference a specific node_selector.

Comment 7 Kent Hua 2017-08-22 20:50:26 UTC
Created attachment 1316808 [details]
ansible hosts file

Comment 8 Kent Hua 2017-08-23 16:19:16 UTC
Removing the entry osm_default_node_selector="region=primary" solved the issue, but shouldn't the apiserver and controller-manager in kube-service-catalog be capable of handling that entry in the ansible hosts?

Comment 9 ewolinet 2017-08-23 18:18:25 UTC
That's strange... I wonder if something has changed with OCP. In the role we are creating the project with a nodeselector of "" ... so that should have been able to overwrite the default node selector parameter.

https://github.com/openshift/openshift-ansible/blob/master/roles/openshift_service_catalog/tasks/install.yml#L22

- name: Set Service Catalog namespace
  oc_project:
    state: present
    name: "kube-service-catalog"
    node_selector: ""

Comment 10 Kent Hua 2017-08-23 18:35:25 UTC
I see that line, but it doesn't appear to be setting it.  So it'll work with installs that don't have a default node selector set.

[root@master ~]# oc export project kube-service-catalog
apiVersion: v1
kind: Project
metadata:
  annotations:
    openshift.io/description: ""
    openshift.io/display-name: ""
    openshift.io/sa.scc.mcs: s0:c8,c7
    openshift.io/sa.scc.supplemental-groups: 1000070000/10000
    openshift.io/sa.scc.uid-range: 1000070000/10000
  creationTimestamp: null
  name: kube-service-catalog
spec:
  finalizers:
  - openshift.io/origin
  - kubernetes
status:
  phase: Active

Comment 12 Johnny Liu 2017-10-12 10:26:02 UTC
Could reproduce this bug with openshift v3.6.173.0.21 + openshift-ansible-3.6.173.0.21-2.git.0.44a4038.el7.noarch, this bug is really similar with BZ#1497047, almost the same root cause.

Comment 13 Johnny Liu 2017-10-12 11:21:24 UTC
Also could reproduce this bug with openshift v3.6.173.0.21 + openshift-ansible-3.6.173.0.49-1.git.0.7e8ae51.el7.noarch

# oc get ns kube-service-catalog -o yaml
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    openshift.io/description: ""
    openshift.io/display-name: ""
    openshift.io/sa.scc.mcs: s0:c8,c7
    openshift.io/sa.scc.supplemental-groups: 1000070000/10000
    openshift.io/sa.scc.uid-range: 1000070000/10000
  creationTimestamp: 2017-10-12T11:10:22Z
  name: kube-service-catalog
  resourceVersion: "2001"
  selfLink: /api/v1/namespaces/kube-service-catalog
  uid: f0af2f47-af3d-11e7-b1af-fa163eeaedc0
spec:
  finalizers:
  - openshift.io/origin
  - kubernetes
status:
  phase: Active

Comment 17 Johnny Liu 2017-10-24 07:30:40 UTC
The PR is targeted to 3.6, while this bug is attached to 3.7 advisory, so move back to MODIFIED status.

Comment 18 Johnny Liu 2017-10-24 07:33:25 UTC
This bug is not reproduced in 3.7, even if be reproduced, then should be a dup with BZ#1497047. This bug should be specifically targeted to 3.6, so modify target release to 3.6.z.

Comment 23 Johnny Liu 2017-11-17 02:01:01 UTC
The same comment as comment 17.

@Scott, could you help remove this bug from 3.7 errata, and add it into 3.6 openshift-ansible installer errata?

Comment 26 Scott Dodson 2017-11-17 19:07:35 UTC
ok

Comment 28 Johnny Liu 2017-11-20 10:11:17 UTC
Verified this bug with openshift-ansible-3.6.173.0.77-1.git.0.c63cec7.el7.noarch, and PASS.

Comment 31 errata-xmlrpc 2017-11-21 05:41:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3255