Description of problem: **This BZ should be included in next z-stream as it blocks OCP 3.11 on OSP installation. **The fix has already been merged but just after tag: openshift-ansible-3.11.117-1 OCP 3.11 on OSP installation playbook fails due to kuryr-controller pod crash. It affects OCP 3.11 deployments on OSP 13 and OSP 14 when namespace isolation is enabled. Version-Release number of the following components: $ rpm -q openshift-ansible openshift-ansible-3.11.117-1.git.0.add13ff.el7.noarch $ rpm -q ansible ansible-2.5.15-1.el7ae.noarch $ ansible --version ansible 2.5.15 config file = /etc/ansible/ansible.cfg configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Sep 12 2018, 05:31:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] How reproducible: always Steps to Reproduce: 1. Install OSP 13 or 14 with Octavia (in a hybrid environment) 2. Deploy the ansible-host and DNS server on the overcloud 3. Enable kuryr and namespace isolation (inventory/group_vars/all.yml): openshift_kuryr_subnet_driver: namespace openshift_kuryr_sg_driver: namespace 4. Run Openshift-on-Openstack playbooks from the ansible-host, with ansible 2.5: 4.1. ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/prerequisites.yml 4.2. ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/provision.yml 4.3. ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory red-hat-ca.yml 4.4. ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory repos.yml 4.5. ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/install.yml Actual results: install.yml playbook fails: TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] *** FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (30 retries left). ... FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (1 retries left). fatal: [master-0.openshift.example.com]: FAILED! => {"attempts": 30, "changed": true, "cmd": ["oc", "get", "crd", "servicemonitors.monitoring.coreos.com", "-n", "openshift-monitoring", "--config=/tmp/openshift-cluster-monitoring-ansible-vPTnlW/admin.kubeconfig"], "delta": "0:00:00.180127", "end": "2019-06-12 07:30:25.940036", "msg": "non-zero return code", "rc": 1, "start": "2019-06-12 07:30:25.759909", "stderr": "No resources found.\nError from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found", "stderr_lines": ["No resources found.", "Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found"], "stdout": "", "stdout_lines": []} PLAY RECAP ********************************************************************* app-node-0.openshift.example.com : ok=233 changed=87 unreachable=0 failed=0 app-node-1.openshift.example.com : ok=211 changed=87 unreachable=0 failed=0 infra-node-0.openshift.example.com : ok=211 changed=87 unreachable=0 failed=0 localhost : ok=36 changed=0 unreachable=0 failed=0 master-0.openshift.example.com : ok=715 changed=302 unreachable=0 failed=1 INSTALLER STATUS *************************************************************** Initialization : Complete (0:00:35) Health Check : Complete (0:00:02) Node Bootstrap Preparation : Complete (0:07:59) etcd Install : Complete (0:00:39) Master Install : Complete (0:06:59) Master Additional Install : Complete (0:01:08) Node Join : Complete (0:00:40) Hosted Install : Complete (0:00:58) Cluster Monitoring Operator : In Progress (0:16:12) This phase can be restarted by running: playbooks/openshift-monitoring/config.yml Failure summary: 1. Hosts: master-0.openshift.example.com Play: Configure Cluster Monitoring Operator Task: Wait for the ServiceMonitor CRD to be created Message: non-zero return code Expected results: Successful OCP cluster installation, with all the pods in Running status. Additional info: [openshift@master-0 ~]$ oc get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP default docker-registry-1-deploy 0/1 ContainerCreating 0 27m <none> default router-1-deploy 0/1 ContainerCreating 0 27m <none> kube-system master-api-master-0.openshift.example.com 1/1 Running 0 26m 192.168.99.13 kube-system master-controllers-master-0.openshift.example.com 1/1 Running 0 26m 192.168.99.13 kube-system master-etcd-master-0.openshift.example.com 1/1 Running 1 25m 192.168.99.13 kuryr kuryr-cni-ds-fqrz9 2/2 Running 0 28m 192.168.99.6 kuryr kuryr-cni-ds-vxll5 2/2 Running 4 28m 192.168.99.7 kuryr kuryr-cni-ds-xxw2p 2/2 Running 0 28m 192.168.99.10 kuryr kuryr-cni-ds-zfjvm 2/2 Running 0 31m 192.168.99.13 kuryr kuryr-controller-94bd45d99-29xbc 0/1 CrashLoopBackOff 1 14s 192.168.99.7 openshift-monitoring cluster-monitoring-operator-75c6b544dd-sdfhq 0/1 ContainerCreating 0 26m <none> openshift-node sync-bqksw 1/1 Running 0 28m 192.168.99.7 openshift-node sync-h5z4c 1/1 Running 0 30m 192.168.99.13 openshift-node sync-xgdvn 1/1 Running 0 28m 192.168.99.10 openshift-node sync-zcqc2 1/1 Running 0 28m 192.168.99.6 [openshift@master-0 ~]$ oc -n kuryr logs kuryr-controller-94bd45d99-29xbc 2019-06-12 11:41:20.544 1 INFO kuryr_kubernetes.config [-] Logging enabled! 2019-06-12 11:41:20.544 1 INFO kuryr_kubernetes.config [-] /usr/bin/kuryr-k8s-controller version 0.0.0 2019-06-12 11:41:20.711 1 INFO os_vif [-] Loaded VIF plugins: noop, sriov, ovs, linux_bridge, noop 2019-06-12 11:41:20.713 1 INFO kuryr_kubernetes.controller.service [-] Configured handlers: ['vif', 'lb', 'lbaasspec', 'namespace', 'kuryrnet'] 2019-06-12 11:41:21.299 1 WARNING kuryr_kubernetes.controller.drivers.lbaasv2 [-] [neutron_defaults]resource_tags is set, but Octavia API 2.0 does not support resource tagging. Kuryr will put requested tags in t he description field of Octavia resources. 2019-06-12 11:41:21.312 1 ERROR kuryr_kubernetes.controller.service [-] Handlers "set(['kuryrnet'])" were not found.: None 2019-06-12 11:41:21.312 1 ERROR kuryr_kubernetes.controller.service None 2019-06-12 11:41:21.312 1 ERROR kuryr_kubernetes.controller.service 2019-06-12 11:41:21.313 1 CRITICAL kuryr_kubernetes.controller.service [-] Handlers "set(['kuryrnet'])" were not found. WORKAROUND: 1. Edit kuryr-config: [openshift@master-0 ~]$ oc -n kuryr edit cm kuryr-config -- enabled_handlers = vif,lb,lbaasspec,namespace,kuryrnet ++ enabled_handlers = vif,lb,lbaasspec,namespace 2. Delete kuryr-controller pod [openshift@master-0 ~]$ oc delete pod -n kuryr kuryr-controller-xxxx 3. Delete any kury-cni that is not recovered and is in crashloop status [openshift@master-0 ~]$ oc delete pod -n kuryr kuryr-cni-xxxx 4. After some minutes all pods should be in Running status
v3.11.136
Verified in openshift-ansible-3.11.136 on top of OSP 13 2019-06-25.1 puddle. (shiftstack) [cloud-user@ansible-host-0 ~]$ rpm -q openshift-ansible openshift-ansible-3.11.136-1.git.0.b757272.el7.noarch (shiftstack) [cloud-user@ansible-host-0 ~]$ rpm -q ansible ansible-2.5.15-1.el7ae.noarch Verification steps: 1. Install OSP 13 with Octavia (in a hybrid environment) 2. Deploy the ansible-host and DNS server on the overcloud 3. Enable kuryr and namespace isolation (inventory/group_vars/all.yml): openshift_kuryr_subnet_driver: namespace openshift_kuryr_sg_driver: namespace 4. Run Openshift-on-Openstack playbooks from the ansible-host, with ansible 2.5 5. Installation ends successfully: INSTALLER STATUS *************************************************************** Initialization : Complete (0:00:32) Health Check : Complete (0:00:03) Node Bootstrap Preparation : Complete (0:08:03) etcd Install : Complete (0:00:42) Master Install : Complete (0:07:09) Master Additional Install : Complete (0:01:16) Node Join : Complete (0:00:49) Hosted Install : Complete (0:01:02) Cluster Monitoring Operator : Complete (0:02:18) Web Console Install : Complete (0:00:56) Console Install : Complete (0:02:38) metrics-server Install : Complete (0:00:00) 6. All the pods in Running status: [openshift@master-0 ~]$ oc get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE default docker-registry-1-qbppk 1/1 Running 0 5d default registry-console-1-2sdgn 1/1 Running 0 5d default router-1-nxchk 1/1 Running 0 5d kube-system master-api-master-0.openshift.example.com 1/1 Running 0 5d kube-system master-controllers-master-0.openshift.example.com 1/1 Running 0 5d kube-system master-etcd-master-0.openshift.example.com 1/1 Running 0 5d kuryr kuryr-cni-ds-8sh2q 2/2 Running 0 5d kuryr kuryr-cni-ds-flsw2 2/2 Running 0 5d kuryr kuryr-cni-ds-j54w4 2/2 Running 0 5d kuryr kuryr-cni-ds-q52fq 2/2 Running 0 5d kuryr kuryr-controller-7cf75d55c9-k6hhv 1/1 Running 0 5d openshift-console console-58cf4f7886-xtkpg 1/1 Running 0 5d openshift-monitoring alertmanager-main-0 3/3 Running 0 5d openshift-monitoring alertmanager-main-1 3/3 Running 0 5d openshift-monitoring alertmanager-main-2 3/3 Running 0 5d openshift-monitoring cluster-monitoring-operator-75c6b544dd-tvdmj 1/1 Running 0 5d openshift-monitoring grafana-c7d5bc87c-8pdjp 2/2 Running 0 5d openshift-monitoring kube-state-metrics-5d6b7bb44f-t26bv 3/3 Running 0 5d openshift-monitoring node-exporter-8f9hx 2/2 Running 0 5d openshift-monitoring node-exporter-jgpqc 2/2 Running 0 5d openshift-monitoring node-exporter-qm7hc 2/2 Running 0 5d openshift-monitoring node-exporter-tg8sr 2/2 Running 0 5d openshift-monitoring prometheus-k8s-0 4/4 Running 1 5d openshift-monitoring prometheus-k8s-1 4/4 Running 1 5d openshift-monitoring prometheus-operator-5b47ff445b-nxngh 1/1 Running 0 5d openshift-node sync-4ql68 1/1 Running 0 5d openshift-node sync-4sxqk 1/1 Running 0 5d openshift-node sync-66jd5 1/1 Running 0 5d openshift-node sync-mx8nz 1/1 Running 0 5d openshift-web-console webconsole-787f54c7f8-c77rd 1/1 Running 0 5d Additional check: [openshift@master-0 ~]$ oc -n kuryr get cm -o yaml | grep enabled_handlers enabled_handlers = vif,lb,lbaasspec,namespace kuryrnet handler is not present.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2816