Bug 1720172
| Summary: | Openshift-on-OpenStack installation playbook fails when namespace isolation is enabled | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jon Uriarte <juriarte> |
| Component: | Installer | Assignee: | Luis Tomas Bolivar <ltomasbo> |
| Installer sub component: | openshift-ansible | QA Contact: | Jon Uriarte <juriarte> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | urgent | CC: | itbrown, wmeng |
| Version: | 3.11.0 | ||
| Target Milestone: | --- | ||
| Target Release: | 3.11.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-09-24 08:08:08 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
v3.11.136 Verified in openshift-ansible-3.11.136 on top of OSP 13 2019-06-25.1 puddle.
(shiftstack) [cloud-user@ansible-host-0 ~]$ rpm -q openshift-ansible
openshift-ansible-3.11.136-1.git.0.b757272.el7.noarch
(shiftstack) [cloud-user@ansible-host-0 ~]$ rpm -q ansible
ansible-2.5.15-1.el7ae.noarch
Verification steps:
1. Install OSP 13 with Octavia (in a hybrid environment)
2. Deploy the ansible-host and DNS server on the overcloud
3. Enable kuryr and namespace isolation (inventory/group_vars/all.yml):
openshift_kuryr_subnet_driver: namespace
openshift_kuryr_sg_driver: namespace
4. Run Openshift-on-Openstack playbooks from the ansible-host, with ansible 2.5
5. Installation ends successfully:
INSTALLER STATUS ***************************************************************
Initialization : Complete (0:00:32)
Health Check : Complete (0:00:03)
Node Bootstrap Preparation : Complete (0:08:03)
etcd Install : Complete (0:00:42)
Master Install : Complete (0:07:09)
Master Additional Install : Complete (0:01:16)
Node Join : Complete (0:00:49)
Hosted Install : Complete (0:01:02)
Cluster Monitoring Operator : Complete (0:02:18)
Web Console Install : Complete (0:00:56)
Console Install : Complete (0:02:38)
metrics-server Install : Complete (0:00:00)
6. All the pods in Running status:
[openshift@master-0 ~]$ oc get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default docker-registry-1-qbppk 1/1 Running 0 5d
default registry-console-1-2sdgn 1/1 Running 0 5d
default router-1-nxchk 1/1 Running 0 5d
kube-system master-api-master-0.openshift.example.com 1/1 Running 0 5d
kube-system master-controllers-master-0.openshift.example.com 1/1 Running 0 5d
kube-system master-etcd-master-0.openshift.example.com 1/1 Running 0 5d
kuryr kuryr-cni-ds-8sh2q 2/2 Running 0 5d
kuryr kuryr-cni-ds-flsw2 2/2 Running 0 5d
kuryr kuryr-cni-ds-j54w4 2/2 Running 0 5d
kuryr kuryr-cni-ds-q52fq 2/2 Running 0 5d
kuryr kuryr-controller-7cf75d55c9-k6hhv 1/1 Running 0 5d
openshift-console console-58cf4f7886-xtkpg 1/1 Running 0 5d
openshift-monitoring alertmanager-main-0 3/3 Running 0 5d
openshift-monitoring alertmanager-main-1 3/3 Running 0 5d
openshift-monitoring alertmanager-main-2 3/3 Running 0 5d
openshift-monitoring cluster-monitoring-operator-75c6b544dd-tvdmj 1/1 Running 0 5d
openshift-monitoring grafana-c7d5bc87c-8pdjp 2/2 Running 0 5d
openshift-monitoring kube-state-metrics-5d6b7bb44f-t26bv 3/3 Running 0 5d
openshift-monitoring node-exporter-8f9hx 2/2 Running 0 5d
openshift-monitoring node-exporter-jgpqc 2/2 Running 0 5d
openshift-monitoring node-exporter-qm7hc 2/2 Running 0 5d
openshift-monitoring node-exporter-tg8sr 2/2 Running 0 5d
openshift-monitoring prometheus-k8s-0 4/4 Running 1 5d
openshift-monitoring prometheus-k8s-1 4/4 Running 1 5d
openshift-monitoring prometheus-operator-5b47ff445b-nxngh 1/1 Running 0 5d
openshift-node sync-4ql68 1/1 Running 0 5d
openshift-node sync-4sxqk 1/1 Running 0 5d
openshift-node sync-66jd5 1/1 Running 0 5d
openshift-node sync-mx8nz 1/1 Running 0 5d
openshift-web-console webconsole-787f54c7f8-c77rd 1/1 Running 0 5d
Additional check:
[openshift@master-0 ~]$ oc -n kuryr get cm -o yaml | grep enabled_handlers
enabled_handlers = vif,lb,lbaasspec,namespace
kuryrnet handler is not present.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2816 |
Description of problem: **This BZ should be included in next z-stream as it blocks OCP 3.11 on OSP installation. **The fix has already been merged but just after tag: openshift-ansible-3.11.117-1 OCP 3.11 on OSP installation playbook fails due to kuryr-controller pod crash. It affects OCP 3.11 deployments on OSP 13 and OSP 14 when namespace isolation is enabled. Version-Release number of the following components: $ rpm -q openshift-ansible openshift-ansible-3.11.117-1.git.0.add13ff.el7.noarch $ rpm -q ansible ansible-2.5.15-1.el7ae.noarch $ ansible --version ansible 2.5.15 config file = /etc/ansible/ansible.cfg configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules'] ansible python module location = /usr/lib/python2.7/site-packages/ansible executable location = /usr/bin/ansible python version = 2.7.5 (default, Sep 12 2018, 05:31:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] How reproducible: always Steps to Reproduce: 1. Install OSP 13 or 14 with Octavia (in a hybrid environment) 2. Deploy the ansible-host and DNS server on the overcloud 3. Enable kuryr and namespace isolation (inventory/group_vars/all.yml): openshift_kuryr_subnet_driver: namespace openshift_kuryr_sg_driver: namespace 4. Run Openshift-on-Openstack playbooks from the ansible-host, with ansible 2.5: 4.1. ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/prerequisites.yml 4.2. ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/provision.yml 4.3. ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory red-hat-ca.yml 4.4. ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory repos.yml 4.5. ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/install.yml Actual results: install.yml playbook fails: TASK [openshift_cluster_monitoring_operator : Wait for the ServiceMonitor CRD to be created] *** FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (30 retries left). ... FAILED - RETRYING: Wait for the ServiceMonitor CRD to be created (1 retries left). fatal: [master-0.openshift.example.com]: FAILED! => {"attempts": 30, "changed": true, "cmd": ["oc", "get", "crd", "servicemonitors.monitoring.coreos.com", "-n", "openshift-monitoring", "--config=/tmp/openshift-cluster-monitoring-ansible-vPTnlW/admin.kubeconfig"], "delta": "0:00:00.180127", "end": "2019-06-12 07:30:25.940036", "msg": "non-zero return code", "rc": 1, "start": "2019-06-12 07:30:25.759909", "stderr": "No resources found.\nError from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found", "stderr_lines": ["No resources found.", "Error from server (NotFound): customresourcedefinitions.apiextensions.k8s.io \"servicemonitors.monitoring.coreos.com\" not found"], "stdout": "", "stdout_lines": []} PLAY RECAP ********************************************************************* app-node-0.openshift.example.com : ok=233 changed=87 unreachable=0 failed=0 app-node-1.openshift.example.com : ok=211 changed=87 unreachable=0 failed=0 infra-node-0.openshift.example.com : ok=211 changed=87 unreachable=0 failed=0 localhost : ok=36 changed=0 unreachable=0 failed=0 master-0.openshift.example.com : ok=715 changed=302 unreachable=0 failed=1 INSTALLER STATUS *************************************************************** Initialization : Complete (0:00:35) Health Check : Complete (0:00:02) Node Bootstrap Preparation : Complete (0:07:59) etcd Install : Complete (0:00:39) Master Install : Complete (0:06:59) Master Additional Install : Complete (0:01:08) Node Join : Complete (0:00:40) Hosted Install : Complete (0:00:58) Cluster Monitoring Operator : In Progress (0:16:12) This phase can be restarted by running: playbooks/openshift-monitoring/config.yml Failure summary: 1. Hosts: master-0.openshift.example.com Play: Configure Cluster Monitoring Operator Task: Wait for the ServiceMonitor CRD to be created Message: non-zero return code Expected results: Successful OCP cluster installation, with all the pods in Running status. Additional info: [openshift@master-0 ~]$ oc get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP default docker-registry-1-deploy 0/1 ContainerCreating 0 27m <none> default router-1-deploy 0/1 ContainerCreating 0 27m <none> kube-system master-api-master-0.openshift.example.com 1/1 Running 0 26m 192.168.99.13 kube-system master-controllers-master-0.openshift.example.com 1/1 Running 0 26m 192.168.99.13 kube-system master-etcd-master-0.openshift.example.com 1/1 Running 1 25m 192.168.99.13 kuryr kuryr-cni-ds-fqrz9 2/2 Running 0 28m 192.168.99.6 kuryr kuryr-cni-ds-vxll5 2/2 Running 4 28m 192.168.99.7 kuryr kuryr-cni-ds-xxw2p 2/2 Running 0 28m 192.168.99.10 kuryr kuryr-cni-ds-zfjvm 2/2 Running 0 31m 192.168.99.13 kuryr kuryr-controller-94bd45d99-29xbc 0/1 CrashLoopBackOff 1 14s 192.168.99.7 openshift-monitoring cluster-monitoring-operator-75c6b544dd-sdfhq 0/1 ContainerCreating 0 26m <none> openshift-node sync-bqksw 1/1 Running 0 28m 192.168.99.7 openshift-node sync-h5z4c 1/1 Running 0 30m 192.168.99.13 openshift-node sync-xgdvn 1/1 Running 0 28m 192.168.99.10 openshift-node sync-zcqc2 1/1 Running 0 28m 192.168.99.6 [openshift@master-0 ~]$ oc -n kuryr logs kuryr-controller-94bd45d99-29xbc 2019-06-12 11:41:20.544 1 INFO kuryr_kubernetes.config [-] Logging enabled! 2019-06-12 11:41:20.544 1 INFO kuryr_kubernetes.config [-] /usr/bin/kuryr-k8s-controller version 0.0.0 2019-06-12 11:41:20.711 1 INFO os_vif [-] Loaded VIF plugins: noop, sriov, ovs, linux_bridge, noop 2019-06-12 11:41:20.713 1 INFO kuryr_kubernetes.controller.service [-] Configured handlers: ['vif', 'lb', 'lbaasspec', 'namespace', 'kuryrnet'] 2019-06-12 11:41:21.299 1 WARNING kuryr_kubernetes.controller.drivers.lbaasv2 [-] [neutron_defaults]resource_tags is set, but Octavia API 2.0 does not support resource tagging. Kuryr will put requested tags in t he description field of Octavia resources. 2019-06-12 11:41:21.312 1 ERROR kuryr_kubernetes.controller.service [-] Handlers "set(['kuryrnet'])" were not found.: None 2019-06-12 11:41:21.312 1 ERROR kuryr_kubernetes.controller.service None 2019-06-12 11:41:21.312 1 ERROR kuryr_kubernetes.controller.service 2019-06-12 11:41:21.313 1 CRITICAL kuryr_kubernetes.controller.service [-] Handlers "set(['kuryrnet'])" were not found. WORKAROUND: 1. Edit kuryr-config: [openshift@master-0 ~]$ oc -n kuryr edit cm kuryr-config -- enabled_handlers = vif,lb,lbaasspec,namespace,kuryrnet ++ enabled_handlers = vif,lb,lbaasspec,namespace 2. Delete kuryr-controller pod [openshift@master-0 ~]$ oc delete pod -n kuryr kuryr-controller-xxxx 3. Delete any kury-cni that is not recovered and is in crashloop status [openshift@master-0 ~]$ oc delete pod -n kuryr kuryr-cni-xxxx 4. After some minutes all pods should be in Running status