Created attachment 1451419 [details] kuryr logs and more information Description of problem: When deploying a multi-master OpenShift cluster with kuryr sdn solution on top of OpenStack, the bootstrap-autoapprover pod remains in ContainerCreating status and the nodes do not reach the Ready status, so no application pods can be deployed. Version-Release number of selected component (if applicable): openstack-kuryr-kubernetes-controller-0.4.3-1.el7ost.noarch openstack-kuryr-kubernetes-cni-0.4.2-0.20180404104924.985c387.el7ost.noarch How reproducible: almost 100% when deploying a multi-master OpenShift cluster on top of OpenStack Steps to Reproduce: 1. Get OCP openshift-ansible downstream rpm 2. Configure OSP (all.yml) and OCP (OSEv3.yml) inventory files - Set 'openshift_openstack_num_masters: 3' in inventory/group_vars/all.yml 3. Run: ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/prerequisites.yml ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/provision.yml ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory red-hat-ca.yml ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/repos.yml ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/install.yml Actual results: Deployed Openshift multi-master deployment not fully working. The nodes are not ready and the bootstrap-autoapprover pod remains in ContainerCreating status. [openshift@master-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION app-node-0.openshift.example.com NotReady compute 54m v1.10.0+b81c8f8 app-node-1.openshift.example.com NotReady compute 54m v1.10.0+b81c8f8 infra-node-0.openshift.example.com NotReady compute,infra 54m v1.10.0+b81c8f8 infra-node-1.openshift.example.com NotReady compute,infra 54m v1.10.0+b81c8f8 master-0.openshift.example.com Ready master 59m v1.10.0+b81c8f8 master-1.openshift.example.com Ready master 59m v1.10.0+b81c8f8 master-2.openshift.example.com Ready master 59m v1.10.0+b81c8f8 [openshift@master-0 ~]$ oc get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE default router-1-sfsqt 1/1 Running 0 49m 192.168.99.10 infra-node-1.openshift.example.com default router-1-vvh6h 1/1 Running 0 49m 192.168.99.19 infra-node-0.openshift.example.com kube-system master-api-master-0.openshift.example.com 1/1 Running 1 58m 192.168.99.6 master-0.openshift.example.com kube-system master-api-master-1.openshift.example.com 1/1 Running 1 58m 192.168.99.15 master-1.openshift.example.com kube-system master-api-master-2.openshift.example.com 1/1 Running 1 58m 192.168.99.8 master-2.openshift.example.com kube-system master-controllers-master-0.openshift.example.com 1/1 Running 0 58m 192.168.99.6 master-0.openshift.example.com kube-system master-controllers-master-1.openshift.example.com 1/1 Running 0 58m 192.168.99.15 master-1.openshift.example.com kube-system master-controllers-master-2.openshift.example.com 1/1 Running 1 58m 192.168.99.8 master-2.openshift.example.com kube-system master-etcd-master-0.openshift.example.com 1/1 Running 1 58m 192.168.99.6 master-0.openshift.example.com kube-system master-etcd-master-1.openshift.example.com 1/1 Running 1 59m 192.168.99.15 master-1.openshift.example.com kube-system master-etcd-master-2.openshift.example.com 1/1 Running 1 59m 192.168.99.8 master-2.openshift.example.com openshift-infra bootstrap-autoapprover-0 0/1 ContainerCreating 0 57m <none> master-1.openshift.example.com openshift-infra kuryr-cni-ds-67cxc 1/1 Running 0 54m 192.168.99.12 app-node-0.openshift.example.com openshift-infra kuryr-cni-ds-9dbsk 1/1 Running 0 53m 192.168.99.19 infra-node-0.openshift.example.com openshift-infra kuryr-cni-ds-bxnkz 1/1 Running 0 54m 192.168.99.4 app-node-1.openshift.example.com openshift-infra kuryr-cni-ds-cr5sc 1/1 Running 0 53m 192.168.99.15 master-1.openshift.example.com openshift-infra kuryr-cni-ds-gpnxp 1/1 Running 0 53m 192.168.99.10 infra-node-1.openshift.example.com openshift-infra kuryr-cni-ds-h2vsm 1/1 Running 0 53m 192.168.99.6 master-0.openshift.example.com openshift-infra kuryr-cni-ds-ztqms 1/1 Running 0 53m 192.168.99.8 master-2.openshift.example.com openshift-infra kuryr-controller-65c98f7444-qv7qh 1/1 Running 0 57m 192.168.99.4 app-node-1.openshift.example.com openshift-node sync-ch285 1/1 Running 0 55m 192.168.99.10 infra-node-1.openshift.example.com openshift-node sync-d6r9f 1/1 Running 0 55m 192.168.99.19 infra-node-0.openshift.example.com openshift-node sync-h82jm 1/1 Running 0 57m 192.168.99.6 master-0.openshift.example.com openshift-node sync-jmwjx 1/1 Running 0 57m 192.168.99.8 master-2.openshift.example.com openshift-node sync-mhmfq 1/1 Running 0 54m 192.168.99.4 app-node-1.openshift.example.com openshift-node sync-nms7d 1/1 Running 0 54m 192.168.99.12 app-node-0.openshift.example.com openshift-node sync-qw9t8 1/1 Running 0 57m 192.168.99.15 master-1.openshift.example.com Expected results: Fully working Openshift multi-master deployment, with the nodes in Ready status and the bootstrap-autoapprover pod in Running status. Additional info: Find attached the logs.
The Kuryr CNI executable works by placing a script on the host that does: docker exec ID_of_the_kuryr_cni_container ... The ID is retrieved from the Kubernetes API. It is possible for the API to be reached by the CNI container before the kubelet has updated the pod status field that contains the container ID. In such cases, we get a Null result and the script generation misbehaves.
An alternative would be to not use the Kubernetes API to find the container to execute but have the CNI script do some check itself as in this example: [openshift@app-node-1 ~]$ cat findit.sh CNI_POD_NAME="$1" read -r -d '' finder <<EOF import json import sys import pprint containers=json.load(sys.stdin) for container in containers: if ('Labels' in container and container['Labels'].get('io.kubernetes.pod.name') == '$CNI_POD_NAME' and container['Labels'].get('com.redhat.component') != 'openshift-enterprise-pod-container'): print(container['Id']) EOF curl --unix-socket /var/run/docker.sock http:/containers/json 2> /dev/null | python -c "$finder" [openshift@app-node-1 ~]$ sudo sh findit.sh kuryr-cni-ds-sflpf 69d8bdec7033fb3d4267ebd8781fc57a9af4e5491ab0352e579c7cff1d3fa31a the output of calling the findit.sh or function could be stored in some directory and have the CNI container start wipe it so it needs to be regenerated (so upgrades work instead of pointing to the terminating CNI).
openstack-kuryr-cni Container image change only for this fix
Verified in https://access.redhat.com/containers/#/registry.access.redhat.com/rhosp13/openstack-kuryr-cni/images/13.0-67 image. A multi-master cluster is successfully deployed with the new kuryr-cni image, all the pods are running and the openshift nodes are ready. Verification steps: 1. Get OCP openshift-ansible downstream rpm 2. Configure OSP (all.yml) and OCP (OSEv3.yml) inventory files - Set 'openshift_openstack_num_masters: 3' in inventory/group_vars/all.yml 3. Run: ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/prerequisites.yml ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/provision.yml ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory red-hat-ca.yml ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/repos.yml ansible-playbook --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory /usr/share/ansible/openshift-ansible/playbooks/openstack/openshift-cluster/install.yml 4. Check the installer finishes without errors 5. Check vms deployed in the overcloud (overcloud) [cloud-user@ansible-host ~]$ openstack server list +--------------------------------------+------------------------------------+--------+-------------------------------------------------------------------------+---------+-----------+ | ID | Name | Status | Networks | Image | Flavor | +--------------------------------------+------------------------------------+--------+-------------------------------------------------------------------------+---------+-----------+ | 214765eb-2028-44b4-ac68-97bf56b36586 | infra-node-1.openshift.example.com | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.10, 172.20.0.235 | rhel75 | m1.node | | c8fc3cea-1371-42e7-88f0-634330a8db13 | infra-node-0.openshift.example.com | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.20, 172.20.0.210 | rhel75 | m1.node | | 266d023c-537b-4116-8a05-250e5fca1c09 | master-2.openshift.example.com | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.14, 172.20.0.236 | rhel75 | m1.master | | d4730b0f-5a79-45d6-9089-e1f3b5f92a72 | master-0.openshift.example.com | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.4, 172.20.0.234 | rhel75 | m1.master | | 3a1e4b89-746d-46e8-9a97-1b9a627ad505 | master-1.openshift.example.com | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.5, 172.20.0.220 | rhel75 | m1.master | | 6079fc25-bf35-4924-b89a-4c0afe92f7e6 | app-node-1.openshift.example.com | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.13, 172.20.0.223 | rhel75 | m1.node | | dd006f99-a61e-47a7-9a19-b9b36a001712 | app-node-0.openshift.example.com | ACTIVE | openshift-ansible-openshift.example.com-net=192.168.99.6, 172.20.0.233 | rhel75 | m1.node | | 19ea6449-8fe3-41b7-bff8-ce973357ccfe | openshift-dns | ACTIVE | openshift-dns=192.168.23.3, 172.20.0.218 | centos7 | m1.small | | 490056e5-a0b0-4af8-8cb2-f7b7321dd604 | ansible-host | ACTIVE | ansible-host=172.16.0.6, 172.20.0.212 | rhel75 | m1.small | +--------------------------------------+------------------------------------+--------+-------------------------------------------------------------------------+---------+-----------+ 6. Check all the nodes are Ready [openshift@master-0 ~]$ oc get nodes NAME STATUS ROLES AGE VERSION app-node-0.openshift.example.com Ready compute 51m v1.10.0+b81c8f8 app-node-1.openshift.example.com Ready compute 51m v1.10.0+b81c8f8 infra-node-0.openshift.example.com Ready infra 51m v1.10.0+b81c8f8 infra-node-1.openshift.example.com Ready infra 51m v1.10.0+b81c8f8 master-0.openshift.example.com Ready master 55m v1.10.0+b81c8f8 master-1.openshift.example.com Ready master 55m v1.10.0+b81c8f8 master-2.openshift.example.com Ready master 55m v1.10.0+b81c8f8 7. Check all the pods are Running [openshift@master-0 ~]$ oc get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE default router-1-8tjvx 1/1 Running 0 46m 192.168.99.20 infra-node-0.openshift.example.com default router-1-gtkmg 1/1 Running 0 46m 192.168.99.10 infra-node-1.openshift.example.com kube-system master-api-master-0.openshift.example.com 1/1 Running 1 54m 192.168.99.4 master-0.openshift.example.com kube-system master-api-master-1.openshift.example.com 1/1 Running 1 54m 192.168.99.5 master-1.openshift.example.com kube-system master-api-master-2.openshift.example.com 1/1 Running 1 54m 192.168.99.14 master-2.openshift.example.com kube-system master-controllers-master-0.openshift.example.com 1/1 Running 1 54m 192.168.99.4 master-0.openshift.example.com kube-system master-controllers-master-1.openshift.example.com 1/1 Running 0 54m 192.168.99.5 master-1.openshift.example.com kube-system master-controllers-master-2.openshift.example.com 1/1 Running 2 54m 192.168.99.14 master-2.openshift.example.com kube-system master-etcd-master-0.openshift.example.com 1/1 Running 1 54m 192.168.99.4 master-0.openshift.example.com kube-system master-etcd-master-1.openshift.example.com 1/1 Running 1 54m 192.168.99.5 master-1.openshift.example.com kube-system master-etcd-master-2.openshift.example.com 1/1 Running 1 54m 192.168.99.14 master-2.openshift.example.com openshift-infra bootstrap-autoapprover-0 1/1 Running 0 52m 10.11.0.50 master-2.openshift.example.com openshift-infra kuryr-cni-ds-5ddlw 1/1 Running 0 50m 192.168.99.6 app-node-0.openshift.example.com openshift-infra kuryr-cni-ds-8tcxv 1/1 Running 0 49m 192.168.99.4 master-0.openshift.example.com openshift-infra kuryr-cni-ds-ck8tn 1/1 Running 0 50m 192.168.99.13 app-node-1.openshift.example.com openshift-infra kuryr-cni-ds-f6mqr 1/1 Running 0 49m 192.168.99.5 master-1.openshift.example.com openshift-infra kuryr-cni-ds-gxrbp 1/1 Running 0 49m 192.168.99.10 infra-node-1.openshift.example.com openshift-infra kuryr-cni-ds-n84fs 1/1 Running 0 49m 192.168.99.14 master-2.openshift.example.com openshift-infra kuryr-cni-ds-tdxvr 1/1 Running 0 49m 192.168.99.20 infra-node-0.openshift.example.com openshift-infra kuryr-controller-65c98f7444-zpg47 1/1 Running 1 53m 192.168.99.13 app-node-1.openshift.example.com openshift-node sync-2sqqx 1/1 Running 0 53m 192.168.99.14 master-2.openshift.example.com openshift-node sync-d5mj7 1/1 Running 0 53m 192.168.99.4 master-0.openshift.example.com openshift-node sync-djjmd 1/1 Running 0 50m 192.168.99.13 app-node-1.openshift.example.com openshift-node sync-jcxjv 1/1 Running 0 50m 192.168.99.20 infra-node-0.openshift.example.com openshift-node sync-jxwvf 1/1 Running 0 50m 192.168.99.6 app-node-0.openshift.example.com openshift-node sync-qhb7h 1/1 Running 0 53m 192.168.99.5 master-1.openshift.example.com openshift-node sync-x6mn8 1/1 Running 0 50m 192.168.99.10 infra-node-1.openshift.example.com 8. Deploy a dc with 8 replicas [openshift@master-0 ~]$ oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE demo-1-29pg2 1/1 Running 0 27s 10.11.0.13 app-node-0.openshift.example.com demo-1-5kj7s 1/1 Running 0 27s 10.11.0.10 app-node-1.openshift.example.com demo-1-7x99s 1/1 Running 0 28s 10.11.0.2 app-node-0.openshift.example.com demo-1-8qxdd 1/1 Running 0 28s 10.11.0.12 app-node-0.openshift.example.com demo-1-97fmp 1/1 Running 0 27s 10.11.0.6 app-node-1.openshift.example.com demo-1-cn2l7 1/1 Running 0 28s 10.11.0.14 app-node-1.openshift.example.com demo-1-fcsnf 1/1 Running 0 5m 10.11.0.9 app-node-1.openshift.example.com demo-1-hswj4 1/1 Running 0 27s 10.11.0.8 app-node-0.openshift.example.com
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2085