Bug 1571588

Summary: Kuryr node configuration is skipped
Product: OpenShift Container Platform Reporter: Tomas Sedovic <tsedovic>
Component: InstallerAssignee: MichaƂ Dulko <mdulko>
Status: CLOSED ERRATA QA Contact: Jon Uriarte <juriarte>
Severity: high Docs Contact:
Priority: high    
Version: 3.10.0CC: aos-bugs, jokerman, mdulko, mmccomas, racedoro, sdodson
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
For OpenShift 3.10, it's installer changed the way node SDN configuration should be done, getting rid of all the node installation steps. This ended up leaving our node CNI binary installation as non-compliant and not executed, which made CNI complain about missing 'loop' binary. The solution is that now we have an init container image that has the binaries installed from the official RPM source and copies them to the host expected CNI binary location.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:13:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Tomas Sedovic 2018-04-25 07:36:48 UTC
Description of problem:

When deploying with the Kuryr CNI by setting:

openshift_use_kuryr: true
openshift_use_openshift_sdn: false
os_sdn_network_plugin_name: cni

etc. in the inventory, the Kuryr-specific tasks that are supposed to run on every OpenShift node are skipped.

The tasks are located here:

https://github.com/openshift/openshift-ansible/blob/master/roles/kuryr/tasks/node.yaml

And they're executed from this playbook:

https://github.com/openshift/openshift-ansible/blob/master/playbooks/openshift-node/private/additional_config.yml

But the `oo_nodes_use_kuryr` group is empty. This seems to be because `oo_nodes_to_bootstrap` group contains all the nodes and this group is excluded.

See: https://github.com/openshift/openshift-ansible/blob/6d4886384b4d3bd3ade3efd0683d2784c136c2ca/playbooks/openshift-node/private/additional_config.yml#L3

Version-Release number of the following components:
rpm -q openshift-ansible: openshift-ansible-3.10.0-0.27.0.git.0.abed3b7.el7.noarch

rpm -q ansible: ansible-2.4.3.0-1.el7ae.noarch

ansible --version: 

ansible 2.4.3.0
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/cloud-user/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Feb  9 2018, 09:51:13) [GCC 4.8.5 20150623 (Red Hat 4.8.5-28)]


How reproducible:

Steps to Reproduce:
1. Configure Kuryr in the inventory: https://github.com/openshift/openshift-ansible/blob/master/playbooks/openstack/configuration.md#necessary-kuryr-options
2. Deploy OCP (this should succeed)
3. Log into any of the OpenShift nodes
4. Look inside the `/opt/cni/bin` directory

Actual results:

The directory only contains the `kuryr-cni` executable.

In addition, the registry deploy pod will fail because it will not be able to find the `loopback` executable.

Expected results:

`/opt/cni/bin` should contain several executables, including `loopback`

Comment 1 Tomas Sedovic 2018-04-25 07:38:29 UTC
Proposed fix: https://github.com/openshift/openshift-ansible/pull/8110

Comment 5 Tomas Sedovic 2018-05-04 14:45:50 UTC
Based on the discussions in the pull request: https://github.com/openshift/openshift-ansible/pull/8110

We'll want to move this config out of the playbooks and into the daemonset.

Comment 6 Tomas Sedovic 2018-05-09 12:47:49 UTC
The previous pull request was closed without merging, this is a replacement:

https://github.com/openshift/openshift-ansible/pull/8309

Comment 7 Jon Uriarte 2018-05-17 08:12:32 UTC
Verified in openshift-ansible-3.10.0-0.46.0.git.0.85c3afd.el7.noarch and openshift-ansible-playbooks-3.10.0-0.46.0.git.0.85c3afd.el7.noarch.

Verification steps:
1. Run Openshift-on-Openstack playbook with kuryr configuration:
    openshift_use_kuryr: True
    openshift_use_openshift_sdn: False
    os_sdn_network_plugin_name: cni
    openshift_node_proxy_mode: userspace

2. Check `/opt/cni/bin` directory contains 'loopback' and 'kuryr-cni' executables in all the Openshift nodes:
(overcloud) [cloud-user@ansible-host ~]$ ansible --user openshift -i /usr/share/ansible/openshift-ansible/playbooks/openstack/inventory.py -i inventory OSEv3 -m command -a "ls -ltr /opt/cni/bin/"                
 [WARNING]: Unable to parse /home/cloud-user/inventory/sample-inventory as an inventory source

 [WARNING]: Unable to parse /home/cloud-user/inventory as an inventory source

 [WARNING]: Found both group and host with same name: localhost

app-node-1.openshift.example.com | SUCCESS | rc=0 >>
total 12832
-rwxr-xr-x. 1 root root 3079040 May 16 09:43 host-local
-rwxr-xr-x. 1 root root 3131888 May 16 09:43 loopback
-rwxr-xr-x. 1 root root 6920336 May 16 09:43 openshift-sdn
-rwxr-xr-x. 1 root root     190 May 16 09:44 kuryr-cni

infra-node-0.openshift.example.com | SUCCESS | rc=0 >>
total 12832
-rwxr-xr-x. 1 root root 3079040 May 16 09:49 host-local
-rwxr-xr-x. 1 root root 3131888 May 16 09:49 loopback
-rwxr-xr-x. 1 root root 6920336 May 16 09:49 openshift-sdn
-rwxr-xr-x. 1 root root     190 May 16 09:49 kuryr-cni

app-node-0.openshift.example.com | SUCCESS | rc=0 >>
total 12832
-rwxr-xr-x. 1 root root 3079040 May 16 09:43 host-local
-rwxr-xr-x. 1 root root 3131888 May 16 09:43 loopback
-rwxr-xr-x. 1 root root 6920336 May 16 09:43 openshift-sdn
-rwxr-xr-x. 1 root root     190 May 16 09:44 kuryr-cni

master-0.openshift.example.com | SUCCESS | rc=0 >>
total 12832
-rwxr-xr-x. 1 root root 3079040 May 16 09:49 host-local
-rwxr-xr-x. 1 root root 3131888 May 16 09:49 loopback
-rwxr-xr-x. 1 root root 6920336 May 16 09:49 openshift-sdn
-rwxr-xr-x. 1 root root     190 May 16 09:49 kuryr-cni

3. oc new-project test
4. oc run --image kuryr/demo demo
5. oc scale dc/demo --replicas=6
6. oc expose dc/demo --port 80 --target-port 8080 --type LoadBalancer
    service "demo" exposed
7. oc get svc
    NAME      TYPE           CLUSTER-IP      EXTERNAL-IP                     PORT(S)        AGE
    demo      LoadBalancer   172.30.26.156   172.29.148.90,172.29.148.90   80:31846/TCP   24s
8. Check a floating IP has been asigned for the load balancer:
    (overcloud) [root@undercloud stack]# openstack floating ip list | grep 172.30.26.156
    | 8b66d4d1-c875-463c-a4a3-6d530b2e8d18 | 172.20.0.216        | 172.30.26.156    | 5b05a262-5138-4592-b607-7a0f4863680b | dbba197f-d28e-49be-9905-fde1fa67cd52 | 6c07532860e641989bacc5583275080a |          
9. Check the floating IP is reachable:
   (overcloud) [cloud-user@ansible-host ~]$ curl 172.20.0.216
   demo-1-sp7l6: HELLO! I AM ALIVE!!!
   (overcloud) [cloud-user@ansible-host ~]$ curl 172.20.0.216
   demo-1-9s7g8: HELLO! I AM ALIVE!!!
   (overcloud) [cloud-user@ansible-host ~]$ curl 172.20.0.216
   demo-1-6p7gx: HELLO! I AM ALIVE!!!

Comment 9 errata-xmlrpc 2018-07-30 19:13:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816