Description of problem: In OCP 4.1.x, CNI binaries could be copied into /opt/multus/bin where they would be picked up by multus. This does no longer work in 4.2. Previously, multus would search two directories for binaries: - /var/lib/cni/bin - /opt/multus/bin In 4.2, it only finds binaries in /var/lib/cni/bin. Version-Release number of selected component (if applicable): 4.2.0-0.nightly-2019-09-14-171119 How reproducible: Always Steps to Reproduce: 1. Install ServiceMesh operator in 4.2 2. deploy control plane 3. to circumvent race condition (Bug 1732598), run this on every node: rm /etc/kubernetes/cni/net.d/80-openshift-network.conf && systemctl restart crio 4. deploy services into the mesh. Actual results: Pods fail to start with errors similar to: "Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox" Expected results: Pods start and traffic is routed through the sidecars. Additional info: If you don't perform step 3, pods will start, but traffic is not routed through the sidecars because the CNI binary is never executed. Note that I couldn't find this path documented anywhere, but 4.1.x versions showed it in error messages. See this issue for reference: https://issues.jboss.org/browse/MAISTRA-582
Doug, Even if unintentional, it seems like an API change, which it would be nice to avoid. Do you think we can fix this for 4.2?
Hey guys thanks for filing this and for the heads up. I'm starting an archaeological dig to figure out where/when we had the `/opt/multus/bin` and figure where and how that got removed from the multiple CNI binary search paths.
Just a quick update that even after some searching, I can't find a reference to `/opt/multus/bin` in the Multus history (I have, however found a reference to it in debug logs in this github comment: https://github.com/intel/multus-cni/issues/243 ). Which isn't making it easier to unravel. However, I did discover an issue where the "binDir" Multus configuration parameter (documented here: https://github.com/intel/multus-cni/blob/master/doc/configuration.md#multus-cni-configuration-reference) is not being picked up. And I'm wondering if it's related. I'll update once I know more.
My current plan is to ensure that the `binDir` configuration option in Multus works as intended and provide a way to use that to add the `/opt/multus/bin` as a directory that can be used (as I cannot locate in history how this worked before, which is a mystery to me, but, I digress) I've figured out a way to make it work, however, I feel that my approach is somewhat ham-fisted, and I've asked my colleague Tomofumi to take a look at it. This upstream pull request is here: https://github.com/intel/multus-cni/pull/376 This will likely also likely require a downstream PR to Multus to bring these changes in (once we figure out the right method by which to accomplish this), as well as a companion change to the cluster-network-operator in order to express which `binDir` to use in addition to the regularly used /var/lib/cni/bin (which would be /opt/multus/bin)
Thank you for the update Doug. I found that we write PATH into CNI_PATH: https://github.com/openshift/containernetworking-plugins/blob/ab8f244f28035eb06c0f5416bd1964c541b38a6e/pkg/testutils/cmd.go#L36 - maybe the PATH has changed?
Nevermind my last comment, it doesn't look like that is happening in multus-cni
Upstream PR #376 has been merged. However, I also noticed I had an oversight, I need to expose the `binDir` CNI configuration option via the entrypoint (on a quick read through I thought that may have been an option, but, I was looking at the wrong bin dir option) https://github.com/intel/multus-cni/pull/378 Once these two changes come downstream, we'll also need to make a PR to the CNO in order to also add a `--additional-bin-dir=/opt/multus/bin`
We have some pending changes with pull requests up: * https://github.com/openshift/multus-cni/pull/30 -- brings the upstream changes previously referenced into 4.2 release * https://github.com/openshift/cluster-network-operator/pull/320 -- integrates those changes into the cluster-network-operator by setting the --additional-bin-dir=/opt/multus/bin for the Multus entrypoint.
On UPI Vsphere cluster one master one worker, /opt/multus/bin is still not being created on 4.2.0-0.nightly-2019-09-20-090334 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-09-20-090334 True False 5h17m 4.2.0-0.nightly-2019-09-20-090334 $ oc get pods -n openshift-multus NAME READY STATUS RESTARTS AGE multus-admission-controller-c8kjf 1/1 Running 1 5h35m multus-xsw8f 1/1 Running 1 5h35m multus-z2st6 1/1 Running 1 5h35m $ ll /opt/multus/bin ls: cannot access /opt/multus/bin: No such file or directory $ ll /var/lib/cni/bin total 107904 -rwxr-xr-x. 1 root root 4016716 Sep 20 13:24 bandwidth -rwxr-xr-x. 1 root root 4459109 Sep 20 13:24 bridge -rwxr-xr-x. 1 root root 11188325 Sep 20 13:24 dhcp -rwxr-xr-x. 1 root root 5706536 Sep 20 13:24 firewall -rwxr-xr-x. 1 root root 2937585 Sep 20 13:24 flannel -rwxr-xr-x. 1 root root 3982836 Sep 20 13:24 host-device -rwxr-xr-x. 1 root root 3465116 Sep 20 13:24 host-local -rwxr-xr-x. 1 root root 4141112 Sep 20 13:24 ipvlan -rwxr-xr-x. 1 root root 3060578 Sep 20 13:24 loopback -rwxr-xr-x. 1 root root 4212551 Sep 20 13:24 macvlan -rwxr-xr-x. 1 root root 35627186 Sep 20 13:24 multus -rwxr-xr-x. 1 root root 6005728 Sep 20 13:24 openshift-sdn -rwxr-xr-x. 1 root root 3945238 Sep 20 13:24 portmap -rwxr-xr-x. 1 root root 4389701 Sep 20 13:24 ptp -rwxr-xr-x. 1 root root 3269265 Sep 20 13:24 sbr -rwxr-xr-x. 1 root root 2761580 Sep 20 13:24 static -rwxr-xr-x. 1 root root 3138879 Sep 20 13:24 tuning -rwxr-xr-x. 1 root root 4141022 Sep 20 13:24 vlan
Not sure if ServiceMesh operator in 4.2 installation is mandatory to check this
Hrmmm, this change shouldn't actually create /opt/multus/bin -- just enable you to put plugins there. Daniel -- was /opt/multus/bin existing when you were putting plugins there originally, or is it a requirement? Or can that directory be created at the time you drop the binary in /opt/multus/bin? If it is, we can look into getting a PR into the machine config operator to have the directory created.
We're creating the directory as part of dropping the binary - if it doesn't exist. Not sure if it exists by default on 4.1
Yeah, I don't think this is a problem. It's OK if the directory doesn't exist, but we will look there if it exists. Back to you, Anurag. You'll need to make the directory as part of the test.
Okay, it seems that /opt/multus/bin doesn't exist in 4.1 but/opt/cni/bin is present but /opt/multus/bin can be created. I will go ahead and check on 4.2. Thanks
Verifying this bug on 4.2.0-0.nightly-2019-09-22-222738. Daniel, please re-open this if it conflicts at your end. I can create /opt/multus/bin on 4.2 now [core@compute-0 ~]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-09-22-222738 True False 5m56s Cluster version is 4.2.0-0.nightly-2019-09-22-222738 [core@compute-0 ~]$ sudo mkdir /opt/multus/ [core@compute-0 ~]$ sudo mkdir /opt/multus/bin [core@compute-0 ~]$ ll /opt/multus/bin total 0 Thanks!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922