Bug 1679036
Summary: | [Multus-cni] The multus-cni cannot call openshift-sdn to clean up the ipam file when the pod falls into failed status | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Meng Bo <bmeng> |
Component: | Networking | Assignee: | Douglas Smith <dosmith> |
Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 4.1.0 | CC: | aos-bugs, cdc |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:44:14 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Meng Bo
2019-02-20 08:03:27 UTC
Good catch. This is definitely a release blocker. Meng Bo -- thanks a bunch for the detailed instructions on replicating the issue. I'm able to replicate the issue with the given instructions. I did hack in my own logging, like so: ``` # cat /etc/kubernetes/cni/net.d/00-multus.conf { "name": "multus-cni-network", "type": "multus", "logFile": "/var/log/multus.log", "logLevel": "debug", "namespaceIsolation": true, "kubeconfig": "/etc/kubernetes/cni/net.d/multus.d/multus.kubeconfig", "delegates": [ { "cniVersion": "0.2.0", "name": "openshift-sdn", "type": "openshift-sdn" } ] } ``` Additionally I used a node label and node selector to assign to a particular node, more detail in my notes about my investigation here: https://gist.github.com/dougbtv/31b53730afc11eeffee30f30907d1060 There were no logs on deletion. My next steps are to look into how / why that's happening, but, it's almost as if Multus was never called. FYI, you can stop the network operator and do your own customizations for development. The instructions are at https://github.com/openshift/cluster-network-operator#stopping-the-deployed-operators I've also been able to replicate in an upstream Kubernetes lab, and I've filed an upstream issue here @ https://github.com/intel/multus-cni/issues/267 I've been able to isolate the issue and I can see that there's a portion where Multus returns too early in the `cmdDel` function, thereby not calling the delegated CNI plugin during delete when it cannot find the netns. My fix includes just sending a warning to the debug logs and continuing along to allow the delegates to be called. Proposed fix @ https://github.com/intel/multus-cni/pull/269 Pull request landed upstream and has been merged downstream, should be available in the next build of the downstream image. Can this be marked as MODIFIED? Has this been brought downstream? Thanks Casey, it has indeed been brought downstream, marked it as modified. Tested on 4.0.0-0.nightly-2019-03-14-040908 The issue has been fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |