+++ This bug was initially created as a clone of Bug #1816746 +++ Description of problem: I have specified an additionalNetwork containing "ipam": {"type": "static"}. When I try to add this network to a pod specifying a specific ip, the additional interface is silently ignore. The pod comes up with no additional network interfaces. The full pod definition is: === apiVersion: v1 kind: Pod metadata: name: busybox1 labels: app: busybox1 annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "osp-internalapi-static", "ips": "192.168.222.1/24" } ]' spec: containers: - image: busybox command: - sleep - "3600" imagePullPolicy: IfNotPresent name: busybox restartPolicy: Always === Note that I am not able to specify ips as an array in the above because it is rejected as invalid input. The full additionalNetworks stanza is: === - name: osp-internalapi-static namespace: default rawCNIConfig: '{ "cniVersion": "0.3.1", "type": "bridge", "bridge": "br-ospinfra", "vlan": 100, "capabilities": { "ips": true }, "ipam": { "type": "static" } }' type: Raw === Note that if I add the IP address to the cni definition (in ipam.addresses) and remove it from the pod definition then the network is created as expected. In debugging, Tomofumi Hayashi asked me to try with the multus admission controller disabled, which I did with: === oc -n openshift-cluster-version scale --replicas=0 deploy/cluster-version-operator oc -n openshift-multus delete daemonset/multus-admission-controller === Unfortunately this didn't help. Reproduced, with logs: === oc get pod busybox1 -o yaml apiVersion: v1 kind: Pod metadata: annotations: k8s.ovn.org/pod-networks: '{"default":{"ip_address":"10.129.2.6/23","mac_address":"fa:30:ef:81:02:07","gateway_ip":"10.129.2.1"}}' k8s.v1.cni.cncf.io/networks: '[{"name":"osp-internalapi-static","namespace":"default","ips":"192.168.222.1/24","mac":"02:8c:0c:00:00:0d"}]' k8s.v1.cni.cncf.io/networks-status: |- [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.129.2.6" ], "mac": "fa:30:ef:81:02:07", "dns": {} }] creationTimestamp: "2020-03-24T15:58:32Z" labels: app: busybox1 name: busybox1 namespace: default resourceVersion: "4137705" selfLink: /api/v1/namespaces/default/pods/busybox1 uid: bf61b98f-d979-4d61-8780-424f7a6dcb0c spec: containers: - command: - sleep - "3600" image: busybox imagePullPolicy: IfNotPresent name: busybox resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-mw227 readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true imagePullSecrets: - name: default-dockercfg-rxcjv nodeName: worker-2 priority: 0 restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes: - name: default-token-mw227 secret: defaultMode: 420 secretName: default-token-mw227 status: conditions: - lastProbeTime: null lastTransitionTime: "2020-03-24T15:58:32Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2020-03-24T15:58:35Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: "2020-03-24T15:58:35Z" status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2020-03-24T15:58:32Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://b7bc1fc4408acee83e31e1a7b2fc5493704606e053c3f4b3a7e45b45f195af94 image: docker.io/library/busybox:latest imageID: docker.io/library/busybox@sha256:afe605d272837ce1732f390966166c2afff5391208ddd57de10942748694049d lastState: {} name: busybox ready: true restartCount: 0 started: true state: running: startedAt: "2020-03-24T15:58:34Z" hostIP: 192.168.111.25 phase: Running podIP: 10.129.2.6 podIPs: - ip: 10.129.2.6 qosClass: BestEffort startTime: "2020-03-24T15:58:32Z" === === $ oc get net-attach-def/osp-internalapi-static -o yaml apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: creationTimestamp: "2020-03-23T16:53:58Z" generation: 9 name: osp-internalapi-static namespace: default ownerReferences: - apiVersion: operator.openshift.io/v1 blockOwnerDeletion: true controller: true kind: Network name: cluster uid: a1738e2d-ddd8-43a2-bae9-0bfa68c635ac resourceVersion: "3997440" selfLink: /apis/k8s.cni.cncf.io/v1/namespaces/default/network-attachment-definitions/osp-internalapi-static uid: fc764ba7-d6f4-4a14-8700-17a5b8a3983f spec: config: '{ "cniVersion": "0.3.1", "type": "bridge", "bridge": "br-ospinfra", "vlan": 100, "capabilities": { "ips": true }, "ipam": { "type": "static" } }' === The pod does not have a multus interface: === $ kubectl exec busybox1 ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 3: eth0@if79: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1400 qdisc noqueue link/ether fa:30:ef:81:02:07 brd ff:ff:ff:ff:ff:ff === The logs from worker-2 during the above are attached to this BZ as kubelet-crio.log. Version-Release number of selected component (if applicable): 4.5.0-0.nightly-2020-03-17-091701 How reproducible: Always --- Additional comment from Matthew Booth on 2020-03-25 09:54:14 GMT --- It appears that the error is in the definition of the pod, but the valid syntax was being rejected by 2 separate admission controllers: multus-admission-controller and kubemacpool-mac-controller-manager. The rejected input is: === apiVersion: v1 kind: Pod metadata: name: busybox1 labels: app: busybox1 annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "osp-internalapi-static", "ips": [ "192.168.222.1/24" ] } ]' spec: containers: - image: busybox command: - sleep - "3600" imagePullPolicy: IfNotPresent name: busybox restartPolicy: Always === The failure is: === $ oc create -f busybox.yaml Error from server: error when creating "busybox.yaml": admission webhook "mutatepods.example.com" denied the request: parsePodNetworkAnnotation: failed to parse pod Network Attachment Selection Annotation JSON format: json: cannot unmarshal array into Go struct field NetworkSelectionElement.ips of type string === The workaround is to disable both admission controllers: oc -n openshift-cluster-version scale --replicas=0 deploy/cluster-version-operator oc -n openshift-network-operator scale --replicas=0 deploy/network-operator oc -n openshift-cnv scale --replicas=0 deploy/kubemacpool-mac-controller-manager oc -n openshift-multus delete daemonset/multus-admission-controller With this in place the above pod is created successfully, and the additional multus interface is added with the correct static IP.
Upstream PR: https://github.com/k8snetworkplumbingwg/net-attach-def-admission-controller/pull/39
(In reply to Tomofumi Hayashi from comment #1) > Upstream PR: > https://github.com/k8snetworkplumbingwg/net-attach-def-admission-controller/ > pull/39 Sorry, it is not for the bz....
@Meni, could you please verify that we are able to create a VM with secondary network on 2.3 with KubeMacPool involved? I'm afraid this may be a blocker for 2.3.
Just found out this issue happens only when IPAM is used for the secondary network. IPAM is not documented in our docs AFAIK and we recommend using basic L2 and handle addressing via a DHCP server running on the network.
But since we reconcile Pods too, it affect all the workload. I suggest this as a blocker. We will resolve it simply by disabling the webhook on Pods.
Since we don't keep Webhook configuration in CNAO, but it is generated by KMP, we need to change the sources of KMP itself and backport it.
Petr, all our secondary networks use mac pool (by default). We don't set IPEM.
Thanks. The issue seems to affect only Pods/VMs using IPAM. We have to fix it not to break secondary networks on Pods in OpenShift.
Since it is so late in the release cycle, we are disabling KMP in 2.3.
Docs impact is a Known Issue in the 2.3 Release Notes. PR: https://github.com/openshift/openshift-docs/pull/21431 @Nelly, can you please assign someone to QE review?
LGTM but since i dont see the genreated doc, i cant tell if {CNVProductName}& {CNVVersion} params are properly working (no other known issues use them)
Meni, could you please test upgrade from 2.2 to 2.3 and make sure that KMP is removed during it.
After upgrade the cluster from OCP4.3+CNV2.2 to OCP4.4+CNV2.3, the KMP is not existing anymore.
Adjusted the doc text to make it clear that this issue only affects VMs that don't have an explicit MAC address set.
This bug was listed under Known Issues for the CNV 2.3 release. I am noting that it was closed for the current release. Therefore, I am deleting this write-up from the Known Issues section of the CNV 2.4 release for this defect [lmandavi]