Bug 1780387 - openshift-sdn repots error trying to add network, pods and tests fail
Summary: openshift-sdn repots error trying to add network, pods and tests fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.4.0
Assignee: Phil Cameron
QA Contact: zhaozhanqi
URL:
Whiteboard:
: 1781242 (view as bug list)
Depends On:
Blocks: 1782312
TreeView+ depends on / blocked
 
Reported: 2019-12-05 20:28 UTC by Clayton Coleman
Modified: 2020-05-04 11:19 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1782312 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:18:37 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift sdn pull 82 0 None closed Bug 1780387: host-local plugin should be built and executed within container 2021-02-07 17:35:57 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:19:10 UTC

Description Clayton Coleman 2019-12-05 20:28:53 UTC
https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-serial-4.4/213

Dec  5 17:08:33.496: INFO: At 2019-12-05 17:03:24 +0000 UTC - event for security-context-36fbcd66-e199-4993-981e-62f7a6288837: {kubelet ci-op-vnwdk-w-c-jwd8m.c.openshift-gce-devel-ci.internal} FailedCreatePodSandBox: Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_security-context-36fbcd66-e199-4993-981e-62f7a6288837_e2e-persistent-local-volumes-test-624_a838fc22-8f6e-4b77-bd1b-a25c2ae40b37_0(6df464f3fbea81157862da75950ecdf1732823e56d9b2aeda6a567763090b41d): Multus: error adding pod to network "openshift-sdn": delegateAdd: error invoking DelegateAdd - "openshift-sdn": error in getting result from AddNetwork: CNI request failed with status 400: 'failed to run IPAM for 6df464f3fbea81157862da75950ecdf1732823e56d9b2aeda6a567763090b41d: failed to run CNI IPAM ADD: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input

Looks like this is happening somewhat frequently over the last two weeks, happening in 4.3, 4.4, and upgrade from 4.2

https://search.svc.ci.openshift.org/?search=failed+to+run+IPAM+for+&maxAge=336h&context=-1&type=build-log

Comment 1 Clayton Coleman 2019-12-05 20:55:31 UTC
host-local is coredumping

Dec 05 14:55:50 ip-10-0-135-76 systemd-coredump[6383]: Process 6372 (host-local) of user 0 dumped core.
                                                       Stack trace of thread 6372:
                                                       #0  0x000000000044381d n/a (/host/var/opt/cni/bin/host-local)
                                                       #1  0x00000000004437b4 n/a (/host/var/opt/cni/bin/host-local)
                                                       #2  0x000000000042cf58 n/a (/host/var/opt/cni/bin/host-local)
                                                       #3  0x0000000000451e0a n/a (/host/var/opt/cni/bin/host-local)

Also something from openshift-sdn

Fatal error: bad TinySizeClass
runtime: panic before malloc heap initialized
runtime stack:
runtime.throw(0x55e6c1, 0x11)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:617 +0x72 fp=0x7fff436c0198 sp=0x7fff436c0168 pc=0x42a542
runtime.mallocinit()
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/malloc.go:361 +0x244 fp=0x7fff436c01c8 sp=0x7fff436c0198 pc=0x40a2c4
runtime.schedinit()
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/proc.go:539 +0x62 fp=0x7fff436c0220 sp=0x7fff436c01c8 pc=0x42cf62
runtime.rt0_go(0x7fff436c0258, 0x1, 0x7fff436c0258, 0x0, 0x0, 0x1, 0x7fff436c1c55, 0x0, 0x7fff436c1c72, 0x7fff436c1c82, ...)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/asm_amd64.s:195 +0x11a fp=0x7fff436c0228 sp=0x7fff436c0220 pc=0x451e0a
fatal error: bad TinySizeClass
runtime: panic before malloc heap initialized

Comment 3 Casey Callendrello 2019-12-10 14:38:58 UTC
The problem is that openshift-sdn is executing the host-local plugin installed on the *host* by multus.

We should stop doing that. We need to package our own host-local and stop using the bind-mounted one.

Phil, can you get to this today?

Comment 4 Vadim Rutkovsky 2019-12-10 15:00:17 UTC
*** Bug 1781242 has been marked as a duplicate of this bug. ***

Comment 5 Douglas Smith 2019-12-10 16:29:45 UTC
Pull request for building the host-local within the openshift-sdn build process and referencing the container-local binary @ https://github.com/openshift/sdn/pull/82

Comment 9 errata-xmlrpc 2020-05-04 11:18:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.