Bug 1782312

Summary: openshift-sdn repots error trying to add network, pods and tests fail
Product: OpenShift Container Platform Reporter: Ben Bennett <bbennett>
Component: NetworkingAssignee: Casey Callendrello <cdc>
Networking sub component: openshift-sdn QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: anusaxen, ccoleman, dosmith, nagrawal, pcameron, vrutkovs, zzhao
Version: 4.3.0   
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1780387 Environment:
Last Closed: 2020-01-23 11:18:55 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1780387    
Bug Blocks:    

Description Ben Bennett 2019-12-11 14:17:56 UTC
+++ This bug was initially created as a clone of Bug #1780387 +++

https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-gcp-serial-4.4/213

Dec  5 17:08:33.496: INFO: At 2019-12-05 17:03:24 +0000 UTC - event for security-context-36fbcd66-e199-4993-981e-62f7a6288837: {kubelet ci-op-vnwdk-w-c-jwd8m.c.openshift-gce-devel-ci.internal} FailedCreatePodSandBox: Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_security-context-36fbcd66-e199-4993-981e-62f7a6288837_e2e-persistent-local-volumes-test-624_a838fc22-8f6e-4b77-bd1b-a25c2ae40b37_0(6df464f3fbea81157862da75950ecdf1732823e56d9b2aeda6a567763090b41d): Multus: error adding pod to network "openshift-sdn": delegateAdd: error invoking DelegateAdd - "openshift-sdn": error in getting result from AddNetwork: CNI request failed with status 400: 'failed to run IPAM for 6df464f3fbea81157862da75950ecdf1732823e56d9b2aeda6a567763090b41d: failed to run CNI IPAM ADD: netplugin failed but error parsing its diagnostic message "": unexpected end of JSON input

Looks like this is happening somewhat frequently over the last two weeks, happening in 4.3, 4.4, and upgrade from 4.2

https://search.svc.ci.openshift.org/?search=failed+to+run+IPAM+for+&maxAge=336h&context=-1&type=build-log

--- Additional comment from Clayton Coleman on 2019-12-05 15:55:31 EST ---

host-local is coredumping

Dec 05 14:55:50 ip-10-0-135-76 systemd-coredump[6383]: Process 6372 (host-local) of user 0 dumped core.
                                                       Stack trace of thread 6372:
                                                       #0  0x000000000044381d n/a (/host/var/opt/cni/bin/host-local)
                                                       #1  0x00000000004437b4 n/a (/host/var/opt/cni/bin/host-local)
                                                       #2  0x000000000042cf58 n/a (/host/var/opt/cni/bin/host-local)
                                                       #3  0x0000000000451e0a n/a (/host/var/opt/cni/bin/host-local)

Also something from openshift-sdn

Fatal error: bad TinySizeClass
runtime: panic before malloc heap initialized
runtime stack:
runtime.throw(0x55e6c1, 0x11)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:617 +0x72 fp=0x7fff436c0198 sp=0x7fff436c0168 pc=0x42a542
runtime.mallocinit()
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/malloc.go:361 +0x244 fp=0x7fff436c01c8 sp=0x7fff436c0198 pc=0x40a2c4
runtime.schedinit()
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/proc.go:539 +0x62 fp=0x7fff436c0220 sp=0x7fff436c01c8 pc=0x42cf62
runtime.rt0_go(0x7fff436c0258, 0x1, 0x7fff436c0258, 0x0, 0x0, 0x1, 0x7fff436c1c55, 0x0, 0x7fff436c1c72, 0x7fff436c1c82, ...)
	/opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/asm_amd64.s:195 +0x11a fp=0x7fff436c0228 sp=0x7fff436c0220 pc=0x451e0a
fatal error: bad TinySizeClass
runtime: panic before malloc heap initialized

--- Additional comment from Clayton Coleman on 2019-12-05 16:00:21 EST ---

That chunk was from https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12105

--- Additional comment from Casey Callendrello on 2019-12-10 09:38:58 EST ---

The problem is that openshift-sdn is executing the host-local plugin installed on the *host* by multus.

We should stop doing that. We need to package our own host-local and stop using the bind-mounted one.

Phil, can you get to this today?

--- Additional comment from Vadim Rutkovsky on 2019-12-10 10:00:17 EST ---



--- Additional comment from Douglas Smith on 2019-12-10 11:29:45 EST ---

Pull request for building the host-local within the openshift-sdn build process and referencing the container-local binary @ https://github.com/openshift/sdn/pull/82

Comment 2 Anurag saxena 2019-12-16 19:15:14 UTC
Casey, i don't see much informer related logs on the recent build (4.3.0-0.nightly-2019-12-13-180405 ) except for the fields like K8s.io/client-go which is okay as per PR code. Let me know if there is any better way to test this. 

I am skeptical about "https://raw.githubusercontent.com/anuragthehatter/v3-testfiles/master/networking/list_for_pods.json" logs though is these are expected

$ oc logs sdn-gp6jt | grep -i informer
I1216 13:47:43.172149    2377 shared_informer.go:197] Waiting for caches to sync for service config
I1216 13:47:43.172204    2377 shared_informer.go:197] Waiting for caches to sync for endpoints config
I1216 13:47:43.272369    2377 shared_informer.go:204] Caches are synced for endpoints config 
I1216 13:47:43.272650    2377 shared_informer.go:204] Caches are synced for service config 
W1216 13:52:02.891270    2377 reflector.go:299] k8s.io/client-go/informers/factory.go:134: watch of *v1.Namespace ended with: too old resource version: 10781 (12298)
W1216 13:52:02.891358    2377 reflector.go:299] k8s.io/client-go/informers/factory.go:134: watch of *v1.NetworkPolicy ended with: too old resource version: 6151 (12311)
W1216 13:52:02.976055    2377 reflector.go:299] github.com/openshift/client-go/network/informers/externalversions/factory.go:101: watch of *v1.EgressNetworkPolicy ended with: too old resource version: 9255 (12643)
W1216 13:52:03.472013    2377 reflector.go:299] github.com/openshift/client-go/network/informers/externalversions/factory.go:101: watch of *v1.HostSubnet ended with: too old resource version: 10783 (15852)
W1216 13:52:03.525101    2377 reflector.go:299] github.com/openshift/client-go/network/informers/externalversions/factory.go:101: watch of *v1.NetNamespace ended with: too old resource version: 10764 (15852)
W1216 13:53:58.758647    2377 reflector.go:299] k8s.io/client-go/informers/factory.go:134: watch of *v1.NetworkPolicy ended with: too old resource version: 12311 (15938)
W1216 13:53:58.758794    2377 reflector.go:299] k8s.io/client-go/informers/factory.go:134: watch of *v1.Namespace ended with: too old resource version: 12298 (15930)
W1216 13:54:00.919699    2377 reflector.go:299] github.com/openshift/client-go/network/informers/externalversions/factory.go:101: watch of *v1.NetNamespace ended with: too old resource version: 15852 (16829)
W1216 13:54:00.919733    2377 reflector.go:299] github.com/openshift/client-go/network/informers/externalversions/factory.go:101: watch of *v1.EgressNetworkPolicy ended with: too old resource version: 12643 (16816)
W1216 13:54:01.533256    2377 reflector.go:299] github.com/openshift/client-go/network/informers/externalversions/factory.go:101: watch of *v1.HostSubnet ended with: too old resource version: 15852 (16827)
W1216 19:10:39.232325    2377 reflector.go:299] k8s.io/client-go/informers/factory.go:134: watch of *v1.NetworkPolicy ended with: too old resource version: 15938 (16840)
W1216 19:10:39.670314    2377 reflector.go:299] github.com/openshift/client-go/network/informers/externalversions/factory.go:101: watch of *v1.EgressNetworkPolicy ended with: too old resource version: 16816 (97402)
W1216 19:10:39.778000    2377 reflector.go:299] github.com/openshift/client-go/network/informers/externalversions/factory.go:101: watch of *v1.HostSubnet ended with: too old resource version: 16827 (97399)

Comment 3 Anurag saxena 2019-12-16 19:22:57 UTC
Sorry, disregard the above comment. Supposed to be for other bug :)

Comment 4 zhaozhanqi 2019-12-19 07:04:12 UTC
Verified this bug on 4.3.0-0.nightly-2019-12-18-224446
sh-4.2# ls /usr/bin/cni/osdn-host-local 
/usr/bin/cni/osdn-host-local
sh-4.2# 

Did not find this issue in recently e2e testing.

Comment 6 errata-xmlrpc 2020-01-23 11:18:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062