Bug 2041830

Summary: CI: ovn-kubernetes-master-e2e-aws-ovn-windows is broken
Product: OpenShift Container Platform Reporter: Surya Seetharaman <surya>
Component: NetworkingAssignee: Surya Seetharaman <surya>
Networking sub component: ovn-kubernetes QA Contact: Mike Fiedler <mifiedle>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: mifiedle, trozet
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:40:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Surya Seetharaman 2022-01-18 10:51:21 UTC
Description of problem:
We are seeing panics in the CI runs:

2022-01-15T19:05:10.881566498Z I0115 19:05:10.881524       1 informer.go:294] Successfully synced 'ci-op-pg859vpf-ed916-p8l86-master-2'
2022-01-15T19:05:10.887132241Z I0115 19:05:10.887098       1 master.go:429] Created hybrid overlay logical route policy for node ci-op-pg859vpf-ed916-p8l86-worker-5slgm
2022-01-15T19:05:10.887274695Z E0115 19:05:10.887239       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
2022-01-15T19:05:10.887274695Z goroutine 856 [running]:
2022-01-15T19:05:10.887274695Z k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1bd5fc0, 0x2e22800)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
2022-01-15T19:05:10.887274695Z k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x86
2022-01-15T19:05:10.887274695Z panic(0x1bd5fc0, 0x2e22800)
2022-01-15T19:05:10.887274695Z 	/usr/lib/golang/src/runtime/panic.go:965 +0x1b9
2022-01-15T19:05:10.887274695Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/libovsdbops.findDatapathByPredicate(0x0, 0x0, 0xc0033ab360, 0x0, 0x0, 0x0)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/libovsdbops/datapath.go:17 +0xc0
2022-01-15T19:05:10.887274695Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/libovsdbops.FindDatapathByExternalIDs(0x0, 0x0, 0xc002d12ba0, 0x4, 0xc001ec8ce8, 0x2c)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/libovsdbops/datapath.go:43 +0x6f
2022-01-15T19:05:10.887274695Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/util.CreateMACBinding(0x0, 0x0, 0xc0029f7590, 0x2c, 0x1e69f1b, 0x12, 0xc002691508, 0x6, 0x6, 0xc00269155c, ...)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/util/ovn.go:21 +0xb9
2022-01-15T19:05:10.887274695Z github.com/ovn-org/ovn-kubernetes/go-controller/hybrid-overlay/pkg/controller.(*MasterController).setupHybridLRPolicySharedGw(0xc000cc9f00, 0xc000114758, 0x1, 0x1, 0xc002e1d410, 0x27, 0xc002691508, 0x6, 0x6, 0x0, ...)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/hybrid-overlay/pkg/controller/master.go:433 +0xa5e
2022-01-15T19:05:10.887274695Z github.com/ovn-org/ovn-kubernetes/go-controller/hybrid-overlay/pkg/controller.(*MasterController).handleOverlayPort(0xc000cc9f00, 0xc002003b00, 0x2107510, 0xc002ca2480, 0x0, 0x7b3a227465477074)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/hybrid-overlay/pkg/controller/master.go:277 +0xd85
2022-01-15T19:05:10.887274695Z github.com/ovn-org/ovn-kubernetes/go-controller/hybrid-overlay/pkg/controller.(*MasterController).AddNode(0xc000cc9f00, 0xc002003b00, 0xc001d1b920, 0x27)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/hybrid-overlay/pkg/controller/master.go:330 +0x4ac
2022-01-15T19:05:10.887274695Z github.com/ovn-org/ovn-kubernetes/go-controller/hybrid-overlay/pkg/controller.NewMaster.func1(0x1e32460, 0xc002003b00, 0x27, 0x1e32460)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/hybrid-overlay/pkg/controller/master.go:71 +0x49
2022-01-15T19:05:10.887274695Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/informer.(*eventHandler).syncHandler(0xc001420ae0, 0xc001d1b920, 0x27, 0xe00000002e62320, 0x5)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/informer/informer.go:335 +0x33e
2022-01-15T19:05:10.887274695Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/informer.(*eventHandler).processNextWorkItem.func1(0xc001420ae0, 0x1b0fb40, 0xc0013ba610, 0x0, 0x0)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/informer/informer.go:280 +0xea
2022-01-15T19:05:10.887274695Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/informer.(*eventHandler).processNextWorkItem(0xc001420ae0, 0x203000)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/informer/informer.go:297 +0x49
2022-01-15T19:05:10.887274695Z github.com/ovn-org/ovn-kubernetes/go-controller/pkg/informer.(*eventHandler).runWorker(...)
2022-01-15T19:05:10.887274695Z 	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/informer/informer.go:248
2022-01-15T19:05:10.887274695Z k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0029b6f98)

Version-Release number of selected component (if applicable):


How reproducible:
90% of CI runs:
https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-windows

We need to backport https://github.com/ovn-org/ovn-kubernetes/pull/2720/commits/51f3d5f669595a8a9efd2e2292faefe359e84543 to fix this issue.

Comment 5 Mike Fiedler 2022-01-26 17:46:45 UTC
Letting this soak longer in CI.   General regression on winc cluster successful, but CI is showing  a lot of recent failure for  FAIL: TestWMCO/network/Pod_DNS_Resolution

Comment 8 Mike Fiedler 2022-02-03 13:23:34 UTC
Verified,  latest CI runs are passing ~70% and the failures are different than this one.

Comment 10 errata-xmlrpc 2022-03-10 16:40:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056