Bug 1962886 - sigsegv ovnkube-master
Summary: sigsegv ovnkube-master
Keywords:
Status: CLOSED DUPLICATE of bug 1958958
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-20 17:52 UTC by Alex Krzos
Modified: 2021-05-20 20:27 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-20 20:27:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Alex Krzos 2021-05-20 17:52:10 UTC
Description of problem:
While installing many SNO clusters via ACM through ZTP, one cluster failed to complete install and it seems it is because the master OVN pod is segfaulting.

Version-Release number of selected component (if applicable):
4.8.0-fc.3

How reproducible:
Rare - 1/72 clusters in this environment failed because of this.

Steps to Reproduce:
1.
2.
3.

Actual results:
0520 15:42:13.515240       1 pods.go:325] LSP already exists for port: openshift-kube-apiserver_revision-pruner-4-sno00007
I0520 15:42:13.522093       1 pods.go:289] [openshift-operator-lifecycle-manager/olm-operator-7bbcc48779-ddjl2] addLogicalPort took 38.838486ms
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x17971e0]

goroutine 1070 [running]:
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn/address_set.(*ovnAddressSets).AddIPs(0x0, 0xc0016a8dc8, 0x1, 0x1, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/address_set/address_set.go:292 +0x60
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).addPodToNamespace(0xc0012c0000, 0xc001aaacf0, 0x24, 0xc002230700, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/namespace.go:69 +0x114
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).addLogicalPort(0xc0012c0000, 0xc001c03a98, 0x0, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/pods.go:501 +0xf49
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).ensurePod(0xc0012c0000, 0x0, 0xc001c03a98, 0x1, 0x0)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/ovn.go:507 +0x510
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*Controller).WatchPods.func2(0x1b9f860, 0xc001c03a98)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/ovn/ovn.go:540 +0x5c
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:231
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnAdd(0xc001d6f340, 0x1e42660, 0xc001d6bfc8, 0x1b9f860, 0xc001c03a98)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:264 +0x6a
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*Handler).OnAdd(...)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/factory/handler.go:38
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.newQueuedInformer.func1.1(0xc002049d90, 0xc0020511d0, 0xc002066600)
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/factory/handler.go:367 +0xaf
created by github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.newQueuedInformer.func1
	/go/src/github.com/openshift/ovn-kubernetes/go-controller/pkg/factory/handler.go:360 +0x10a

Expected results:
No segfaults from the ovnkube-master container.

Additional info:

Tailing the logs of each crashloop, it seems to have the same log line before the panic each time, perhaps this helps suggest the root cause of the issue:
I0520 15:42:13.522093       1 pods.go:289] [openshift-operator-lifecycle-manager/olm-operator-7bbcc48779-ddjl2] addLogicalPort took 38.838486ms
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x17971e0]

I0520 17:46:26.315322       1 pods.go:289] [openshift-operator-lifecycle-manager/olm-operator-7bbcc48779-ddjl2] addLogicalPort took 20.307113ms
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x17971e0]

Comment 1 Antonio Ojea 2021-05-20 20:27:55 UTC

*** This bug has been marked as a duplicate of bug 1958958 ***


Note You need to log in before you can comment on or make changes to this bug.