Bug 1781707 - openshift-sdn nil dereference in startup [NEEDINFO]
Summary: openshift-sdn nil dereference in startup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.4.0
Assignee: Juan Luis de Sousa-Valadas
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks: 1785675 1785728
TreeView+ depends on / blocked
 
Reported: 2019-12-10 12:44 UTC by Casey Callendrello
Modified: 2020-05-04 11:20 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Kubeconfig used by the kubelet (which we use on the sdn) changed its path. Consequence: SDN has a nil deference trying to parse the empty file Fix: Make SDN able to handle both old and new paths. Result: Bug fixed.
Clone Of:
Environment:
Last Closed: 2020-05-04 11:19:30 UTC
Target Upstream Version:
cdc: needinfo? (rvanderp)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0581 None None None 2020-05-04 11:20:10 UTC

Description Casey Callendrello 2019-12-10 12:44:47 UTC
As seen in a training cluster:

    panic: runtime error: invalid memory address or nil pointer dereference
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x166faf9]
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: goroutine 1 [running]:
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/pkg/openshift-sdn.injectKubeAPIEnv(0x7ffd075a6cd8, 0x1a, 0x0, 0xc000705ba0)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:191 +0xa9
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/pkg/openshift-sdn.(*OpenShiftSDN).Run(0xc0000d0600, 0xc000676280, 0x1d424e0, 0xc0000b6010, 0xc000098480)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:77 +0x50
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/pkg/openshift-sdn.NewOpenShiftSDNCommand.func1.2(0xc000000008, 0x1b4f118)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:61 +0x4e
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/vendor/k8s.io/kubernetes/pkg/util/interrupt.(*Handler).Run(0xc00030f410, 0xc000705d50, 0x0, 0x0)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/vendor/k8s.io/kubernetes/pkg/util/interrupt/interrupt.go:103 +0xff
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/pkg/openshift-sdn.NewOpenShiftSDNCommand.func1(0xc000676280, 0xc00030f3b0, 0x0, 0x3)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:60 +0x154
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/vendor/github.com/spf13/cobra.(*Command).execute(0xc000676280, 0xc0000ba010, 0x3, 0x3, 0xc000676280, 0xc0000ba010)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/vendor/github.com/spf13/cobra/command.go:760 +0x2ae
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc000676280, 0xd, 0x1d424e0, 0xc0000b6010)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/vendor/github.com/spf13/cobra/command.go:846 +0x2ec
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/vendor/github.com/spf13/cobra.(*Command).Execute(...)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/vendor/github.com/spf13/cobra/command.go:794
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: main.main()
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/cmd/openshift-sdn/openshift-sdn.go:28 +0x17b

Comment 1 Casey Callendrello 2019-12-10 12:50:46 UTC
Aha. The kubeconfig used by the kubelet, which we cheat and use in the sdn, is blank (because it seems to have moved around). The solution is to fix this in the network operator.

Comment 2 Casey Callendrello 2019-12-10 12:59:02 UTC
Actually, wait. I'm not sure the right way to tell the network-operator about the location of the apiserver. Right now we're reading /etc/kubernetes/kubeconfig, but it seems that file has moved.

Need to ask the apiserver team how to solve this.

Comment 3 Casey Callendrello 2019-12-10 13:21:50 UTC
/usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --rotate-certificates --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --minimum-container-ttl-duration=6m0s --cloud-provider=aws --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --v=3


is the new kubelet cmdline.

Comment 4 Casey Callendrello 2019-12-20 08:56:22 UTC
Once the PR merges, we should backport it to 4.3 and 4.2. Assigning to Juan to lead that process along.

Comment 5 Juan Luis de Sousa-Valadas 2019-12-20 15:41:20 UTC
For future reference: https://github.com/openshift/cluster-network-operator/pull/420

Comment 9 errata-xmlrpc 2020-05-04 11:19:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.