As seen in a training cluster: panic: runtime error: invalid memory address or nil pointer dereference Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x166faf9] Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: goroutine 1 [running]: Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/pkg/openshift-sdn.injectKubeAPIEnv(0x7ffd075a6cd8, 0x1a, 0x0, 0xc000705ba0) Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: /go/src/github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:191 +0xa9 Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/pkg/openshift-sdn.(*OpenShiftSDN).Run(0xc0000d0600, 0xc000676280, 0x1d424e0, 0xc0000b6010, 0xc000098480) Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: /go/src/github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:77 +0x50 Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/pkg/openshift-sdn.NewOpenShiftSDNCommand.func1.2(0xc000000008, 0x1b4f118) Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: /go/src/github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:61 +0x4e Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/vendor/k8s.io/kubernetes/pkg/util/interrupt.(*Handler).Run(0xc00030f410, 0xc000705d50, 0x0, 0x0) Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: /go/src/github.com/openshift/sdn/vendor/k8s.io/kubernetes/pkg/util/interrupt/interrupt.go:103 +0xff Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/pkg/openshift-sdn.NewOpenShiftSDNCommand.func1(0xc000676280, 0xc00030f3b0, 0x0, 0x3) Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: /go/src/github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:60 +0x154 Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/vendor/github.com/spf13/cobra.(*Command).execute(0xc000676280, 0xc0000ba010, 0x3, 0x3, 0xc000676280, 0xc0000ba010) Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: /go/src/github.com/openshift/sdn/vendor/github.com/spf13/cobra/command.go:760 +0x2ae Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc000676280, 0xd, 0x1d424e0, 0xc0000b6010) Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: /go/src/github.com/openshift/sdn/vendor/github.com/spf13/cobra/command.go:846 +0x2ec Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/vendor/github.com/spf13/cobra.(*Command).Execute(...) Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: /go/src/github.com/openshift/sdn/vendor/github.com/spf13/cobra/command.go:794 Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: main.main() Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: /go/src/github.com/openshift/sdn/cmd/openshift-sdn/openshift-sdn.go:28 +0x17b
Aha. The kubeconfig used by the kubelet, which we cheat and use in the sdn, is blank (because it seems to have moved around). The solution is to fix this in the network operator.
Actually, wait. I'm not sure the right way to tell the network-operator about the location of the apiserver. Right now we're reading /etc/kubernetes/kubeconfig, but it seems that file has moved. Need to ask the apiserver team how to solve this.
/usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --rotate-certificates --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --minimum-container-ttl-duration=6m0s --cloud-provider=aws --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --v=3 is the new kubelet cmdline.
Once the PR merges, we should backport it to 4.3 and 4.2. Assigning to Juan to lead that process along.
For future reference: https://github.com/openshift/cluster-network-operator/pull/420
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days