Bug 1781707

Summary: openshift-sdn nil dereference in startup
Product: OpenShift Container Platform Reporter: Casey Callendrello <cdc>
Component: NetworkingAssignee: Juan Luis de Sousa-Valadas <jdesousa>
Networking sub component: openshift-sdn QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: huirwang, rvanderp, wkulhane
Version: 4.2.0   
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Kubeconfig used by the kubelet (which we use on the sdn) changed its path. Consequence: SDN has a nil deference trying to parse the empty file Fix: Make SDN able to handle both old and new paths. Result: Bug fixed.
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-04 11:19:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1785675, 1785728    

Description Casey Callendrello 2019-12-10 12:44:47 UTC
As seen in a training cluster:

    panic: runtime error: invalid memory address or nil pointer dereference
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x166faf9]
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: goroutine 1 [running]:
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/pkg/openshift-sdn.injectKubeAPIEnv(0x7ffd075a6cd8, 0x1a, 0x0, 0xc000705ba0)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:191 +0xa9
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/pkg/openshift-sdn.(*OpenShiftSDN).Run(0xc0000d0600, 0xc000676280, 0x1d424e0, 0xc0000b6010, 0xc000098480)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:77 +0x50
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/pkg/openshift-sdn.NewOpenShiftSDNCommand.func1.2(0xc000000008, 0x1b4f118)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:61 +0x4e
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/vendor/k8s.io/kubernetes/pkg/util/interrupt.(*Handler).Run(0xc00030f410, 0xc000705d50, 0x0, 0x0)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/vendor/k8s.io/kubernetes/pkg/util/interrupt/interrupt.go:103 +0xff
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/pkg/openshift-sdn.NewOpenShiftSDNCommand.func1(0xc000676280, 0xc00030f3b0, 0x0, 0x3)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/pkg/openshift-sdn/cmd.go:60 +0x154
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/vendor/github.com/spf13/cobra.(*Command).execute(0xc000676280, 0xc0000ba010, 0x3, 0x3, 0xc000676280, 0xc0000ba010)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/vendor/github.com/spf13/cobra/command.go:760 +0x2ae
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc000676280, 0xd, 0x1d424e0, 0xc0000b6010)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/vendor/github.com/spf13/cobra/command.go:846 +0x2ec
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: github.com/openshift/sdn/vendor/github.com/spf13/cobra.(*Command).Execute(...)
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/vendor/github.com/spf13/cobra/command.go:794
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]: main.main()
    Dec 09 14:39:33 ip-10-0-133-40 hyperkube[2027]:         /go/src/github.com/openshift/sdn/cmd/openshift-sdn/openshift-sdn.go:28 +0x17b

Comment 1 Casey Callendrello 2019-12-10 12:50:46 UTC
Aha. The kubeconfig used by the kubelet, which we cheat and use in the sdn, is blank (because it seems to have moved around). The solution is to fix this in the network operator.

Comment 2 Casey Callendrello 2019-12-10 12:59:02 UTC
Actually, wait. I'm not sure the right way to tell the network-operator about the location of the apiserver. Right now we're reading /etc/kubernetes/kubeconfig, but it seems that file has moved.

Need to ask the apiserver team how to solve this.

Comment 3 Casey Callendrello 2019-12-10 13:21:50 UTC
/usr/bin/hyperkube kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --rotate-certificates --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=rhcos --minimum-container-ttl-duration=6m0s --cloud-provider=aws --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --v=3


is the new kubelet cmdline.

Comment 4 Casey Callendrello 2019-12-20 08:56:22 UTC
Once the PR merges, we should backport it to 4.3 and 4.2. Assigning to Juan to lead that process along.

Comment 5 Juan Luis de Sousa-Valadas 2019-12-20 15:41:20 UTC
For future reference: https://github.com/openshift/cluster-network-operator/pull/420

Comment 9 errata-xmlrpc 2020-05-04 11:19:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581

Comment 10 Red Hat Bugzilla 2023-09-14 05:48:28 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days