Bug 1859382 - check-endpoints panics on graceful shutdown
Summary: check-endpoints panics on graceful shutdown
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.8.0
Assignee: Luis Sanchez
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-21 21:04 UTC by Luis Sanchez
Modified: 2021-07-27 22:32 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 22:32:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-kube-apiserver-operator pull 906 0 None closed ponetworkconnectivitychecks not being updated 2021-01-25 16:09:25 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:32:43 UTC

Description Luis Sanchez 2020-07-21 21:04:07 UTC
E0720 17:45:35.897478       1 shared_informer.go:226] unable to sync caches for check-endpoints
E0720 17:45:35.897585       1 runtime.go:76] Observed a panic: timeout waiting for informer cache
goroutine 1 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1e8f700, 0x2602550)
	k8s.io/apimachinery.3/pkg/util/runtime/runtime.go:74 +0xa3
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	k8s.io/apimachinery.3/pkg/util/runtime/runtime.go:48 +0x82
panic(0x1e8f700, 0x2602550)
	runtime/panic.go:679 +0x1b2
github.com/openshift/library-go/pkg/controller/factory.(*baseController).Run(0xc000cd2d80, 0x268b6a0, 0xc000098b40, 0x1)
	github.com/openshift/library-go.0-20200617100204-823c70af3f78/pkg/controller/factory/base_controller.go:66 +0x861
github.com/openshift/cluster-kube-apiserver-operator/pkg/cmd/checkendpoints.NewCheckEndpointsCommand.func1(0x268b6a0, 0xc000098b40, 0xc000b03020, 0x268b6a0, 0xc000098b40)
	github.com/openshift/cluster-kube-apiserver-operator@/pkg/cmd/checkendpoints/cmd.go:36 +0x4bc
github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerBuilder).Run(0xc0000d4900, 0x268b6a0, 0xc000098b40, 0x0, 0x0, 0xc0006b1b60)
	github.com/openshift/library-go.0-20200617100204-823c70af3f78/pkg/controller/controllercmd/builder.go:275 +0x683
github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerCommandConfig).StartController(0xc0006c12c0, 0x268b6a0, 0xc000364300, 0xc000364300, 0xc00041e7b0)
	github.com/openshift/library-go.0-20200617100204-823c70af3f78/pkg/controller/controllercmd/cmd.go:284 +0x4eb
github.com/openshift/library-go/pkg/controller/controllercmd.(*ControllerCommandConfig).NewCommandWithContext.func1(0xc0000d3b80, 0xc00047d480, 0x0, 0x8)
	github.com/openshift/library-go.0-20200617100204-823c70af3f78/pkg/controller/controllercmd/cmd.go:128 +0x67a
github.com/spf13/cobra.(*Command).execute(0xc0000d3b80, 0xc00047d180, 0x8, 0x8, 0xc0000d3b80, 0xc00047d180)
	github.com/spf13/cobra.5/command.go:830 +0x2aa
github.com/spf13/cobra.(*Command).ExecuteC(0xc000224000, 0xc000054040, 0xc000224000, 0x6)
	github.com/spf13/cobra.5/command.go:914 +0x2fb
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra.5/command.go:864
main.main()
	github.com/openshift/cluster-kube-apiserver-operator@/cmd/cluster-kube-apiserver-operator/main.go:42 +0x18e

Comment 1 Luis Sanchez 2020-07-21 21:11:02 UTC
Might be related to https://github.com/openshift/cluster-kube-apiserver-operator/pull/906 but can't confirm exact build. Will attempt to reproduce.

Comment 2 Luis Sanchez 2020-07-27 16:24:16 UTC
Already fixed by https://github.com/openshift/cluster-kube-apiserver-operator/pull/906.

Comment 3 Ke Wang 2020-08-03 13:56:40 UTC
Verified with OCP 4.6.0-0.nightly-2020-08-01-215144, steps see below,

oc debug node/<master node>
Created one test script as following, the script will shutdown the kube-apiserver gracefully several times, 

sh-4.4# cat test.sh
i=0    
while [ $i -lt 8 ]; do
  pid=$(ps aux | grep " kube-apiserver " | grep -v grep  | awk 'NR==1 {print $2}')
  if [ "X$pid" != "X$ppid" ];then
    echo "kill $pid $i time(s)"
    kill $pid
    i=$(( i + 1 ))
    ppid=$pid
  fi
  sleep 180
done
sh-4.4# ./test.sh
kill 703095 0 times
kill 733488 1 times
kill 738292 2 times
kill 743032 3 times
kill 748310 4 times
kill 753881 5 times
kill 760332 6 times
kill 772610 7 times

sh-4.4# grep -nri 'panic' openshift-*
No results found,

After the kube-apiserver gracefully shutdown, no any panic occurred, so move the bug Verified.

Comment 6 errata-xmlrpc 2021-07-27 22:32:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.