Description of problem: One OCP master node of the three downed in the customer environment. The customer was not able to oc login nor ssh access. It was repaired after rebooting. The following two types of logs were repeated, was it related? Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: I0125 00:00:53.166475 1169 status_manager.go:382] Ignoring same status for pod "console-operator-566bf94bbd-lgqhl_openshift-console-operator(c6c80abf-1d46-11ea-973d-0a38d054022e)", statu s: {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 01:20:40 +0000 UTC Reason: Message:} {Type:Ready Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 14:33:19 +0000 UTC Reason: Message:} {Type:ContainersReady Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 14:33:19 +0000 UTC Reason: Message:} {Type:PodScheduled Status :True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 01:20:40 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.0.158.144 PodIP:10.129.0.95 StartTime:2019-12-13 01:20:40 +0000 UTC InitCo ntainerStatuses:[] ContainerStatuses:[{Name:console-operator State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2019-12-13 14:33:18 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:&ContainerSt ateTerminated{ExitCode:2,Signal:0,Reason:Error,Message:c/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/vendor/monis.app/go/openshift/controller/controller.go:62 Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/vendor/monis.app/go/openshift/operator/operator.go:33 Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/asm_amd64.s:1337 Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: panic: (controller.die) (0x1a19ea0,0xc0001f3eb0) [recovered] Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: panic: Console: timed out waiting for caches to sync Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: goroutine 313 [running]: Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/console-operator/vendor/monis.app/go/openshift/controller.crash(0x1a19ea0, 0xc0001f3eb0) Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/vendor/monis.app/go/openshift/controller/die.go:8 +0x7a Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc000d60f30, 0x1, 0x1) Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:54 +0xc7 Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: panic(0x1a19ea0, 0xc0001f3eb0) Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 +0x1b5 Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/console-operator/vendor/monis.app/go/openshift/controller.(*controller).Run(0xc00029a1c0, 0x1, 0xc0000b3b00) Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/vendor/monis.app/go/openshift/controller/controller.go:62 +0x3f6 Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/console-operator/vendor/monis.app/go/openshift/operator.(*operator).Run(0xc00083a0c0, 0xc0000b3b00) Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/vendor/monis.app/go/openshift/operator/operator.go:33 +0x8f Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: created by github.com/openshift/console-operator/pkg/console/starter.RunOperator Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/pkg/console/starter/starter.go:212 +0x1247 Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: ,StartedAt:2019-12-13 14:22:01 +0000 UTC,FinishedAt:2019-12-13 14:33:17 +0000 UTC,ContainerID:cri-o://47cf26720b84437c48d56fe896138ea5ddf81c45557f9e3f2e7111c60c5829d7,}} Ready:true RestartC ount:2 Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:baa36e6f1137e46dc97819af23b1bc6e835912f00edf6c921340187e5d58190f ImageID:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:baa36e6f1137e46dc97819af23b1bc6e835912f00edf 6c921340187e5d58190f ContainerID:cri-o://49bfb009eb94da0f872e31ead9f80290ed2e13455b4db85ccd08b5ff6fc1e70f}] QOSClass:Burstable} Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: I0125 00:01:14.166506 1169 status_manager.go:382] Ignoring same status for pod "authentication-operator-59bd6dffb8-vv6vb_openshift-authentication-operator(c6ca9b12-1d46-11ea-973d-0a38d05 4022e)", status: {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 01:20:41 +0000 UTC Reason: Message:} {Type:Ready Status:True LastProbeTime:0001-01-01 00:0 0:00 +0000 UTC LastTransitionTime:2019-12-13 14:44:43 +0000 UTC Reason: Message:} {Type:ContainersReady Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 14:44:43 +0000 UTC Reason: Message:} {Type:PodSc heduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 01:20:40 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.0.158.144 PodIP:10.129.0.96 StartTime:2019-12-13 01:20:41 +0 000 UTC InitContainerStatuses:[] ContainerStatuses:[{Name:operator State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2019-12-13 14:44:43 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:&Conta inerStateTerminated{ExitCode:2,Signal:0,Reason:Error,Message:time/runtime.go:65 Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/controller/controller.go:62 Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/operator/operator.go:33 Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/asm_amd64.s:1337 Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: panic: (controller.die) (0x196b3e0,0xc000513a60) [recovered] Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: panic: AuthenticationOperator2: timed out waiting for caches to sync Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: goroutine 329 [running]: Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/controller.crash(0x196b3e0, 0xc000513a60) Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/controller/die.go:8 +0x7a Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc000a9ff30, 0x1, 0x1) Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:54 +0xc7 Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: panic(0x196b3e0, 0xc000513a60) Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 +0x1b5 Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/controller.(*controller).Run(0xc000170cb0, 0x1, 0xc0000b48a0) Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/controller/controller.go:62 +0x3f6 Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/operator.(*operator).Run(0xc0006ea080, 0xc0000b48a0) Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/operator/operator.go:33 +0x8f Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: created by github.com/openshift/cluster-authentication-operator/pkg/operator2.RunOperator Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/cluster-authentication-operator/pkg/operator2/starter.go:215 +0x1395 Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: ,StartedAt:2019-12-13 14:33:26 +0000 UTC,FinishedAt:2019-12-13 14:44:42 +0000 UTC,ContainerID:cri-o://9bd2862dca8e325d2f3139129d611263daf386fae7fba1f8ff4b977d2e0fea3c,}} Ready:true RestartCount:3 Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d9996c2242abde0a4280c97527ba65664050c3a7615435c803dedbaf3d5e9419 ImageID:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d9996c2242abde0a4280c97527ba65664050c3a7615435c803dedbaf3d5e9419 ContainerID:cri-o://42cfa36dbdc422bda23236f31e29475c840e850cd5c32cd1cf0209bd78348d55}] QOSClass:Burstable} Version-Release number of selected component (if applicable): openshift.4.2.0 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: The OCP master node down. Expected results: The master node is running. Additional info: The customer environment is the same as BZ1783889.
From those logs, it appears that hyperkube is panicking trying to start the `console-operator` and `authentication-operator` pods. Not sure how this would impact the ability to SSH to a host, but may affect the ability to `oc login`? Sending over to Node for additional investigation.
This may be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1801826. The fix that merged reserves more CPU and RAM for Kubelet and CRIO. How much memory is available on the masters?
Hi. Your question is about the master node spec? The master node has 16GB memory. $ head sosreport-ip-10-0-158-144-02570287-2020-01-28-ckwjjxh/proc/meminfo MemTotal: 16420432 kB MemFree: 9454948 kB MemAvailable: 13928356 kB Buffers: 116 kB Cached: 4213008 kB SwapCached: 0 kB Active: 3081624 kB Inactive: 3248440 kB Active(anon): 1693472 kB Inactive(anon): 508 kB
There are various efforts to fix this... We have the previously mentioned fix going into 4.2 as well as other performance fixes going in. *** This bug has been marked as a duplicate of bug 1801826 ***