Bug 1797828 - Can't oc login nor ssh access master node.
Summary: Can't oc login nor ssh access master node.
Keywords:
Status: CLOSED DUPLICATE of bug 1801826
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Ryan Phillips
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-04 00:59 UTC by ryoji noma
Modified: 2023-09-07 21:42 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-12 13:32:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description ryoji noma 2020-02-04 00:59:00 UTC
Description of problem:

One OCP master node of the three downed in the customer environment. The customer was not able to oc login nor ssh access. It was repaired after rebooting.
The following two types of logs were repeated, was it related?


Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: I0125 00:00:53.166475    1169 status_manager.go:382] Ignoring same status for pod "console-operator-566bf94bbd-lgqhl_openshift-console-operator(c6c80abf-1d46-11ea-973d-0a38d054022e)", statu
s: {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 01:20:40 +0000 UTC Reason: Message:} {Type:Ready Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC
 LastTransitionTime:2019-12-13 14:33:19 +0000 UTC Reason: Message:} {Type:ContainersReady Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 14:33:19 +0000 UTC Reason: Message:} {Type:PodScheduled Status
:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 01:20:40 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.0.158.144 PodIP:10.129.0.95 StartTime:2019-12-13 01:20:40 +0000 UTC InitCo
ntainerStatuses:[] ContainerStatuses:[{Name:console-operator State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2019-12-13 14:33:18 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:&ContainerSt
ateTerminated{ExitCode:2,Signal:0,Reason:Error,Message:c/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/vendor/monis.app/go/openshift/controller/controller.go:62
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/vendor/monis.app/go/openshift/operator/operator.go:33
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/asm_amd64.s:1337
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: panic: (controller.die) (0x1a19ea0,0xc0001f3eb0) [recovered]
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]:         panic: Console: timed out waiting for caches to sync
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: goroutine 313 [running]:
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/console-operator/vendor/monis.app/go/openshift/controller.crash(0x1a19ea0, 0xc0001f3eb0)
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]:         /go/src/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/vendor/monis.app/go/openshift/controller/die.go:8 +0x7a
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc000d60f30, 0x1, 0x1)
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]:         /go/src/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:54 +0xc7
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: panic(0x1a19ea0, 0xc0001f3eb0)
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]:         /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 +0x1b5
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/console-operator/vendor/monis.app/go/openshift/controller.(*controller).Run(0xc00029a1c0, 0x1, 0xc0000b3b00)
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]:         /go/src/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/vendor/monis.app/go/openshift/controller/controller.go:62 +0x3f6
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/console-operator/vendor/monis.app/go/openshift/operator.(*operator).Run(0xc00083a0c0, 0xc0000b3b00)
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]:         /go/src/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/vendor/monis.app/go/openshift/operator/operator.go:33 +0x8f
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: created by github.com/openshift/console-operator/pkg/console/starter.RunOperator
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]:         /go/src/github.com/openshift/console-operator/_output/local/go/src/github.com/openshift/console-operator/pkg/console/starter/starter.go:212 +0x1247
Jan 25 00:00:53 ip-10-0-158-144 hyperkube[1169]: ,StartedAt:2019-12-13 14:22:01 +0000 UTC,FinishedAt:2019-12-13 14:33:17 +0000 UTC,ContainerID:cri-o://47cf26720b84437c48d56fe896138ea5ddf81c45557f9e3f2e7111c60c5829d7,}} Ready:true RestartC
ount:2 Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:baa36e6f1137e46dc97819af23b1bc6e835912f00edf6c921340187e5d58190f ImageID:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:baa36e6f1137e46dc97819af23b1bc6e835912f00edf
6c921340187e5d58190f ContainerID:cri-o://49bfb009eb94da0f872e31ead9f80290ed2e13455b4db85ccd08b5ff6fc1e70f}] QOSClass:Burstable}


Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: I0125 00:01:14.166506    1169 status_manager.go:382] Ignoring same status for pod "authentication-operator-59bd6dffb8-vv6vb_openshift-authentication-operator(c6ca9b12-1d46-11ea-973d-0a38d05
4022e)", status: {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 01:20:41 +0000 UTC Reason: Message:} {Type:Ready Status:True LastProbeTime:0001-01-01 00:0
0:00 +0000 UTC LastTransitionTime:2019-12-13 14:44:43 +0000 UTC Reason: Message:} {Type:ContainersReady Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 14:44:43 +0000 UTC Reason: Message:} {Type:PodSc
heduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2019-12-13 01:20:40 +0000 UTC Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:10.0.158.144 PodIP:10.129.0.96 StartTime:2019-12-13 01:20:41 +0
000 UTC InitContainerStatuses:[] ContainerStatuses:[{Name:operator State:{Waiting:nil Running:&ContainerStateRunning{StartedAt:2019-12-13 14:44:43 +0000 UTC,} Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:&Conta
inerStateTerminated{ExitCode:2,Signal:0,Reason:Error,Message:time/runtime.go:65
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/controller/controller.go:62
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /go/src/github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/operator/operator.go:33
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/asm_amd64.s:1337
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: panic: (controller.die) (0x196b3e0,0xc000513a60) [recovered]
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]:         panic: AuthenticationOperator2: timed out waiting for caches to sync
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: goroutine 329 [running]:
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/controller.crash(0x196b3e0, 0xc000513a60)
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]:         /go/src/github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/controller/die.go:8 +0x7a
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0xc000a9ff30, 0x1, 0x1)
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]:         /go/src/github.com/openshift/cluster-authentication-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:54 +0xc7
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: panic(0x196b3e0, 0xc000513a60)
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]:         /opt/rh/go-toolset-1.12/root/usr/lib/go-toolset-1.12-golang/src/runtime/panic.go:522 +0x1b5
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/controller.(*controller).Run(0xc000170cb0, 0x1, 0xc0000b48a0)
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]:         /go/src/github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/controller/controller.go:62 +0x3f6
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/operator.(*operator).Run(0xc0006ea080, 0xc0000b48a0)
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]:         /go/src/github.com/openshift/cluster-authentication-operator/vendor/monis.app/go/openshift/operator/operator.go:33 +0x8f
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: created by github.com/openshift/cluster-authentication-operator/pkg/operator2.RunOperator
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]:         /go/src/github.com/openshift/cluster-authentication-operator/pkg/operator2/starter.go:215 +0x1395
Jan 25 00:01:14 ip-10-0-158-144 hyperkube[1169]: ,StartedAt:2019-12-13 14:33:26 +0000 UTC,FinishedAt:2019-12-13 14:44:42 +0000 UTC,ContainerID:cri-o://9bd2862dca8e325d2f3139129d611263daf386fae7fba1f8ff4b977d2e0fea3c,}} Ready:true RestartCount:3 Image:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d9996c2242abde0a4280c97527ba65664050c3a7615435c803dedbaf3d5e9419 ImageID:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d9996c2242abde0a4280c97527ba65664050c3a7615435c803dedbaf3d5e9419 ContainerID:cri-o://42cfa36dbdc422bda23236f31e29475c840e850cd5c32cd1cf0209bd78348d55}] QOSClass:Burstable}


Version-Release number of selected component (if applicable):
openshift.4.2.0

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

The OCP master node down.

Expected results:

The master node is running.

Additional info:

The customer environment is the same as BZ1783889.

Comment 6 Micah Abbott 2020-02-12 17:42:01 UTC
From those logs, it appears that hyperkube is panicking trying to start the `console-operator` and `authentication-operator` pods.

Not sure how this would impact the ability to SSH to a host, but may affect the ability to `oc login`?

Sending over to Node for additional investigation.

Comment 7 Ryan Phillips 2020-02-12 18:28:45 UTC
This may be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1801826. The fix that merged reserves more CPU and RAM for Kubelet and CRIO. How much memory is available on the masters?

Comment 8 ryoji noma 2020-02-17 06:02:03 UTC
Hi.

Your question is about the master node spec?
The master node has 16GB memory.

$ head  sosreport-ip-10-0-158-144-02570287-2020-01-28-ckwjjxh/proc/meminfo 
MemTotal:       16420432 kB
MemFree:         9454948 kB
MemAvailable:   13928356 kB
Buffers:             116 kB
Cached:          4213008 kB
SwapCached:            0 kB
Active:          3081624 kB
Inactive:        3248440 kB
Active(anon):    1693472 kB
Inactive(anon):      508 kB

Comment 9 Ryan Phillips 2020-03-12 13:32:53 UTC
There are various efforts to fix this... We have the previously mentioned fix going into 4.2 as well as other performance fixes going in.

*** This bug has been marked as a duplicate of bug 1801826 ***


Note You need to log in before you can comment on or make changes to this bug.