Description of problem: When a master host OCP4.1 on AWS is CPU usage to spike, the following goroutine stack traces has been recorded as follows. What does the goroutine stack trace is meaning ? How can we translate it ? 1. Event los 2. Failed node event logs 3. Operator status (false only) 1. Events logs using "oc get events" ~~~ openshift-marketplace 41m Warning Unhealthy pod/marketplace-operator-xxxxxxxxxx-dwvtm Readiness probe failed: Get http://10.130.0.68:8080/healthz: dial tcp 10.130.0.68:8080: connect: connection refused openshift-service-catalog-apiserver-operator 21m Normal Created pod/openshift-service-catalog-apiserver-operator-xxxxxxxxxx-mzflx Created container operator openshift-service-catalog-apiserver-operator 21m Normal Started pod/openshift-service-catalog-apiserver-operator-xxxxxxxxxx-mzflx Started container operator openshift-apiserver 34m Warning Unhealthy pod/apiserver-aaaaa Readiness probe failed: HTTP probe failed with statuscode: 500 openshift-marketplace 41m Warning Unhealthy pod/marketplace-operator-xxxxxxxxxx-dwvtm Liveness probe failed: Get http://10.130.0.68:8080/healthz: dial tcp 10.130.0.68:8080: connect: connection refused openshift-sdn 41m Warning Unhealthy pod/sdn-aaaaa Liveness probe failed: Get http://10.0.1.68:10256/healthz: dial tcp 10.0.1.68:10256: connect: connection refused openshift-apiserver 41m Warning Unhealthy pod/apiserver-aaaaa Liveness probe failed: HTTP probe failed with statuscode: 500 openshift-machine-config-operator 41m Warning Unhealthy pod/etcd-quorum-guard-xxxxxxxxxx-aaaaa Readiness probe errored: rpc error: code = Unknown desc = command error: runtime/cgo: pthread_create failed: Resource temporarily unavailable SIGABRT: abort PC=0x7f17532e593f m=0 sigcode=18446744073709551610 goroutine 0 [idle]: runtime: unknown pc 0x7f17532e593f stack: frame={sp:0x7fffa5464480, fp:0x0} stack=[0x7fffa4c65978,0x7fffa54649a0) 00007fffa5464380: 00007fffa5464388 00007fffa54643d8 00007fffa5464390: 00007fffa54643d8 000055f120d679d1 <runtime.persistentalloc1+465> 00007fffa54643a0: 000055f121620990 0000000000000000 00007fffa54643b0: 0000000000010000 0000000000000008 00007fffa54643c0: 0000000000000120 000055f100000000 00007fffa54643d0: 0000000000000120 0000000000000000 00007fffa54643e0: 0000000000000130 0000000000000007 00007fffa54643f0: 0000000000000013 000055f122988098 00007fffa5464400: 000000000000011f 0000000000000150 00007fffa5464410: ffffffffffffffa8 0000000000000004 00007fffa5464420: 0000003400000013 0000000000000000 00007fffa5464430: 0000000000000000 0000006e0000005b 00007fffa5464440: 0000000000000000 0000007c00000077 00007fffa5464450: 0000000000000120 00007f175366cc60 00007fffa5464460: 0000000000000120 0000000000020a20 00007fffa5464470: 00007f1753671710 000055f1229885e0 00007fffa5464480: <0000000000000000 00007f1753334efa 00007fffa5464490: 00007f175009b700 00007fffa54644d0 00007fffa54644a0: 000055f120da3c47 <runtime.step+279> 000055f1214163a2 00007fffa54644b0: 00000000001ea699 00000000001ea699 00007fffa54644c0: 0000027500000002 00000000001ea699 00007fffa54644d0: 00007fffa5464578 000055f120da300f <runtime.pcvalue+319> 00007fffa54644e0: 000055f121434058 00000000001cc9e3 00007fffa54644f0: 00000000001cc9e3 00007fffa5464538 00007fffa5464500: fffffffe7fffffff ffffffffffffffff 00007fffa5464510: ffffffffffffffff ffffffffffffffff 00007fffa5464520: ffffffffffffffff ffffffffffffffff 00007fffa5464530: ffffffffffffffff ffffffffffffffff 00007fffa5464540: ffffffffffffffff ffffffffffffffff 00007fffa5464550: ffffffffffffffff ffffffffffffffff 00007fffa5464560: ffffffffffffffff ffffffffffffffff 00007fffa5464570: ffffffffffffffff ffffffffffffffff runtime: unknown pc 0x7f17532e593f stack: frame={sp:0x7fffa5464480, fp:0x0} stack=[0x7fffa4c65978,0x7fffa54649a0) 00007fffa5464380: 00007fffa5464388 00007fffa54643d8 00007fffa5464390: 00007fffa54643d8 000055f120d679d1 <runtime.persistentalloc1+465> 00007fffa54643a0: 000055f121620990 0000000000000000 00007fffa54643b0: 0000000000010000 0000000000000008 00007fffa54643c0: 0000000000000120 000055f100000000 00007fffa54643d0: 0000000000000120 0000000000000000 00007fffa54643e0: 0000000000000130 0000000000000007 00007fffa54643f0: 0000000000000013 000055f122988098 00007fffa5464400: 000000000000011f 0000000000000150 00007fffa5464410: ffffffffffffffa8 0000000000000004 00007fffa5464420: 0000003400000013 0000000000000000 00007fffa5464430: 0000000000000000 0000006e0000005b 00007fffa5464440: 0000000000000000 0000007c00000077 00007fffa5464450: 0000000000000120 00007f175366cc60 00007fffa5464460: 0000000000000120 0000000000020a20 00007fffa5464470: 00007f1753671710 000055f1229885e0 00007fffa5464480: <0000000000000000 00007f1753334efa 00007fffa5464490: 00007f175009b700 00007fffa54644d0 00007fffa54644a0: 000055f120da3c47 <runtime.step+279> 000055f1214163a2 00007fffa54644b0: 00000000001ea699 00000000001ea699 00007fffa54644c0: 0000027500000002 00000000001ea699 00007fffa54644d0: 00007fffa5464578 000055f120da300f <runtime.pcvalue+319> 00007fffa54644e0: 000055f121434058 00000000001cc9e3 00007fffa54644f0: 00000000001cc9e3 00007fffa5464538 00007fffa5464500: fffffffe7fffffff ffffffffffffffff 00007fffa5464510: ffffffffffffffff ffffffffffffffff 00007fffa5464520: ffffffffffffffff ffffffffffffffff 00007fffa5464530: ffffffffffffffff ffffffffffffffff 00007fffa5464540: ffffffffffffffff ffffffffffffffff 00007fffa5464550: ffffffffffffffff ffffffffffffffff 00007fffa5464560: ffffffffffffffff ffffffffffffffff 00007fffa5464570: ffffffffffffffff ffffffffffffffff goroutine 1 [running, locked to thread]: runtime.asmcgocall(0x55f12104a890, 0xc000028720) /usr/lib/golang/src/runtime/asm_amd64.s:622 +0x3f fp=0xc0000286f8 sp=0xc0000286f0 pc=0x55f120db2f3f runtime.newm1(0xc00005e380) /usr/lib/golang/src/runtime/proc.go:1935 +0xc8 fp=0xc000028748 sp=0xc0000286f8 pc=0x55f120d8bd28 runtime.newm(0x55f1213cd040, 0x0) /usr/lib/golang/src/runtime/proc.go:1919 +0x9f fp=0xc000028778 sp=0xc000028748 pc=0x55f120d8bbaf runtime.startTemplateThread() /usr/lib/golang/src/runtime/proc.go:1955 +0x51 fp=0xc000028798 sp=0xc000028778 pc=0x55f120d8bdf1 runtime.main() /usr/lib/golang/src/runtime/proc.go:184 +0x19c fp=0xc0000287e0 sp=0xc000028798 pc=0x55f120d8799c runtime.goexit() goroutine 1 [running, locked to thread]: runtime.asmcgocall(0x55f12104a890, 0xc000028720) /usr/lib/golang/src/runtime/asm_amd64.s:622 +0x3f fp=0xc0000286f8 sp=0xc0000286f0 pc=0x55f120db2f3f runtime.newm1(0xc00005e380) /usr/lib/golang/src/runtime/proc.go:1935 +0xc8 fp=0xc000028748 sp=0xc0000286f8 pc=0x55f120d8bd28 runtime.newm(0x55f1213cd040, 0x0) /usr/lib/golang/src/runtime/proc.go:1919 +0x9f fp=0xc000028778 sp=0xc000028748 pc=0x55f120d8bbaf runtime.startTemplateThread() /usr/lib/golang/src/runtime/proc.go:1955 +0x51 fp=0xc000028798 sp=0xc000028778 pc=0x55f120d8bdf1 runtime.main() /usr/lib/golang/src/runtime/proc.go:184 +0x19c fp=0xc0000287e0 sp=0xc000028798 pc=0x55f120d8799c runtime.goexit() /usr/lib/golang/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc0000287e8 sp=0xc0000287e0 pc=0x55f120db3801 goroutine 2 [force gc (idle)]: runtime.gopark(0x55f1213ccf98, 0x55f121620970, 0x1410, 0x1) /usr/lib/golang/src/runtime/proc.go:302 +0xf1 fp=0xc000028f80 sp=0xc000028f60 pc=0x55f120d87e01 runtime.goparkunlock(0x55f121620970, 0x1410, 0x1) /usr/lib/golang/src/runtime/proc.go:308 +0x55 fp=0xc000028fb0 sp=0xc000028f80 pc=0x55f120d87eb5 runtime.forcegchelper() /usr/lib/golang/src/runtime/proc.go:251 +0xb7 fp=0xc000028fe0 sp=0xc000028fb0 pc=0x55f120d87c77 runtime.goexit() /usr/lib/golang/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc000028fe8 sp=0xc000028fe0 pc=0x55f120db3801 created by runtime.init.5 /usr/lib/golang/src/runtime/proc.go:240 +0x37 goroutine 3 [GC sweep wait]: runtime.gopark(0x55f1213ccf98, 0x55f121620b00, 0x55f120d7140c, 0x1) /usr/lib/golang/src/runtime/proc.go:302 +0xf1 fp=0xc000029780 sp=0xc000029760 pc=0x55f120d87e01 runtime.goparkunlock(0x55f121620b00, 0x55f1210a140c, 0x1) /usr/lib/golang/src/runtime/proc.go:308 +0x55 fp=0xc0000297b0 sp=0xc000029780 pc=0x55f120d87eb5 runtime.bgsweep(0xc000052000) runtime.goexit() /usr/lib/golang/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc0000297e0 sp=0xc0000297d8 pc=0x55f120db3801 created by runtime.gcenable /usr/lib/golang/src/runtime/mgc.go:216 +0x5a rax 0x0 rbx 0x6 rcx 0x7f17532e593f rdx 0x0 rdi 0x2 rsi 0x7fffa5464480 rbp 0x55f1210ae632 rsp 0x7fffa5464480 r8 0x0 r9 0x7fffa5464480 r10 0x8 r11 0x246 r12 0x55f122988720 r13 0x11 r14 0x55f1210a12b8 r15 0x0 rip 0x7f17532e593f rflags 0x246 cs 0x33 fs 0x0 gs 0x0 ----- SIGQUIT: quit PC=0x55f120db5683 m=2 sigcode=0 goroutine 0 [idle]: runtime.futex(0x55f121621230, 0x80, 0x7f175109cd50, 0x0, 0x7f1700000000, 0x55f120db52b8, 0x3c, 0x0, 0x7f175109cd68, 0x55f120d65970, ...) /usr/lib/golang/src/runtime/sys_linux_amd64.s:532 +0x23 fp=0x7f175109cd20 sp=0x7f175109cd18 pc=0x55f120db5683 runtime.futexsleep(0x55f121621230, 0x0, 0xdf8475800) /usr/lib/golang/src/runtime/os_linux.go:63 +0xc0 fp=0x7f175109cd70 sp=0x7f175109cd20 pc=0x55f120d83b10 runtime.notetsleep_internal(0x55f121621230, 0xdf8475800, 0x989680) /usr/lib/golang/src/runtime/lock_futex.go:193 +0x130 fp=0x7f175109cda8 sp=0x7f175109cd70 pc=0x55f120d65970 runtime.notetsleep(0x55f121621230, 0xdf8475800, 0x0) /usr/lib/golang/src/runtime/lock_futex.go:216 +0x58 fp=0x7f175109cdd0 sp=0x7f175109cda8 pc=0x55f120d65a48 runtime.sysmon() /usr/lib/golang/src/runtime/proc.go:4369 +0x175 fp=0x7f175109ce58 sp=0x7f175109cdd0 pc=0x55f120d92655 runtime.mstart1() /usr/lib/golang/src/runtime/proc.go:1282 +0xea fp=0x7f175109ce80 sp=0x7f175109ce58 pc=0x55f120d8ab7a runtime.mstart() /usr/lib/golang/src/runtime/proc.go:1248 +0x70 fp=0x7f175109cea8 sp=0x7f175109ce80 pc=0x55f120d8aa70 rax 0xfffffffffffffffc rbx 0x0 rcx 0x55f120db5683 rdx 0x0 rdi 0x55f121621230 rsi 0x80 rbp 0x7f175109cd60 rsp 0x7f175109cd18 r8 0x0 r9 0x0 r10 0x7f175109cd50 r11 0x246 r12 0x7fffa546462e r13 0x7fffa546462f r14 0x7fffa5464720 r15 0x7f175109cfc0 rip 0x55f120db5683 rflags 0x246 cs 0x33 fs 0x0 gs 0x0 ----- SIGQUIT: quit PC=0x55f120db5681 m=3 sigcode=0 goroutine 0 [idle]: runtime.futex(0xc00002c840, 0x80, 0x0, 0x0, 0x0, 0x55f120d83b5c, 0x0, 0x0, 0x, stdout: , stderr: , exit code -1 openshift-machine-api 16m Warning BackOff pod/cluster-autoscaler-operator-xxxxxxxxxx-aaaaa Back-off restarting failed container openshift-operator-lifecycle-manager 19m Warning BackOff pod/catalog-operator-xxxxxxxxxx-xmpmm Back-off restarting failed container openshift-authentication 41m Warning Unhealthy pod/oauth-openshift-xxxxxxxxxx-aaaaa Liveness probe failed: Get https://10.130.0.65:6443/healthz: dial tcp 10.130.0.65:6443: connect: connection refused openshift-authentication 34m Warning Unhealthy pod/oauth-openshift-xxxxxxxxxx-aaaaa Readiness probe failed: Get https://10.130.0.65:6443/healthz: dial tcp 10.130.0.65:6443: connect: connection refused openshift-apiserver 41m Warning Unhealthy pod/apiserver-aaaaa Readiness probe failed: Get https://10.130.0.11:8443/healthz: dial tcp 10.130.0.11:8443: connect: connection refused openshift-marketplace 21m Warning BackOff pod/marketplace-operator-xxxxxxxxxx-dwvtm Back-off restarting failed container openshift-machine-config-operator 41m Warning Unhealthy pod/etcd-quorum-guard-xxxxxxxxxx-aaaaa Readiness probe errored: rpc error: code = Unknown desc = command error: runtime/cgo: pthread_create failed: Resource temporarily unavailable runtime/cgo: pthread_create failed: Resource temporarily unavailable SIGABRT: abort PC=0x7fc5a5c5e93f m=3 sigcode=18446744073709551610 goroutine 0 [idle]: runtime: unknown pc 0x7fc5a5c5e93f stack: frame={sp:0x7fc5a3214820, fp:0x0} stack=[0x7fc5a2a15288,0x7fc5a3214e88) 00007fc5a3214720: 0000000000000000 0000000000000000 : ~~~ 2. Failed node events logs ~~~ : openshift-machine-config-operator etcd-quorum-guard-xxxxxxxxxx-aaaaa 10m (0%) 0 (0%) 5Mi (0%) 0 (0%) 66d : Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 1720m (49%) 0 (0%) memory 5315Mi (34%) 1748Mi (11%) ephemeral-storage 0 (0%) 0 (0%) attachable-volumes-aws-ebs 0 0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeReady 31m (x3 over 16h) kubelet, ip-10-0-1-68.ap-northeast-1.compute.internal Node ip-10-0-1-68.ap-northeast-1.compute.internal status is now: NodeReady Normal NodeNotReady 25m (x4 over 16h) kubelet, ip-10-0-1-68.ap-northeast-1.compute.internal Node ip-10-0-1-68.ap-northeast-1.compute.internal status is now: NodeNotReady Warning SystemOOM 25m kubelet, ip-10-0-1-68.ap-northeast-1.compute.internal System OOM encountered ~~~ 3. Operator Status (only false) ~~~ machine-config 4.1.0 False False True 89d monitoring 4.1.0 False True True 65d ~~~ Version-Release number of selected component (if applicable): OCP 4.1.0 How reproducible: N/A Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0062