with go 1.18, kube-apiserver memory usage is high, see https://github.com/kubernetes/kubernetes/issues/108357. Looks like 'GOGC=63' brings back memory usage to the original level but this is from the upstream 5K test. See https://github.com/kubernetes/kubernetes/issues/108357#issuecomment-1056901991 Is 'GOGC=63' suitable for openshift? we need to do some perf testing specific to openshift and try to find a suitable vale for GOGC and set it appropriately when we build openshift kube-apiserver.
I am fine if we go with 'GOGC=63', but I think we should run some sort of scale test with openshift and capture some data points in order for us to to make this decision. one option is to wait and see what the scale test run by perf team (equivalent to upstream 5K) shows with and without GOGC defined.
I think we should address this for all three apiservers
Since the effect of the Go 1.18 changes is dependent on load characteristics and won't be uniform across all clusters, the current plan is to provide a knob for admins to tweak GOGC. There should also be a release note / upgrade checklist item so that admins can proactively tune their clusters if necessary.
Verification steps: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-23-153912 True False 3h57m Cluster version is 4.11.0-0.nightly-2022-06-23-153912 Check the default GOGC setting for kube-apiserver, $ oc exec kube-apiserver-kewang-2411op2-dnk84-master-0 -n openshift-kube-apiserver -- printenv | grep -i gogc GOGC=100 $ oc get pod -n kube-apiserver-kewang-2411op2-dnk84-master-0 -n openshift-kube-apiserver -oyaml | grep -iA1 gogc - name: GOGC value: "100" -- - name: GOGC value: "100" -- - name: GOGC value: "100" Change the default GOGC setting for kube-apiserver in the first terminal, $ oc get no NAME STATUS ROLES AGE VERSION kewang-2411op2-dnk84-master-0 Ready master 41m v1.24.0+284d62a kewang-2411op2-dnk84-master-1 Ready master 41m v1.24.0+284d62a kewang-2411op2-dnk84-master-2 Ready master 41m v1.24.0+284d62a ... $ oc debug node/kewang-2411op2-dnk84-master-0 ... sh-4.4# chroot /host sh-4.4# cd /etc/kubernetes/manifests sh-4.4# vi kube-apiserver-pod.yaml # replace GOGC value with 63 and save sh-4.4# mv kube-apiserver-pod.yaml .. # kubelet will shutdown kube-apiserver Open second terminal and check the kube-apiserver pods, $ oc get po -n openshift-kube-apiserver -l apiserver NAME READY STATUS RESTARTS AGE kube-apiserver-kewang-2411op2-dnk84-master-1 5/5 Running 0 3h42m kube-apiserver-kewang-2411op2-dnk84-master-2 5/5 Running 0 3h45m Move back the kube-apiserver-pod.yaml to manifest directory in the first terminal, Check kube-apiserver pods in the second terminal after a while, the kube-apiserver pod is started up. $ oc get po -n openshift-kube-apiserver -l apiserver NAME READY STATUS RESTARTS AGE kube-apiserver-kewang-2411op2-dnk84-master-0 3/5 Running 0 7s kube-apiserver-kewang-2411op2-dnk84-master-1 5/5 Running 0 3h43m kube-apiserver-kewang-2411op2-dnk84-master-2 5/5 Running 0 3h46m Apply the same steps to other kube-apiservers for GOGC setting, check the results, $ oc exec kube-apiserver-kewang-2411op2-dnk84-master-0 -n openshift-kube-apiserver -- printenv | grep -i gogc GOGC=63 $ oc exec kube-apiserver-kewang-2411op2-dnk84-master-1 -n openshift-kube-apiserver -- printenv | grep -i gogc GOGC=63 $ oc exec kube-apiserver-kewang-2411op2-dnk84-master-2 -n openshift-kube-apiserver -- printenv | grep -i gogc GOGC=63 Check the cluster operators, $ oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.11.0-0.nightly-2022-06-23-153912 True False False 40m baremetal 4.11.0-0.nightly-2022-06-23-153912 True False False 7h41m cloud-controller-manager 4.11.0-0.nightly-2022-06-23-153912 True False False 7h43m cloud-credential 4.11.0-0.nightly-2022-06-23-153912 True False False 7h44m cluster-autoscaler 4.11.0-0.nightly-2022-06-23-153912 True False False 7h41m config-operator 4.11.0-0.nightly-2022-06-23-153912 True False False 7h42m console 4.11.0-0.nightly-2022-06-23-153912 True False False 7h21m csi-snapshot-controller 4.11.0-0.nightly-2022-06-23-153912 True False False 7h42m dns 4.11.0-0.nightly-2022-06-23-153912 True False False 7h41m etcd 4.11.0-0.nightly-2022-06-23-153912 True False False 7h30m image-registry 4.11.0-0.nightly-2022-06-23-153912 True False False 7h24m ingress 4.11.0-0.nightly-2022-06-23-153912 True False False 7h24m insights 4.11.0-0.nightly-2022-06-23-153912 True False False 7h28m kube-apiserver 4.11.0-0.nightly-2022-06-23-153912 True False False 7h29m kube-controller-manager 4.11.0-0.nightly-2022-06-23-153912 True False False 7h38m kube-scheduler 4.11.0-0.nightly-2022-06-23-153912 True False False 7h37m kube-storage-version-migrator 4.11.0-0.nightly-2022-06-23-153912 True False False 56m machine-api 4.11.0-0.nightly-2022-06-23-153912 True False False 7h38m machine-approver 4.11.0-0.nightly-2022-06-23-153912 True False False 7h41m machine-config 4.11.0-0.nightly-2022-06-23-153912 True False False 40m marketplace 4.11.0-0.nightly-2022-06-23-153912 True False False 7h41m monitoring 4.11.0-0.nightly-2022-06-23-153912 True False False 7h22m network 4.11.0-0.nightly-2022-06-23-153912 True False False 7h42m node-tuning 4.11.0-0.nightly-2022-06-23-153912 True False False 7h41m openshift-apiserver 4.11.0-0.nightly-2022-06-23-153912 True False False 58m openshift-controller-manager 4.11.0-0.nightly-2022-06-23-153912 True False False 7h37m openshift-samples 4.11.0-0.nightly-2022-06-23-153912 True False False 7h25m operator-lifecycle-manager 4.11.0-0.nightly-2022-06-23-153912 True False False 7h41m operator-lifecycle-manager-catalog 4.11.0-0.nightly-2022-06-23-153912 True False False 7h42m operator-lifecycle-manager-packageserver 4.11.0-0.nightly-2022-06-23-153912 True False False 7h26m service-ca 4.11.0-0.nightly-2022-06-23-153912 True False False 7h42m storage 4.11.0-0.nightly-2022-06-23-153912 True False False 7h37m $ oc adm top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% kewang-2411op2-dnk84-master-0 1583m 21% 10589Mi 71% kewang-2411op2-dnk84-master-1 1338m 17% 9318Mi 62% kewang-2411op2-dnk84-master-2 688m 9% 4925Mi 33% kewang-2411op2-dnk84-worker-0-j2vfq 702m 20% 4115Mi 60% kewang-2411op2-dnk84-worker-0-v9vft 1724m 49% 4703Mi 68% Based on above, after GOGC setting changed, the cluster works well, so move the bug VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069