Bug 1782893

Summary: MCO should be updated based on the backport of reserved-cpus feature
Product: OpenShift Container Platform Reporter: Antonio Murdaca <amurdaca>
Component: Machine Config OperatorAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED ERRATA QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: amurdaca, augol, dblack, eparis, fsimonce, ksinny, mnguyen, msivak, mvirgil, sgordon, skumari, smilner, vromanso
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1779348 Environment:
Last Closed: 2020-05-04 11:19:53 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1775826    
Bug Blocks: 1771572, 1779348    

Description Antonio Murdaca 2019-12-12 15:18:26 UTC
+++ This bug was initially created as a clone of Bug #1779348 +++

Description of problem:

As the reserved cpus feature is being backported to 4.3 (BZ1775826, PR[1])
it is essential to update MCO vendor directory to include the new kubelet schemas. It's needed for MCO to be able to generate correct kubelet config that includes the `ReservedSystemCPUs` field. 


[1] https://github.com/openshift/origin/pull/24224

--- Additional comment from Vladik Romanovsky on 2019-12-04 18:06:59 UTC ---

It appears that MCO is following openshift/kubernetes/origin-4.3-kubernetes-1.16.0
I've opened a backport[1] of reserved-cpus[2] patch there as well.
Once this is merged, I'll be able to post a PR to MCO to update it's vendor dir.

[1] https://github.com/openshift/kubernetes/pull/101
[2] https://github.com/kubernetes/kubernetes/pull/83592

--- Additional comment from Vladik Romanovsky on 2019-12-05 15:09:43 UTC ---

After several discussion, I've been told that the correct path is to backport the reserved-cpus PR[1]
to the origin/master branch. This is the only way to update openshift/kubernetes/origin-4.3-kubernetes-1.16.2 branch.
I've opened a PR to address this[2]

Once [2] is merged I will need to push a PR to MCO to retarget the openshift/kubernetes branch from 1.16.0 to 1.16.2


[1] https://github.com/kubernetes/kubernetes/pull/83592
[2] https://github.com/openshift/origin/pull/24257

--- Additional comment from Sinny Kumari on 2019-12-09 17:19:00 UTC ---

As far as I see in 4.3 branch of Machine Config Operator, it vendors v1.16.0-beta.0.0.20190913145653 from github.com/openshift/kubernetes repo https://github.com/openshift/machine-config-operator/blob/release-4.3/go.mod#L107. How exactly making changes to openshift/origin/master branch flows into openshift/kubernetes/origin-4.3-kubernetes-1.16.2 branch ? Is there any documentation which explains this and can be used as reference?

Also, dwe may need to update MCO vendor directory in master branch first to make sure changes are working fine. MCO master branch vendors kubernetes v1.16.0-beta.0.0.20190913145653 as well.

--- Additional comment from Vladik Romanovsky on 2019-12-09 19:07:13 UTC ---

(In reply to Sinny Kumari from comment #3)
> As far as I see in 4.3 branch of Machine Config Operator, it vendors
> v1.16.0-beta.0.0.20190913145653 from github.com/openshift/kubernetes repo
> https://github.com/openshift/machine-config-operator/blob/release-4.3/go.
> mod#L107. How exactly making changes to openshift/origin/master branch flows
> into openshift/kubernetes/origin-4.3-kubernetes-1.16.2 branch ? Is there any
> documentation which explains this and can be used as reference?
> 
> Also, dwe may need to update MCO vendor directory in master branch first to
> make sure changes are working fine. MCO master branch vendors kubernetes
> v1.16.0-beta.0.0.20190913145653 as well.

In BZ1775826 we've backported a reserved-cpus PR to origin/release-4.3[1]. 
However, this code wasn't in origin/master and wasn't copied to the openshift/kubernetes branch.
Therefore, I've opened a backport[2] to the origin/master branch and once it'll get merged [1] will be copied to openshift/kubernetes/origin-4.3-kubernetes-1.16.2 
Once this happens we will need to make 4.3 branch of Machine Config Operator to vendor the 1.16.2 branch from github.com/openshift/kubernetes

I'm not aware of any documentation on the subject. The above steps have been taken following a conversation with @rphillips @sjenning @eparis

Vladik

[1] https://github.com/openshift/origin/pull/24224
[2] https://github.com/openshift/origin/pull/24257

--- Additional comment from Sinny Kumari on 2019-12-10 05:03:41 UTC ---

Being relatively new with OpenShift process, it is a bit confusing to me that why we need to update openshift/origin first instead of directly updating openshift/kubernetes required branch.
Thanks Vladik for explanation.

Comment 2 Michael Nguyen 2019-12-18 00:51:23 UTC
Verified on 

$ cat kubeletconfig.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: enabled
  kubeletConfig:
    reservedSystemCPUs: 0,2
    cpuManagerPolicy: static
    cpuManagerReconcilePeriod: 5s

$ oc apply -f kubeletconfig.yaml
NAME                           STATUS                        ROLES    AGE     VERSION
ip-10-0-130-59.ec2.internal    Ready                         master   3h34m   v1.16.2
ip-10-0-133-136.ec2.internal   Ready                         worker   3h23m   v1.16.2
ip-10-0-154-85.ec2.internal    Ready                         worker   3h25m   v1.16.2
ip-10-0-157-203.ec2.internal   NotReady,SchedulingDisabled   master   3h34m   v1.16.2
ip-10-0-160-63.ec2.internal    Ready                         worker   3h24m   v1.16.2
ip-10-0-163-177.ec2.internal   Ready                         master   3h34m   v1.16.2
$ oc get nodes
NAME                           STATUS   ROLES    AGE     VERSION
ip-10-0-130-59.ec2.internal    Ready    master   3h51m   v1.16.2
ip-10-0-133-136.ec2.internal   Ready    worker   3h41m   v1.16.2
ip-10-0-154-85.ec2.internal    Ready    worker   3h42m   v1.16.2
ip-10-0-157-203.ec2.internal   Ready    master   3h51m   v1.16.2
ip-10-0-160-63.ec2.internal    Ready    worker   3h42m   v1.16.2
ip-10-0-163-177.ec2.internal   Ready    master   3h51m   v1.16.2
[mnguyen@pet30 openshift]$ oc debug node/ip-10-0-130-59.ec2.internal
Starting pod/ip-10-0-130-59ec2internal-debug ...
To use host binaries, run `chroot /host`
chroot /hosIf you don't see a command prompt, try pressing enter.
chroot /host
sh-4.4# cat /etc/kubernetes/kubelet.conf 
{"kind":"KubeletConfiguration","apiVersion":"kubelet.config.k8s.io/v1beta1","staticPodPath":"/etc/kubernetes/manifests","syncFrequency":"0s","fileCheckFrequency":"0s","httpCheckFrequency":"0s","rotateCertificates":true,"serverTLSBootstrap":true,"authentication":{"x509":{"clientCAFile":"/etc/kubernetes/kubelet-ca.crt"},"webhook":{"cacheTTL":"0s"},"anonymous":{"enabled":false}},"authorization":{"webhook":{"cacheAuthorizedTTL":"0s","cacheUnauthorizedTTL":"0s"}},"clusterDomain":"cluster.local","clusterDNS":["172.30.0.10"],"streamingConnectionIdleTimeout":"0s","nodeStatusUpdateFrequency":"0s","nodeStatusReportFrequency":"0s","imageMinimumGCAge":"0s","volumeStatsAggPeriod":"0s","cgroupDriver":"systemd","cpuManagerPolicy":"static","cpuManagerReconcilePeriod":"5s","runtimeRequestTimeout":"0s","maxPods":250,"kubeAPIQPS":50,"kubeAPIBurst":100,"serializeImagePulls":false,"evictionPressureTransitionPeriod":"0s","featureGates":{"LegacyNodeRoleBehavior":false,"NodeDisruptionExclusion":true,"RotateKubeletServerCertificate":true,"ServiceNodeExclusion":true,"SupportPodPidsLimit":true},"containerLogMaxSize":"50Mi","systemReserved":{"cpu":"500m","memory":"500Mi"},"reservedSystemCPUs":"0,2"}
sh-4.4# journalctl -t hyperkube | grep reserv
Dec 17 20:49:36 ip-10-0-130-59 hyperkube[2052]: I1217 20:49:36.671184    2052 flags.go:33] FLAG: --kube-reserved=""
Dec 17 20:49:36 ip-10-0-130-59 hyperkube[2052]: I1217 20:49:36.671193    2052 flags.go:33] FLAG: --kube-reserved-cgroup=""
Dec 17 20:49:36 ip-10-0-130-59 hyperkube[2052]: I1217 20:49:36.671553    2052 flags.go:33] FLAG: --qos-reserved=""
Dec 17 20:49:36 ip-10-0-130-59 hyperkube[2052]: I1217 20:49:36.671637    2052 flags.go:33] FLAG: --reserved-cpus=""
Dec 17 20:49:36 ip-10-0-130-59 hyperkube[2052]: I1217 20:49:36.671841    2052 flags.go:33] FLAG: --system-reserved=""
Dec 17 20:49:36 ip-10-0-130-59 hyperkube[2052]: I1217 20:49:36.671850    2052 flags.go:33] FLAG: --system-reserved-cgroup=""
Dec 17 22:49:30 ip-10-0-130-59 hyperkube[2294]: I1217 22:49:30.217739    2294 flags.go:33] FLAG: --kube-reserved=""
Dec 17 22:49:30 ip-10-0-130-59 hyperkube[2294]: I1217 22:49:30.217744    2294 flags.go:33] FLAG: --kube-reserved-cgroup=""
Dec 17 22:49:30 ip-10-0-130-59 hyperkube[2294]: I1217 22:49:30.218049    2294 flags.go:33] FLAG: --qos-reserved=""
Dec 17 22:49:30 ip-10-0-130-59 hyperkube[2294]: I1217 22:49:30.218100    2294 flags.go:33] FLAG: --reserved-cpus=""
Dec 17 22:49:30 ip-10-0-130-59 hyperkube[2294]: I1217 22:49:30.218227    2294 flags.go:33] FLAG: --system-reserved=""
Dec 17 22:49:30 ip-10-0-130-59 hyperkube[2294]: I1217 22:49:30.218235    2294 flags.go:33] FLAG: --system-reserved-cgroup=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.700468    2290 flags.go:33] FLAG: --kube-reserved=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.700473    2290 flags.go:33] FLAG: --kube-reserved-cgroup=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.700690    2290 flags.go:33] FLAG: --qos-reserved=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.700744    2290 flags.go:33] FLAG: --reserved-cpus=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.700866    2290 flags.go:33] FLAG: --system-reserved=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.700871    2290 flags.go:33] FLAG: --system-reserved-cgroup=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.933467    2290 server.go:679] Option --reserved-cpus is specified, it will overwrite the cpu setting in KubeReserved="map[]", SystemReserved="map[cpu:500m memory:500Mi]".
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.342761    2282 flags.go:33] FLAG: --kube-reserved=""
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.342766    2282 flags.go:33] FLAG: --kube-reserved-cgroup=""
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.342987    2282 flags.go:33] FLAG: --qos-reserved=""
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.343036    2282 flags.go:33] FLAG: --reserved-cpus=""
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.343174    2282 flags.go:33] FLAG: --system-reserved=""
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.343179    2282 flags.go:33] FLAG: --system-reserved-cgroup=""
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.659464    2282 server.go:679] Option --reserved-cpus is specified, it will overwrite the cpu setting in KubeReserved="map[]", SystemReserved="map[cpu:500m memory:500Mi]".
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.663561    2282 policy_static.go:110] [cpumanager] reserved 2 CPUs ("0,2") not available for exclusive assignment

Comment 3 Michael Nguyen 2019-12-18 00:52:04 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2019-12-16-124946   True        False         3h38m   Cluster version is 4.4.0-0.nightly-2019-12-16-124946

Comment 5 errata-xmlrpc 2020-05-04 11:19:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581