Bug 1782893 - MCO should be updated based on the backport of reserved-cpus feature
Summary: MCO should be updated based on the backport of reserved-cpus feature
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.4.0
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1775826
Blocks: 1771572 1779348
TreeView+ depends on / blocked
 
Reported: 2019-12-12 15:18 UTC by Antonio Murdaca
Modified: 2020-05-04 11:20 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1779348
Environment:
Last Closed: 2020-05-04 11:19:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1159 0 None closed Bug 1777150: pkg/controller: allow kubelet config and runtime changes for custom pools 2020-04-28 12:07:38 UTC
Github openshift machine-config-operator pull 1331 0 None closed Bug 1782893: vendor: bring in ReservedSystemCPUs changes 2020-04-28 12:07:38 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:20:27 UTC

Description Antonio Murdaca 2019-12-12 15:18:26 UTC
+++ This bug was initially created as a clone of Bug #1779348 +++

Description of problem:

As the reserved cpus feature is being backported to 4.3 (BZ1775826, PR[1])
it is essential to update MCO vendor directory to include the new kubelet schemas. It's needed for MCO to be able to generate correct kubelet config that includes the `ReservedSystemCPUs` field. 


[1] https://github.com/openshift/origin/pull/24224

--- Additional comment from Vladik Romanovsky on 2019-12-04 18:06:59 UTC ---

It appears that MCO is following openshift/kubernetes/origin-4.3-kubernetes-1.16.0
I've opened a backport[1] of reserved-cpus[2] patch there as well.
Once this is merged, I'll be able to post a PR to MCO to update it's vendor dir.

[1] https://github.com/openshift/kubernetes/pull/101
[2] https://github.com/kubernetes/kubernetes/pull/83592

--- Additional comment from Vladik Romanovsky on 2019-12-05 15:09:43 UTC ---

After several discussion, I've been told that the correct path is to backport the reserved-cpus PR[1]
to the origin/master branch. This is the only way to update openshift/kubernetes/origin-4.3-kubernetes-1.16.2 branch.
I've opened a PR to address this[2]

Once [2] is merged I will need to push a PR to MCO to retarget the openshift/kubernetes branch from 1.16.0 to 1.16.2


[1] https://github.com/kubernetes/kubernetes/pull/83592
[2] https://github.com/openshift/origin/pull/24257

--- Additional comment from Sinny Kumari on 2019-12-09 17:19:00 UTC ---

As far as I see in 4.3 branch of Machine Config Operator, it vendors v1.16.0-beta.0.0.20190913145653 from github.com/openshift/kubernetes repo https://github.com/openshift/machine-config-operator/blob/release-4.3/go.mod#L107. How exactly making changes to openshift/origin/master branch flows into openshift/kubernetes/origin-4.3-kubernetes-1.16.2 branch ? Is there any documentation which explains this and can be used as reference?

Also, dwe may need to update MCO vendor directory in master branch first to make sure changes are working fine. MCO master branch vendors kubernetes v1.16.0-beta.0.0.20190913145653 as well.

--- Additional comment from Vladik Romanovsky on 2019-12-09 19:07:13 UTC ---

(In reply to Sinny Kumari from comment #3)
> As far as I see in 4.3 branch of Machine Config Operator, it vendors
> v1.16.0-beta.0.0.20190913145653 from github.com/openshift/kubernetes repo
> https://github.com/openshift/machine-config-operator/blob/release-4.3/go.
> mod#L107. How exactly making changes to openshift/origin/master branch flows
> into openshift/kubernetes/origin-4.3-kubernetes-1.16.2 branch ? Is there any
> documentation which explains this and can be used as reference?
> 
> Also, dwe may need to update MCO vendor directory in master branch first to
> make sure changes are working fine. MCO master branch vendors kubernetes
> v1.16.0-beta.0.0.20190913145653 as well.

In BZ1775826 we've backported a reserved-cpus PR to origin/release-4.3[1]. 
However, this code wasn't in origin/master and wasn't copied to the openshift/kubernetes branch.
Therefore, I've opened a backport[2] to the origin/master branch and once it'll get merged [1] will be copied to openshift/kubernetes/origin-4.3-kubernetes-1.16.2 
Once this happens we will need to make 4.3 branch of Machine Config Operator to vendor the 1.16.2 branch from github.com/openshift/kubernetes

I'm not aware of any documentation on the subject. The above steps have been taken following a conversation with @rphillips @sjenning @eparis

Vladik

[1] https://github.com/openshift/origin/pull/24224
[2] https://github.com/openshift/origin/pull/24257

--- Additional comment from Sinny Kumari on 2019-12-10 05:03:41 UTC ---

Being relatively new with OpenShift process, it is a bit confusing to me that why we need to update openshift/origin first instead of directly updating openshift/kubernetes required branch.
Thanks Vladik for explanation.

Comment 2 Michael Nguyen 2019-12-18 00:51:23 UTC
Verified on 

$ cat kubeletconfig.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: enabled
  kubeletConfig:
    reservedSystemCPUs: 0,2
    cpuManagerPolicy: static
    cpuManagerReconcilePeriod: 5s

$ oc apply -f kubeletconfig.yaml
NAME                           STATUS                        ROLES    AGE     VERSION
ip-10-0-130-59.ec2.internal    Ready                         master   3h34m   v1.16.2
ip-10-0-133-136.ec2.internal   Ready                         worker   3h23m   v1.16.2
ip-10-0-154-85.ec2.internal    Ready                         worker   3h25m   v1.16.2
ip-10-0-157-203.ec2.internal   NotReady,SchedulingDisabled   master   3h34m   v1.16.2
ip-10-0-160-63.ec2.internal    Ready                         worker   3h24m   v1.16.2
ip-10-0-163-177.ec2.internal   Ready                         master   3h34m   v1.16.2
$ oc get nodes
NAME                           STATUS   ROLES    AGE     VERSION
ip-10-0-130-59.ec2.internal    Ready    master   3h51m   v1.16.2
ip-10-0-133-136.ec2.internal   Ready    worker   3h41m   v1.16.2
ip-10-0-154-85.ec2.internal    Ready    worker   3h42m   v1.16.2
ip-10-0-157-203.ec2.internal   Ready    master   3h51m   v1.16.2
ip-10-0-160-63.ec2.internal    Ready    worker   3h42m   v1.16.2
ip-10-0-163-177.ec2.internal   Ready    master   3h51m   v1.16.2
[mnguyen@pet30 openshift]$ oc debug node/ip-10-0-130-59.ec2.internal
Starting pod/ip-10-0-130-59ec2internal-debug ...
To use host binaries, run `chroot /host`
chroot /hosIf you don't see a command prompt, try pressing enter.
chroot /host
sh-4.4# cat /etc/kubernetes/kubelet.conf 
{"kind":"KubeletConfiguration","apiVersion":"kubelet.config.k8s.io/v1beta1","staticPodPath":"/etc/kubernetes/manifests","syncFrequency":"0s","fileCheckFrequency":"0s","httpCheckFrequency":"0s","rotateCertificates":true,"serverTLSBootstrap":true,"authentication":{"x509":{"clientCAFile":"/etc/kubernetes/kubelet-ca.crt"},"webhook":{"cacheTTL":"0s"},"anonymous":{"enabled":false}},"authorization":{"webhook":{"cacheAuthorizedTTL":"0s","cacheUnauthorizedTTL":"0s"}},"clusterDomain":"cluster.local","clusterDNS":["172.30.0.10"],"streamingConnectionIdleTimeout":"0s","nodeStatusUpdateFrequency":"0s","nodeStatusReportFrequency":"0s","imageMinimumGCAge":"0s","volumeStatsAggPeriod":"0s","cgroupDriver":"systemd","cpuManagerPolicy":"static","cpuManagerReconcilePeriod":"5s","runtimeRequestTimeout":"0s","maxPods":250,"kubeAPIQPS":50,"kubeAPIBurst":100,"serializeImagePulls":false,"evictionPressureTransitionPeriod":"0s","featureGates":{"LegacyNodeRoleBehavior":false,"NodeDisruptionExclusion":true,"RotateKubeletServerCertificate":true,"ServiceNodeExclusion":true,"SupportPodPidsLimit":true},"containerLogMaxSize":"50Mi","systemReserved":{"cpu":"500m","memory":"500Mi"},"reservedSystemCPUs":"0,2"}
sh-4.4# journalctl -t hyperkube | grep reserv
Dec 17 20:49:36 ip-10-0-130-59 hyperkube[2052]: I1217 20:49:36.671184    2052 flags.go:33] FLAG: --kube-reserved=""
Dec 17 20:49:36 ip-10-0-130-59 hyperkube[2052]: I1217 20:49:36.671193    2052 flags.go:33] FLAG: --kube-reserved-cgroup=""
Dec 17 20:49:36 ip-10-0-130-59 hyperkube[2052]: I1217 20:49:36.671553    2052 flags.go:33] FLAG: --qos-reserved=""
Dec 17 20:49:36 ip-10-0-130-59 hyperkube[2052]: I1217 20:49:36.671637    2052 flags.go:33] FLAG: --reserved-cpus=""
Dec 17 20:49:36 ip-10-0-130-59 hyperkube[2052]: I1217 20:49:36.671841    2052 flags.go:33] FLAG: --system-reserved=""
Dec 17 20:49:36 ip-10-0-130-59 hyperkube[2052]: I1217 20:49:36.671850    2052 flags.go:33] FLAG: --system-reserved-cgroup=""
Dec 17 22:49:30 ip-10-0-130-59 hyperkube[2294]: I1217 22:49:30.217739    2294 flags.go:33] FLAG: --kube-reserved=""
Dec 17 22:49:30 ip-10-0-130-59 hyperkube[2294]: I1217 22:49:30.217744    2294 flags.go:33] FLAG: --kube-reserved-cgroup=""
Dec 17 22:49:30 ip-10-0-130-59 hyperkube[2294]: I1217 22:49:30.218049    2294 flags.go:33] FLAG: --qos-reserved=""
Dec 17 22:49:30 ip-10-0-130-59 hyperkube[2294]: I1217 22:49:30.218100    2294 flags.go:33] FLAG: --reserved-cpus=""
Dec 17 22:49:30 ip-10-0-130-59 hyperkube[2294]: I1217 22:49:30.218227    2294 flags.go:33] FLAG: --system-reserved=""
Dec 17 22:49:30 ip-10-0-130-59 hyperkube[2294]: I1217 22:49:30.218235    2294 flags.go:33] FLAG: --system-reserved-cgroup=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.700468    2290 flags.go:33] FLAG: --kube-reserved=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.700473    2290 flags.go:33] FLAG: --kube-reserved-cgroup=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.700690    2290 flags.go:33] FLAG: --qos-reserved=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.700744    2290 flags.go:33] FLAG: --reserved-cpus=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.700866    2290 flags.go:33] FLAG: --system-reserved=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.700871    2290 flags.go:33] FLAG: --system-reserved-cgroup=""
Dec 17 23:17:59 ip-10-0-130-59 hyperkube[2290]: I1217 23:17:59.933467    2290 server.go:679] Option --reserved-cpus is specified, it will overwrite the cpu setting in KubeReserved="map[]", SystemReserved="map[cpu:500m memory:500Mi]".
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.342761    2282 flags.go:33] FLAG: --kube-reserved=""
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.342766    2282 flags.go:33] FLAG: --kube-reserved-cgroup=""
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.342987    2282 flags.go:33] FLAG: --qos-reserved=""
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.343036    2282 flags.go:33] FLAG: --reserved-cpus=""
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.343174    2282 flags.go:33] FLAG: --system-reserved=""
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.343179    2282 flags.go:33] FLAG: --system-reserved-cgroup=""
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.659464    2282 server.go:679] Option --reserved-cpus is specified, it will overwrite the cpu setting in KubeReserved="map[]", SystemReserved="map[cpu:500m memory:500Mi]".
Dec 18 00:35:33 ip-10-0-130-59 hyperkube[2282]: I1218 00:35:33.663561    2282 policy_static.go:110] [cpumanager] reserved 2 CPUs ("0,2") not available for exclusive assignment

Comment 3 Michael Nguyen 2019-12-18 00:52:04 UTC
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2019-12-16-124946   True        False         3h38m   Cluster version is 4.4.0-0.nightly-2019-12-16-124946

Comment 5 errata-xmlrpc 2020-05-04 11:19:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.