Bug 1779348 - MCO should be updated based on the backport of reserved-cpus feature
Summary: MCO should be updated based on the backport of reserved-cpus feature
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.3.0
Assignee: Sinny Kumari
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On: 1775826 1782893
Blocks: 1771572
TreeView+ depends on / blocked
 
Reported: 2019-12-03 19:42 UTC by Vladik Romanovsky
Modified: 2020-02-09 11:25 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1782893 (view as bug list)
Environment:
Last Closed: 2020-01-23 11:17:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1332 0 None closed Bug 1779348: [release-4.3] vendor: bring in ReservedSystemCPUs changes 2020-06-23 20:53:39 UTC
Github openshift machine-config-operator pull 1338 0 None closed Bug 1779348: vendor: correctly run make go-deps 2020-06-23 20:53:39 UTC
Red Hat Product Errata RHBA-2020:0062 0 None None None 2020-01-23 11:18:04 UTC

Description Vladik Romanovsky 2019-12-03 19:42:34 UTC
Description of problem:

As the reserved cpus feature is being backported to 4.3 (BZ1775826, PR[1])
it is essential to update MCO vendor directory to include the new kubelet schemas. It's needed for MCO to be able to generate correct kubelet config that includes the `ReservedSystemCPUs` field. 


[1] https://github.com/openshift/origin/pull/24224

Comment 1 Vladik Romanovsky 2019-12-04 18:06:59 UTC
It appears that MCO is following openshift/kubernetes/origin-4.3-kubernetes-1.16.0
I've opened a backport[1] of reserved-cpus[2] patch there as well.
Once this is merged, I'll be able to post a PR to MCO to update it's vendor dir.

[1] https://github.com/openshift/kubernetes/pull/101
[2] https://github.com/kubernetes/kubernetes/pull/83592

Comment 2 Vladik Romanovsky 2019-12-05 15:09:43 UTC
After several discussion, I've been told that the correct path is to backport the reserved-cpus PR[1]
to the origin/master branch. This is the only way to update openshift/kubernetes/origin-4.3-kubernetes-1.16.2 branch.
I've opened a PR to address this[2]

Once [2] is merged I will need to push a PR to MCO to retarget the openshift/kubernetes branch from 1.16.0 to 1.16.2


[1] https://github.com/kubernetes/kubernetes/pull/83592
[2] https://github.com/openshift/origin/pull/24257

Comment 3 Sinny Kumari 2019-12-09 17:19:00 UTC
As far as I see in 4.3 branch of Machine Config Operator, it vendors v1.16.0-beta.0.0.20190913145653 from github.com/openshift/kubernetes repo https://github.com/openshift/machine-config-operator/blob/release-4.3/go.mod#L107. How exactly making changes to openshift/origin/master branch flows into openshift/kubernetes/origin-4.3-kubernetes-1.16.2 branch ? Is there any documentation which explains this and can be used as reference?

Also, dwe may need to update MCO vendor directory in master branch first to make sure changes are working fine. MCO master branch vendors kubernetes v1.16.0-beta.0.0.20190913145653 as well.

Comment 4 Vladik Romanovsky 2019-12-09 19:07:13 UTC
(In reply to Sinny Kumari from comment #3)
> As far as I see in 4.3 branch of Machine Config Operator, it vendors
> v1.16.0-beta.0.0.20190913145653 from github.com/openshift/kubernetes repo
> https://github.com/openshift/machine-config-operator/blob/release-4.3/go.
> mod#L107. How exactly making changes to openshift/origin/master branch flows
> into openshift/kubernetes/origin-4.3-kubernetes-1.16.2 branch ? Is there any
> documentation which explains this and can be used as reference?
> 
> Also, dwe may need to update MCO vendor directory in master branch first to
> make sure changes are working fine. MCO master branch vendors kubernetes
> v1.16.0-beta.0.0.20190913145653 as well.

In BZ1775826 we've backported a reserved-cpus PR to origin/release-4.3[1]. 
However, this code wasn't in origin/master and wasn't copied to the openshift/kubernetes branch.
Therefore, I've opened a backport[2] to the origin/master branch and once it'll get merged [1] will be copied to openshift/kubernetes/origin-4.3-kubernetes-1.16.2 
Once this happens we will need to make 4.3 branch of Machine Config Operator to vendor the 1.16.2 branch from github.com/openshift/kubernetes

I'm not aware of any documentation on the subject. The above steps have been taken following a conversation with @rphillips @sjenning @eparis

Vladik

[1] https://github.com/openshift/origin/pull/24224
[2] https://github.com/openshift/origin/pull/24257

Comment 5 Sinny Kumari 2019-12-10 05:03:41 UTC
Being relatively new with OpenShift process, it is a bit confusing to me that why we need to update openshift/origin first instead of directly updating openshift/kubernetes required branch.
Thanks Vladik for explanation.

Comment 11 Michael Nguyen 2019-12-18 19:15:07 UTC
Verified on nightly build of 4.3

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2019-12-18-145749   True        False         19m     Cluster version is 4.3.0-0.nightly-2019-12-18-145749

$ cat ../kubeletconfig.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: cpumanager-enabled
spec:
  machineConfigPoolSelector:
    matchLabels:
      custom-kubelet: enabled
  kubeletConfig:
    reservedSystemCPUs: 0,2
    cpuManagerPolicy: static
    cpuManagerReconcilePeriod: 5s

$ oc label mcp master custom-kubelet=enabled
machineconfigpool.machineconfiguration.openshift.io/master labeled
$ oc get mcp/master --show-labels
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   LABELS
master   rendered-master-38cf6c0a1cec739719f5bf7b7bebc7ad   True      False      False      3              3                   3                     0                      custom-kubelet=enabled,machineconfiguration.openshift.io/mco-built-in=,operator.machineconfiguration.openshift.io/required-for-upgrade=
$ oc apply -f ../kubeletconfig.yaml 
kubeletconfig.machineconfiguration.openshift.io/cpumanager-enabled created
$ oc get kubeletconfig
NAME                 AGE
cpumanager-enabled   5s
$ oc get nodes
NAME                           STATUS                     ROLES    AGE   VERSION
ip-10-0-130-157.ec2.internal   Ready                      worker   28m   v1.16.2
ip-10-0-141-196.ec2.internal   Ready,SchedulingDisabled   master   39m   v1.16.2
ip-10-0-147-46.ec2.internal    Ready                      worker   28m   v1.16.2
ip-10-0-155-125.ec2.internal   Ready                      master   39m   v1.16.2
ip-10-0-162-50.ec2.internal    Ready                      master   38m   v1.16.2
ip-10-0-169-122.ec2.internal   Ready                      worker   28m   v1.16.2
[mnguyen@pet30 4.3]$ oc get nodes
NAME                           STATUS                     ROLES    AGE   VERSION
ip-10-0-130-157.ec2.internal   Ready                      worker   30m   v1.16.2
ip-10-0-141-196.ec2.internal   Ready,SchedulingDisabled   master   41m   v1.16.2
ip-10-0-147-46.ec2.internal    Ready                      worker   30m   v1.16.2
ip-10-0-155-125.ec2.internal   Ready                      master   40m   v1.16.2
ip-10-0-162-50.ec2.internal    Ready                      master   40m   v1.16.2
ip-10-0-169-122.ec2.internal   Ready                      worker   30m   v1.16.2
[mnguyen@pet30 4.3]$ oc get nodes
NAME                           STATUS                     ROLES    AGE   VERSION
ip-10-0-130-157.ec2.internal   Ready                      worker   33m   v1.16.2
ip-10-0-141-196.ec2.internal   Ready                      master   45m   v1.16.2
ip-10-0-147-46.ec2.internal    Ready                      worker   33m   v1.16.2
ip-10-0-155-125.ec2.internal   Ready,SchedulingDisabled   master   44m   v1.16.2
ip-10-0-162-50.ec2.internal    Ready                      master   44m   v1.16.2
ip-10-0-169-122.ec2.internal   Ready                      worker   33m   v1.16.2
$ oc debug node/ip-10-0-141-196.ec2.internal
Starting pod/ip-10-0-141-196ec2internal-debug ...
To use host binaries, run `chroot /host`
If you don't see a command prompt, try pressing enter.
chroot /host
sh-4.4# cat /etc/kubernetes/kubelet.conf 
{"kind":"KubeletConfiguration","apiVersion":"kubelet.config.k8s.io/v1beta1","staticPodPath":"/etc/kubernetes/manifests","syncFrequency":"0s","fileCheckFrequency":"0s","httpCheckFrequency":"0s","rotateCertificates":true,"serverTLSBootstrap":true,"authentication":{"x509":{"clientCAFile":"/etc/kubernetes/kubelet-ca.crt"},"webhook":{"cacheTTL":"0s"},"anonymous":{"enabled":false}},"authorization":{"webhook":{"cacheAuthorizedTTL":"0s","cacheUnauthorizedTTL":"0s"}},"clusterDomain":"cluster.local","clusterDNS":["172.30.0.10"],"streamingConnectionIdleTimeout":"0s","nodeStatusUpdateFrequency":"0s","nodeStatusReportFrequency":"0s","imageMinimumGCAge":"0s","volumeStatsAggPeriod":"0s","cgroupDriver":"systemd","cpuManagerPolicy":"static","cpuManagerReconcilePeriod":"5s","runtimeRequestTimeout":"0s","maxPods":250,"kubeAPIQPS":50,"kubeAPIBurst":100,"serializeImagePulls":false,"evictionPressureTransitionPeriod":"0s","featureGates":{"LegacyNodeRoleBehavior":false,"NodeDisruptionExclusion":true,"RotateKubeletServerCertificate":true,"ServiceNodeExclusion":true,"SupportPodPidsLimit":true},"containerLogMaxSize":"50Mi","systemReserved":{"cpu":"500m","memory":"500Mi"},"reservedSystemCPUs":"0,2"}
sh-4.4# journalctl -t hyperkube | grep reserv
Dec 18 18:24:14 ip-10-0-141-196 hyperkube[2047]: I1218 18:24:14.381830    2047 flags.go:33] FLAG: --kube-reserved=""
Dec 18 18:24:14 ip-10-0-141-196 hyperkube[2047]: I1218 18:24:14.381839    2047 flags.go:33] FLAG: --kube-reserved-cgroup=""
Dec 18 18:24:14 ip-10-0-141-196 hyperkube[2047]: I1218 18:24:14.382209    2047 flags.go:33] FLAG: --qos-reserved=""
Dec 18 18:24:14 ip-10-0-141-196 hyperkube[2047]: I1218 18:24:14.382300    2047 flags.go:33] FLAG: --reserved-cpus=""
Dec 18 18:24:14 ip-10-0-141-196 hyperkube[2047]: I1218 18:24:14.382530    2047 flags.go:33] FLAG: --system-reserved=""
Dec 18 18:24:14 ip-10-0-141-196 hyperkube[2047]: I1218 18:24:14.382538    2047 flags.go:33] FLAG: --system-reserved-cgroup=""
Dec 18 19:08:19 ip-10-0-141-196 hyperkube[2302]: I1218 19:08:19.147085    2302 flags.go:33] FLAG: --kube-reserved=""
Dec 18 19:08:19 ip-10-0-141-196 hyperkube[2302]: I1218 19:08:19.147091    2302 flags.go:33] FLAG: --kube-reserved-cgroup=""
Dec 18 19:08:19 ip-10-0-141-196 hyperkube[2302]: I1218 19:08:19.147311    2302 flags.go:33] FLAG: --qos-reserved=""
Dec 18 19:08:19 ip-10-0-141-196 hyperkube[2302]: I1218 19:08:19.147363    2302 flags.go:33] FLAG: --reserved-cpus=""
Dec 18 19:08:19 ip-10-0-141-196 hyperkube[2302]: I1218 19:08:19.147494    2302 flags.go:33] FLAG: --system-reserved=""
Dec 18 19:08:19 ip-10-0-141-196 hyperkube[2302]: I1218 19:08:19.147500    2302 flags.go:33] FLAG: --system-reserved-cgroup=""
Dec 18 19:08:19 ip-10-0-141-196 hyperkube[2302]: I1218 19:08:19.452529    2302 server.go:679] Option --reserved-cpus is specified, it will overwrite the cpu setting in KubeReserved="map[]", SystemReserved="map[cpu:500m memory:500Mi]".
Dec 18 19:08:19 ip-10-0-141-196 hyperkube[2302]: I1218 19:08:19.462598    2302 policy_static.go:110] [cpumanager] reserved 2 CPUs ("0,2") not available for exclusive assignment
sh-4.4# exit
exit
sh-4.2# exit
exit

Removing debug pod ...

Comment 13 errata-xmlrpc 2020-01-23 11:17:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0062


Note You need to log in before you can comment on or make changes to this bug.