Bug 1787641

Summary: 4.2 kubelets do not report ephemeral-storage capacity
Product: OpenShift Container Platform Reporter: Neelesh Agrawal <nagrawal>
Component: NodeAssignee: Ryan Phillips <rphillips>
Status: CLOSED NOTABUG QA Contact: Sunil Choudhary <schoudha>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.2.zCC: aos-bugs, ccoleman, jack.ottofaro, jokerman, lmohanty, rkrawitz, rphillips, schoudha, wking
Target Milestone: ---   
Target Release: 4.2.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1787427 Environment:
Last Closed: 2020-01-06 20:30:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1787427    
Bug Blocks:    

Description Neelesh Agrawal 2020-01-03 19:19:46 UTC
+++ This bug was initially created as a clone of Bug #1787427 +++

Ephemeral storage reporting is in beta since Kubernetes 1.10 [1].  But for some reason it is not getting reported by 4.2 kubelets.  For an example of a 4.2 cluster without ephemeral-storage capacity reporting, see this 4.2.10 -> 4.2.12 update test [2]:

  $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12620/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-a0dbe73b7831a8ddb9a2c58a560461d7c2c23a92231289a2104b93e7723c0eff/cluster-scoped-resources/core/nodes/ip-10-0-129-58.ec2.internal.yaml | yaml2json | jq .status.capacity | json2yaml
  attachable-volumes-aws-ebs: '39'
  cpu: '4'
  hugepages-1Gi: '0'
  hugepages-2Mi: '0'
  memory: 16419384Ki
  pods: '250'

Capacity reporting is working in 4.3, e.g. see this 4.2.12 -> 4.3.0-0.nightly-2020-01-02-141332 update test [3].

  $ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/13437/artifacts/e2e-aws-upgrade/must-gather/quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-c6c63e67c3d38a704c8695a40bb64b9975df2bda3f00c9379592cd5596126f2d/cluster-scoped-resources/core/nodes/ip-10-0-130-241.ec2.internal.yaml | yaml2json | jq .status.capacity | json2yaml
  attachable-volumes-aws-ebs: '39'
  cpu: '4'
  ephemeral-storage: 124768236Ki
  hugepages-1Gi: '0'
  hugepages-2Mi: '0'
  memory: 16419384Ki
  pods: '250'

In bug 1786315, we are taking the narrow fix of removing our reliance on the ephemeral-storage capacity reporting.  But there may be other consumers outside of the cluster-version operator who would like to have ephemeral-storage reporting in 4.2.z.

[1]: https://github.com/kubernetes/enhancements/issues/361
[2]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/12620
[3]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/13437

--- Additional comment from Clayton Coleman on 2020-01-02 15:29:52 EST ---

I suspect we didn’t enable the feature, and it wasn’t default on.  We should confirm exact sequence of changes.

Comment 1 Robert Krawitz 2020-01-06 20:22:11 UTC
This is behaving as coded -- we explicitly turned the feature off and didn't enable it until 4.3.  From the node logs:

Jan 06 19:58:41 ip-10-0-137-223 hyperkube[7595]: I0106 19:58:41.216152    7595 feature_gate.go:226] feature gates: &{map[ExperimentalCriticalPodAnnotation:true LocalStorageCapacityIsolation:false RotateKubeletServerCertificate:true SupportPodPidsLimit:true]}
Jan 06 19:58:41 ip-10-0-137-223 hyperkube[7595]: I0106 19:58:41.216199    7595 feature_gate.go:226] feature gates: &{map[ExperimentalCriticalPodAnnotation:true LocalStorageCapacityIsolation:false RotateKubeletServerCertificate:true SupportPodPidsLimit:true]}

See https://github.com/openshift/origin/blob/release-4.2/vendor/github.com/openshift/api/config/v1/types_feature.go#L105

There is similar code in the MCO.

Comment 2 Ryan Phillips 2020-01-06 20:30:00 UTC
Going to close this as NOTABUG since we explicitly turned the feature off, and the feature is available in later releases.