Bug 1730217

Summary: After enabling ephemeral storage and defined a quota that requests ephemeral storage pods are evicted
Product: OpenShift Container Platform Reporter: Joel Rosental R. <jrosenta>
Component: NodeAssignee: Robert Krawitz <rkrawitz>
Status: CLOSED WONTFIX QA Contact: Jianwei Hou <jhou>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: aos-bugs, gblomqui, jokerman, mmccomas, nagrawal, rkrawitz, rphillips
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-15 19:31:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joel Rosental R. 2019-07-16 08:16:15 UTC
Description of problem:
After enabling ephemeral-storage on nodes by setting "LocalStorageCapacityIsolation=true" on the Kubelet arguments, adding a resourcequota that requests ephemeral-storage, all pods with an ephemeral-storage quota defined are evicted with "OutOfEphemeral-storage" even though it seems to be available allocatable capacity on the node as per `oc describe node` output.

status:
    message: 'Pod Node didn''t have enough resource: ephemeral-storage, requested:
      1073741824, used: 0, capacity: 0'
    phase: Failed
    reason: OutOfephemeral-storage


# oc describe node 
[...]
Capacity:
 cpu:                8
 ephemeral-storage:  535560704Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             57532536Ki
 pods:               160
Allocatable:
 cpu:                8
 ephemeral-storage:  535560704Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             52289656Ki
 pods:               160

[...]

Version-Release number of selected component (if applicable):
OCP 3.11.88

How reproducible:
N/A

Steps to Reproduce:
1. configure the feature-gate in masters servers as suggested in https://docs.openshift.com/container-platform/3.10/install_config/configuring_ephemeral.html#ephemeral-storage-enabling-ephemeral-storage
2.  Configure a resourcequota that accounts for ephemeral-storage (among other resources), e.g:

apiVersion: v1
items:
- apiVersion: v1
  kind: ResourceQuota
  metadata:
    creationTimestamp: 2019-04-24T09:14:31Z
    name: all-resources
    namespace: test-access
    resourceVersion: "29344803"
    selfLink: /api/v1/namespaces/test-access/resourcequotas/all-resources
    uid: 5e4219bf-6671-11e9-a2db-000d3a265647
  spec:
    hard:
      ceph-infra-dynamic.storageclass.storage.k8s.io/persistentvolumeclaims: "0"
      ceph-infra-dynamic.storageclass.storage.k8s.io/requests.storage: "0"
      configmaps: "100"
      count/routes.route.openshift.io: "50"
      limits.cpu: "4"
      limits.ephemeral-storage: 16Gi
      limits.memory: 4Gi
      openshift.io/imagestreams: "50"
      persistentvolumeclaims: "10"
      pods: "50"
      replicationcontrollers: "50"
      requests.cpu: "2"
      requests.ephemeral-storage: 16Gi
      requests.memory: 2Gi
      requests.storage: 50Gi
      secrets: "100"
      services: "50"
      services.loadbalancers: "5"
      services.nodeports: "5"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

3. Launch some pods requesting among other resources, ephemeral-storage, e.g:

resources:
        limits:
          cpu: "4"
          ephemeral-storage: 1Gi
          memory: 8000Mi
        requests:
          cpu: "1"
          ephemeral-storage: 1Gi
          memory: 2000Mi


4. systemctl restart atomic-openshift-node.service

Actual results:

pods with quota get evicted with: OutOfephemeral-storage

Expected results:

Pod should not be evicted with OutOfephemeral-storage if there is available allocatable space as depicted in `oc describe node`.

Additional info:

Comment 34 Red Hat Bugzilla 2023-09-18 00:16:44 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days