Bug 1886294 - Unable to schedule a pod due to Insufficient ephemeral-storage
Summary: Unable to schedule a pod due to Insufficient ephemeral-storage
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-scheduler
Version: 4.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Jan Chaloupka
QA Contact: RamaKasturi
URL:
Whiteboard:
Depends On:
Blocks: 1913263
TreeView+ depends on / blocked
 
Reported: 2020-10-08 06:38 UTC by Jonas Nordell
Modified: 2023-12-15 19:42 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: ephemeral-storage resource computation was not feature gated Consequence: ephemeral-storage resource was taken into even when the feature was disabled. Causing a pod to failed to be scheduled. Fix: feature gate ephemeral-storage resource computation when scheduling Result: ephemeral-storage resource is no longer taken into account when feature is disabled
Clone Of:
: 1913263 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:23:52 UTC
Target Upstream Version:
Embargoed:
jnordell: needinfo-
joshisa: needinfo-
knarra: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubernetes kubernetes pull 96092 0 None closed Honor disabled LocalStorageCapacityIsolation in scheduling 2021-02-12 19:27:04 UTC
Github openshift kubernetes pull 471 0 None closed Bug 1907373: Rebase to kube v1.20.0 2021-02-12 19:27:06 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:24:25 UTC

Description Jonas Nordell 2020-10-08 06:38:07 UTC
Description of problem:

In a three node OpenShift Container Platform 4.5.6_1505 cluster the following pod fails to be scheduled due to insufficient ephemeral-storage

---- Begin YAML Snippet ----
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    schedulerName: default-scheduler
    ports:
    - containerPort: 80
    resources:
      requests:
        ephemeral-storage: 4096M
      limits:
        ephemeral-storage: 4096M
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', "echo waiting for myservice; sleep 7;"]
    resources:
      requests:
        cpu: 500m
        ephemeral-storage: 2M
        memory: 1024M
---- End YAML Snippet ----

Error:
#######

Events:
  Type     Reason            Age        From               Message
  ----     ------            ----       ----               -------
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 Insufficient ephemeral-storage.
  Warning  FailedScheduling  <unknown>  default-scheduler  0/3 nodes are available: 3 Insufficient ephemeral-storage.
-------
#######

From describe node

#######
Allocatable:
 cpu:                3910m
 ephemeral-storage:  100275095474
#######

It was working in OCP 4.3 and OCP 4.4

And it does work if the ephemeral-storage references are removed from the "initContainers" definition but leaving the ephemeral-storage reference on the regular container. 


Version-Release number of selected component (if applicable):

OCP 4.5.6_1505

How reproducible:

In customer environment, every time. 

Steps to Reproduce:
1. Create a pod with the above described YAML (oc create -f <file>)
2. 
3.

Actual results:
Fail to deploy, Insufficient ephemeral-storage

Expected results:
Pod should start as the nodes have ephemeral-storage available

Additional info:

Comment 3 Jan Chaloupka 2020-10-08 12:29:04 UTC
Not able to reproduce it. Tested over vanilla clusters of the following versions:
- 4.6.0-0.ci-2020-10-06-161728
- 4.5.14

```
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                    Requests          Limits
  --------                    --------          ------
  cpu                         844m (24%)        0 (0%)
  memory                      2313748480 (14%)  512Mi (3%)
  ephemeral-storage           4096M (3%)        4096M (3%)
  hugepages-1Gi               0 (0%)            0 (0%)
  hugepages-2Mi               0 (0%)            0 (0%)
  attachable-volumes-aws-ebs  0                 0
```

Can you reproduce it in your environment and share the cluster?

Comment 9 John McMeeking 2020-11-01 18:08:25 UTC
@jnordell  Can this be reopened?

To recreate this bug you need to have the LocalStorageCapacityIsolation feature-gate disabled and initContainers with ephemeral-storage resource requests or limits.

This problem was found on IBM Cloud Red Hat OpenShift which apparently is out of sync with the Red Hat offering - the customer was using a feature which they thought was enabled but in fact was not. Even though the limits obviously won't be enforced it presents a migration problem since such pods are not schedulable in 4.5.

We have duplicated this behavior in Kubernetes 1.18 through 1.20. Kubernetes 1.17 does not have this behavior. Also opened https://github.com/kubernetes/kubernetes/issues/96083.

IBM Cloud Red Hat OpenShift Service Development

Comment 12 Sanjay Joshi 2020-11-19 16:43:07 UTC
@jnordell Following up on @jmcmeek.com request - can we get this bugzilla reopened now that we know the exact recipe to reproduce.  A fix has already been merged into the K8s upstream and backported for 1.18 and 1.19 with a PR tag of priority/critical-urgent.  It will make its way into K8s upstream patch 1.18.11 and 1.19.4 per this issue (https://github.com/p7t/actus/issues/265).  I would like to get this reproduced and understand how this can flow into the OpenShift 4.5 and 4.6 release streams after you are able to recreate the issue.

Comment 13 Jan Chaloupka 2020-11-20 10:45:58 UTC
The fix gets picked into 4.7 with the next rebase of 1.20. Once done, this issue can be cloned for 4.6 and 4.5.

Comment 15 Jan Chaloupka 2021-01-05 12:15:36 UTC
Hi Jonas,

I see the customer case is closed. How severe the issue is for the customer? Will it be sufficient to backport the fix to 4.6? Or, is 4.5 still relevant?

Comment 16 Jonas Nordell 2021-01-05 12:47:36 UTC
I am not sure.

Maybe @jmcmeek.com or @joshisa.com could answer this?

Comment 18 Sanjay Joshi 2021-01-05 15:09:52 UTC
@jchaloup @jnordell .  Since this fix enables our clients to use the IBM Cloud Managed OpenShift, backports for both 4.5 and 4.6 are needed so that IBM Cloud can eventually adopt.   

Rationale:  4.5 will be supported on IBM Cloud (tentatively) until Aug 2021 and 4.6 has extended update support.  This leaves a large exposure window with clients interested in IBM Cloud.  Without backports, support on IBM Cloud has an awkward gap between 4.4 (which will go out of support in 1H 2021) and 4.7 which will impact solutioning and client experience on IBM Cloud.

Thanks.

Comment 19 RamaKasturi 2021-01-06 13:06:14 UTC
Hi Jonas,

   I am trying to reproduce the bug on a 4.5 cluster but unable to do so, Below are the steps i performed, do i need to perform any additional steps to be able to reproduce the issue ? 

steps performed:
=================
1) Install 4.5 nightly build
2) oc create -f /tmp/ephermal.yaml

[knarra@knarra openshift-client-linux-4.5.0-0.nightly-2021-01-05-234719]$ ./oc describe node ip-10-0-217-111.us-east-2.compute.internal | grep "ephemeral-storage"
  ephemeral-storage:           125277164Ki
  ephemeral-storage:           114381692328
  ephemeral-storage           4096M (3%)        4096M (3%)

yaml definition:
==========================
[knarra@knarra openshift-client-linux-4.5.0-0.nightly-2021-01-05-234719]$ cat /tmp/ephermal.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    name: nginx
spec:
  containers:
  - name: nginx
    image: quay.io/openshifttest/nginx@sha256:3936fb3946790d711a68c58be93628e43cbca72439079e16d154b5db216b58da
    schedulerName: default-scheduler
    ports:
    - containerPort: 80
    resources:
      requests:
        ephemeral-storage: 4096M
      limits:
        ephemeral-storage: 4096M
  initContainers:
  - name: init-myservice
    image: quay.io/openshifttest/busybox@sha256:afe605d272837ce1732f390966166c2afff5391208ddd57de10942748694049d
    command: ['sh', '-c', "echo waiting for myservice; sleep 7;"]
    resources:
      requests:
        cpu: 500m
        ephemeral-storage: 2M
        memory: 1024M

Comment 20 Sanjay Joshi 2021-01-06 14:05:38 UTC
@knarra: You need to also make sure that you have the LocalStorageCapacityIsolation feature-gate disabled on the cluster.  See Comment9:  https://bugzilla.redhat.com/show_bug.cgi?id=1886294#c9 .  If that is disabled, then success can be validated by the successful scheduling and running of the ephermal.yaml pod that you have created.  Hope this helps.

Comment 21 RamaKasturi 2021-01-06 17:28:06 UTC
(In reply to Sanjay Joshi from comment #20)
> @knarra: You need to also make sure that you have the
> LocalStorageCapacityIsolation feature-gate disabled on the cluster.  See
> Comment9:  https://bugzilla.redhat.com/show_bug.cgi?id=1886294#c9 .  If that
> is disabled, then success can be validated by the successful scheduling and
> running of the ephermal.yaml pod that you have created.  Hope this helps.

Thanks for the quick reply, i will try to enable this feature-gate and try again, will clear needinfo on jonas as i have got the required input.

Comment 22 Sanjay Joshi 2021-01-06 18:01:49 UTC
(In reply to RamaKasturi from comment #21)
> (In reply to Sanjay Joshi from comment #20)
> > @knarra: You need to also make sure that you have the
> > LocalStorageCapacityIsolation feature-gate disabled on the cluster.  See
> > Comment9:  https://bugzilla.redhat.com/show_bug.cgi?id=1886294#c9 .  If that
> > is disabled, then success can be validated by the successful scheduling and
> > running of the ephermal.yaml pod that you have created.  Hope this helps.
> 
> Thanks for the quick reply, i will try to enable this feature-gate and try
> again, will clear needinfo on jonas as i have got the required input.

:-). Cool.  Just to confirm - the feature-gate needs to be "disabled" on the cluster.   When the feature-gate is enabled (which I think is the default on OpenShift clusters), things work fine and the K8s bug does not surface (e.g.  the example pod schedules and runs fine).  IBM Cloud has chosen to disable this feature-gate for long term update/maintenance considerations.

Comment 23 RamaKasturi 2021-01-07 10:57:02 UTC
Verified bug with payload below and do not see the issue happening. Below are the steps followed to verify the bug.

Test steps followed to verify the bug:
========================================
1) login to one master node, edit /etc/kubernetes/manifests/kube-scheduler-pod.yaml and add the feature gate LocalStorageCapacityIsolation=false to --feature-gates=<line> args, wait for kube-scheduler pod on that node to restart.
2) Repeat the step for every other master.
3) Now create a pod by running the command oc create -f /tmp/ephermal.yaml file

[knarra@knarra openshift-client-linux-4.5.0-0.nightly-2021-01-05-234719]$ cat /tmp/ephermal.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    name: nginx
spec:
  containers:
  - name: nginx
    image: quay.io/openshifttest/nginx@sha256:3936fb3946790d711a68c58be93628e43cbca72439079e16d154b5db216b58da
    schedulerName: default-scheduler
    ports:
    - containerPort: 80
    resources:
      requests:
        ephemeral-storage: 4096M
      limits:
        ephemeral-storage: 4096M
  initContainers:
  - name: init-myservice
    image: quay.io/openshifttest/busybox@sha256:afe605d272837ce1732f390966166c2afff5391208ddd57de10942748694049d
    command: ['sh', '-c', "echo waiting for myservice; sleep 7;"]
    resources:
      requests:
        cpu: 500m
        ephemeral-storage: 2M
        memory: 1024M
4) we can see that pod is scheduled and running fine.

[knarra@knarra openshift-client-linux-4.7.0-0.nightly-2021-01-07-034013]$ ./oc get pods -o wide
NAME    READY   STATUS    RESTARTS   AGE   IP            NODE                                        NOMINATED NODE   READINESS GATES
nginx   1/1     Running   0          12s   10.129.2.61   ip-10-0-142-44.us-east-2.compute.internal   <none>           <none>

Tried steps from 1 to 4 on a 4.5 cluster and i could reproduce the issue where pod is stuck in pending state.

[knarra@knarra openshift-client-linux-4.5.0-0.nightly-2021-01-05-234719]$ ./oc describe pod nginx
Name:         nginx
Namespace:    default
Priority:     0
Node:         <none>
Labels:       name=nginx
Annotations:  <none>
Status:       Pending
IP:           
IPs:          <none>
Init Containers:
  init-myservice:
    Image:      quay.io/openshifttest/busybox@sha256:afe605d272837ce1732f390966166c2afff5391208ddd57de10942748694049d
    Port:       <none>
    Host Port:  <none>
    Command:
      sh
      -c
      echo waiting for myservice; sleep 7;
    Requests:
      cpu:                500m
      ephemeral-storage:  2M
      memory:             1024M
    Environment:          <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lp7r9 (ro)
Containers:
  nginx:
    Image:      quay.io/openshifttest/nginx@sha256:3936fb3946790d711a68c58be93628e43cbca72439079e16d154b5db216b58da
    Port:       80/TCP
    Host Port:  0/TCP
    Limits:
      ephemeral-storage:  4096M
    Requests:
      ephemeral-storage:  4096M
    Environment:          <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-lp7r9 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-token-lp7r9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-lp7r9
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  4h16m  default-scheduler  0/6 nodes are available: 6 Insufficient ephemeral-storage.

[knarra@knarra openshift-client-linux-4.5.0-0.nightly-2021-01-05-234719]$ ./oc get pods -o wide
NAME    READY   STATUS    RESTARTS   AGE     IP       NODE     NOMINATED NODE   READINESS GATES
nginx   0/1     Pending   0          4h17m   <none>   <none>   <none>           <none>

Based on the above moving bug to verified state.

Comment 24 Jan Chaloupka 2021-01-19 10:42:00 UTC
Jonas, the fix is not going to be backported to 4.5. 4.5 release is in maintenance phase now and only urgent/high bugs and CVEs are fixed.

Comment 27 errata-xmlrpc 2021-02-24 15:23:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.