Bug 1608625

Summary: Number of NVMe disks attachable is lower than max predefined count for EBS
Product: OpenShift Container Platform Reporter: Hemant Kumar <hekumar>
Component: StorageAssignee: Hemant Kumar <hekumar>
Status: CLOSED ERRATA QA Contact: Chao Yang <chaoyang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, aos-storage-staff, apaladug, bbennett, bchilds, chaoyang, hekumar
Target Milestone: ---Keywords: NeedsTestCase
Target Release: 3.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1602054 Environment:
Last Closed: 2018-09-22 04:55:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1602054, 1608626    
Bug Blocks:    

Description Hemant Kumar 2018-07-26 01:15:28 UTC
+++ This bug was initially created as a clone of Bug #1602054 +++

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Every time.


Steps to Reproduce:
1. Set up an AWS M5 node
2. Attach more than 27 EBS volumes to pods

Actual results:

The attaches fail.

Expected results:

Success.

Additional info:

See: https://github.com/kubernetes/kubernetes/issues/59015

Fixed in Kube 1.11 by: https://github.com/kubernetes/kubernetes/pull/64154

--- Additional comment from Hemant Kumar on 2018-07-17 14:38:40 EDT ---

It may be possible to fix this

--- Additional comment from Hemant Kumar on 2018-07-17 15:18:09 EDT ---

Sorry my comment got submitted before I can finish typing. It is probably possible to fix this without needing volume limit feature in 3.9. It is not super clean I think but EC2 instance type is available as label in node Object and hence scheduler can potentially look at that label and deduce volume attach limit from it rather than going on with hardcoded values.

I will try and open a PR for it.

--- Additional comment from Hemant Kumar on 2018-07-19 15:38:03 EDT ---

upstream PR - https://github.com/kubernetes/kubernetes/pull/66397

Comment 2 Hemant Kumar 2018-08-21 14:34:09 UTC
PR for origin - https://github.com/openshift/origin/pull/20608

Comment 4 Chao Yang 2018-09-12 05:12:13 UTC
Passed on 
openshift v3.10.45
kubernetes v1.10.0+b81c8f8

Create 27 dynamic pvc and pods, only 25 pods are running. 

[root@ip-172-18-9-136 test]# oc describe pods mypod26
Name:         mypod26
Namespace:    test
Node:         <none>
Labels:       name=frontendhttp
Annotations:  openshift.io/scc=anyuid
Status:       Pending
IP:           
Containers:
  myfrontend:
    Image:        aosqe/hello-openshift
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:
      /tmp from aws (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-mqv98 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  aws:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  ebs26
    ReadOnly:   false
  default-token-mqv98:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-mqv98
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  26s (x26 over 6m)  default-scheduler  0/3 nodes are available: 1 node(s) exceed max volume count, 2 node(s) didn't match node selector.

Comment 6 errata-xmlrpc 2018-09-22 04:55:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2660

Comment 7 Anand Paladugu 2020-06-24 02:26:51 UTC
@Hemant

I have an OCP 3.11 customer-facing this issue.  He is now running some pods with NVME type volumes and it looks like OCP is still treating the attachment limit as 39 and is failing to attach volumes and hence deployments are failing.

1. Are these changes in OCP 3.10 available in OCP 3.11?
2. Are the changes only valid for M5 nodes?

From the code changes, I see that a new default limit is added for M5 nodes.  My customer is running R5 and R4 instances in AWS.

Thanks

Anand

Comment 8 Hemant Kumar 2020-06-24 21:59:18 UTC
Yes in 3.11 out of box - new default limit of 25 were only added for M5 and C5 node types. All other node types including R5 and R4 still use instance limit of 39 and hence can have failing deployments. Customer could define `KUBE_MAX_PD_VOLS` environment variable in scheduler and set it to 25 which would globally change maximum attach limit for all nodes types to 25. Please let me know if that workaround works for the customer.