Bug 1608625
| Summary: | Number of NVMe disks attachable is lower than max predefined count for EBS | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Hemant Kumar <hekumar> |
| Component: | Storage | Assignee: | Hemant Kumar <hekumar> |
| Status: | CLOSED ERRATA | QA Contact: | Chao Yang <chaoyang> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.10.0 | CC: | aos-bugs, aos-storage-staff, apaladug, bbennett, bchilds, chaoyang, hekumar |
| Target Milestone: | --- | Keywords: | NeedsTestCase |
| Target Release: | 3.10.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1602054 | Environment: | |
| Last Closed: | 2018-09-22 04:55:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1602054, 1608626 | ||
| Bug Blocks: | |||
|
Description
Hemant Kumar
2018-07-26 01:15:28 UTC
PR for origin - https://github.com/openshift/origin/pull/20608 Passed on
openshift v3.10.45
kubernetes v1.10.0+b81c8f8
Create 27 dynamic pvc and pods, only 25 pods are running.
[root@ip-172-18-9-136 test]# oc describe pods mypod26
Name: mypod26
Namespace: test
Node: <none>
Labels: name=frontendhttp
Annotations: openshift.io/scc=anyuid
Status: Pending
IP:
Containers:
myfrontend:
Image: aosqe/hello-openshift
Port: 80/TCP
Host Port: 0/TCP
Environment: <none>
Mounts:
/tmp from aws (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-mqv98 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
aws:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: ebs26
ReadOnly: false
default-token-mqv98:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-mqv98
Optional: false
QoS Class: BestEffort
Node-Selectors: node-role.kubernetes.io/compute=true
Tolerations: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 26s (x26 over 6m) default-scheduler 0/3 nodes are available: 1 node(s) exceed max volume count, 2 node(s) didn't match node selector.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:2660 @Hemant I have an OCP 3.11 customer-facing this issue. He is now running some pods with NVME type volumes and it looks like OCP is still treating the attachment limit as 39 and is failing to attach volumes and hence deployments are failing. 1. Are these changes in OCP 3.10 available in OCP 3.11? 2. Are the changes only valid for M5 nodes? From the code changes, I see that a new default limit is added for M5 nodes. My customer is running R5 and R4 instances in AWS. Thanks Anand Yes in 3.11 out of box - new default limit of 25 were only added for M5 and C5 node types. All other node types including R5 and R4 still use instance limit of 39 and hence can have failing deployments. Customer could define `KUBE_MAX_PD_VOLS` environment variable in scheduler and set it to 25 which would globally change maximum attach limit for all nodes types to 25. Please let me know if that workaround works for the customer. |