1747377 – CSI: attachable-volumes-csi-ebs.csi.aws.com=39 does not work for csi ebs driver

Bug 1747377 - CSI: attachable-volumes-csi-ebs.csi.aws.com=39 does not work for csi ebs driver

Summary: CSI: attachable-volumes-csi-ebs.csi.aws.com=39 does not work for csi ebs driver

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.2.0
Assignee:	Fabio Bertinatto
QA Contact:	Chao Yang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-08-30 09:12 UTC by Chao Yang
Modified:	2019-10-16 06:39 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-10-16 06:39:13 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift origin pull 23718	None	None	None	2019-09-03 13:06:59 UTC
Github	openshift origin pull 23756	None	None	None	2019-09-10 09:19:53 UTC
Red Hat Product Errata	RHBA-2019:2922	None	None	None	2019-10-16 06:39:23 UTC

Description Chao Yang 2019-08-30 09:12:51 UTC

Description of problem:
attachable-volumes-csi-ebs.csi.aws.com=39 does not work for csi ebs driver, user can still attach 52 volumes to one node
 
Version-Release number of selected component (if applicable):
4.2.0-0.nightly-2019-08-29-170426

How reproducible:
Always

Steps to Reproduce:
1.Creat sc
foo-encrypted   ebs.csi.aws.com         89m
2.Only make one worker can be scheduled 
3.Create 54 pods with above sc
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web0
spec:
  selector:
    matchLabels:
      app: nginx # has to match .spec.template.metadata.labels
  serviceName: "nginx"
  replicas: 54 # by default is 1
  template:
    metadata:
      labels:
        app: nginx # has to match .spec.selector.matchLabels
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: nginx
        image: aosqe/hello-openshift
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: ww0
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: ww0
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi
      storageClassName: 'foo-encrypted'
4.Node info
status:
  addresses:
  - address: 10.0.134.8
    type: InternalIP
  - address: ip-10-0-134-8.us-east-2.compute.internal
    type: InternalDNS
  - address: ip-10-0-134-8.us-east-2.compute.internal
    type: Hostname
  allocatable:
    attachable-volumes-aws-ebs: "39"
    attachable-volumes-csi-ebs.csi.aws.com: "39"
    cpu: 1500m
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 7548496Ki
    pods: "250"
  capacity:
    attachable-volumes-aws-ebs: "39"
    attachable-volumes-csi-ebs.csi.aws.com: "39"
    cpu: "2"
    hugepages-1Gi: "0"
    hugepages-2Mi: "0"
    memory: 8162896Ki
    pods: "250"
    

Actual results:
52 pods attached to one node

Expected results:
Should be 39 pods attach to one node 
Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Fabio Bertinatto 2019-08-30 12:14:18 UTC

> Expected results:
> Should be 39 pods attach to one node 

It's possible that 52 pods are scheduled in a single node, as long as they don't use more than 39 volumes.

In this case there's only 1 volume shared among 52 pods, which is OK.

Comment 2 Chao Yang 2019-09-02 07:51:30 UTC

This StatefulSet will create 52 volumes and attach to node. 
But it should be 39 volumes according to the set in the node.status.capacity.attachable-volumes-csi-ebs.csi.aws.com: "39"

Comment 8 Chao Yang 2019-09-10 03:19:51 UTC

It is failed on 4.2.0-0.nightly-2019-09-08-180038
We still can attach 40 volumes to one node.

Comment 9 Fabio Bertinatto 2019-09-10 09:27:11 UTC

The last OpenShift rebase [0] reverted the patch that fixed this issue due to a  potential bug in the publishing bot.

I re-submitted the patch at [1].

[0] https://github.com/openshift/origin/pull/23674
[1] https://github.com/openshift/origin/pull/23756

Comment 10 Fabio Bertinatto 2019-09-11 10:22:33 UTC

Tested with 4.2.0-0.okd-2019-09-11-080531. Volume limit was enforced correctly:

$ oc describe pod/web0-39 
(...)
Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  1s (x2 over 66s)  default-scheduler  0/6 nodes are available: 1 node(s) exceed max volume count, 2 node(s) were unschedulable, 3 node(s) had taints that the pod didn't tolerate.

Comment 12 Chao Yang 2019-09-16 05:40:47 UTC

It is passed on 4.2.0-0.nightly-2019-09-15-052022
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  88s (x9 over 13m)  default-scheduler  0/5 nodes are available: 1 node(s) exceed max volume count, 1 node(s) were unschedulable, 3 node(s) had taints that the pod didn't tolerate.

Comment 13 errata-xmlrpc 2019-10-16 06:39:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922

Note You need to log in before you can comment on or make changes to this bug.