Bug 1876886 - Pod hang on volume attachment
Summary: Pod hang on volume attachment
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.6.0
Assignee: Bob Furu
QA Contact: Xiaoli Tian
Vikram Goyal
URL:
Whiteboard:
Depends On:
Blocks: 1876933
TreeView+ depends on / blocked
 
Reported: 2020-09-08 13:02 UTC by Qin Ping
Modified: 2020-09-11 18:26 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1876933 (view as bug list)
Environment:
Last Closed: 2020-09-11 18:26:14 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Qin Ping 2020-09-08 13:02:11 UTC
Description of problem:
Max attached ebs volume number is counted separately for csi driver and in-tree plugin, it will make a pod is scheduled successfully, but hang on the volume attachment.

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-09-07-210458

How reproducible:
Always

Steps to Reproduce:
1. Launched a 4.6 Cluster on OCP
2. Created 25 PVCs provisioned by ebs.csi.aws.com
3. Created a Pod(pod1) consumming these PVCs and scheduled this pod to worker1
4. Create 25 PVCs provisioned by kubernetes.io/aws-ebs, after pod1 is running
5. Created a Pod(pod2) consumming these PVCs and scheduled this pod to worker2

Actual results:
pod2 is scheduled successfully,but hang on volume attachment

Report the following event repeatly:
Warning  FailedMount             5m5s       kubelet, ip-10-0-74-88.ap-northeast-1.compute.internal  Unable to attach or mount volumes: unmounted volumes=[local20 local9 local10 local16 local5 local19 local21 local2 local3 local8 local25 local11 local1 local18 local22 local14 local6 local7 local23 local4 local13 local15 local12 local24], unattached volumes=[local20 local9 local10 local16 local5 local19 default-token-52hmc local21 local2 local3 local8 local25 local11 local1 local18 local22 local14 local6 local7 local23 local4 local13 local15 local17 local12 local24]: timed out waiting for the condition

Expected results:
pod2 can not be scheduled.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Jan Safranek 2020-09-08 14:35:14 UTC
This is current limitation of OCP & AWS EBS CSI driver. Cluster admin should use either in-tree volumes or CSI volumes, but not both at the same time. This should be documented as limitation of the AWS EBS CSI driver in our docs.

Comment 2 Bob Furu 2020-09-08 21:12:03 UTC
Created https://github.com/openshift/openshift-docs/pull/25219
Moving to QE and SME review.

Comment 3 Bob Furu 2020-09-09 21:01:05 UTC
Feedback applied, awaiting second review by SME and QE before merge: https://github.com/openshift/openshift-docs/pull/25348#issuecomment-689819227

Comment 4 Bob Furu 2020-09-11 18:04:59 UTC
Docs live on 4.5, 4.6: https://docs.openshift.com/container-platform/4.5/storage/persistent_storage/persistent-storage-aws.html#maximum-number-of-ebs-volumes-on-a-node_persistent-storage-aws
Waiting for answer from SME on whether this denotes a support status change or not.

Also opened separate PRs for 4.3, 4.4 that do not include note about CSI because that is not supported until 4.5:
- https://github.com/openshift/openshift-docs/pull/25413
- https://github.com/openshift/openshift-docs/pull/25411

Comment 5 Bob Furu 2020-09-11 18:26:14 UTC
Confirmed with Storage team that we have not removed KUBE_MAX_PD_VOLS support for in-tree plug-ins. According to Hemant, how to configure is tricky, and might be possible by modifying scheduler's pod spec and applying an environment variable. But it has to be supported by scheduler operator.

Closing BZ.


Note You need to log in before you can comment on or make changes to this bug.