Description of problem: When creating Openshift pods with AWS EBS plugin it is possible only to attach /dev/xvdb - /dev/xvdp to openshift node and thus only to start number of pods which is equal to number of allocated EBS devices ( /dev/xvdb - /dev/xvdp ) Openshift pods uses AWS EBS as persistent storage Version-Release number of selected component (if applicable): # Openshift packages installed # rpm -qa | grep atomic-opensh tuned-profiles-atomic-openshift-node-3.1.1.911-1.git.0.14f4c71.el7.x86_64 atomic-openshift-sdn-ovs-3.1.1.911-1.git.0.14f4c71.el7.x86_64 atomic-openshift-clients-3.1.1.911-1.git.0.14f4c71.el7.x86_64 atomic-openshift-node-3.1.1.911-1.git.0.14f4c71.el7.x86_64 atomic-openshift-master-3.1.1.911-1.git.0.14f4c71.el7.x86_64 atomic-openshift-3.1.1.911-1.git.0.14f4c71.el7.x86_64 root@ip-172-31-7-106: ~ # uname -a Linux ip-172-31-7-106.us-west-2.compute.internal 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux How reproducible: On functional Openshift environment ( I tested with two Openshift nodes ) with above packages, run # python create_ebs_pod.py --volumesize=1 --image=r7perffio --tagprefix=openshift_test --minpod=1 --maxpod=44 --pvfile=pv.json --pvcfile=pvc.json --podfile=pod.json This will create 43 pods with one 1 GB EBS device attached to each of them, and approximatelly 20 pods an ose node should get. However, not all 20 pods will start, but only up to exhaustion of devices ( /dev/xvda - /dev/xvdb ). where, create_ebs_pod.py pv.json pvc.json podfile.json can be found at : https://github.com/ekuric/openshift/tree/master/_aws Steps to Reproduce: Please see step above Actual results: Last device attached to amazon instance ( acting as openshift node ) in tests when awsElasticBlockStore plugin was used was /dev/xvdp what means that devices from /dev/xvdb - /dev/xvdp were available for pods to use as persistent storage. After this, no more pods were possible to start. Expected results: to be albe to attach more EBS devices to amazon instance when using awsElasticBlockStore plugin - and be able to start more Openshift pods Additional info: In logs is stated below [1] Amazon : WS cloud provider says: // See: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html devices := []mountDevice{} for c := 'f'; c <= 'p'; c++ { devices = append(devices, mountDevice(fmt.Sprintf("%c", c))) } pod file used : https://github.com/ekuric/openshift/blob/master/_aws/pod.json pv file used: https://github.com/ekuric/openshift/blob/master/_aws/pv.json pvcfile used : https://github.com/ekuric/openshift/blob/master/_aws/pvc.json [1] Mar 8 08:24:23 ip-172-31-7-105 atomic-openshift-node: E0308 08:24:23.508726 12979 pod_workers.go:138] Error syncing pod 8f3bf98c-e529-11e5-b35d-028bb7d6e433, skipping: Could not attach EBS Disk "vol-9652ec60". Timeout waiting for mount paths to be created. Mar 8 08:24:23 ip-172-31-7-105 atomic-openshift-node: W0308 08:24:23.656873 12979 aws_util.go:167] Retrying attach for EBS Disk "vol-9b53ed6d" (retry count=7). Mar 8 08:24:23 ip-172-31-7-105 atomic-openshift-node: W0308 08:24:23.770064 12979 aws.go:950] Unexpected EBS DeviceName: "/dev/sda1" Mar 8 08:24:23 ip-172-31-7-105 atomic-openshift-node: W0308 08:24:23.770114 12979 aws.go:983] Could not assign a mount device (all in use?). mappings=map[b:vol-6936fc9f f:vol-b954ea4f j:vol-4154eab7 l:vol-1b54eaed m:vol-9855eb6e p:vol-4155ebb7 a1:vol-a236fc54 g:vol-c654ea30 h:vol-2754ead1 i:vol-6e54ea98 k:vol-7a54ea8c n:vol-2c55ebda o:vol-ad55eb5b], valid=[f g h i j k l m n
This is the root cause: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L903 Kubernetes uses only devices /dev/xvd[f-p] or /dev/sd[f-p], as suggested by http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html#available-ec2-device-names: "Recommended for EBS Volumes: ... /dev/sd[f-p]" This leaves us with 11 EBS volumes, which is quite few. IMO we could use more, up to 40 volumes supported by Amazon. As related bug, when Kubernetes runs out of device names, 'kubectl describe pod' shows cryptic error: 'Value (/dev/xvd) for parameter device is invalid. /dev/xvd is not a valid EBS device name.' IMO we should print something like 'Too many devices attached, only 11 devices are supported by AWS.' Note to self: look at https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L1054 and try to return an error here.
Created PR https://github.com/kubernetes/kubernetes/pull/22942 and waiting for upstream feedback.
The first part, raising the limit to 39 has been merged to Kubernetes 1.2. Admins can adjust the limit by setting env. variable "KUBE_MAX_PD_VOLS" in scheduler process (openshift-master), however kubelet will refuse to attach more that 39 volumes anyway. 'oc describe pod' will show clear message that too many volumes are attached and a pod can't be started. https://github.com/kubernetes/kubernetes/pull/22942 The second part, allowing kubelet to attach more than 39 volumes, is still open and I'm working on it. Tracked here: https://github.com/kubernetes/kubernetes/issues/22994
sorry, wrong bug, moving back to assigned
Experimental fix for the second part: https://github.com/kubernetes/kubernetes/pull/23254 Waiting for upstream feedback.
Fix merged upstream. Target this for OSE 3.2.1 ?
I'm marking it as 'POST', i.e. fixed in Kubernetes and waiting for OpenShift rebase.
AWS changed in the meantime and now it actually enforces devices to be named according to https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/block-device-mapping-concepts.html Only devices /dev/xvd[b-c][a-z] are usable now. This was merged into Kubernetes as https://github.com/kubernetes/kubernetes/pull/41455 and into OpenShift as bug #1422531 (see the bug for releases where it was fixed).