1659442 – 3.11 Clarification on KUBE_MAX_PD_VOLS for OpenShift/OpenStack Integration

Bug 1659442 - 3.11 Clarification on KUBE_MAX_PD_VOLS for OpenShift/OpenStack Integration

Summary: 3.11 Clarification on KUBE_MAX_PD_VOLS for OpenShift/OpenStack Integration

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Hemant Kumar
QA Contact:	Chao Yang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1668893 1669543 1669544
TreeView+	depends on / blocked

Reported:	2018-12-14 12:00 UTC by Christian Stark
Modified:	2019-07-11 09:54 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1668893 1669543 (view as bug list)
Environment:
Last Closed:	2019-04-11 05:38:23 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0636	0	None	None	None	2019-04-11 05:38:38 UTC

Description Christian Stark 2018-12-14 12:00:42 UTC

Description of problem:

Customer is running in this restriction

Bug 1583553 - [RFE] Not able to attach more than 26 virtio-scsi volumes

Recently, there is a patch proposed upstream to increase the number of volumes allowed to attach to a single instance > 26:
https://review.openstack.org/567472
http://lists.openstack.org/pipermail/openstack-dev/2018-June/131289.html
https://review.openstack.org/597306



Now assuming this will be increased (out of scope of this bug)
we wonder if it is a must to adjust also KUBE_MAX_PD_VOLS (to the same value):

https://github.com/openshift/origin/blob/release-3.11/vendor/k8s.io/kubernetes/pkg/scheduler/algorithm/predicates/predicates.go#L112

// KubeMaxPDVols defines the maximum number of PD Volumes per kubelet
KubeMaxPDVols = "KUBE_MAX_PD_VOLS"


If you don't adjust this you will get errors in the PODS because the pods get selected by the scheduler but can't get the volume....



Question 1:
Is the understanding correct?

Question 2:
Can fix be backported to 3.9?

Question 3:
If it cannot be backported could you imagine any workaround how to get a similar behaviour?



Actual results:


Expected results:


Additional info:

Comment 9 Hemant Kumar 2019-02-12 18:16:57 UTC

PR - https://bugzilla.redhat.com/show_bug.cgi?id=1659442

Comment 10 Hemant Kumar 2019-03-12 16:21:34 UTC

oops sorry for incorrect link to 3.11 PR - https://github.com/openshift/origin/pull/22024

Comment 12 Chao Yang 2019-03-18 06:45:42 UTC

This is passed 
oc v3.11.96
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-chaoyang-master-etcd-nfs-1:8443
openshift v3.11.96
kubernetes v1.11.0+d4cacc0

1.Prepare the openshift env on openstack
2.[root@qe-chaoyang-master-etcd-nfs-1 sysconfig]# grep -ri max /etc/origin/master/scheduler.json 
            "name": "MaxEBSVolumeCount"
            "name": "MaxGCEPDVolumeCount"
            "name": "MaxCinderVolumeCount"
            "name": "MaxAzureDiskVolumeCount"
3. grep -ri max /etc/origin/master/master.env 
export KUBE_MAX_PD_VOLS=3
4.Restart api and controller service
master-restart api
master-restart controllers
5.Create pods
[root@qe-chaoyang-master-etcd-nfs-1 sysconfig]# oc get pods
NAME      READY     STATUS    RESTARTS   AGE
mypod1    1/1       Running   0          41m
mypod2    1/1       Running   0          40m
mypod3    1/1       Running   0          39m
mypod4    0/1       Pending   0          15s
6.[root@qe-chaoyang-master-etcd-nfs-1 sysconfig]# oc describe pods mypod4
Name:               mypod4
Namespace:          test
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             name=frontendhttp
Annotations:        openshift.io/scc=anyuid
Status:             Pending
IP:                 
Containers:
  myfrontend:
    Image:        jhou/hello-openshift
    Port:         80/TCP
    Host Port:    0/TCP
    Environment:  <none>
    Mounts:
      /tmp from aws (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4xf88 (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  aws:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  ebsc4
    ReadOnly:   false
  default-token-4xf88:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4xf88
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>
Events:
  Type     Reason            Age               From               Message
  ----     ------            ----              ----               -------
  Warning  FailedScheduling  3m (x25 over 4m)  default-scheduler  0/4 nodes are available: 1 node(s) exceed max volume count, 1 node(s) were unschedulable, 2 node(s) didn't match node selector.

Comment 14 errata-xmlrpc 2019-04-11 05:38:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0636

Note You need to log in before you can comment on or make changes to this bug.