Bug 1668893 - 3.9 Clarification on KUBE_MAX_PD_VOLS for OpenShift/OpenStack Integration
Summary: 3.9 Clarification on KUBE_MAX_PD_VOLS for OpenShift/OpenStack Integration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.9.z
Assignee: Hemant Kumar
QA Contact: Liang Xia
URL:
Whiteboard:
Depends On: 1659442 1669543 1669544
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-23 20:08 UTC by Hemant Kumar
Modified: 2023-09-07 19:40 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1659442
Environment:
Last Closed: 2019-02-20 08:46:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0331 0 None None None 2019-02-20 08:47:02 UTC

Comment 4 Liang Xia 2019-02-12 06:25:21 UTC
QE checked the bug on version v3.9.69 with below steps,
and volumes allowed to attach to a single openstack instance is limited to 26.

@Hemant Kumar, could you help confirm this is expected or not ? 


1. Update the node and leave only one to schedulable status.
# oc adm manage-node --schedulable xxx

2. Enable predicate via scheduler.json
# grep -i cinder /etc/origin/master/scheduler.json  -A4 -B4
        {
            "name": "MaxAzureDiskVolumeCount"
        }, 
        {
            "name": "MaxCinderVolumeCount"
        }, 
        {
            "name": "MatchInterPodAffinity"
        }, 

3. Restart api and controller service.

4. Keep creating pvc and pod.
# oc get pods
NAME      READY     STATUS              RESTARTS   AGE
mypod01   1/1       Running             0          18m
mypod02   1/1       Running             0          18m
mypod03   1/1       Running             0          17m
mypod04   1/1       Running             0          17m
mypod05   1/1       Running             0          17m
mypod06   1/1       Running             0          17m
mypod07   1/1       Running             0          17m
mypod08   1/1       Running             0          17m
mypod09   1/1       Running             0          17m
mypod10   1/1       Running             0          17m
mypod11   1/1       Running             0          17m
mypod12   1/1       Running             0          17m
mypod13   1/1       Running             0          16m
mypod14   1/1       Running             0          16m
mypod15   1/1       Running             0          16m
mypod16   1/1       Running             0          16m
mypod17   1/1       Running             0          16m
mypod18   1/1       Running             0          16m
mypod19   1/1       Running             0          16m
mypod20   1/1       Running             0          16m
mypod21   1/1       Running             0          16m
mypod22   1/1       Running             0          16m
mypod23   1/1       Running             0          15m
mypod24   1/1       Running             0          15m
mypod25   1/1       Running             0          15m
mypod26   0/1       ContainerCreating   0          15m
mypod27   0/1       ContainerCreating   0          15m

# oc describe pod mypod26
Name:         mypod26
Namespace:    bz1668893
Node:         qe-chaoyang-node-registry-router-1/10.0.77.49
Start Time:   Tue, 12 Feb 2019 01:00:18 -0500
Labels:       <none>
Annotations:  openshift.io/scc=anyuid
Status:       Pending
IP:           
Containers:
  dynamic:
    Container ID:   
    Image:          aosqe/hello-openshift
    Image ID:       
    Port:           80/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /mnt/pv from dynamic (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bs4hl (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  dynamic:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  mypvc26
    ReadOnly:   false
  default-token-bs4hl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-bs4hl
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>
Events:
  Type     Reason                 Age                From                                         Message
  ----     ------                 ----               ----                                         -------
  Normal   Scheduled              16m                default-scheduler                            Successfully assigned mypod26 to qe-chaoyang-node-registry-router-1
  Normal   SuccessfulMountVolume  16m                kubelet, qe-chaoyang-node-registry-router-1  MountVolume.SetUp succeeded for volume "default-token-bs4hl"
  Warning  FailedAttachVolume     1m (x15 over 16m)  attachdetach-controller                      AttachVolume.Attach failed for volume "pvc-795e2daf-2e8b-11e9-b868-fa163eb7596d" : failed to attach 66ece5e3-34c6-4408-8c95-4f2de0859adc volume to 27947252-a153-431d-9c27-e1bd4bc5ebbc compute: Internal Server Error
  Warning  FailedMount            30s (x7 over 14m)  kubelet, qe-chaoyang-node-registry-router-1  Unable to mount volumes for pod "mypod26_bz1668893(79a63ad0-2e8b-11e9-b868-fa163eb7596d)": timeout expired waiting for volumes to attach/mount for pod "bz1668893"/"mypod26". list of unattached/unmounted volumes=[dynamic]

Comment 6 Liang Xia 2019-02-13 02:26:33 UTC
QE tried again on version v3.9.69 with below steps,

1. Update the node and leave only one to schedulable status.
# oc adm manage-node --schedulable xxx
# oc get nodes
NAME                                STATUS                     ROLES     AGE       VERSION
qe-lxia-39-master-etcd-nfs-1        Ready,SchedulingDisabled   master    17m       v1.9.1+a0ce1bc657
qe-lxia-39-node-registry-router-1   Ready                      compute   17m       v1.9.1+a0ce1bc657

2. Enable predicate via scheduler.json
# grep -i cinder /etc/origin/master/scheduler.json  -A4 -B4
        {
            "name": "MaxAzureDiskVolumeCount"
        }, 
        {
            "name": "MaxCinderVolumeCount"
        }, 
        {
            "name": "MatchInterPodAffinity"
        }, 
3. Set KUBE_MAX_PD_VOLS=3
# grep -i vol /etc/sysconfig/atomic-openshift-master-controllers
KUBE_MAX_PD_VOLS=3

4. Restart api and controller service.

5. Create 4 pvc/pod.
# oc get pods mypod{1..4}
NAME      READY     STATUS    RESTARTS   AGE
mypod1    1/1       Running   0          7m
mypod2    1/1       Running   0          6m
mypod3    1/1       Running   0          6m
mypod4    0/1       Pending   0          5m

# oc describe pod mypod4
Name:         mypod4
Namespace:    default
Node:         <none>
Labels:       <none>
Annotations:  openshift.io/scc=anyuid
Status:       Pending
IP:           
Containers:
  dynamic:
    Image:        aosqe/hello-openshift
    Port:         80/TCP
    Environment:  <none>
    Mounts:
      /mnt/ocp_pv from dynamic (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-l9mpz (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  dynamic:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  mypvc4
    ReadOnly:   false
  default-token-l9mpz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-l9mpz
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  14s (x22 over 5m)  default-scheduler  0/2 nodes are available: 1 MaxVolumeCount, 1 NodeUnschedulable.

Comment 8 errata-xmlrpc 2019-02-20 08:46:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0331


Note You need to log in before you can comment on or make changes to this bug.