Bug 1668893

Summary: 3.9 Clarification on KUBE_MAX_PD_VOLS for OpenShift/OpenStack Integration
Product: OpenShift Container Platform Reporter: Hemant Kumar <hekumar>
Component: StorageAssignee: Hemant Kumar <hekumar>
Status: CLOSED ERRATA QA Contact: Liang Xia <lxia>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.9.0CC: agogala, aos-bugs, aos-storage-staff, hekumar
Target Milestone: ---   
Target Release: 3.9.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1659442 Environment:
Last Closed: 2019-02-20 08:46:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1659442, 1669543, 1669544    
Bug Blocks:    

Comment 4 Liang Xia 2019-02-12 06:25:21 UTC
QE checked the bug on version v3.9.69 with below steps,
and volumes allowed to attach to a single openstack instance is limited to 26.

@Hemant Kumar, could you help confirm this is expected or not ? 


1. Update the node and leave only one to schedulable status.
# oc adm manage-node --schedulable xxx

2. Enable predicate via scheduler.json
# grep -i cinder /etc/origin/master/scheduler.json  -A4 -B4
        {
            "name": "MaxAzureDiskVolumeCount"
        }, 
        {
            "name": "MaxCinderVolumeCount"
        }, 
        {
            "name": "MatchInterPodAffinity"
        }, 

3. Restart api and controller service.

4. Keep creating pvc and pod.
# oc get pods
NAME      READY     STATUS              RESTARTS   AGE
mypod01   1/1       Running             0          18m
mypod02   1/1       Running             0          18m
mypod03   1/1       Running             0          17m
mypod04   1/1       Running             0          17m
mypod05   1/1       Running             0          17m
mypod06   1/1       Running             0          17m
mypod07   1/1       Running             0          17m
mypod08   1/1       Running             0          17m
mypod09   1/1       Running             0          17m
mypod10   1/1       Running             0          17m
mypod11   1/1       Running             0          17m
mypod12   1/1       Running             0          17m
mypod13   1/1       Running             0          16m
mypod14   1/1       Running             0          16m
mypod15   1/1       Running             0          16m
mypod16   1/1       Running             0          16m
mypod17   1/1       Running             0          16m
mypod18   1/1       Running             0          16m
mypod19   1/1       Running             0          16m
mypod20   1/1       Running             0          16m
mypod21   1/1       Running             0          16m
mypod22   1/1       Running             0          16m
mypod23   1/1       Running             0          15m
mypod24   1/1       Running             0          15m
mypod25   1/1       Running             0          15m
mypod26   0/1       ContainerCreating   0          15m
mypod27   0/1       ContainerCreating   0          15m

# oc describe pod mypod26
Name:         mypod26
Namespace:    bz1668893
Node:         qe-chaoyang-node-registry-router-1/10.0.77.49
Start Time:   Tue, 12 Feb 2019 01:00:18 -0500
Labels:       <none>
Annotations:  openshift.io/scc=anyuid
Status:       Pending
IP:           
Containers:
  dynamic:
    Container ID:   
    Image:          aosqe/hello-openshift
    Image ID:       
    Port:           80/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /mnt/pv from dynamic (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bs4hl (ro)
Conditions:
  Type           Status
  Initialized    True 
  Ready          False 
  PodScheduled   True 
Volumes:
  dynamic:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  mypvc26
    ReadOnly:   false
  default-token-bs4hl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-bs4hl
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node-role.kubernetes.io/compute=true
Tolerations:     <none>
Events:
  Type     Reason                 Age                From                                         Message
  ----     ------                 ----               ----                                         -------
  Normal   Scheduled              16m                default-scheduler                            Successfully assigned mypod26 to qe-chaoyang-node-registry-router-1
  Normal   SuccessfulMountVolume  16m                kubelet, qe-chaoyang-node-registry-router-1  MountVolume.SetUp succeeded for volume "default-token-bs4hl"
  Warning  FailedAttachVolume     1m (x15 over 16m)  attachdetach-controller                      AttachVolume.Attach failed for volume "pvc-795e2daf-2e8b-11e9-b868-fa163eb7596d" : failed to attach 66ece5e3-34c6-4408-8c95-4f2de0859adc volume to 27947252-a153-431d-9c27-e1bd4bc5ebbc compute: Internal Server Error
  Warning  FailedMount            30s (x7 over 14m)  kubelet, qe-chaoyang-node-registry-router-1  Unable to mount volumes for pod "mypod26_bz1668893(79a63ad0-2e8b-11e9-b868-fa163eb7596d)": timeout expired waiting for volumes to attach/mount for pod "bz1668893"/"mypod26". list of unattached/unmounted volumes=[dynamic]

Comment 6 Liang Xia 2019-02-13 02:26:33 UTC
QE tried again on version v3.9.69 with below steps,

1. Update the node and leave only one to schedulable status.
# oc adm manage-node --schedulable xxx
# oc get nodes
NAME                                STATUS                     ROLES     AGE       VERSION
qe-lxia-39-master-etcd-nfs-1        Ready,SchedulingDisabled   master    17m       v1.9.1+a0ce1bc657
qe-lxia-39-node-registry-router-1   Ready                      compute   17m       v1.9.1+a0ce1bc657

2. Enable predicate via scheduler.json
# grep -i cinder /etc/origin/master/scheduler.json  -A4 -B4
        {
            "name": "MaxAzureDiskVolumeCount"
        }, 
        {
            "name": "MaxCinderVolumeCount"
        }, 
        {
            "name": "MatchInterPodAffinity"
        }, 
3. Set KUBE_MAX_PD_VOLS=3
# grep -i vol /etc/sysconfig/atomic-openshift-master-controllers
KUBE_MAX_PD_VOLS=3

4. Restart api and controller service.

5. Create 4 pvc/pod.
# oc get pods mypod{1..4}
NAME      READY     STATUS    RESTARTS   AGE
mypod1    1/1       Running   0          7m
mypod2    1/1       Running   0          6m
mypod3    1/1       Running   0          6m
mypod4    0/1       Pending   0          5m

# oc describe pod mypod4
Name:         mypod4
Namespace:    default
Node:         <none>
Labels:       <none>
Annotations:  openshift.io/scc=anyuid
Status:       Pending
IP:           
Containers:
  dynamic:
    Image:        aosqe/hello-openshift
    Port:         80/TCP
    Environment:  <none>
    Mounts:
      /mnt/ocp_pv from dynamic (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-l9mpz (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  dynamic:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  mypvc4
    ReadOnly:   false
  default-token-l9mpz:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-l9mpz
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  14s (x22 over 5m)  default-scheduler  0/2 nodes are available: 1 MaxVolumeCount, 1 NodeUnschedulable.

Comment 8 errata-xmlrpc 2019-02-20 08:46:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0331