1570176 – OSD cluster doesn't respect Pod Anti-affinity configuration during scheduling

Bug 1570176 - OSD cluster doesn't respect Pod Anti-affinity configuration during scheduling

Summary: OSD cluster doesn't respect Pod Anti-affinity configuration during scheduling

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Online
Classification:	Red Hat
Component:	Pod
Sub Component:
Version:	3.x
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	3.x
Assignee:	Avesh Agarwal
QA Contact:	DeShuai Ma
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-04-20 20:00 UTC by Jesse Sarnovsky
Modified:	2018-05-15 11:44 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-15 11:44:05 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Jesse Sarnovsky 2018-04-20 20:00:58 UTC

Description of problem:

OSD cluster doesn't respect Pod Anti-affinity configuration during scheduling.

We have 4 sets of pods using pod Anti-Affinity like this, with a unique app label for the set:
1. 3 pods w/ mongo - All 3 pods were scheduled on the same infra node
2. 3 pods w/ mysql - All 3 pods were scheduled on the same infra node
3. 3 pods w/ mongo - 2 pods scheduled on one infra, 1 pod on 2nd infra, and 0 pods on 3rd infra node
4. 3 pods w/ mongo - All 3 pods scheduled on different compute nodes

In all 4 cases, there were 3 or 4 nodes which matched the node selector with plenty of resources available.

Version-Release number of selected component (if applicable):
OSD v3.7.23

How reproducible:
About 50% of the time

Steps to Reproduce:
1. Create project with annotation:

openshift.io/node-selector: type=infra

2. Create 3 deployment configs in the project with the following:

- 1 replica each
- label app=fh-core-mongo
- podAntiAffinity configured as:

spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- fh-core-mongo
topologyKey: kubernetes.io/hostname

3. Use 'oc get pods -n <project> -o wide' to see which node(s) each of the 3 pods are running on.

Actual results:
All 3 pods end up running on the same node like below

mongodb-1-1-2bjpb 1/1 Running 0 1d 10.1.6.199 ip-172-31-28-26.eu-west-1.compute.internal
mongodb-2-1-hnvnb 1/1 Running 0 1d 10.1.6.201 ip-172-31-28-26.eu-west-1.compute.internal
mongodb-3-1-6l9rn 1/1 Running 0 1d 10.1.6.203 ip-172-31-28-26.eu-west-1.compute.internal

Expected results:
Each of the 3 pods should be scheduled onto a different one of the nodes.

Additional info:
The same configuration worked in all 4 sets prior to OSD being upgraded from 3.6 to 3.7. The same configuration also works on multiple standard OpenShift clusters running v3.7.23.

Comment 1 Avesh Agarwal 2018-04-23 15:59:14 UTC

Can you provide "oc get pod <pod-name> -o yaml" after the 3 pods have been deployed?

Also, how many nodes are in the cluster? Could you provide oc describe for all nodes?

Comment 2 Jesse Sarnovsky 2018-04-23 18:48:00 UTC

Sure, output of oc get pods:

[root@rhm-eng-a-master-71c94 ~]# oc get pods/mongodb-1-1-2bjpb -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"rhmap-core","name":"mongodb-1-1","uid":"6857a0f7-4359-11e8-b5c3-0aba80795ecd","apiVersion":"v1","resourceVersion":"5055451"}}
    openshift.io/deployment-config.latest-version: "1"
    openshift.io/deployment-config.name: mongodb-1
    openshift.io/deployment.name: mongodb-1-1
    openshift.io/generated-by: OpenShiftNewApp
    openshift.io/scc: restricted
  creationTimestamp: 2018-04-18T22:39:54Z
  generateName: mongodb-1-1-
  labels:
    app: fh-core-mongo
    deployment: mongodb-1-1
    deploymentconfig: mongodb-1
    name: mongodb-replica-1
  name: mongodb-1-1-2bjpb
  namespace: rhmap-core
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicationController
    name: mongodb-1-1
    uid: 6857a0f7-4359-11e8-b5c3-0aba80795ecd
  resourceVersion: "5055518"
  selfLink: /api/v1/namespaces/rhmap-core/pods/mongodb-1-1-2bjpb
  uid: 6a12115d-4359-11e8-b5c3-0aba80795ecd
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - fh-core-mongo
        topologyKey: kubernetes.io/hostname
  containers:
  - command:
    - run-mongod-replication
    env:
    - name: MONGODB_REPLICA_NAME
      value: rs0
    - name: MONGODB_SERVICE_NAME
      value: mongodb
    - name: MONGODB_KEYFILE_VALUE
      valueFrom:
        configMapKeyRef:
          key: mongodb-keyfile-value
          name: mongodb-keys
    - name: MONGODB_ADMIN_PASSWORD
      valueFrom:
        configMapKeyRef:
          key: mongodb-admin-password
          name: mongodb-keys
    - name: MONGODB_FHAAA_USER
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-aaa-user
          name: mongodb-keys
    - name: MONGODB_FHAAA_PASSWORD
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-aaa-password
          name: mongodb-keys
    - name: MONGODB_FHAAA_DATABASE
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-aaa-database
          name: mongodb-keys
    - name: MONGODB_FHSUPERCORE_USER
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-supercore-user
          name: mongodb-keys
    - name: MONGODB_FHSUPERCORE_PASSWORD
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-supercore-password
          name: mongodb-keys
    - name: MONGODB_FHSUPERCORE_DATABASE
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-supercore-database
          name: mongodb-keys
    - name: MONGODB_FHREPORTING_USER
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-reporting-user
          name: mongodb-keys
    - name: MONGODB_FHREPORTING_PASSWORD
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-reporting-password
          name: mongodb-keys
    - name: MONGODB_FHREPORTING_DATABASE
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-reporting-database
          name: mongodb-keys
    image: rhmap46/mongodb:3.2-36
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 2
      initialDelaySeconds: 5
      periodSeconds: 60
      successThreshold: 1
      tcpSocket:
        port: 27017
      timeoutSeconds: 5
    name: mongodb
    ports:
    - containerPort: 27017
      protocol: TCP
    resources:
      limits:
        cpu: "1"
        memory: 1000Mi
      requests:
        cpu: 200m
        memory: 200Mi
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - NET_RAW
        - SETGID
        - SETUID
      privileged: false
      runAsUser: 1003390000
      seLinuxOptions:
        level: s0:c58,c42
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/mongodb/data
      name: mongodb-data-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-jg9mx
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: default-dockercfg-svrdf
  nodeName: ip-172-31-28-26.eu-west-1.compute.internal
  nodeSelector:
    type: infra
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1003390000
    seLinuxOptions:
      level: s0:c58,c42
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
  - name: mongodb-data-volume
    persistentVolumeClaim:
      claimName: mongodb-claim-1
  - name: default-token-jg9mx
    secret:
      defaultMode: 420
      secretName: default-token-jg9mx
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2018-04-18T22:39:54Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2018-04-18T22:40:14Z
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2018-04-18T22:39:54Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://4688c03e7841d87465c53467e08edc17e5792af936ba223fa8a29e812b0afd39
    image: registry.rhm-eng-a.openshift.com:443/rhmap46/mongodb:3.2-36
    imageID: docker-pullable://registry.rhm-eng-a.openshift.com:443/rhmap46/mongodb@sha256:bcfd94b74bfb049fc6c5649216d703f15fe22c2caf30121ade844760fdefc601
    lastState: {}
    name: mongodb
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2018-04-18T22:40:13Z
  hostIP: 172.31.28.26
  phase: Running
  podIP: 10.1.6.199
  qosClass: Burstable
  startTime: 2018-04-18T22:39:54Z
[root@rhm-eng-a-master-71c94 ~]# oc get pods/mongodb-2-1-hnvnb -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"rhmap-core","name":"mongodb-2-1","uid":"7d447891-4359-11e8-b5c3-0aba80795ecd","apiVersion":"v1","resourceVersion":"5055658"}}
    openshift.io/deployment-config.latest-version: "1"
    openshift.io/deployment-config.name: mongodb-2
    openshift.io/deployment.name: mongodb-2-1
    openshift.io/generated-by: OpenShiftNewApp
    openshift.io/scc: restricted
  creationTimestamp: 2018-04-18T22:40:29Z
  generateName: mongodb-2-1-
  labels:
    app: fh-core-mongo
    deployment: mongodb-2-1
    deploymentconfig: mongodb-2
    name: mongodb-replica-2
  name: mongodb-2-1-hnvnb
  namespace: rhmap-core
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicationController
    name: mongodb-2-1
    uid: 7d447891-4359-11e8-b5c3-0aba80795ecd
  resourceVersion: "5055743"
  selfLink: /api/v1/namespaces/rhmap-core/pods/mongodb-2-1-hnvnb
  uid: 7f2b1503-4359-11e8-b5c3-0aba80795ecd
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - fh-core-mongo
        topologyKey: kubernetes.io/hostname
  containers:
  - command:
    - run-mongod-replication
    env:
    - name: MONGODB_REPLICA_NAME
      value: rs0
    - name: MONGODB_SERVICE_NAME
      value: mongodb
    - name: MONGODB_KEYFILE_VALUE
      valueFrom:
        configMapKeyRef:
          key: mongodb-keyfile-value
          name: mongodb-keys
    - name: MONGODB_ADMIN_PASSWORD
      valueFrom:
        configMapKeyRef:
          key: mongodb-admin-password
          name: mongodb-keys
    - name: MONGODB_FHAAA_USER
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-aaa-user
          name: mongodb-keys
    - name: MONGODB_FHAAA_PASSWORD
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-aaa-password
          name: mongodb-keys
    - name: MONGODB_FHAAA_DATABASE
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-aaa-database
          name: mongodb-keys
    - name: MONGODB_FHSUPERCORE_USER
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-supercore-user
          name: mongodb-keys
    - name: MONGODB_FHSUPERCORE_PASSWORD
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-supercore-password
          name: mongodb-keys
    - name: MONGODB_FHSUPERCORE_DATABASE
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-supercore-database
          name: mongodb-keys
    - name: MONGODB_FHREPORTING_USER
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-reporting-user
          name: mongodb-keys
    - name: MONGODB_FHREPORTING_PASSWORD
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-reporting-password
          name: mongodb-keys
    - name: MONGODB_FHREPORTING_DATABASE
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-reporting-database
          name: mongodb-keys
    image: rhmap46/mongodb:3.2-36
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 2
      initialDelaySeconds: 5
      periodSeconds: 60
      successThreshold: 1
      tcpSocket:
        port: 27017
      timeoutSeconds: 5
    name: mongodb
    ports:
    - containerPort: 27017
      protocol: TCP
    resources:
      limits:
        cpu: "1"
        memory: 1000Mi
      requests:
        cpu: 200m
        memory: 200Mi
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - NET_RAW
        - SETGID
        - SETUID
      privileged: false
      runAsUser: 1003390000
      seLinuxOptions:
        level: s0:c58,c42
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/mongodb/data
      name: mongodb-data-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-jg9mx
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: default-dockercfg-svrdf
  nodeName: ip-172-31-28-26.eu-west-1.compute.internal
  nodeSelector:
    type: infra
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1003390000
    seLinuxOptions:
      level: s0:c58,c42
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
  - name: mongodb-data-volume
    persistentVolumeClaim:
      claimName: mongodb-claim-2
  - name: default-token-jg9mx
    secret:
      defaultMode: 420
      secretName: default-token-jg9mx
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2018-04-18T22:40:29Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2018-04-18T22:40:49Z
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2018-04-18T22:40:29Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://f4005d898dbc2b7486eec987f05453626ee658e73b5bbf5944b4a687458ed30d
    image: registry.rhm-eng-a.openshift.com:443/rhmap46/mongodb:3.2-36
    imageID: docker-pullable://registry.rhm-eng-a.openshift.com:443/rhmap46/mongodb@sha256:bcfd94b74bfb049fc6c5649216d703f15fe22c2caf30121ade844760fdefc601
    lastState: {}
    name: mongodb
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2018-04-18T22:40:49Z
  hostIP: 172.31.28.26
  phase: Running
  podIP: 10.1.6.201
  qosClass: Burstable
  startTime: 2018-04-18T22:40:29Z
[root@rhm-eng-a-master-71c94 ~]# oc get pods/mongodb-3-1-6l9rn -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"rhmap-core","name":"mongodb-3-1","uid":"92d84cb1-4359-11e8-b5c3-0aba80795ecd","apiVersion":"v1","resourceVersion":"5055825"}}
    openshift.io/deployment-config.latest-version: "1"
    openshift.io/deployment-config.name: mongodb-3
    openshift.io/deployment.name: mongodb-3-1
    openshift.io/generated-by: OpenShiftNewApp
    openshift.io/scc: restricted
  creationTimestamp: 2018-04-18T22:41:05Z
  generateName: mongodb-3-1-
  labels:
    app: fh-core-mongo
    deployment: mongodb-3-1
    deploymentconfig: mongodb-3
    name: mongodb-replica-3
  name: mongodb-3-1-6l9rn
  namespace: rhmap-core
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicationController
    name: mongodb-3-1
    uid: 92d84cb1-4359-11e8-b5c3-0aba80795ecd
  resourceVersion: "5055887"
  selfLink: /api/v1/namespaces/rhmap-core/pods/mongodb-3-1-6l9rn
  uid: 947c17b1-4359-11e8-b5c3-0aba80795ecd
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - fh-core-mongo
        topologyKey: kubernetes.io/hostname
  containers:
  - command:
    - run-mongod-replication
    env:
    - name: MONGODB_REPLICA_NAME
      value: rs0
    - name: MONGODB_SERVICE_NAME
      value: mongodb
    - name: MONGODB_KEYFILE_VALUE
      valueFrom:
        configMapKeyRef:
          key: mongodb-keyfile-value
          name: mongodb-keys
    - name: MONGODB_ADMIN_PASSWORD
      valueFrom:
        configMapKeyRef:
          key: mongodb-admin-password
          name: mongodb-keys
    - name: MONGODB_FHAAA_USER
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-aaa-user
          name: mongodb-keys
    - name: MONGODB_FHAAA_PASSWORD
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-aaa-password
          name: mongodb-keys
    - name: MONGODB_FHAAA_DATABASE
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-aaa-database
          name: mongodb-keys
    - name: MONGODB_FHSUPERCORE_USER
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-supercore-user
          name: mongodb-keys
    - name: MONGODB_FHSUPERCORE_PASSWORD
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-supercore-password
          name: mongodb-keys
    - name: MONGODB_FHSUPERCORE_DATABASE
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-supercore-database
          name: mongodb-keys
    - name: MONGODB_FHREPORTING_USER
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-reporting-user
          name: mongodb-keys
    - name: MONGODB_FHREPORTING_PASSWORD
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-reporting-password
          name: mongodb-keys
    - name: MONGODB_FHREPORTING_DATABASE
      valueFrom:
        configMapKeyRef:
          key: mongodb-fh-reporting-database
          name: mongodb-keys
    image: rhmap46/mongodb:3.2-36
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 2
      initialDelaySeconds: 5
      periodSeconds: 60
      successThreshold: 1
      tcpSocket:
        port: 27017
      timeoutSeconds: 5
    name: mongodb
    ports:
    - containerPort: 27017
      protocol: TCP
    resources:
      limits:
        cpu: "1"
        memory: 1000Mi
      requests:
        cpu: 200m
        memory: 200Mi
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - NET_RAW
        - SETGID
        - SETUID
      privileged: false
      runAsUser: 1003390000
      seLinuxOptions:
        level: s0:c58,c42
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/mongodb/data
      name: mongodb-data-volume
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-jg9mx
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: default-dockercfg-svrdf
  nodeName: ip-172-31-28-26.eu-west-1.compute.internal
  nodeSelector:
    type: infra
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1003390000
    seLinuxOptions:
      level: s0:c58,c42
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
  - name: mongodb-data-volume
    persistentVolumeClaim:
      claimName: mongodb-claim-3
  - name: default-token-jg9mx
    secret:
      defaultMode: 420
      secretName: default-token-jg9mx
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2018-04-18T22:41:05Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2018-04-18T22:41:24Z
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2018-04-18T22:41:05Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://06a4bd72fe8d2d7bc2116ee7e0e83a3b51a6b5f26d37bb0ae7ff7dd65e9deb34
    image: registry.rhm-eng-a.openshift.com:443/rhmap46/mongodb:3.2-36
    imageID: docker-pullable://registry.rhm-eng-a.openshift.com:443/rhmap46/mongodb@sha256:bcfd94b74bfb049fc6c5649216d703f15fe22c2caf30121ade844760fdefc601
    lastState: {}
    name: mongodb
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2018-04-18T22:41:24Z
  hostIP: 172.31.28.26
  phase: Running
  podIP: 10.1.6.203
  qosClass: Burstable
  startTime: 2018-04-18T22:41:05Z




This is a standard build of OSD, so 3 masters, 3 infra, 4 compute.  Since we are providing a hosted service for RHMAP, this set of pods runs on the 3 infra nodes.

Output of oc describe for all nodes:

Name:			ip-172-31-17-34.eu-west-1.compute.internal
Role:
Labels:			beta.kubernetes.io/arch=amd64
			beta.kubernetes.io/instance-type=m4.xlarge
			beta.kubernetes.io/os=linux
			failure-domain.beta.kubernetes.io/region=eu-west-1
			failure-domain.beta.kubernetes.io/zone=eu-west-1a
			hostname=rhm-eng-a-node-compute-b1a7d
			kubernetes.io/hostname=ip-172-31-17-34.eu-west-1.compute.internal
			logging-infra-fluentd=true
			ops_node=new
			region=eu-west-1
			type=compute
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Wed, 04 Apr 2018 03:52:09 -0400
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  DiskPressure 		False 	Mon, 23 Apr 2018 14:42:19 -0400 	Wed, 04 Apr 2018 06:05:21 -0400 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
  MemoryPressure 	False 	Mon, 23 Apr 2018 14:42:19 -0400 	Wed, 04 Apr 2018 06:05:21 -0400 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  OutOfDisk 		False 	Mon, 23 Apr 2018 14:42:19 -0400 	Wed, 04 Apr 2018 06:05:21 -0400 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  Ready 		True 	Mon, 23 Apr 2018 14:42:19 -0400 	Wed, 04 Apr 2018 06:05:31 -0400 	KubeletReady 			kubelet is posting ready status
Addresses:
  InternalIP:	172.31.17.34
  ExternalIP:	34.244.185.160
  InternalDNS:	ip-172-31-17-34.eu-west-1.compute.internal
  ExternalDNS:	ec2-34-244-185-160.eu-west-1.compute.amazonaws.com
  Hostname:	ip-172-31-17-34.eu-west-1.compute.internal
Capacity:
 cpu:		4
 memory:	16266532Ki
 pods:		40
Allocatable:
 cpu:		3
 memory:	15115556Ki
 pods:		40
System Info:
 Machine ID:			0307f3889c4e4ab49e0d409c90f6062e
 System UUID:			EC28668C-8A2E-FEFF-4DA5-D90B78190807
 Boot ID:			d0c375c4-ac0d-4b94-8212-44d4897ea2ad
 Kernel Version:		3.10.0-693.21.1.el7.x86_64
 OS Image:			Employee SKU
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.7.6+a08f5eeb62
 Kube-Proxy Version:		v1.7.6+a08f5eeb62
ExternalID:			i-07772bbe384b0920a
Non-terminated Pods:		(10 in total)
  Namespace			Name						CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----						------------	----------	---------------	-------------
  logging			logging-fluentd-vsjsq				100m (3%)	0 (0%)		512Mi (3%)	512Mi (3%)
  rhmap-rhmap-ci-ocp4-e		redis-1524226262828gwkn-1-j9mql			100m (3%)	500m (16%)	100Mi (0%)	500Mi (3%)
  rhmap-rhmap-dev		nodejs-cloudappdevmp2s-1-fsfpc			100m (3%)	500m (16%)	90Mi (0%)	250Mi (1%)
  rhmap-rhmap-dev		nodejs-cloudappdevpw6i-1-sz82q			100m (3%)	500m (16%)	90Mi (0%)	250Mi (1%)
  rhmap-rhmap-dev		nodejs-testingcloudappdev3jyn-1-qsmk5		100m (3%)	500m (16%)	90Mi (0%)	250Mi (1%)
  rhmap-rhmap-dev		redis-1524148095926pw6i-1-jp44m			100m (3%)	500m (16%)	100Mi (0%)	500Mi (3%)
  rhmap-rhmap-dev		redis-1524226029129xhpb-1-w4j6c			100m (3%)	500m (16%)	100Mi (0%)	500Mi (3%)
  rhmap-rhmap-dev		redis-15242266930537pvi-1-56mkw			100m (3%)	500m (16%)	100Mi (0%)	500Mi (3%)
  rhmap-rhmap-dev		redis-1524226968390ihdm-1-db4j6			100m (3%)	500m (16%)	100Mi (0%)	500Mi (3%)
  rhmap-rhmap-dev		redis-1524236984850rmof-1-spmxm			100m (3%)	500m (16%)	100Mi (0%)	500Mi (3%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  1 (33%)	4500m (150%)	1382Mi (9%)	4262Mi (28%)
Events:		<none>


Name:			ip-172-31-21-124.eu-west-1.compute.internal
Role:
Labels:			beta.kubernetes.io/arch=amd64
			beta.kubernetes.io/instance-type=m4.2xlarge
			beta.kubernetes.io/os=linux
			failure-domain.beta.kubernetes.io/region=eu-west-1
			failure-domain.beta.kubernetes.io/zone=eu-west-1a
			hostname=rhm-eng-a-node-infra-7205d
			kubernetes.io/hostname=ip-172-31-21-124.eu-west-1.compute.internal
			logging-infra-fluentd=true
			ops_node=new
			region=eu-west-1
			type=infra
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Tue, 13 Mar 2018 06:38:51 -0400
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  DiskPressure 		False 	Mon, 23 Apr 2018 14:42:24 -0400 	Wed, 04 Apr 2018 06:08:52 -0400 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
  MemoryPressure 	False 	Mon, 23 Apr 2018 14:42:24 -0400 	Wed, 04 Apr 2018 06:08:52 -0400 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  OutOfDisk 		False 	Mon, 23 Apr 2018 14:42:24 -0400 	Wed, 04 Apr 2018 06:08:52 -0400 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  Ready 		True 	Mon, 23 Apr 2018 14:42:24 -0400 	Wed, 04 Apr 2018 06:09:02 -0400 	KubeletReady 			kubelet is posting ready status
Addresses:
  InternalIP:	172.31.21.124
  ExternalIP:	34.245.191.60
  InternalDNS:	ip-172-31-21-124.eu-west-1.compute.internal
  ExternalDNS:	ec2-34-245-191-60.eu-west-1.compute.amazonaws.com
  Hostname:	ip-172-31-21-124.eu-west-1.compute.internal
Capacity:
 cpu:		8
 memory:	32780604Ki
 pods:		80
Allocatable:
 cpu:		7
 memory:	31629628Ki
 pods:		80
System Info:
 Machine ID:			d52c597d0f1a42aeb01b5a7d71e63f24
 System UUID:			EC22BD3C-F2F3-CB12-D6B0-022D9E23F985
 Boot ID:			fd7333d0-9ed2-43be-99c8-b10e900da4b9
 Kernel Version:		3.10.0-693.11.6.el7.x86_64
 OS Image:			Red Hat Enterprise Linux
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.7.6+a08f5eeb62
 Kube-Proxy Version:		v1.7.6+a08f5eeb62
ExternalID:			i-008d6ef81fc43b760
Non-terminated Pods:		(19 in total)
  Namespace			Name						CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----						------------	----------	---------------	-------------
  default			docker-registry-26-87s9f			0 (0%)		0 (0%)		1G (3%)		2G (6%)
  default			router-674-cvll2				100m (1%)	0 (0%)		256Mi (0%)	0 (0%)
  logging			logging-es-data-master-x5uwa0st-2-qdj5r		475m (6%)	0 (0%)		12544Mi (40%)	12544Mi (40%)
  logging			logging-fluentd-md7sk				100m (1%)	0 (0%)		512Mi (1%)	512Mi (1%)
  logging			logging-kibana-7-rsjlj				50m (0%)	0 (0%)		1280Mi (4%)	1280Mi (4%)
  openshift-infra		hawkular-cassandra-1-q2vhc			375m (5%)	0 (0%)		4Gi (13%)	4Gi (13%)
  openshift-infra		hawkular-metrics-lkz4p				100m (1%)	0 (0%)		3Gi (9%)	3Gi (9%)
  openshift-infra		heapster-9hxff					100m (1%)	0 (0%)		3840Mi (12%)	3840Mi (12%)
  rhmap-3-node-mbaas		fh-mbaas-2-8q6zr				200m (2%)	800m (11%)	200Mi (0%)	800Mi (2%)
  rhmap-3-node-mbaas		fh-messaging-1-wzdrk				200m (2%)	400m (5%)	200Mi (0%)	400Mi (1%)
  rhmap-3-node-mbaas		fh-metrics-1-rmmjd				200m (2%)	400m (5%)	200Mi (0%)	400Mi (1%)
  rhmap-3-node-mbaas		fh-statsd-1-2lwtn				200m (2%)	400m (5%)	200Mi (0%)	400Mi (1%)
  rhmap-3-node-mbaas		mongodb-3-1-2zwtl				200m (2%)	1 (14%)		200Mi (0%)	1000Mi (3%)
  rhmap-core			fh-aaa-1-957q2					20m (0%)	800m (11%)	100Mi (0%)	800Mi (2%)
  rhmap-core			fh-appstore-1-fk6sj				1m (0%)		800m (11%)	50Mi (0%)	800Mi (2%)
  rhmap-core			fh-messaging-1-m97ms				200m (2%)	400m (5%)	200Mi (0%)	400Mi (1%)
  rhmap-core			fh-metrics-1-2rhdj				200m (2%)	400m (5%)	200Mi (0%)	400Mi (1%)
  rhmap-core			fh-scm-1-n4kdp					12m (0%)	800m (11%)	70Mi (0%)	800Mi (2%)
  rhmap-core			fh-supercore-1-m85rx				20m (0%)	800m (11%)	200Mi (0%)	800Mi (2%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests		Memory Limits
  ------------	----------	---------------		-------------
  2753m (39%)	7 (100%)	29751953920 (91%)	35915142144 (110%)
Events:		<none>


Name:			ip-172-31-23-102.eu-west-1.compute.internal
Role:
Labels:			beta.kubernetes.io/arch=amd64
			beta.kubernetes.io/instance-type=m4.2xlarge
			beta.kubernetes.io/os=linux
			failure-domain.beta.kubernetes.io/region=eu-west-1
			failure-domain.beta.kubernetes.io/zone=eu-west-1a
			hostname=rhm-eng-a-node-infra-4117d
			kubernetes.io/hostname=ip-172-31-23-102.eu-west-1.compute.internal
			logging-infra-fluentd=true
			ops_node=new
			region=eu-west-1
			type=infra
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Tue, 13 Mar 2018 06:33:52 -0400
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  DiskPressure 		False 	Mon, 23 Apr 2018 14:42:19 -0400 	Wed, 04 Apr 2018 05:36:22 -0400 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
  MemoryPressure 	False 	Mon, 23 Apr 2018 14:42:19 -0400 	Wed, 04 Apr 2018 05:36:22 -0400 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  OutOfDisk 		False 	Mon, 23 Apr 2018 14:42:19 -0400 	Wed, 04 Apr 2018 05:36:22 -0400 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  Ready 		True 	Mon, 23 Apr 2018 14:42:19 -0400 	Wed, 04 Apr 2018 05:36:32 -0400 	KubeletReady 			kubelet is posting ready status
Addresses:
  InternalIP:	172.31.23.102
  ExternalIP:	34.243.193.29
  InternalDNS:	ip-172-31-23-102.eu-west-1.compute.internal
  ExternalDNS:	ec2-34-243-193-29.eu-west-1.compute.amazonaws.com
  Hostname:	ip-172-31-23-102.eu-west-1.compute.internal
Capacity:
 cpu:		8
 memory:	32780604Ki
 pods:		80
Allocatable:
 cpu:		7
 memory:	31629628Ki
 pods:		80
System Info:
 Machine ID:			d52c597d0f1a42aeb01b5a7d71e63f24
 System UUID:			EC2742D1-50F5-B783-1E57-B78585E70FD2
 Boot ID:			d9ba8d84-5648-4232-bde1-cb5014d5311b
 Kernel Version:		3.10.0-693.11.6.el7.x86_64
 OS Image:			Red Hat Enterprise Linux
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.7.6+a08f5eeb62
 Kube-Proxy Version:		v1.7.6+a08f5eeb62
ExternalID:			i-07feb12a94b083610
Non-terminated Pods:		(17 in total)
  Namespace			Name					CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----					------------	----------	---------------	-------------
  default			docker-registry-26-b4gx7		0 (0%)		0 (0%)		1G (3%)		2G (6%)
  default			docker-registry-26-gn9vs		0 (0%)		0 (0%)		1G (3%)		2G (6%)
  default			oso-rhel7-zagg-web-2-6cz9c		500m (7%)	1 (14%)		1536Mi (4%)	1536Mi (4%)
  default			oso-rhel7-zagg-web-2-x8lbg		500m (7%)	1 (14%)		1536Mi (4%)	1536Mi (4%)
  default			router-674-2682r			100m (1%)	0 (0%)		256Mi (0%)	0 (0%)
  logging			logging-curator-6-rkwrn			25m (0%)	0 (0%)		512Mi (1%)	512Mi (1%)
  logging			logging-es-lorgc43d-9-gdm48		475m (6%)	0 (0%)		12544Mi (40%)	12544Mi (40%)
  logging			logging-fluentd-jcmwl			100m (1%)	0 (0%)		512Mi (1%)	512Mi (1%)
  openshift-infra		hawkular-cassandra-2-pcfgg		375m (5%)	0 (0%)		4Gi (13%)	4Gi (13%)
  rhmap-3-node-mbaas		fh-mbaas-2-fcp9c			200m (2%)	800m (11%)	200Mi (0%)	800Mi (2%)
  rhmap-3-node-mbaas		fh-messaging-1-5nw95			200m (2%)	400m (5%)	200Mi (0%)	400Mi (1%)
  rhmap-3-node-mbaas		fh-metrics-1-v6sgd			200m (2%)	400m (5%)	200Mi (0%)	400Mi (1%)
  rhmap-core			fh-metrics-1-dhc6d			200m (2%)	400m (5%)	200Mi (0%)	400Mi (1%)
  rhmap-core			fh-ngui-1-nlhzk				10m (0%)	800m (11%)	250Mi (0%)	800Mi (2%)
  rhmap-core			gitlab-shell-1-2g28x			20m (0%)	1600m (22%)	100Mi (0%)	1600Mi (5%)
  rhmap-core			millicore-1-6ppr2			1011m (14%)	3600m (51%)	1560Mi (5%)	5100Mi (16%)
  rhmap-core			redis-1-qknsq				100m (1%)	500m (7%)	100Mi (0%)	500Mi (1%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests		Memory Limits
  ------------	----------	---------------		-------------
  4016m (57%)	10500m (150%)	26958205952 (83%)	36229031936 (111%)
Events:		<none>


Name:			ip-172-31-23-59.eu-west-1.compute.internal
Role:
Labels:			beta.kubernetes.io/arch=amd64
			beta.kubernetes.io/instance-type=m4.xlarge
			beta.kubernetes.io/os=linux
			failure-domain.beta.kubernetes.io/region=eu-west-1
			failure-domain.beta.kubernetes.io/zone=eu-west-1a
			hostname=rhm-eng-a-node-compute-4f00c
			kubernetes.io/hostname=ip-172-31-23-59.eu-west-1.compute.internal
			logging-infra-fluentd=true
			ops_node=new
			region=eu-west-1
			type=compute
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Wed, 04 Apr 2018 03:52:08 -0400
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  DiskPressure 		False 	Mon, 23 Apr 2018 14:42:15 -0400 	Wed, 04 Apr 2018 05:58:06 -0400 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
  MemoryPressure 	False 	Mon, 23 Apr 2018 14:42:15 -0400 	Wed, 04 Apr 2018 05:58:06 -0400 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  OutOfDisk 		False 	Mon, 23 Apr 2018 14:42:15 -0400 	Wed, 04 Apr 2018 05:58:06 -0400 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  Ready 		True 	Mon, 23 Apr 2018 14:42:15 -0400 	Wed, 04 Apr 2018 05:58:16 -0400 	KubeletReady 			kubelet is posting ready status
Addresses:
  InternalIP:	172.31.23.59
  ExternalIP:	34.243.136.167
  InternalDNS:	ip-172-31-23-59.eu-west-1.compute.internal
  ExternalDNS:	ec2-34-243-136-167.eu-west-1.compute.amazonaws.com
  Hostname:	ip-172-31-23-59.eu-west-1.compute.internal
Capacity:
 cpu:		4
 memory:	16266532Ki
 pods:		40
Allocatable:
 cpu:		3
 memory:	15115556Ki
 pods:		40
System Info:
 Machine ID:			0307f3889c4e4ab49e0d409c90f6062e
 System UUID:			EC2E9508-C920-94C1-C118-EAE69C4E0835
 Boot ID:			f9c6af19-c987-43a6-9ff6-45db7b89997e
 Kernel Version:		3.10.0-693.21.1.el7.x86_64
 OS Image:			Employee SKU
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.7.6+a08f5eeb62
 Kube-Proxy Version:		v1.7.6+a08f5eeb62
ExternalID:			i-0d7ff2ea60614c42f
Non-terminated Pods:		(6 in total)
  Namespace			Name						CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----						------------	----------	---------------	-------------
  logging			logging-fluentd-k62js				100m (3%)	0 (0%)		512Mi (3%)	512Mi (3%)
  ops-health-monitoring		pull-04051430z-tv-1-rznk7			0 (0%)		0 (0%)		0 (0%)		0 (0%)
  rhmap-rhmap-dev		nodejs-appart15242369785devrmof-3-cmjs5		100m (3%)	500m (16%)	90Mi (0%)	250Mi (1%)
  rhmap-rhmap-dev		redis-15242271345913jyn-1-v5jgf			100m (3%)	500m (16%)	100Mi (0%)	500Mi (3%)
  rhmap-rhmap-dev		redis-15242416697974hay-1-r9dbb			100m (3%)	500m (16%)	100Mi (0%)	500Mi (3%)
  rhmap-user-data		mongodb-2-1-9pt57				200m (6%)	1 (33%)		200Mi (1%)	1000Mi (6%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  600m (20%)	2500m (83%)	1002Mi (6%)	2762Mi (18%)
Events:		<none>


Name:			ip-172-31-27-184.eu-west-1.compute.internal
Role:
Labels:			beta.kubernetes.io/arch=amd64
			beta.kubernetes.io/instance-type=m4.xlarge
			beta.kubernetes.io/os=linux
			failure-domain.beta.kubernetes.io/region=eu-west-1
			failure-domain.beta.kubernetes.io/zone=eu-west-1a
			hostname=rhm-eng-a-node-compute-61e52
			kubernetes.io/hostname=ip-172-31-27-184.eu-west-1.compute.internal
			logging-infra-fluentd=true
			ops_node=new
			region=eu-west-1
			type=compute
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Wed, 04 Apr 2018 03:52:10 -0400
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  DiskPressure 		False 	Mon, 23 Apr 2018 14:42:20 -0400 	Fri, 13 Apr 2018 04:49:01 -0400 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
  MemoryPressure 	False 	Mon, 23 Apr 2018 14:42:20 -0400 	Fri, 13 Apr 2018 04:49:01 -0400 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  OutOfDisk 		False 	Mon, 23 Apr 2018 14:42:20 -0400 	Fri, 13 Apr 2018 04:49:01 -0400 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  Ready 		True 	Mon, 23 Apr 2018 14:42:20 -0400 	Fri, 13 Apr 2018 04:49:11 -0400 	KubeletReady 			kubelet is posting ready status
Addresses:
  InternalIP:	172.31.27.184
  ExternalIP:	52.48.69.46
  InternalDNS:	ip-172-31-27-184.eu-west-1.compute.internal
  ExternalDNS:	ec2-52-48-69-46.eu-west-1.compute.amazonaws.com
  Hostname:	ip-172-31-27-184.eu-west-1.compute.internal
Capacity:
 cpu:		4
 memory:	16266532Ki
 pods:		40
Allocatable:
 cpu:		3
 memory:	15115556Ki
 pods:		40
System Info:
 Machine ID:			0307f3889c4e4ab49e0d409c90f6062e
 System UUID:			EC2E55DB-A652-6E0E-0DB5-949F3EA610A8
 Boot ID:			4636d560-1bbc-4f0b-8cd7-9b874eceffa3
 Kernel Version:		3.10.0-693.21.1.el7.x86_64
 OS Image:			Employee SKU
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.7.6+a08f5eeb62
 Kube-Proxy Version:		v1.7.6+a08f5eeb62
ExternalID:			i-0d0d55ad3d3bb0e1d
Non-terminated Pods:		(9 in total)
  Namespace			Name						CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----						------------	----------	---------------	-------------
  logging			logging-fluentd-gwrx2				100m (3%)	0 (0%)		512Mi (3%)	512Mi (3%)
  rhmap-rhmap-ci-ocp4-e		nodejs-testingcloudappciocgwkn-1-bz76w		100m (3%)	500m (16%)	90Mi (0%)	250Mi (1%)
  rhmap-rhmap-dev		nodejs-ciocp4serviceeditdevihdm-1-hmx2k		100m (3%)	500m (16%)	90Mi (0%)	250Mi (1%)
  rhmap-rhmap-dev		nodejs-cloudappdev4hay-2-bx9hj			100m (3%)	500m (16%)	90Mi (0%)	250Mi (1%)
  rhmap-rhmap-dev		nodejs-cloudappdevxhpb-1-4nn4q			100m (3%)	500m (16%)	90Mi (0%)	250Mi (1%)
  rhmap-rhmap-dev		nodejs-testingcloudappdev7pvi-1-fwqt9		100m (3%)	500m (16%)	90Mi (0%)	250Mi (1%)
  rhmap-rhmap-dev		redis-1524226482906mp2s-1-54z6r			100m (3%)	500m (16%)	100Mi (0%)	500Mi (3%)
  rhmap-rhmap-dev		redis-1524476222268fsjm-1-r47rq			100m (3%)	500m (16%)	100Mi (0%)	500Mi (3%)
  rhmap-user-data		mongodb-1-1-bqcvs				200m (6%)	1 (33%)		200Mi (1%)	1000Mi (6%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  1 (33%)	4500m (150%)	1362Mi (9%)	3762Mi (25%)
Events:		<none>


Name:			ip-172-31-28-26.eu-west-1.compute.internal
Role:
Labels:			beta.kubernetes.io/arch=amd64
			beta.kubernetes.io/instance-type=m4.2xlarge
			beta.kubernetes.io/os=linux
			failure-domain.beta.kubernetes.io/region=eu-west-1
			failure-domain.beta.kubernetes.io/zone=eu-west-1a
			hostname=rhm-eng-a-node-infra-4c268
			kubernetes.io/hostname=ip-172-31-28-26.eu-west-1.compute.internal
			logging-infra-fluentd=true
			ops_node=new
			region=eu-west-1
			type=infra
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Tue, 13 Mar 2018 06:28:24 -0400
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  DiskPressure 		False 	Mon, 23 Apr 2018 14:42:20 -0400 	Fri, 13 Apr 2018 04:50:17 -0400 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
  MemoryPressure 	False 	Mon, 23 Apr 2018 14:42:20 -0400 	Fri, 13 Apr 2018 04:50:17 -0400 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  OutOfDisk 		False 	Mon, 23 Apr 2018 14:42:20 -0400 	Fri, 13 Apr 2018 04:50:17 -0400 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  Ready 		True 	Mon, 23 Apr 2018 14:42:20 -0400 	Fri, 13 Apr 2018 04:50:17 -0400 	KubeletReady 			kubelet is posting ready status
Addresses:
  InternalIP:	172.31.28.26
  ExternalIP:	34.245.57.92
  InternalDNS:	ip-172-31-28-26.eu-west-1.compute.internal
  ExternalDNS:	ec2-34-245-57-92.eu-west-1.compute.amazonaws.com
  Hostname:	ip-172-31-28-26.eu-west-1.compute.internal
Capacity:
 cpu:		8
 memory:	32780604Ki
 pods:		80
Allocatable:
 cpu:		7
 memory:	31629628Ki
 pods:		80
System Info:
 Machine ID:			d52c597d0f1a42aeb01b5a7d71e63f24
 System UUID:			EC204444-95C3-B208-11A0-224CD7735A9B
 Boot ID:			393988dc-93af-4cb1-8da6-31e8140d8695
 Kernel Version:		3.10.0-693.11.6.el7.x86_64
 OS Image:			Red Hat Enterprise Linux
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.7.6+a08f5eeb62
 Kube-Proxy Version:		v1.7.6+a08f5eeb62
ExternalID:			i-021d17d2f4724546b
Non-terminated Pods:		(23 in total)
  Namespace			Name					CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----					------------	----------	---------------	-------------
  logging			logging-es-j7sg158e-9-qd6d6		475m (6%)	0 (0%)		12544Mi (40%)	12544Mi (40%)
  logging			logging-fluentd-hnmwl			100m (1%)	0 (0%)		512Mi (1%)	512Mi (1%)
  openshift-infra		hawkular-cassandra-3-7tb5b		375m (5%)	0 (0%)		4Gi (13%)	4Gi (13%)
  rhmap-3-node-mbaas		fh-mbaas-2-cwphc			200m (2%)	800m (11%)	200Mi (0%)	800Mi (2%)
  rhmap-3-node-mbaas		fh-messaging-1-8jt4t			200m (2%)	400m (5%)	200Mi (0%)	400Mi (1%)
  rhmap-3-node-mbaas		fh-metrics-1-4qzfb			200m (2%)	400m (5%)	200Mi (0%)	400Mi (1%)
  rhmap-3-node-mbaas		mongodb-1-1-dghrn			200m (2%)	1 (14%)		200Mi (0%)	1000Mi (3%)
  rhmap-3-node-mbaas		mongodb-2-1-fz8r7			200m (2%)	1 (14%)		200Mi (0%)	1000Mi (3%)
  rhmap-3-node-mbaas		nagios-1-xsj9d				200m (2%)	800m (11%)	200Mi (0%)	800Mi (2%)
  rhmap-core			fh-aaa-1-8vnkl				20m (0%)	800m (11%)	100Mi (0%)	800Mi (2%)
  rhmap-core			fh-messaging-1-6wr4j			200m (2%)	400m (5%)	200Mi (0%)	400Mi (1%)
  rhmap-core			fh-ngui-1-t5hb2				10m (0%)	800m (11%)	250Mi (0%)	800Mi (2%)
  rhmap-core			fh-supercore-1-99wzm			20m (0%)	800m (11%)	200Mi (0%)	800Mi (2%)
  rhmap-core			memcached-1-7shkf			10m (0%)	800m (11%)	30Mi (0%)	500M (1%)
  rhmap-core			millicore-1-xc6nx			1011m (14%)	3600m (51%)	1560Mi (5%)	5100Mi (16%)
  rhmap-core			mongodb-1-1-2bjpb			200m (2%)	1 (14%)		200Mi (0%)	1000Mi (3%)
  rhmap-core			mongodb-2-1-hnvnb			200m (2%)	1 (14%)		200Mi (0%)	1000Mi (3%)
  rhmap-core			mongodb-3-1-6l9rn			200m (2%)	1 (14%)		200Mi (0%)	1000Mi (3%)
  rhmap-core			mysql-1-pqj7h				100m (1%)	3200m (45%)	700Mi (2%)	1Gi (3%)
  rhmap-core			mysql-2-1-mnftk				100m (1%)	3200m (45%)	700Mi (2%)	1Gi (3%)
  rhmap-core			mysql-3-1-7hj8l				100m (1%)	3200m (45%)	700Mi (2%)	1Gi (3%)
  rhmap-core			nagios-1-x5dg4				200m (2%)	800m (11%)	200Mi (0%)	800Mi (2%)
  rhmap-core			ups-1-nbwmq				400m (5%)	2 (28%)		900Mi (2%)	5000Mi (16%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  4921m (70%)	27 (385%)	24492Mi (79%)	43831354624 (135%)
Events:		<none>


Name:			ip-172-31-29-232.eu-west-1.compute.internal
Role:
Labels:			hostname=rhm-eng-a-master-03f3e
			kubernetes.io/hostname=ip-172-31-29-232.eu-west-1.compute.internal
			region=eu-west-1
			type=master
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Wed, 02 Mar 2016 15:56:36 -0500
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  Ready 		True 	Mon, 23 Apr 2018 14:42:24 -0400 	Thu, 19 Apr 2018 21:11:36 -0400 	KubeletReady 			kubelet is posting ready status
  OutOfDisk 		False 	Mon, 23 Apr 2018 14:42:24 -0400 	Wed, 04 Apr 2018 05:08:37 -0400 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  MemoryPressure 	False 	Mon, 23 Apr 2018 14:42:24 -0400 	Wed, 04 Apr 2018 05:08:37 -0400 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  DiskPressure 		False 	Mon, 23 Apr 2018 14:42:24 -0400 	Mon, 23 Apr 2018 03:40:05 -0400 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
Addresses:
  InternalIP:	172.31.29.232
  ExternalIP:	52.48.129.40
  InternalDNS:	ip-172-31-29-232.eu-west-1.compute.internal
  ExternalDNS:	ec2-52-48-129-40.eu-west-1.compute.amazonaws.com
  Hostname:	ip-172-31-29-232.eu-west-1.compute.internal
Capacity:
 alpha.kubernetes.io/nvidia-gpu:	0
 cpu:					4
 memory:				16266564Ki
 pods:					40
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:	0
 cpu:					3
 memory:				15115588Ki
 pods:					40
System Info:
 Machine ID:			f9370ed252a14f73b014c1301a9b6d1b
 System UUID:			EC2F7CF1-76C0-6A6C-EE62-0885492B3414
 Boot ID:			392a5b53-3894-4d50-b566-de75d5409be5
 Kernel Version:		3.10.0-693.11.6.el7.x86_64
 OS Image:			Unknown
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.7.6+a08f5eeb62
 Kube-Proxy Version:		v1.7.6+a08f5eeb62
ExternalID:			i-c5816049
Non-terminated Pods:		(0 in total)
  Namespace			Name		CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----		------------	----------	---------------	-------------
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  0 (0%)	0 (0%)		0 (0%)		0 (0%)
Events:
  FirstSeen	LastSeen	Count	From							SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----							-------------	--------	------			-------
  41d		28d		2	kubelet, ip-172-31-29-232.eu-west-1.compute.internal			Normal		NodeHasSufficientDisk	Node ip-172-31-29-232.eu-west-1.compute.internal status is now: NodeHasSufficientDisk
  41d		28d		2	kubelet, ip-172-31-29-232.eu-west-1.compute.internal			Normal		NodeHasSufficientMemory	Node ip-172-31-29-232.eu-west-1.compute.internal status is now: NodeHasSufficientMemory
  41d		28d		2	kubelet, ip-172-31-29-232.eu-west-1.compute.internal			Normal		NodeHasNoDiskPressure	Node ip-172-31-29-232.eu-west-1.compute.internal status is now: NodeHasNoDiskPressure
  41d		28d		2	kubelet, ip-172-31-29-232.eu-west-1.compute.internal			Normal		NodeReady		Node ip-172-31-29-232.eu-west-1.compute.internal status is now: NodeReady


Name:			ip-172-31-29-233.eu-west-1.compute.internal
Role:
Labels:			beta.kubernetes.io/arch=amd64
			beta.kubernetes.io/instance-type=m4.xlarge
			beta.kubernetes.io/os=linux
			failure-domain.beta.kubernetes.io/region=eu-west-1
			failure-domain.beta.kubernetes.io/zone=eu-west-1a
			hostname=rhm-eng-a-master-5c646
			kubernetes.io/hostname=ip-172-31-29-233.eu-west-1.compute.internal
			region=eu-west-1
			type=master
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Fri, 09 Feb 2018 13:09:41 -0500
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  OutOfDisk 		False 	Mon, 23 Apr 2018 14:42:26 -0400 	Fri, 13 Apr 2018 04:50:43 -0400 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  MemoryPressure 	False 	Mon, 23 Apr 2018 14:42:26 -0400 	Fri, 13 Apr 2018 04:50:43 -0400 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  DiskPressure 		False 	Mon, 23 Apr 2018 14:42:26 -0400 	Fri, 13 Apr 2018 04:50:43 -0400 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
  Ready 		True 	Mon, 23 Apr 2018 14:42:26 -0400 	Fri, 13 Apr 2018 04:50:43 -0400 	KubeletReady 			kubelet is posting ready status
Addresses:
  InternalIP:	172.31.29.233
  ExternalIP:	34.244.70.245
  InternalDNS:	ip-172-31-29-233.eu-west-1.compute.internal
  ExternalDNS:	ec2-34-244-70-245.eu-west-1.compute.amazonaws.com
  Hostname:	ip-172-31-29-233.eu-west-1.compute.internal
Capacity:
 cpu:		4
 memory:	16266564Ki
 pods:		40
Allocatable:
 cpu:		3
 memory:	15115588Ki
 pods:		40
System Info:
 Machine ID:			f9370ed252a14f73b014c1301a9b6d1b
 System UUID:			EC2E0928-DEBC-7B7E-77E2-A5E6B27C36AD
 Boot ID:			67322ed3-98df-4767-bde0-d0065646a09e
 Kernel Version:		3.10.0-693.11.6.el7.x86_64
 OS Image:			Red Hat Enterprise Linux
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.7.6+a08f5eeb62
 Kube-Proxy Version:		v1.7.6+a08f5eeb62
ExternalID:			i-c681604a
Non-terminated Pods:		(0 in total)
  Namespace			Name		CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----		------------	----------	---------------	-------------
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  0 (0%)	0 (0%)		0 (0%)		0 (0%)
Events:		<none>


Name:			ip-172-31-29-234.eu-west-1.compute.internal
Role:
Labels:			hostname=rhm-eng-a-master-71c94
			kubernetes.io/hostname=ip-172-31-29-234.eu-west-1.compute.internal
			region=eu-west-1
			type=master
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Wed, 17 May 2017 01:10:59 -0400
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  OutOfDisk 		False 	Mon, 23 Apr 2018 14:42:25 -0400 	Thu, 05 Apr 2018 11:40:44 -0400 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  MemoryPressure 	False 	Mon, 23 Apr 2018 14:42:25 -0400 	Thu, 05 Apr 2018 11:40:44 -0400 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  DiskPressure 		False 	Mon, 23 Apr 2018 14:42:25 -0400 	Thu, 05 Apr 2018 11:40:44 -0400 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
  Ready 		True 	Mon, 23 Apr 2018 14:42:25 -0400 	Fri, 06 Apr 2018 09:47:45 -0400 	KubeletReady 			kubelet is posting ready status
Addresses:
  InternalIP:	172.31.29.234
  ExternalIP:	34.250.56.107
  InternalDNS:	ip-172-31-29-234.eu-west-1.compute.internal
  ExternalDNS:	ec2-34-250-56-107.eu-west-1.compute.amazonaws.com
  Hostname:	ip-172-31-29-234.eu-west-1.compute.internal
Capacity:
 alpha.kubernetes.io/nvidia-gpu:	0
 cpu:					4
 memory:				16266564Ki
 pods:					40
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:	0
 cpu:					3
 memory:				15115588Ki
 pods:					40
System Info:
 Machine ID:			f9370ed252a14f73b014c1301a9b6d1b
 System UUID:			EC205367-9042-3F7C-2DED-58D9E521FCF9
 Boot ID:			52380979-fc28-47a8-beda-852b2671c567
 Kernel Version:		3.10.0-693.11.6.el7.x86_64
 OS Image:			Red Hat Enterprise Linux
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.7.6+a08f5eeb62
 Kube-Proxy Version:		v1.7.6+a08f5eeb62
ExternalID:			i-c781604b
Non-terminated Pods:		(0 in total)
  Namespace			Name		CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----		------------	----------	---------------	-------------
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  0 (0%)	0 (0%)		0 (0%)		0 (0%)
Events:		<none>


Name:			ip-172-31-31-152.eu-west-1.compute.internal
Role:
Labels:			beta.kubernetes.io/arch=amd64
			beta.kubernetes.io/instance-type=m4.xlarge
			beta.kubernetes.io/os=linux
			failure-domain.beta.kubernetes.io/region=eu-west-1
			failure-domain.beta.kubernetes.io/zone=eu-west-1a
			hostname=rhm-eng-a-node-compute-b3bb0
			kubernetes.io/hostname=ip-172-31-31-152.eu-west-1.compute.internal
			logging-infra-fluentd=true
			ops_node=new
			region=eu-west-1
			type=compute
Annotations:		volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Wed, 04 Apr 2018 03:52:08 -0400
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  DiskPressure 		False 	Mon, 23 Apr 2018 14:42:25 -0400 	Wed, 04 Apr 2018 05:26:07 -0400 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
  MemoryPressure 	False 	Mon, 23 Apr 2018 14:42:25 -0400 	Wed, 04 Apr 2018 05:26:07 -0400 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  OutOfDisk 		False 	Mon, 23 Apr 2018 14:42:25 -0400 	Wed, 04 Apr 2018 05:26:07 -0400 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  Ready 		True 	Mon, 23 Apr 2018 14:42:25 -0400 	Wed, 04 Apr 2018 05:26:17 -0400 	KubeletReady 			kubelet is posting ready status
Addresses:
  InternalIP:	172.31.31.152
  ExternalIP:	34.243.189.38
  InternalDNS:	ip-172-31-31-152.eu-west-1.compute.internal
  ExternalDNS:	ec2-34-243-189-38.eu-west-1.compute.amazonaws.com
  Hostname:	ip-172-31-31-152.eu-west-1.compute.internal
Capacity:
 cpu:		4
 memory:	16266532Ki
 pods:		40
Allocatable:
 cpu:		3
 memory:	15115556Ki
 pods:		40
System Info:
 Machine ID:			0307f3889c4e4ab49e0d409c90f6062e
 System UUID:			EC25E5E3-C131-91DB-FE85-E7B2F74476F7
 Boot ID:			6d911651-1316-4b4d-8f09-a870aad79e66
 Kernel Version:		3.10.0-693.21.1.el7.x86_64
 OS Image:			Employee SKU
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.7.6+a08f5eeb62
 Kube-Proxy Version:		v1.7.6+a08f5eeb62
ExternalID:			i-04322315f55b94b7d
Non-terminated Pods:		(6 in total)
  Namespace			Name					CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----					------------	----------	---------------	-------------
  logging			logging-fluentd-lqb9n			100m (3%)	0 (0%)		512Mi (3%)	512Mi (3%)
  nodejs-examples		node10-3-ljwn4				0 (0%)		0 (0%)		0 (0%)		0 (0%)
  rhmap-rhmap-dev		nodejs-cloudappdevfsjm-1-xxwnf		100m (3%)	500m (16%)	90Mi (0%)	250Mi (1%)
  rhmap-rhmap-dev		nodejs-samlclouddev3ctp-2-cjz5g		100m (3%)	500m (16%)	90Mi (0%)	250Mi (1%)
  rhmap-rhmap-dev		redis-15242368745743ctp-1-xmsr9		100m (3%)	500m (16%)	100Mi (0%)	500Mi (3%)
  rhmap-user-data		mongodb-3-1-wfdtg			200m (6%)	1 (33%)		200Mi (1%)	1000Mi (6%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  600m (20%)	2500m (83%)	992Mi (6%)	2512Mi (17%)
Events:		<none>



Let me know if there is anything else I can provide.

Comment 3 Avesh Agarwal 2018-04-24 14:15:07 UTC

what is the name of this dedicated cluster? As I have tiered access to starter clusters, I am going to try same access to this cluster too.

Some other questions based on following:

"Additional info:
The same configuration worked in all 4 sets prior to OSD being upgraded from 3.6 to 3.7.  The same configuration also works on multiple standard OpenShift clusters running v3.7.23."

1. So as you said that it was working in 3.6, were the pod spec same? i.e. was anit-affinity configuration part of pod spec and not as annotations in 3.6? want to double check, as this is the biggest suspect I have so far. I am wondering if something related to upgrade that remained even after upgrade.

2. So this issue is only happening in this dedicated cluster, not in any other clusters, right? 

Meanwhile I will see if i can get some more logs myself from this cluster.

Comment 4 Jesse Sarnovsky 2018-04-24 14:56:14 UTC

This is the rhm-eng-a OSD cluster. Answers to your questions:

"1. So as you said that it was working in 3.6, were the pod spec same? i.e. was anit-affinity configuration part of pod spec and not as annotations in 3.6? want to double check, as this is the biggest suspect I have so far. I am wondering if something related to upgrade that remained even after upgrade."

The pod specs were been identical. In fact, we installed our components while it was running OpenShift 3.6 and left them running so we could test availability of our components while the OpenShift SRE team was upgrading OSD to 3.7.

Prior to the upgrade, the 3 pods were on separate nodes. From what I understand, when the first infra node was drained to be upgraded, the pod running on it was doubled up onto one of the other two infra nodes (understandable). When 2nd node was drained to upgrade, pods moved to the 3rd infra node still running 3.6. When the final node was drained to be upgraded, all 3 pods moved to just one of the nodes running 3.7.

We didn't know if it was something specific to the upgrade of OpenShift itself, so we cleaned off all of our stuff and reinstalled again using same templates we had used prior to the upgrade. Again, all 3 pods ended up running on just one of the infra nodes.

We've never used annotations for anti-affinity, its always been through the pod spec podAntiAffinity.

"2. So this issue is only happening in this dedicated cluster, not in any other clusters, right?"

We only have one dedicated cluster, so we haven't been able to test with another OSD. The other clusters we have to test with have OpenShift 3.7 installed by us so they aren't the same as OSD. We have yet to encounter the same problem on any of the OpenShift clusters we created/manage ourselves.

Comment 5 Avesh Agarwal 2018-04-25 14:21:42 UTC

As 3 masters are being run, I am looking if there is an issue with leaderelection or not.

Comment 6 Avesh Agarwal 2018-04-25 16:48:48 UTC

Hi Jesse,

Could you provide controller logs from all 3 masters when the issue happens? Or as per the previous info, mongo pods were created on Apr 18, could you provide controller logs from all 3 masters from Apr 18 (or around 17,18,19).

Comment 7 Avesh Agarwal 2018-04-25 16:56:06 UTC

Meanwhile I am checking if https://github.com/kubernetes/kubernetes/pull/60526 should be picked in 3.7.

Comment 8 Avesh Agarwal 2018-04-25 17:11:34 UTC

(In reply to Avesh Agarwal from comment #7)
> Meanwhile I am checking if
> https://github.com/kubernetes/kubernetes/pull/60526 should be picked in 3.7.

May be not, as this issue seemed only 1.9.

Comment 9 Avesh Agarwal 2018-04-25 17:13:40 UTC

Hi Jesse,

Also if you could see if only for scheduler, you could increase logging to some higher level like 10 for short time to really see what was going on with the scheduler?

Comment 10 Jesse Sarnovsky 2018-04-30 13:36:03 UTC

This cluster has been heavily used for final testing before today's GA of RHMAP 4.x Hosted.  I'll try to borrow the cluster tomorrow to try the recommended actions.  We should have a second dedicated cluster coming soon which would help us confirm if its somehow an issue specific to this cluster or if its something more general to all Dedicated clusters.

Comment 11 Jesse Sarnovsky 2018-05-09 14:18:49 UTC

We have finally gotten our second OSD cluster and the same problem has occurred on it as well.  In fact, it appears that there may be an issue with the scheduler in general as we had 23 of 25 pods within one project ending up on the same node.

I'm turning up the logging and later today RHMAP will be wiped and re-deployed from scratch which should gather the detailed logs we are looking for.

Comment 12 Avesh Agarwal 2018-05-09 14:29:43 UTC

(In reply to Jesse Sarnovsky from comment #11)
> We have finally gotten our second OSD cluster and the same problem has
> occurred on it as well.  In fact, it appears that there may be an issue with
> the scheduler in general as we had 23 of 25 pods within one project ending
> up on the same node.
> 
> I'm turning up the logging and later today RHMAP will be wiped and
> re-deployed from scratch which should gather the detailed logs we are
> looking for.

Hi Jesse,

Yes I would really like to see what is going on, so providing detailed logs (and possibly for scheduler with log level 10) would be helpful. 

Also I am wondering if I can get access to this 2nd OSD cluster, that would be very helpful to see directly what is going on. Infact I can work with you if possible.

Comment 13 Jesse Sarnovsky 2018-05-09 16:50:40 UTC

It took me a while to find out how to modify logging level for just the scheduler; however, during that time, I ended up finding the root cause of the issue.  The default scheduler configuration on the two clusters are very different and explain the problems each has.

On the new cluster (named 'mobile'), Anti-Affinity DOES work properly distributing the mongos across the infra nodes for all 3 projects.  This is because there are 3 separate services, one for each pod.  All of the other multi-instance pods have a single service for the group and they are ending up mostly scheduled to the same node.

Both of these behaviors are due to the default scheduler config which:
- DOES NOT have ServiceSpreadingPriority set (which would have spread the pods within each service to multiple nodes)
- DOES have InterPodAffinityPriority set (which is why the Pod Anti-Affinity is working)

In contrast, the scheduler config for the previous rhm-eng-a cluster:
- DOES NOT have InterPodAffinityPriority set (makes it ignore the Pod Anti-Affinity) 
- DOES have ServiceSpreadingPriority set (which is spreading the pods within each service to multiple nodes)



Here are the full default scheduler configurations for reference purposes.  It is easy to see that the existence of the ServiceSpreadingPriority and InterPodAffinityPriority options are causing the behaviors we are experiencing.
 
Scheduler config for mobile cluster:
{
    "apiVersion": "v1",
    "kind": "Policy",
    "predicates": [
        {
            "name": "NoVolumeZoneConflict"
        },
        {
            "name": "MaxEBSVolumeCount"
        },
        {
            "name": "MaxGCEPDVolumeCount"
        },
        {
            "name": "MaxAzureDiskVolumeCount"
        },
        {
            "name": "MatchInterPodAffinity"
        },
        {
            "name": "NoDiskConflict"
        },
        {
            "name": "GeneralPredicates"
        },
        {
            "name": "PodToleratesNodeTaints"
        },
        {
            "name": "CheckNodeMemoryPressure"
        },
        {
            "name": "CheckNodeDiskPressure"
        },
        {
            "name": "NoVolumeNodeConflict"
        },
        {
            "argument": {
                "serviceAffinity": {
                    "labels": [
                        "region"
                    ]
                }
            },
            "name": "Region"
        }
    ],
    "priorities": [
        {
            "name": "SelectorSpreadPriority",
            "weight": 1
        },
        {
            "name": "InterPodAffinityPriority",
            "weight": 1
        },
        {
            "name": "LeastRequestedPriority",
            "weight": 1
        },
        {
            "name": "BalancedResourceAllocation",
            "weight": 1
        },
        {
            "name": "NodePreferAvoidPodsPriority",
            "weight": 10000
        },
        {
            "name": "NodeAffinityPriority",
            "weight": 1
        },
        {
            "name": "TaintTolerationPriority",
            "weight": 1
        },
        {
            "argument": {
                "serviceAntiAffinity": {
                    "label": "zone"
                }
            },
            "name": "Zone",
            "weight": 2
        }
    ]
}



Scheduler config for rhm-eng-a cluster:

{
    "apiVersion": "v1",
    "kind": "Policy",
    "predicates": [
        {
            "name": "MatchNodeSelector"
        },
        {
            "name": "PodFitsResources"
        },
        {
            "name": "PodFitsPorts"
        },
        {
            "name": "NoDiskConflict"
        },
        {
            "name": "MaxEBSVolumeCount"
        },
        {
            "name": "NoVolumeZoneConflict"
        }
    ],
    "priorities": [
        {
            "name": "LeastRequestedPriority",
            "weight": 1
        },
        {
            "name": "ServiceSpreadingPriority",
            "weight": 1
        }
    ]
}

Comment 14 Avesh Agarwal 2018-05-09 17:03:04 UTC

Hi Jesse,

Thanks for providing this information. And I agree your investigation is correct. And that's why I was very surprised why Pod Anti Affinity did not work as I have not been able to reproduce it on my 3.7 cluster.

Since the original issue was due to scheduler mis-configuration (InterPodAffinityPriority not enabled) So now my question is that is it ok to close this bz or are you still looking for some other help?

Comment 15 Jesse Sarnovsky 2018-05-10 15:02:05 UTC

I do still think its a bug with OSD provisioning and/or the config loop since it fails to manage the default scheduler configuration consistently.  Also, I don't think either of the two configs is the right setup which should be used by default.

I've reached out to the OSD guys who asked me to open this Bugzilla to see how they want to capture the remaining work needed.

Comment 16 Avesh Agarwal 2018-05-10 15:12:40 UTC

(In reply to Jesse Sarnovsky from comment #15)
> I do still think its a bug with OSD provisioning and/or the config loop
> since it fails to manage the default scheduler configuration consistently. 

That sounds good. Though I'd suggest that if its about OSD provisioning and/or the config loop, assigining it to right people would be better to get their attention. 

> Also, I don't think either of the two configs is the right setup which
> should be used by default.

I can surely help with it. And here is my suggestion:

1. Just use default scheduler configuration (or IOWs the default scheduler config file)

2. Don't configure serviceAntiAffinity priority function:        

        {
            "argument": {
                "serviceAntiAffinity": {
                    "label": "zone"
                }
            },
            "name": "Zone",
            "weight": 2
        }

SelectSpreadingPriority function is enabled by default and already works on labels

failure-domain.beta.kubernetes.io/region
failure-domain.beta.kubernetes.io/zone

And in case, if the above labels are not being configured on nodes in OSD clusters, I would like to know why.

> 
> I've reached out to the OSD guys who asked me to open this Bugzilla to see
> how they want to capture the remaining work needed.

Comment 17 Jesse Sarnovsky 2018-05-15 11:44:05 UTC

Thomas will be creating a card to track resolving the issue on their side.

Thanks for your help, Avesh.

Note You need to log in before you can comment on or make changes to this bug.