Description of problem: OSD cluster doesn't respect Pod Anti-affinity configuration during scheduling. We have 4 sets of pods using pod Anti-Affinity like this, with a unique app label for the set: 1. 3 pods w/ mongo - All 3 pods were scheduled on the same infra node 2. 3 pods w/ mysql - All 3 pods were scheduled on the same infra node 3. 3 pods w/ mongo - 2 pods scheduled on one infra, 1 pod on 2nd infra, and 0 pods on 3rd infra node 4. 3 pods w/ mongo - All 3 pods scheduled on different compute nodes In all 4 cases, there were 3 or 4 nodes which matched the node selector with plenty of resources available. Version-Release number of selected component (if applicable): OSD v3.7.23 How reproducible: About 50% of the time Steps to Reproduce: 1. Create project with annotation: openshift.io/node-selector: type=infra 2. Create 3 deployment configs in the project with the following: - 1 replica each - label app=fh-core-mongo - podAntiAffinity configured as: spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - fh-core-mongo topologyKey: kubernetes.io/hostname 3. Use 'oc get pods -n <project> -o wide' to see which node(s) each of the 3 pods are running on. Actual results: All 3 pods end up running on the same node like below mongodb-1-1-2bjpb 1/1 Running 0 1d 10.1.6.199 ip-172-31-28-26.eu-west-1.compute.internal mongodb-2-1-hnvnb 1/1 Running 0 1d 10.1.6.201 ip-172-31-28-26.eu-west-1.compute.internal mongodb-3-1-6l9rn 1/1 Running 0 1d 10.1.6.203 ip-172-31-28-26.eu-west-1.compute.internal Expected results: Each of the 3 pods should be scheduled onto a different one of the nodes. Additional info: The same configuration worked in all 4 sets prior to OSD being upgraded from 3.6 to 3.7. The same configuration also works on multiple standard OpenShift clusters running v3.7.23.
Can you provide "oc get pod <pod-name> -o yaml" after the 3 pods have been deployed? Also, how many nodes are in the cluster? Could you provide oc describe for all nodes?
Sure, output of oc get pods: [root@rhm-eng-a-master-71c94 ~]# oc get pods/mongodb-1-1-2bjpb -o yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/created-by: | {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"rhmap-core","name":"mongodb-1-1","uid":"6857a0f7-4359-11e8-b5c3-0aba80795ecd","apiVersion":"v1","resourceVersion":"5055451"}} openshift.io/deployment-config.latest-version: "1" openshift.io/deployment-config.name: mongodb-1 openshift.io/deployment.name: mongodb-1-1 openshift.io/generated-by: OpenShiftNewApp openshift.io/scc: restricted creationTimestamp: 2018-04-18T22:39:54Z generateName: mongodb-1-1- labels: app: fh-core-mongo deployment: mongodb-1-1 deploymentconfig: mongodb-1 name: mongodb-replica-1 name: mongodb-1-1-2bjpb namespace: rhmap-core ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: ReplicationController name: mongodb-1-1 uid: 6857a0f7-4359-11e8-b5c3-0aba80795ecd resourceVersion: "5055518" selfLink: /api/v1/namespaces/rhmap-core/pods/mongodb-1-1-2bjpb uid: 6a12115d-4359-11e8-b5c3-0aba80795ecd spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - fh-core-mongo topologyKey: kubernetes.io/hostname containers: - command: - run-mongod-replication env: - name: MONGODB_REPLICA_NAME value: rs0 - name: MONGODB_SERVICE_NAME value: mongodb - name: MONGODB_KEYFILE_VALUE valueFrom: configMapKeyRef: key: mongodb-keyfile-value name: mongodb-keys - name: MONGODB_ADMIN_PASSWORD valueFrom: configMapKeyRef: key: mongodb-admin-password name: mongodb-keys - name: MONGODB_FHAAA_USER valueFrom: configMapKeyRef: key: mongodb-fh-aaa-user name: mongodb-keys - name: MONGODB_FHAAA_PASSWORD valueFrom: configMapKeyRef: key: mongodb-fh-aaa-password name: mongodb-keys - name: MONGODB_FHAAA_DATABASE valueFrom: configMapKeyRef: key: mongodb-fh-aaa-database name: mongodb-keys - name: MONGODB_FHSUPERCORE_USER valueFrom: configMapKeyRef: key: mongodb-fh-supercore-user name: mongodb-keys - name: MONGODB_FHSUPERCORE_PASSWORD valueFrom: configMapKeyRef: key: mongodb-fh-supercore-password name: mongodb-keys - name: MONGODB_FHSUPERCORE_DATABASE valueFrom: configMapKeyRef: key: mongodb-fh-supercore-database name: mongodb-keys - name: MONGODB_FHREPORTING_USER valueFrom: configMapKeyRef: key: mongodb-fh-reporting-user name: mongodb-keys - name: MONGODB_FHREPORTING_PASSWORD valueFrom: configMapKeyRef: key: mongodb-fh-reporting-password name: mongodb-keys - name: MONGODB_FHREPORTING_DATABASE valueFrom: configMapKeyRef: key: mongodb-fh-reporting-database name: mongodb-keys image: rhmap46/mongodb:3.2-36 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 2 initialDelaySeconds: 5 periodSeconds: 60 successThreshold: 1 tcpSocket: port: 27017 timeoutSeconds: 5 name: mongodb ports: - containerPort: 27017 protocol: TCP resources: limits: cpu: "1" memory: 1000Mi requests: cpu: 200m memory: 200Mi securityContext: capabilities: drop: - KILL - MKNOD - NET_RAW - SETGID - SETUID privileged: false runAsUser: 1003390000 seLinuxOptions: level: s0:c58,c42 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/mongodb/data name: mongodb-data-volume - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-jg9mx readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: default-dockercfg-svrdf nodeName: ip-172-31-28-26.eu-west-1.compute.internal nodeSelector: type: infra restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1003390000 seLinuxOptions: level: s0:c58,c42 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: - name: mongodb-data-volume persistentVolumeClaim: claimName: mongodb-claim-1 - name: default-token-jg9mx secret: defaultMode: 420 secretName: default-token-jg9mx status: conditions: - lastProbeTime: null lastTransitionTime: 2018-04-18T22:39:54Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2018-04-18T22:40:14Z status: "True" type: Ready - lastProbeTime: null lastTransitionTime: 2018-04-18T22:39:54Z status: "True" type: PodScheduled containerStatuses: - containerID: docker://4688c03e7841d87465c53467e08edc17e5792af936ba223fa8a29e812b0afd39 image: registry.rhm-eng-a.openshift.com:443/rhmap46/mongodb:3.2-36 imageID: docker-pullable://registry.rhm-eng-a.openshift.com:443/rhmap46/mongodb@sha256:bcfd94b74bfb049fc6c5649216d703f15fe22c2caf30121ade844760fdefc601 lastState: {} name: mongodb ready: true restartCount: 0 state: running: startedAt: 2018-04-18T22:40:13Z hostIP: 172.31.28.26 phase: Running podIP: 10.1.6.199 qosClass: Burstable startTime: 2018-04-18T22:39:54Z [root@rhm-eng-a-master-71c94 ~]# oc get pods/mongodb-2-1-hnvnb -o yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/created-by: | {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"rhmap-core","name":"mongodb-2-1","uid":"7d447891-4359-11e8-b5c3-0aba80795ecd","apiVersion":"v1","resourceVersion":"5055658"}} openshift.io/deployment-config.latest-version: "1" openshift.io/deployment-config.name: mongodb-2 openshift.io/deployment.name: mongodb-2-1 openshift.io/generated-by: OpenShiftNewApp openshift.io/scc: restricted creationTimestamp: 2018-04-18T22:40:29Z generateName: mongodb-2-1- labels: app: fh-core-mongo deployment: mongodb-2-1 deploymentconfig: mongodb-2 name: mongodb-replica-2 name: mongodb-2-1-hnvnb namespace: rhmap-core ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: ReplicationController name: mongodb-2-1 uid: 7d447891-4359-11e8-b5c3-0aba80795ecd resourceVersion: "5055743" selfLink: /api/v1/namespaces/rhmap-core/pods/mongodb-2-1-hnvnb uid: 7f2b1503-4359-11e8-b5c3-0aba80795ecd spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - fh-core-mongo topologyKey: kubernetes.io/hostname containers: - command: - run-mongod-replication env: - name: MONGODB_REPLICA_NAME value: rs0 - name: MONGODB_SERVICE_NAME value: mongodb - name: MONGODB_KEYFILE_VALUE valueFrom: configMapKeyRef: key: mongodb-keyfile-value name: mongodb-keys - name: MONGODB_ADMIN_PASSWORD valueFrom: configMapKeyRef: key: mongodb-admin-password name: mongodb-keys - name: MONGODB_FHAAA_USER valueFrom: configMapKeyRef: key: mongodb-fh-aaa-user name: mongodb-keys - name: MONGODB_FHAAA_PASSWORD valueFrom: configMapKeyRef: key: mongodb-fh-aaa-password name: mongodb-keys - name: MONGODB_FHAAA_DATABASE valueFrom: configMapKeyRef: key: mongodb-fh-aaa-database name: mongodb-keys - name: MONGODB_FHSUPERCORE_USER valueFrom: configMapKeyRef: key: mongodb-fh-supercore-user name: mongodb-keys - name: MONGODB_FHSUPERCORE_PASSWORD valueFrom: configMapKeyRef: key: mongodb-fh-supercore-password name: mongodb-keys - name: MONGODB_FHSUPERCORE_DATABASE valueFrom: configMapKeyRef: key: mongodb-fh-supercore-database name: mongodb-keys - name: MONGODB_FHREPORTING_USER valueFrom: configMapKeyRef: key: mongodb-fh-reporting-user name: mongodb-keys - name: MONGODB_FHREPORTING_PASSWORD valueFrom: configMapKeyRef: key: mongodb-fh-reporting-password name: mongodb-keys - name: MONGODB_FHREPORTING_DATABASE valueFrom: configMapKeyRef: key: mongodb-fh-reporting-database name: mongodb-keys image: rhmap46/mongodb:3.2-36 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 2 initialDelaySeconds: 5 periodSeconds: 60 successThreshold: 1 tcpSocket: port: 27017 timeoutSeconds: 5 name: mongodb ports: - containerPort: 27017 protocol: TCP resources: limits: cpu: "1" memory: 1000Mi requests: cpu: 200m memory: 200Mi securityContext: capabilities: drop: - KILL - MKNOD - NET_RAW - SETGID - SETUID privileged: false runAsUser: 1003390000 seLinuxOptions: level: s0:c58,c42 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/mongodb/data name: mongodb-data-volume - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-jg9mx readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: default-dockercfg-svrdf nodeName: ip-172-31-28-26.eu-west-1.compute.internal nodeSelector: type: infra restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1003390000 seLinuxOptions: level: s0:c58,c42 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: - name: mongodb-data-volume persistentVolumeClaim: claimName: mongodb-claim-2 - name: default-token-jg9mx secret: defaultMode: 420 secretName: default-token-jg9mx status: conditions: - lastProbeTime: null lastTransitionTime: 2018-04-18T22:40:29Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2018-04-18T22:40:49Z status: "True" type: Ready - lastProbeTime: null lastTransitionTime: 2018-04-18T22:40:29Z status: "True" type: PodScheduled containerStatuses: - containerID: docker://f4005d898dbc2b7486eec987f05453626ee658e73b5bbf5944b4a687458ed30d image: registry.rhm-eng-a.openshift.com:443/rhmap46/mongodb:3.2-36 imageID: docker-pullable://registry.rhm-eng-a.openshift.com:443/rhmap46/mongodb@sha256:bcfd94b74bfb049fc6c5649216d703f15fe22c2caf30121ade844760fdefc601 lastState: {} name: mongodb ready: true restartCount: 0 state: running: startedAt: 2018-04-18T22:40:49Z hostIP: 172.31.28.26 phase: Running podIP: 10.1.6.201 qosClass: Burstable startTime: 2018-04-18T22:40:29Z [root@rhm-eng-a-master-71c94 ~]# oc get pods/mongodb-3-1-6l9rn -o yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/created-by: | {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"rhmap-core","name":"mongodb-3-1","uid":"92d84cb1-4359-11e8-b5c3-0aba80795ecd","apiVersion":"v1","resourceVersion":"5055825"}} openshift.io/deployment-config.latest-version: "1" openshift.io/deployment-config.name: mongodb-3 openshift.io/deployment.name: mongodb-3-1 openshift.io/generated-by: OpenShiftNewApp openshift.io/scc: restricted creationTimestamp: 2018-04-18T22:41:05Z generateName: mongodb-3-1- labels: app: fh-core-mongo deployment: mongodb-3-1 deploymentconfig: mongodb-3 name: mongodb-replica-3 name: mongodb-3-1-6l9rn namespace: rhmap-core ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: ReplicationController name: mongodb-3-1 uid: 92d84cb1-4359-11e8-b5c3-0aba80795ecd resourceVersion: "5055887" selfLink: /api/v1/namespaces/rhmap-core/pods/mongodb-3-1-6l9rn uid: 947c17b1-4359-11e8-b5c3-0aba80795ecd spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - fh-core-mongo topologyKey: kubernetes.io/hostname containers: - command: - run-mongod-replication env: - name: MONGODB_REPLICA_NAME value: rs0 - name: MONGODB_SERVICE_NAME value: mongodb - name: MONGODB_KEYFILE_VALUE valueFrom: configMapKeyRef: key: mongodb-keyfile-value name: mongodb-keys - name: MONGODB_ADMIN_PASSWORD valueFrom: configMapKeyRef: key: mongodb-admin-password name: mongodb-keys - name: MONGODB_FHAAA_USER valueFrom: configMapKeyRef: key: mongodb-fh-aaa-user name: mongodb-keys - name: MONGODB_FHAAA_PASSWORD valueFrom: configMapKeyRef: key: mongodb-fh-aaa-password name: mongodb-keys - name: MONGODB_FHAAA_DATABASE valueFrom: configMapKeyRef: key: mongodb-fh-aaa-database name: mongodb-keys - name: MONGODB_FHSUPERCORE_USER valueFrom: configMapKeyRef: key: mongodb-fh-supercore-user name: mongodb-keys - name: MONGODB_FHSUPERCORE_PASSWORD valueFrom: configMapKeyRef: key: mongodb-fh-supercore-password name: mongodb-keys - name: MONGODB_FHSUPERCORE_DATABASE valueFrom: configMapKeyRef: key: mongodb-fh-supercore-database name: mongodb-keys - name: MONGODB_FHREPORTING_USER valueFrom: configMapKeyRef: key: mongodb-fh-reporting-user name: mongodb-keys - name: MONGODB_FHREPORTING_PASSWORD valueFrom: configMapKeyRef: key: mongodb-fh-reporting-password name: mongodb-keys - name: MONGODB_FHREPORTING_DATABASE valueFrom: configMapKeyRef: key: mongodb-fh-reporting-database name: mongodb-keys image: rhmap46/mongodb:3.2-36 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 2 initialDelaySeconds: 5 periodSeconds: 60 successThreshold: 1 tcpSocket: port: 27017 timeoutSeconds: 5 name: mongodb ports: - containerPort: 27017 protocol: TCP resources: limits: cpu: "1" memory: 1000Mi requests: cpu: 200m memory: 200Mi securityContext: capabilities: drop: - KILL - MKNOD - NET_RAW - SETGID - SETUID privileged: false runAsUser: 1003390000 seLinuxOptions: level: s0:c58,c42 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/lib/mongodb/data name: mongodb-data-volume - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-jg9mx readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: default-dockercfg-svrdf nodeName: ip-172-31-28-26.eu-west-1.compute.internal nodeSelector: type: infra restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1003390000 seLinuxOptions: level: s0:c58,c42 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: - name: mongodb-data-volume persistentVolumeClaim: claimName: mongodb-claim-3 - name: default-token-jg9mx secret: defaultMode: 420 secretName: default-token-jg9mx status: conditions: - lastProbeTime: null lastTransitionTime: 2018-04-18T22:41:05Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2018-04-18T22:41:24Z status: "True" type: Ready - lastProbeTime: null lastTransitionTime: 2018-04-18T22:41:05Z status: "True" type: PodScheduled containerStatuses: - containerID: docker://06a4bd72fe8d2d7bc2116ee7e0e83a3b51a6b5f26d37bb0ae7ff7dd65e9deb34 image: registry.rhm-eng-a.openshift.com:443/rhmap46/mongodb:3.2-36 imageID: docker-pullable://registry.rhm-eng-a.openshift.com:443/rhmap46/mongodb@sha256:bcfd94b74bfb049fc6c5649216d703f15fe22c2caf30121ade844760fdefc601 lastState: {} name: mongodb ready: true restartCount: 0 state: running: startedAt: 2018-04-18T22:41:24Z hostIP: 172.31.28.26 phase: Running podIP: 10.1.6.203 qosClass: Burstable startTime: 2018-04-18T22:41:05Z This is a standard build of OSD, so 3 masters, 3 infra, 4 compute. Since we are providing a hosted service for RHMAP, this set of pods runs on the 3 infra nodes. Output of oc describe for all nodes: Name: ip-172-31-17-34.eu-west-1.compute.internal Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m4.xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=eu-west-1 failure-domain.beta.kubernetes.io/zone=eu-west-1a hostname=rhm-eng-a-node-compute-b1a7d kubernetes.io/hostname=ip-172-31-17-34.eu-west-1.compute.internal logging-infra-fluentd=true ops_node=new region=eu-west-1 type=compute Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Wed, 04 Apr 2018 03:52:09 -0400 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- DiskPressure False Mon, 23 Apr 2018 14:42:19 -0400 Wed, 04 Apr 2018 06:05:21 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure MemoryPressure False Mon, 23 Apr 2018 14:42:19 -0400 Wed, 04 Apr 2018 06:05:21 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available OutOfDisk False Mon, 23 Apr 2018 14:42:19 -0400 Wed, 04 Apr 2018 06:05:21 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available Ready True Mon, 23 Apr 2018 14:42:19 -0400 Wed, 04 Apr 2018 06:05:31 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.31.17.34 ExternalIP: 34.244.185.160 InternalDNS: ip-172-31-17-34.eu-west-1.compute.internal ExternalDNS: ec2-34-244-185-160.eu-west-1.compute.amazonaws.com Hostname: ip-172-31-17-34.eu-west-1.compute.internal Capacity: cpu: 4 memory: 16266532Ki pods: 40 Allocatable: cpu: 3 memory: 15115556Ki pods: 40 System Info: Machine ID: 0307f3889c4e4ab49e0d409c90f6062e System UUID: EC28668C-8A2E-FEFF-4DA5-D90B78190807 Boot ID: d0c375c4-ac0d-4b94-8212-44d4897ea2ad Kernel Version: 3.10.0-693.21.1.el7.x86_64 OS Image: Employee SKU Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.6 Kubelet Version: v1.7.6+a08f5eeb62 Kube-Proxy Version: v1.7.6+a08f5eeb62 ExternalID: i-07772bbe384b0920a Non-terminated Pods: (10 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- logging logging-fluentd-vsjsq 100m (3%) 0 (0%) 512Mi (3%) 512Mi (3%) rhmap-rhmap-ci-ocp4-e redis-1524226262828gwkn-1-j9mql 100m (3%) 500m (16%) 100Mi (0%) 500Mi (3%) rhmap-rhmap-dev nodejs-cloudappdevmp2s-1-fsfpc 100m (3%) 500m (16%) 90Mi (0%) 250Mi (1%) rhmap-rhmap-dev nodejs-cloudappdevpw6i-1-sz82q 100m (3%) 500m (16%) 90Mi (0%) 250Mi (1%) rhmap-rhmap-dev nodejs-testingcloudappdev3jyn-1-qsmk5 100m (3%) 500m (16%) 90Mi (0%) 250Mi (1%) rhmap-rhmap-dev redis-1524148095926pw6i-1-jp44m 100m (3%) 500m (16%) 100Mi (0%) 500Mi (3%) rhmap-rhmap-dev redis-1524226029129xhpb-1-w4j6c 100m (3%) 500m (16%) 100Mi (0%) 500Mi (3%) rhmap-rhmap-dev redis-15242266930537pvi-1-56mkw 100m (3%) 500m (16%) 100Mi (0%) 500Mi (3%) rhmap-rhmap-dev redis-1524226968390ihdm-1-db4j6 100m (3%) 500m (16%) 100Mi (0%) 500Mi (3%) rhmap-rhmap-dev redis-1524236984850rmof-1-spmxm 100m (3%) 500m (16%) 100Mi (0%) 500Mi (3%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 1 (33%) 4500m (150%) 1382Mi (9%) 4262Mi (28%) Events: <none> Name: ip-172-31-21-124.eu-west-1.compute.internal Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m4.2xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=eu-west-1 failure-domain.beta.kubernetes.io/zone=eu-west-1a hostname=rhm-eng-a-node-infra-7205d kubernetes.io/hostname=ip-172-31-21-124.eu-west-1.compute.internal logging-infra-fluentd=true ops_node=new region=eu-west-1 type=infra Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Tue, 13 Mar 2018 06:38:51 -0400 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- DiskPressure False Mon, 23 Apr 2018 14:42:24 -0400 Wed, 04 Apr 2018 06:08:52 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure MemoryPressure False Mon, 23 Apr 2018 14:42:24 -0400 Wed, 04 Apr 2018 06:08:52 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available OutOfDisk False Mon, 23 Apr 2018 14:42:24 -0400 Wed, 04 Apr 2018 06:08:52 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available Ready True Mon, 23 Apr 2018 14:42:24 -0400 Wed, 04 Apr 2018 06:09:02 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.31.21.124 ExternalIP: 34.245.191.60 InternalDNS: ip-172-31-21-124.eu-west-1.compute.internal ExternalDNS: ec2-34-245-191-60.eu-west-1.compute.amazonaws.com Hostname: ip-172-31-21-124.eu-west-1.compute.internal Capacity: cpu: 8 memory: 32780604Ki pods: 80 Allocatable: cpu: 7 memory: 31629628Ki pods: 80 System Info: Machine ID: d52c597d0f1a42aeb01b5a7d71e63f24 System UUID: EC22BD3C-F2F3-CB12-D6B0-022D9E23F985 Boot ID: fd7333d0-9ed2-43be-99c8-b10e900da4b9 Kernel Version: 3.10.0-693.11.6.el7.x86_64 OS Image: Red Hat Enterprise Linux Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.6 Kubelet Version: v1.7.6+a08f5eeb62 Kube-Proxy Version: v1.7.6+a08f5eeb62 ExternalID: i-008d6ef81fc43b760 Non-terminated Pods: (19 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- default docker-registry-26-87s9f 0 (0%) 0 (0%) 1G (3%) 2G (6%) default router-674-cvll2 100m (1%) 0 (0%) 256Mi (0%) 0 (0%) logging logging-es-data-master-x5uwa0st-2-qdj5r 475m (6%) 0 (0%) 12544Mi (40%) 12544Mi (40%) logging logging-fluentd-md7sk 100m (1%) 0 (0%) 512Mi (1%) 512Mi (1%) logging logging-kibana-7-rsjlj 50m (0%) 0 (0%) 1280Mi (4%) 1280Mi (4%) openshift-infra hawkular-cassandra-1-q2vhc 375m (5%) 0 (0%) 4Gi (13%) 4Gi (13%) openshift-infra hawkular-metrics-lkz4p 100m (1%) 0 (0%) 3Gi (9%) 3Gi (9%) openshift-infra heapster-9hxff 100m (1%) 0 (0%) 3840Mi (12%) 3840Mi (12%) rhmap-3-node-mbaas fh-mbaas-2-8q6zr 200m (2%) 800m (11%) 200Mi (0%) 800Mi (2%) rhmap-3-node-mbaas fh-messaging-1-wzdrk 200m (2%) 400m (5%) 200Mi (0%) 400Mi (1%) rhmap-3-node-mbaas fh-metrics-1-rmmjd 200m (2%) 400m (5%) 200Mi (0%) 400Mi (1%) rhmap-3-node-mbaas fh-statsd-1-2lwtn 200m (2%) 400m (5%) 200Mi (0%) 400Mi (1%) rhmap-3-node-mbaas mongodb-3-1-2zwtl 200m (2%) 1 (14%) 200Mi (0%) 1000Mi (3%) rhmap-core fh-aaa-1-957q2 20m (0%) 800m (11%) 100Mi (0%) 800Mi (2%) rhmap-core fh-appstore-1-fk6sj 1m (0%) 800m (11%) 50Mi (0%) 800Mi (2%) rhmap-core fh-messaging-1-m97ms 200m (2%) 400m (5%) 200Mi (0%) 400Mi (1%) rhmap-core fh-metrics-1-2rhdj 200m (2%) 400m (5%) 200Mi (0%) 400Mi (1%) rhmap-core fh-scm-1-n4kdp 12m (0%) 800m (11%) 70Mi (0%) 800Mi (2%) rhmap-core fh-supercore-1-m85rx 20m (0%) 800m (11%) 200Mi (0%) 800Mi (2%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 2753m (39%) 7 (100%) 29751953920 (91%) 35915142144 (110%) Events: <none> Name: ip-172-31-23-102.eu-west-1.compute.internal Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m4.2xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=eu-west-1 failure-domain.beta.kubernetes.io/zone=eu-west-1a hostname=rhm-eng-a-node-infra-4117d kubernetes.io/hostname=ip-172-31-23-102.eu-west-1.compute.internal logging-infra-fluentd=true ops_node=new region=eu-west-1 type=infra Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Tue, 13 Mar 2018 06:33:52 -0400 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- DiskPressure False Mon, 23 Apr 2018 14:42:19 -0400 Wed, 04 Apr 2018 05:36:22 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure MemoryPressure False Mon, 23 Apr 2018 14:42:19 -0400 Wed, 04 Apr 2018 05:36:22 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available OutOfDisk False Mon, 23 Apr 2018 14:42:19 -0400 Wed, 04 Apr 2018 05:36:22 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available Ready True Mon, 23 Apr 2018 14:42:19 -0400 Wed, 04 Apr 2018 05:36:32 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.31.23.102 ExternalIP: 34.243.193.29 InternalDNS: ip-172-31-23-102.eu-west-1.compute.internal ExternalDNS: ec2-34-243-193-29.eu-west-1.compute.amazonaws.com Hostname: ip-172-31-23-102.eu-west-1.compute.internal Capacity: cpu: 8 memory: 32780604Ki pods: 80 Allocatable: cpu: 7 memory: 31629628Ki pods: 80 System Info: Machine ID: d52c597d0f1a42aeb01b5a7d71e63f24 System UUID: EC2742D1-50F5-B783-1E57-B78585E70FD2 Boot ID: d9ba8d84-5648-4232-bde1-cb5014d5311b Kernel Version: 3.10.0-693.11.6.el7.x86_64 OS Image: Red Hat Enterprise Linux Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.6 Kubelet Version: v1.7.6+a08f5eeb62 Kube-Proxy Version: v1.7.6+a08f5eeb62 ExternalID: i-07feb12a94b083610 Non-terminated Pods: (17 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- default docker-registry-26-b4gx7 0 (0%) 0 (0%) 1G (3%) 2G (6%) default docker-registry-26-gn9vs 0 (0%) 0 (0%) 1G (3%) 2G (6%) default oso-rhel7-zagg-web-2-6cz9c 500m (7%) 1 (14%) 1536Mi (4%) 1536Mi (4%) default oso-rhel7-zagg-web-2-x8lbg 500m (7%) 1 (14%) 1536Mi (4%) 1536Mi (4%) default router-674-2682r 100m (1%) 0 (0%) 256Mi (0%) 0 (0%) logging logging-curator-6-rkwrn 25m (0%) 0 (0%) 512Mi (1%) 512Mi (1%) logging logging-es-lorgc43d-9-gdm48 475m (6%) 0 (0%) 12544Mi (40%) 12544Mi (40%) logging logging-fluentd-jcmwl 100m (1%) 0 (0%) 512Mi (1%) 512Mi (1%) openshift-infra hawkular-cassandra-2-pcfgg 375m (5%) 0 (0%) 4Gi (13%) 4Gi (13%) rhmap-3-node-mbaas fh-mbaas-2-fcp9c 200m (2%) 800m (11%) 200Mi (0%) 800Mi (2%) rhmap-3-node-mbaas fh-messaging-1-5nw95 200m (2%) 400m (5%) 200Mi (0%) 400Mi (1%) rhmap-3-node-mbaas fh-metrics-1-v6sgd 200m (2%) 400m (5%) 200Mi (0%) 400Mi (1%) rhmap-core fh-metrics-1-dhc6d 200m (2%) 400m (5%) 200Mi (0%) 400Mi (1%) rhmap-core fh-ngui-1-nlhzk 10m (0%) 800m (11%) 250Mi (0%) 800Mi (2%) rhmap-core gitlab-shell-1-2g28x 20m (0%) 1600m (22%) 100Mi (0%) 1600Mi (5%) rhmap-core millicore-1-6ppr2 1011m (14%) 3600m (51%) 1560Mi (5%) 5100Mi (16%) rhmap-core redis-1-qknsq 100m (1%) 500m (7%) 100Mi (0%) 500Mi (1%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 4016m (57%) 10500m (150%) 26958205952 (83%) 36229031936 (111%) Events: <none> Name: ip-172-31-23-59.eu-west-1.compute.internal Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m4.xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=eu-west-1 failure-domain.beta.kubernetes.io/zone=eu-west-1a hostname=rhm-eng-a-node-compute-4f00c kubernetes.io/hostname=ip-172-31-23-59.eu-west-1.compute.internal logging-infra-fluentd=true ops_node=new region=eu-west-1 type=compute Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Wed, 04 Apr 2018 03:52:08 -0400 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- DiskPressure False Mon, 23 Apr 2018 14:42:15 -0400 Wed, 04 Apr 2018 05:58:06 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure MemoryPressure False Mon, 23 Apr 2018 14:42:15 -0400 Wed, 04 Apr 2018 05:58:06 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available OutOfDisk False Mon, 23 Apr 2018 14:42:15 -0400 Wed, 04 Apr 2018 05:58:06 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available Ready True Mon, 23 Apr 2018 14:42:15 -0400 Wed, 04 Apr 2018 05:58:16 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.31.23.59 ExternalIP: 34.243.136.167 InternalDNS: ip-172-31-23-59.eu-west-1.compute.internal ExternalDNS: ec2-34-243-136-167.eu-west-1.compute.amazonaws.com Hostname: ip-172-31-23-59.eu-west-1.compute.internal Capacity: cpu: 4 memory: 16266532Ki pods: 40 Allocatable: cpu: 3 memory: 15115556Ki pods: 40 System Info: Machine ID: 0307f3889c4e4ab49e0d409c90f6062e System UUID: EC2E9508-C920-94C1-C118-EAE69C4E0835 Boot ID: f9c6af19-c987-43a6-9ff6-45db7b89997e Kernel Version: 3.10.0-693.21.1.el7.x86_64 OS Image: Employee SKU Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.6 Kubelet Version: v1.7.6+a08f5eeb62 Kube-Proxy Version: v1.7.6+a08f5eeb62 ExternalID: i-0d7ff2ea60614c42f Non-terminated Pods: (6 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- logging logging-fluentd-k62js 100m (3%) 0 (0%) 512Mi (3%) 512Mi (3%) ops-health-monitoring pull-04051430z-tv-1-rznk7 0 (0%) 0 (0%) 0 (0%) 0 (0%) rhmap-rhmap-dev nodejs-appart15242369785devrmof-3-cmjs5 100m (3%) 500m (16%) 90Mi (0%) 250Mi (1%) rhmap-rhmap-dev redis-15242271345913jyn-1-v5jgf 100m (3%) 500m (16%) 100Mi (0%) 500Mi (3%) rhmap-rhmap-dev redis-15242416697974hay-1-r9dbb 100m (3%) 500m (16%) 100Mi (0%) 500Mi (3%) rhmap-user-data mongodb-2-1-9pt57 200m (6%) 1 (33%) 200Mi (1%) 1000Mi (6%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 600m (20%) 2500m (83%) 1002Mi (6%) 2762Mi (18%) Events: <none> Name: ip-172-31-27-184.eu-west-1.compute.internal Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m4.xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=eu-west-1 failure-domain.beta.kubernetes.io/zone=eu-west-1a hostname=rhm-eng-a-node-compute-61e52 kubernetes.io/hostname=ip-172-31-27-184.eu-west-1.compute.internal logging-infra-fluentd=true ops_node=new region=eu-west-1 type=compute Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Wed, 04 Apr 2018 03:52:10 -0400 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- DiskPressure False Mon, 23 Apr 2018 14:42:20 -0400 Fri, 13 Apr 2018 04:49:01 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure MemoryPressure False Mon, 23 Apr 2018 14:42:20 -0400 Fri, 13 Apr 2018 04:49:01 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available OutOfDisk False Mon, 23 Apr 2018 14:42:20 -0400 Fri, 13 Apr 2018 04:49:01 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available Ready True Mon, 23 Apr 2018 14:42:20 -0400 Fri, 13 Apr 2018 04:49:11 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.31.27.184 ExternalIP: 52.48.69.46 InternalDNS: ip-172-31-27-184.eu-west-1.compute.internal ExternalDNS: ec2-52-48-69-46.eu-west-1.compute.amazonaws.com Hostname: ip-172-31-27-184.eu-west-1.compute.internal Capacity: cpu: 4 memory: 16266532Ki pods: 40 Allocatable: cpu: 3 memory: 15115556Ki pods: 40 System Info: Machine ID: 0307f3889c4e4ab49e0d409c90f6062e System UUID: EC2E55DB-A652-6E0E-0DB5-949F3EA610A8 Boot ID: 4636d560-1bbc-4f0b-8cd7-9b874eceffa3 Kernel Version: 3.10.0-693.21.1.el7.x86_64 OS Image: Employee SKU Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.6 Kubelet Version: v1.7.6+a08f5eeb62 Kube-Proxy Version: v1.7.6+a08f5eeb62 ExternalID: i-0d0d55ad3d3bb0e1d Non-terminated Pods: (9 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- logging logging-fluentd-gwrx2 100m (3%) 0 (0%) 512Mi (3%) 512Mi (3%) rhmap-rhmap-ci-ocp4-e nodejs-testingcloudappciocgwkn-1-bz76w 100m (3%) 500m (16%) 90Mi (0%) 250Mi (1%) rhmap-rhmap-dev nodejs-ciocp4serviceeditdevihdm-1-hmx2k 100m (3%) 500m (16%) 90Mi (0%) 250Mi (1%) rhmap-rhmap-dev nodejs-cloudappdev4hay-2-bx9hj 100m (3%) 500m (16%) 90Mi (0%) 250Mi (1%) rhmap-rhmap-dev nodejs-cloudappdevxhpb-1-4nn4q 100m (3%) 500m (16%) 90Mi (0%) 250Mi (1%) rhmap-rhmap-dev nodejs-testingcloudappdev7pvi-1-fwqt9 100m (3%) 500m (16%) 90Mi (0%) 250Mi (1%) rhmap-rhmap-dev redis-1524226482906mp2s-1-54z6r 100m (3%) 500m (16%) 100Mi (0%) 500Mi (3%) rhmap-rhmap-dev redis-1524476222268fsjm-1-r47rq 100m (3%) 500m (16%) 100Mi (0%) 500Mi (3%) rhmap-user-data mongodb-1-1-bqcvs 200m (6%) 1 (33%) 200Mi (1%) 1000Mi (6%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 1 (33%) 4500m (150%) 1362Mi (9%) 3762Mi (25%) Events: <none> Name: ip-172-31-28-26.eu-west-1.compute.internal Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m4.2xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=eu-west-1 failure-domain.beta.kubernetes.io/zone=eu-west-1a hostname=rhm-eng-a-node-infra-4c268 kubernetes.io/hostname=ip-172-31-28-26.eu-west-1.compute.internal logging-infra-fluentd=true ops_node=new region=eu-west-1 type=infra Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Tue, 13 Mar 2018 06:28:24 -0400 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- DiskPressure False Mon, 23 Apr 2018 14:42:20 -0400 Fri, 13 Apr 2018 04:50:17 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure MemoryPressure False Mon, 23 Apr 2018 14:42:20 -0400 Fri, 13 Apr 2018 04:50:17 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available OutOfDisk False Mon, 23 Apr 2018 14:42:20 -0400 Fri, 13 Apr 2018 04:50:17 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available Ready True Mon, 23 Apr 2018 14:42:20 -0400 Fri, 13 Apr 2018 04:50:17 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.31.28.26 ExternalIP: 34.245.57.92 InternalDNS: ip-172-31-28-26.eu-west-1.compute.internal ExternalDNS: ec2-34-245-57-92.eu-west-1.compute.amazonaws.com Hostname: ip-172-31-28-26.eu-west-1.compute.internal Capacity: cpu: 8 memory: 32780604Ki pods: 80 Allocatable: cpu: 7 memory: 31629628Ki pods: 80 System Info: Machine ID: d52c597d0f1a42aeb01b5a7d71e63f24 System UUID: EC204444-95C3-B208-11A0-224CD7735A9B Boot ID: 393988dc-93af-4cb1-8da6-31e8140d8695 Kernel Version: 3.10.0-693.11.6.el7.x86_64 OS Image: Red Hat Enterprise Linux Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.6 Kubelet Version: v1.7.6+a08f5eeb62 Kube-Proxy Version: v1.7.6+a08f5eeb62 ExternalID: i-021d17d2f4724546b Non-terminated Pods: (23 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- logging logging-es-j7sg158e-9-qd6d6 475m (6%) 0 (0%) 12544Mi (40%) 12544Mi (40%) logging logging-fluentd-hnmwl 100m (1%) 0 (0%) 512Mi (1%) 512Mi (1%) openshift-infra hawkular-cassandra-3-7tb5b 375m (5%) 0 (0%) 4Gi (13%) 4Gi (13%) rhmap-3-node-mbaas fh-mbaas-2-cwphc 200m (2%) 800m (11%) 200Mi (0%) 800Mi (2%) rhmap-3-node-mbaas fh-messaging-1-8jt4t 200m (2%) 400m (5%) 200Mi (0%) 400Mi (1%) rhmap-3-node-mbaas fh-metrics-1-4qzfb 200m (2%) 400m (5%) 200Mi (0%) 400Mi (1%) rhmap-3-node-mbaas mongodb-1-1-dghrn 200m (2%) 1 (14%) 200Mi (0%) 1000Mi (3%) rhmap-3-node-mbaas mongodb-2-1-fz8r7 200m (2%) 1 (14%) 200Mi (0%) 1000Mi (3%) rhmap-3-node-mbaas nagios-1-xsj9d 200m (2%) 800m (11%) 200Mi (0%) 800Mi (2%) rhmap-core fh-aaa-1-8vnkl 20m (0%) 800m (11%) 100Mi (0%) 800Mi (2%) rhmap-core fh-messaging-1-6wr4j 200m (2%) 400m (5%) 200Mi (0%) 400Mi (1%) rhmap-core fh-ngui-1-t5hb2 10m (0%) 800m (11%) 250Mi (0%) 800Mi (2%) rhmap-core fh-supercore-1-99wzm 20m (0%) 800m (11%) 200Mi (0%) 800Mi (2%) rhmap-core memcached-1-7shkf 10m (0%) 800m (11%) 30Mi (0%) 500M (1%) rhmap-core millicore-1-xc6nx 1011m (14%) 3600m (51%) 1560Mi (5%) 5100Mi (16%) rhmap-core mongodb-1-1-2bjpb 200m (2%) 1 (14%) 200Mi (0%) 1000Mi (3%) rhmap-core mongodb-2-1-hnvnb 200m (2%) 1 (14%) 200Mi (0%) 1000Mi (3%) rhmap-core mongodb-3-1-6l9rn 200m (2%) 1 (14%) 200Mi (0%) 1000Mi (3%) rhmap-core mysql-1-pqj7h 100m (1%) 3200m (45%) 700Mi (2%) 1Gi (3%) rhmap-core mysql-2-1-mnftk 100m (1%) 3200m (45%) 700Mi (2%) 1Gi (3%) rhmap-core mysql-3-1-7hj8l 100m (1%) 3200m (45%) 700Mi (2%) 1Gi (3%) rhmap-core nagios-1-x5dg4 200m (2%) 800m (11%) 200Mi (0%) 800Mi (2%) rhmap-core ups-1-nbwmq 400m (5%) 2 (28%) 900Mi (2%) 5000Mi (16%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 4921m (70%) 27 (385%) 24492Mi (79%) 43831354624 (135%) Events: <none> Name: ip-172-31-29-232.eu-west-1.compute.internal Role: Labels: hostname=rhm-eng-a-master-03f3e kubernetes.io/hostname=ip-172-31-29-232.eu-west-1.compute.internal region=eu-west-1 type=master Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Wed, 02 Mar 2016 15:56:36 -0500 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- Ready True Mon, 23 Apr 2018 14:42:24 -0400 Thu, 19 Apr 2018 21:11:36 -0400 KubeletReady kubelet is posting ready status OutOfDisk False Mon, 23 Apr 2018 14:42:24 -0400 Wed, 04 Apr 2018 05:08:37 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Mon, 23 Apr 2018 14:42:24 -0400 Wed, 04 Apr 2018 05:08:37 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Mon, 23 Apr 2018 14:42:24 -0400 Mon, 23 Apr 2018 03:40:05 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure Addresses: InternalIP: 172.31.29.232 ExternalIP: 52.48.129.40 InternalDNS: ip-172-31-29-232.eu-west-1.compute.internal ExternalDNS: ec2-52-48-129-40.eu-west-1.compute.amazonaws.com Hostname: ip-172-31-29-232.eu-west-1.compute.internal Capacity: alpha.kubernetes.io/nvidia-gpu: 0 cpu: 4 memory: 16266564Ki pods: 40 Allocatable: alpha.kubernetes.io/nvidia-gpu: 0 cpu: 3 memory: 15115588Ki pods: 40 System Info: Machine ID: f9370ed252a14f73b014c1301a9b6d1b System UUID: EC2F7CF1-76C0-6A6C-EE62-0885492B3414 Boot ID: 392a5b53-3894-4d50-b566-de75d5409be5 Kernel Version: 3.10.0-693.11.6.el7.x86_64 OS Image: Unknown Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.6 Kubelet Version: v1.7.6+a08f5eeb62 Kube-Proxy Version: v1.7.6+a08f5eeb62 ExternalID: i-c5816049 Non-terminated Pods: (0 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 0 (0%) 0 (0%) 0 (0%) 0 (0%) Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 41d 28d 2 kubelet, ip-172-31-29-232.eu-west-1.compute.internal Normal NodeHasSufficientDisk Node ip-172-31-29-232.eu-west-1.compute.internal status is now: NodeHasSufficientDisk 41d 28d 2 kubelet, ip-172-31-29-232.eu-west-1.compute.internal Normal NodeHasSufficientMemory Node ip-172-31-29-232.eu-west-1.compute.internal status is now: NodeHasSufficientMemory 41d 28d 2 kubelet, ip-172-31-29-232.eu-west-1.compute.internal Normal NodeHasNoDiskPressure Node ip-172-31-29-232.eu-west-1.compute.internal status is now: NodeHasNoDiskPressure 41d 28d 2 kubelet, ip-172-31-29-232.eu-west-1.compute.internal Normal NodeReady Node ip-172-31-29-232.eu-west-1.compute.internal status is now: NodeReady Name: ip-172-31-29-233.eu-west-1.compute.internal Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m4.xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=eu-west-1 failure-domain.beta.kubernetes.io/zone=eu-west-1a hostname=rhm-eng-a-master-5c646 kubernetes.io/hostname=ip-172-31-29-233.eu-west-1.compute.internal region=eu-west-1 type=master Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Fri, 09 Feb 2018 13:09:41 -0500 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- OutOfDisk False Mon, 23 Apr 2018 14:42:26 -0400 Fri, 13 Apr 2018 04:50:43 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Mon, 23 Apr 2018 14:42:26 -0400 Fri, 13 Apr 2018 04:50:43 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Mon, 23 Apr 2018 14:42:26 -0400 Fri, 13 Apr 2018 04:50:43 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure Ready True Mon, 23 Apr 2018 14:42:26 -0400 Fri, 13 Apr 2018 04:50:43 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.31.29.233 ExternalIP: 34.244.70.245 InternalDNS: ip-172-31-29-233.eu-west-1.compute.internal ExternalDNS: ec2-34-244-70-245.eu-west-1.compute.amazonaws.com Hostname: ip-172-31-29-233.eu-west-1.compute.internal Capacity: cpu: 4 memory: 16266564Ki pods: 40 Allocatable: cpu: 3 memory: 15115588Ki pods: 40 System Info: Machine ID: f9370ed252a14f73b014c1301a9b6d1b System UUID: EC2E0928-DEBC-7B7E-77E2-A5E6B27C36AD Boot ID: 67322ed3-98df-4767-bde0-d0065646a09e Kernel Version: 3.10.0-693.11.6.el7.x86_64 OS Image: Red Hat Enterprise Linux Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.6 Kubelet Version: v1.7.6+a08f5eeb62 Kube-Proxy Version: v1.7.6+a08f5eeb62 ExternalID: i-c681604a Non-terminated Pods: (0 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 0 (0%) 0 (0%) 0 (0%) 0 (0%) Events: <none> Name: ip-172-31-29-234.eu-west-1.compute.internal Role: Labels: hostname=rhm-eng-a-master-71c94 kubernetes.io/hostname=ip-172-31-29-234.eu-west-1.compute.internal region=eu-west-1 type=master Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Wed, 17 May 2017 01:10:59 -0400 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- OutOfDisk False Mon, 23 Apr 2018 14:42:25 -0400 Thu, 05 Apr 2018 11:40:44 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Mon, 23 Apr 2018 14:42:25 -0400 Thu, 05 Apr 2018 11:40:44 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Mon, 23 Apr 2018 14:42:25 -0400 Thu, 05 Apr 2018 11:40:44 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure Ready True Mon, 23 Apr 2018 14:42:25 -0400 Fri, 06 Apr 2018 09:47:45 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.31.29.234 ExternalIP: 34.250.56.107 InternalDNS: ip-172-31-29-234.eu-west-1.compute.internal ExternalDNS: ec2-34-250-56-107.eu-west-1.compute.amazonaws.com Hostname: ip-172-31-29-234.eu-west-1.compute.internal Capacity: alpha.kubernetes.io/nvidia-gpu: 0 cpu: 4 memory: 16266564Ki pods: 40 Allocatable: alpha.kubernetes.io/nvidia-gpu: 0 cpu: 3 memory: 15115588Ki pods: 40 System Info: Machine ID: f9370ed252a14f73b014c1301a9b6d1b System UUID: EC205367-9042-3F7C-2DED-58D9E521FCF9 Boot ID: 52380979-fc28-47a8-beda-852b2671c567 Kernel Version: 3.10.0-693.11.6.el7.x86_64 OS Image: Red Hat Enterprise Linux Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.6 Kubelet Version: v1.7.6+a08f5eeb62 Kube-Proxy Version: v1.7.6+a08f5eeb62 ExternalID: i-c781604b Non-terminated Pods: (0 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 0 (0%) 0 (0%) 0 (0%) 0 (0%) Events: <none> Name: ip-172-31-31-152.eu-west-1.compute.internal Role: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/instance-type=m4.xlarge beta.kubernetes.io/os=linux failure-domain.beta.kubernetes.io/region=eu-west-1 failure-domain.beta.kubernetes.io/zone=eu-west-1a hostname=rhm-eng-a-node-compute-b3bb0 kubernetes.io/hostname=ip-172-31-31-152.eu-west-1.compute.internal logging-infra-fluentd=true ops_node=new region=eu-west-1 type=compute Annotations: volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Wed, 04 Apr 2018 03:52:08 -0400 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- DiskPressure False Mon, 23 Apr 2018 14:42:25 -0400 Wed, 04 Apr 2018 05:26:07 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure MemoryPressure False Mon, 23 Apr 2018 14:42:25 -0400 Wed, 04 Apr 2018 05:26:07 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available OutOfDisk False Mon, 23 Apr 2018 14:42:25 -0400 Wed, 04 Apr 2018 05:26:07 -0400 KubeletHasSufficientDisk kubelet has sufficient disk space available Ready True Mon, 23 Apr 2018 14:42:25 -0400 Wed, 04 Apr 2018 05:26:17 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 172.31.31.152 ExternalIP: 34.243.189.38 InternalDNS: ip-172-31-31-152.eu-west-1.compute.internal ExternalDNS: ec2-34-243-189-38.eu-west-1.compute.amazonaws.com Hostname: ip-172-31-31-152.eu-west-1.compute.internal Capacity: cpu: 4 memory: 16266532Ki pods: 40 Allocatable: cpu: 3 memory: 15115556Ki pods: 40 System Info: Machine ID: 0307f3889c4e4ab49e0d409c90f6062e System UUID: EC25E5E3-C131-91DB-FE85-E7B2F74476F7 Boot ID: 6d911651-1316-4b4d-8f09-a870aad79e66 Kernel Version: 3.10.0-693.21.1.el7.x86_64 OS Image: Employee SKU Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.6 Kubelet Version: v1.7.6+a08f5eeb62 Kube-Proxy Version: v1.7.6+a08f5eeb62 ExternalID: i-04322315f55b94b7d Non-terminated Pods: (6 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- logging logging-fluentd-lqb9n 100m (3%) 0 (0%) 512Mi (3%) 512Mi (3%) nodejs-examples node10-3-ljwn4 0 (0%) 0 (0%) 0 (0%) 0 (0%) rhmap-rhmap-dev nodejs-cloudappdevfsjm-1-xxwnf 100m (3%) 500m (16%) 90Mi (0%) 250Mi (1%) rhmap-rhmap-dev nodejs-samlclouddev3ctp-2-cjz5g 100m (3%) 500m (16%) 90Mi (0%) 250Mi (1%) rhmap-rhmap-dev redis-15242368745743ctp-1-xmsr9 100m (3%) 500m (16%) 100Mi (0%) 500Mi (3%) rhmap-user-data mongodb-3-1-wfdtg 200m (6%) 1 (33%) 200Mi (1%) 1000Mi (6%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 600m (20%) 2500m (83%) 992Mi (6%) 2512Mi (17%) Events: <none> Let me know if there is anything else I can provide.
what is the name of this dedicated cluster? As I have tiered access to starter clusters, I am going to try same access to this cluster too. Some other questions based on following: "Additional info: The same configuration worked in all 4 sets prior to OSD being upgraded from 3.6 to 3.7. The same configuration also works on multiple standard OpenShift clusters running v3.7.23." 1. So as you said that it was working in 3.6, were the pod spec same? i.e. was anit-affinity configuration part of pod spec and not as annotations in 3.6? want to double check, as this is the biggest suspect I have so far. I am wondering if something related to upgrade that remained even after upgrade. 2. So this issue is only happening in this dedicated cluster, not in any other clusters, right? Meanwhile I will see if i can get some more logs myself from this cluster.
This is the rhm-eng-a OSD cluster. Answers to your questions: "1. So as you said that it was working in 3.6, were the pod spec same? i.e. was anit-affinity configuration part of pod spec and not as annotations in 3.6? want to double check, as this is the biggest suspect I have so far. I am wondering if something related to upgrade that remained even after upgrade." The pod specs were been identical. In fact, we installed our components while it was running OpenShift 3.6 and left them running so we could test availability of our components while the OpenShift SRE team was upgrading OSD to 3.7. Prior to the upgrade, the 3 pods were on separate nodes. From what I understand, when the first infra node was drained to be upgraded, the pod running on it was doubled up onto one of the other two infra nodes (understandable). When 2nd node was drained to upgrade, pods moved to the 3rd infra node still running 3.6. When the final node was drained to be upgraded, all 3 pods moved to just one of the nodes running 3.7. We didn't know if it was something specific to the upgrade of OpenShift itself, so we cleaned off all of our stuff and reinstalled again using same templates we had used prior to the upgrade. Again, all 3 pods ended up running on just one of the infra nodes. We've never used annotations for anti-affinity, its always been through the pod spec podAntiAffinity. "2. So this issue is only happening in this dedicated cluster, not in any other clusters, right?" We only have one dedicated cluster, so we haven't been able to test with another OSD. The other clusters we have to test with have OpenShift 3.7 installed by us so they aren't the same as OSD. We have yet to encounter the same problem on any of the OpenShift clusters we created/manage ourselves.
As 3 masters are being run, I am looking if there is an issue with leaderelection or not.
Hi Jesse, Could you provide controller logs from all 3 masters when the issue happens? Or as per the previous info, mongo pods were created on Apr 18, could you provide controller logs from all 3 masters from Apr 18 (or around 17,18,19).
Meanwhile I am checking if https://github.com/kubernetes/kubernetes/pull/60526 should be picked in 3.7.
(In reply to Avesh Agarwal from comment #7) > Meanwhile I am checking if > https://github.com/kubernetes/kubernetes/pull/60526 should be picked in 3.7. May be not, as this issue seemed only 1.9.
Hi Jesse, Also if you could see if only for scheduler, you could increase logging to some higher level like 10 for short time to really see what was going on with the scheduler?
This cluster has been heavily used for final testing before today's GA of RHMAP 4.x Hosted. I'll try to borrow the cluster tomorrow to try the recommended actions. We should have a second dedicated cluster coming soon which would help us confirm if its somehow an issue specific to this cluster or if its something more general to all Dedicated clusters.
We have finally gotten our second OSD cluster and the same problem has occurred on it as well. In fact, it appears that there may be an issue with the scheduler in general as we had 23 of 25 pods within one project ending up on the same node. I'm turning up the logging and later today RHMAP will be wiped and re-deployed from scratch which should gather the detailed logs we are looking for.
(In reply to Jesse Sarnovsky from comment #11) > We have finally gotten our second OSD cluster and the same problem has > occurred on it as well. In fact, it appears that there may be an issue with > the scheduler in general as we had 23 of 25 pods within one project ending > up on the same node. > > I'm turning up the logging and later today RHMAP will be wiped and > re-deployed from scratch which should gather the detailed logs we are > looking for. Hi Jesse, Yes I would really like to see what is going on, so providing detailed logs (and possibly for scheduler with log level 10) would be helpful. Also I am wondering if I can get access to this 2nd OSD cluster, that would be very helpful to see directly what is going on. Infact I can work with you if possible.
It took me a while to find out how to modify logging level for just the scheduler; however, during that time, I ended up finding the root cause of the issue. The default scheduler configuration on the two clusters are very different and explain the problems each has. On the new cluster (named 'mobile'), Anti-Affinity DOES work properly distributing the mongos across the infra nodes for all 3 projects. This is because there are 3 separate services, one for each pod. All of the other multi-instance pods have a single service for the group and they are ending up mostly scheduled to the same node. Both of these behaviors are due to the default scheduler config which: - DOES NOT have ServiceSpreadingPriority set (which would have spread the pods within each service to multiple nodes) - DOES have InterPodAffinityPriority set (which is why the Pod Anti-Affinity is working) In contrast, the scheduler config for the previous rhm-eng-a cluster: - DOES NOT have InterPodAffinityPriority set (makes it ignore the Pod Anti-Affinity) - DOES have ServiceSpreadingPriority set (which is spreading the pods within each service to multiple nodes) Here are the full default scheduler configurations for reference purposes. It is easy to see that the existence of the ServiceSpreadingPriority and InterPodAffinityPriority options are causing the behaviors we are experiencing. Scheduler config for mobile cluster: { "apiVersion": "v1", "kind": "Policy", "predicates": [ { "name": "NoVolumeZoneConflict" }, { "name": "MaxEBSVolumeCount" }, { "name": "MaxGCEPDVolumeCount" }, { "name": "MaxAzureDiskVolumeCount" }, { "name": "MatchInterPodAffinity" }, { "name": "NoDiskConflict" }, { "name": "GeneralPredicates" }, { "name": "PodToleratesNodeTaints" }, { "name": "CheckNodeMemoryPressure" }, { "name": "CheckNodeDiskPressure" }, { "name": "NoVolumeNodeConflict" }, { "argument": { "serviceAffinity": { "labels": [ "region" ] } }, "name": "Region" } ], "priorities": [ { "name": "SelectorSpreadPriority", "weight": 1 }, { "name": "InterPodAffinityPriority", "weight": 1 }, { "name": "LeastRequestedPriority", "weight": 1 }, { "name": "BalancedResourceAllocation", "weight": 1 }, { "name": "NodePreferAvoidPodsPriority", "weight": 10000 }, { "name": "NodeAffinityPriority", "weight": 1 }, { "name": "TaintTolerationPriority", "weight": 1 }, { "argument": { "serviceAntiAffinity": { "label": "zone" } }, "name": "Zone", "weight": 2 } ] } Scheduler config for rhm-eng-a cluster: { "apiVersion": "v1", "kind": "Policy", "predicates": [ { "name": "MatchNodeSelector" }, { "name": "PodFitsResources" }, { "name": "PodFitsPorts" }, { "name": "NoDiskConflict" }, { "name": "MaxEBSVolumeCount" }, { "name": "NoVolumeZoneConflict" } ], "priorities": [ { "name": "LeastRequestedPriority", "weight": 1 }, { "name": "ServiceSpreadingPriority", "weight": 1 } ] }
Hi Jesse, Thanks for providing this information. And I agree your investigation is correct. And that's why I was very surprised why Pod Anti Affinity did not work as I have not been able to reproduce it on my 3.7 cluster. Since the original issue was due to scheduler mis-configuration (InterPodAffinityPriority not enabled) So now my question is that is it ok to close this bz or are you still looking for some other help?
I do still think its a bug with OSD provisioning and/or the config loop since it fails to manage the default scheduler configuration consistently. Also, I don't think either of the two configs is the right setup which should be used by default. I've reached out to the OSD guys who asked me to open this Bugzilla to see how they want to capture the remaining work needed.
(In reply to Jesse Sarnovsky from comment #15) > I do still think its a bug with OSD provisioning and/or the config loop > since it fails to manage the default scheduler configuration consistently. That sounds good. Though I'd suggest that if its about OSD provisioning and/or the config loop, assigining it to right people would be better to get their attention. > Also, I don't think either of the two configs is the right setup which > should be used by default. I can surely help with it. And here is my suggestion: 1. Just use default scheduler configuration (or IOWs the default scheduler config file) 2. Don't configure serviceAntiAffinity priority function: { "argument": { "serviceAntiAffinity": { "label": "zone" } }, "name": "Zone", "weight": 2 } SelectSpreadingPriority function is enabled by default and already works on labels failure-domain.beta.kubernetes.io/region failure-domain.beta.kubernetes.io/zone And in case, if the above labels are not being configured on nodes in OSD clusters, I would like to know why. > > I've reached out to the OSD guys who asked me to open this Bugzilla to see > how they want to capture the remaining work needed.
Thomas will be creating a card to track resolving the issue on their side. Thanks for your help, Avesh.