Description of problem: If a cluster admin has set the default node selector in the inventory CNS deployment fails. Version-Release number of the following components: openshift-ansible-3.6.173.0.5-3.git.0.522a92a.el7.noarch ansible-2.3.1.0-3.el7.noarch How reproducible: Every time when osm_default_node_selector is set. Commenting it out makes the deployment succeed. Steps to Reproduce: 1. Set osm_default_node_selector='region=compute' or some similar 2. Configure for CNS 3. Actual results: TASK [openshift_storage_glusterfs : Wait for GlusterFS pods] ************************************************************************************************* Tuesday 22 August 2017 11:26:51 +0200 (0:00:02.222) 0:22:39.540 ******** FAILED - RETRYING: Wait for GlusterFS pods (30 retries left). FAILED - RETRYING: Wait for GlusterFS pods (29 retries left). FAILED - RETRYING: Wait for GlusterFS pods (28 retries left). FAILED - RETRYING: Wait for GlusterFS pods (27 retries left). FAILED - RETRYING: Wait for GlusterFS pods (26 retries left). FAILED - RETRYING: Wait for GlusterFS pods (25 retries left). FAILED - RETRYING: Wait for GlusterFS pods (24 retries left). FAILED - RETRYING: Wait for GlusterFS pods (23 retries left). FAILED - RETRYING: Wait for GlusterFS pods (22 retries left). FAILED - RETRYING: Wait for GlusterFS pods (21 retries left). FAILED - RETRYING: Wait for GlusterFS pods (20 retries left). FAILED - RETRYING: Wait for GlusterFS pods (19 retries left). FAILED - RETRYING: Wait for GlusterFS pods (18 retries left). FAILED - RETRYING: Wait for GlusterFS pods (17 retries left). FAILED - RETRYING: Wait for GlusterFS pods (16 retries left). FAILED - RETRYING: Wait for GlusterFS pods (15 retries left). FAILED - RETRYING: Wait for GlusterFS pods (14 retries left). FAILED - RETRYING: Wait for GlusterFS pods (13 retries left). FAILED - RETRYING: Wait for GlusterFS pods (12 retries left). FAILED - RETRYING: Wait for GlusterFS pods (11 retries left). FAILED - RETRYING: Wait for GlusterFS pods (10 retries left). FAILED - RETRYING: Wait for GlusterFS pods (9 retries left). FAILED - RETRYING: Wait for GlusterFS pods (8 retries left). FAILED - RETRYING: Wait for GlusterFS pods (7 retries left). FAILED - RETRYING: Wait for GlusterFS pods (6 retries left). FAILED - RETRYING: Wait for GlusterFS pods (5 retries left). FAILED - RETRYING: Wait for GlusterFS pods (4 retries left). FAILED - RETRYING: Wait for GlusterFS pods (3 retries left). FAILED - RETRYING: Wait for GlusterFS pods (2 retries left). FAILED - RETRYING: Wait for GlusterFS pods (1 retries left). fatal: [master1.example.com]: FAILED! => { "attempts": 30, "changed": false, "failed": true, "results": { "cmd": "/usr/local/bin/oc get pod --selector=glusterfs=storage-pod -o json -n glusterfs", "results": [ { "apiVersion": "v1", "items": [ { "apiVersion": "v1", "kind": "Pod", "metadata": { "annotations": { "kubernetes.io/created-by": "{\"kind\":\"SerializedReference\",\"apiVersion\":\"v1\",\"reference\":{\"kind\":\"DaemonSet\",\"namespace\":\"glusterfs\",\"name\":\"glusterfs-storage\",\"uid\":\"07266f0c-871c-11e7-9415-525400448a7a\",\"apiVersion\":\"extensions\",\"resourceVersion\":\"18143\"}}\n", "openshift.io/scc": "privileged" }, "creationTimestamp": "2017-08-22T09:32:09Z", "generateName": "glusterfs-storage-", "labels": { "glusterfs": "storage-pod", "glusterfs-node": "pod", "pod-template-generation": "1" }, "name": "glusterfs-storage-50r9z", "namespace": "glusterfs", "ownerReferences": [ { "apiVersion": "extensions/v1beta1", "blockOwnerDeletion": true, "controller": true, "kind": "DaemonSet", "name": "glusterfs-storage", "uid": "07266f0c-871c-11e7-9415-525400448a7a" } ], "resourceVersion": "18168", "selfLink": "/api/v1/namespaces/glusterfs/pods/glusterfs-storage-50r9z", "uid": "c52e0c6e-871c-11e7-9415-525400448a7a" }, "spec": { "containers": [ { "image": "rhgs3/rhgs-server-rhel7:latest", "imagePullPolicy": "IfNotPresent", "livenessProbe": { "exec": { "command": [ "/bin/bash", "-c", "systemctl status glusterd.service" ] }, "failureThreshold": 15, "initialDelaySeconds": 40, "periodSeconds": 25, "successThreshold": 1, "timeoutSeconds": 3 }, "name": "glusterfs", "readinessProbe": { "exec": { "command": [ "/bin/bash", "-c", "systemctl status glusterd.service" ] }, "failureThreshold": 15, "initialDelaySeconds": 40, "periodSeconds": 25, "successThreshold": 1, "timeoutSeconds": 3 }, "resources": {}, "securityContext": { "privileged": true }, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [ { "mountPath": "/var/lib/heketi", "name": "glusterfs-heketi" }, { "mountPath": "/run", "name": "glusterfs-run" }, { "mountPath": "/run/lvm", "name": "glusterfs-lvm" }, { "mountPath": "/etc/glusterfs", "name": "glusterfs-etc" }, { "mountPath": "/var/log/glusterfs", "name": "glusterfs-logs" }, { "mountPath": "/var/lib/glusterd", "name": "glusterfs-config" }, { "mountPath": "/dev", "name": "glusterfs-dev" }, { "mountPath": "/var/lib/misc/glusterfsd", "name": "glusterfs-misc" }, { "mountPath": "/sys/fs/cgroup", "name": "glusterfs-cgroup", "readOnly": true }, { "mountPath": "/etc/ssl", "name": "glusterfs-ssl", "readOnly": true }, { "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", "name": "default-token-kbm4z", "readOnly": true } ] } ], "dnsPolicy": "ClusterFirst", "hostNetwork": true, "imagePullSecrets": [ { "name": "default-dockercfg-bl4qd" } ], "nodeName": "infra1.example.com", "nodeSelector": { "glusterfs": "storage-host", "region": "compute" }, "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {}, "serviceAccount": "default", "serviceAccountName": "default", "terminationGracePeriodSeconds": 30, "tolerations": [ { "effect": "NoExecute", "key": "node.alpha.kubernetes.io/notReady", "operator": "Exists" }, { "effect": "NoExecute", "key": "node.alpha.kubernetes.io/unreachable", "operator": "Exists" } ], "volumes": [ { "hostPath": { "path": "/var/lib/heketi" }, "name": "glusterfs-heketi" }, { "emptyDir": {}, "name": "glusterfs-run" }, { "hostPath": { "path": "/run/lvm" }, "name": "glusterfs-lvm" }, { "hostPath": { "path": "/etc/glusterfs" }, "name": "glusterfs-etc" }, { "hostPath": { "path": "/var/log/glusterfs" }, "name": "glusterfs-logs" }, { "hostPath": { "path": "/var/lib/glusterd" }, "name": "glusterfs-config" }, { "hostPath": { "path": "/dev" }, "name": "glusterfs-dev" }, { "hostPath": { "path": "/var/lib/misc/glusterfsd" }, "name": "glusterfs-misc" }, { "hostPath": { "path": "/sys/fs/cgroup" }, "name": "glusterfs-cgroup" }, { "hostPath": { "path": "/etc/ssl" }, "name": "glusterfs-ssl" }, { "name": "default-token-kbm4z", "secret": { "defaultMode": 420, "secretName": "default-token-kbm4z" } } ] }, "status": { "message": "Pod Predicate MatchNodeSelector failed", "phase": "Failed", "reason": "MatchNodeSelector", "startTime": "2017-08-22T09:32:09Z" } }, { "apiVersion": "v1", "kind": "Pod", "metadata": { "annotations": { "kubernetes.io/created-by": "{\"kind\":\"SerializedReference\",\"apiVersion\":\"v1\",\"reference\":{\"kind\":\"DaemonSet\",\"namespace\":\"glusterfs\",\"name\":\"glusterfs-storage\",\"uid\":\"07266f0c-871c-11e7-9415-525400448a7a\",\"apiVersion\":\"extensions\",\"resourceVersion\":\"18143\"}}\n", "openshift.io/scc": "privileged" }, "creationTimestamp": "2017-08-22T09:32:09Z", "generateName": "glusterfs-storage-", "labels": { "glusterfs": "storage-pod", "glusterfs-node": "pod", "pod-template-generation": "1" }, "name": "glusterfs-storage-btxsk", "namespace": "glusterfs", "ownerReferences": [ { "apiVersion": "extensions/v1beta1", "blockOwnerDeletion": true, "controller": true, "kind": "DaemonSet", "name": "glusterfs-storage", "uid": "07266f0c-871c-11e7-9415-525400448a7a" } ], "resourceVersion": "18178", "selfLink": "/api/v1/namespaces/glusterfs/pods/glusterfs-storage-btxsk", "uid": "c549bea4-871c-11e7-9415-525400448a7a" }, "spec": { "containers": [ { "image": "rhgs3/rhgs-server-rhel7:latest", "imagePullPolicy": "IfNotPresent", "livenessProbe": { "exec": { "command": [ "/bin/bash", "-c", "systemctl status glusterd.service" ] }, "failureThreshold": 15, "initialDelaySeconds": 40, "periodSeconds": 25, "successThreshold": 1, "timeoutSeconds": 3 }, "name": "glusterfs", "readinessProbe": { "exec": { "command": [ "/bin/bash", "-c", "systemctl status glusterd.service" ] }, "failureThreshold": 15, "initialDelaySeconds": 40, "periodSeconds": 25, "successThreshold": 1, "timeoutSeconds": 3 }, "resources": {}, "securityContext": { "privileged": true }, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [ { "mountPath": "/var/lib/heketi", "name": "glusterfs-heketi" }, { "mountPath": "/run", "name": "glusterfs-run" }, { "mountPath": "/run/lvm", "name": "glusterfs-lvm" }, { "mountPath": "/etc/glusterfs", "name": "glusterfs-etc" }, { "mountPath": "/var/log/glusterfs", "name": "glusterfs-logs" }, { "mountPath": "/var/lib/glusterd", "name": "glusterfs-config" }, { "mountPath": "/dev", "name": "glusterfs-dev" }, { "mountPath": "/var/lib/misc/glusterfsd", "name": "glusterfs-misc" }, { "mountPath": "/sys/fs/cgroup", "name": "glusterfs-cgroup", "readOnly": true }, { "mountPath": "/etc/ssl", "name": "glusterfs-ssl", "readOnly": true }, { "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", "name": "default-token-kbm4z", "readOnly": true } ] } ], "dnsPolicy": "ClusterFirst", "hostNetwork": true, "imagePullSecrets": [ { "name": "default-dockercfg-bl4qd" } ], "nodeName": "infra3.example.com", "nodeSelector": { "glusterfs": "storage-host", "region": "compute" }, "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {}, "serviceAccount": "default", "serviceAccountName": "default", "terminationGracePeriodSeconds": 30, "tolerations": [ { "effect": "NoExecute", "key": "node.alpha.kubernetes.io/notReady", "operator": "Exists" }, { "effect": "NoExecute", "key": "node.alpha.kubernetes.io/unreachable", "operator": "Exists" } ], "volumes": [ { "hostPath": { "path": "/var/lib/heketi" }, "name": "glusterfs-heketi" }, { "emptyDir": {}, "name": "glusterfs-run" }, { "hostPath": { "path": "/run/lvm" }, "name": "glusterfs-lvm" }, { "hostPath": { "path": "/etc/glusterfs" }, "name": "glusterfs-etc" }, { "hostPath": { "path": "/var/log/glusterfs" }, "name": "glusterfs-logs" }, { "hostPath": { "path": "/var/lib/glusterd" }, "name": "glusterfs-config" }, { "hostPath": { "path": "/dev" }, "name": "glusterfs-dev" }, { "hostPath": { "path": "/var/lib/misc/glusterfsd" }, "name": "glusterfs-misc" }, { "hostPath": { "path": "/sys/fs/cgroup" }, "name": "glusterfs-cgroup" }, { "hostPath": { "path": "/etc/ssl" }, "name": "glusterfs-ssl" }, { "name": "default-token-kbm4z", "secret": { "defaultMode": 420, "secretName": "default-token-kbm4z" } } ] }, "status": { "message": "Pod Predicate MatchNodeSelector failed", "phase": "Failed", "reason": "MatchNodeSelector", "startTime": "2017-08-22T09:32:10Z" } }, { "apiVersion": "v1", "kind": "Pod", "metadata": { "annotations": { "kubernetes.io/created-by": "{\"kind\":\"SerializedReference\",\"apiVersion\":\"v1\",\"reference\":{\"kind\":\"DaemonSet\",\"namespace\":\"glusterfs\",\"name\":\"glusterfs-storage\",\"uid\":\"07266f0c-871c-11e7-9415-525400448a7a\",\"apiVersion\":\"extensions\",\"resourceVersion\":\"18143\"}}\n", "openshift.io/scc": "privileged" }, "creationTimestamp": "2017-08-22T09:32:09Z", "generateName": "glusterfs-storage-", "labels": { "glusterfs": "storage-pod", "glusterfs-node": "pod", "pod-template-generation": "1" }, "name": "glusterfs-storage-ltxl7", "namespace": "glusterfs", "ownerReferences": [ { "apiVersion": "extensions/v1beta1", "blockOwnerDeletion": true, "controller": true, "kind": "DaemonSet", "name": "glusterfs-storage", "uid": "07266f0c-871c-11e7-9415-525400448a7a" } ], "resourceVersion": "18171", "selfLink": "/api/v1/namespaces/glusterfs/pods/glusterfs-storage-ltxl7", "uid": "c52e241f-871c-11e7-9415-525400448a7a" }, "spec": { "containers": [ { "image": "rhgs3/rhgs-server-rhel7:latest", "imagePullPolicy": "IfNotPresent", "livenessProbe": { "exec": { "command": [ "/bin/bash", "-c", "systemctl status glusterd.service" ] }, "failureThreshold": 15, "initialDelaySeconds": 40, "periodSeconds": 25, "successThreshold": 1, "timeoutSeconds": 3 }, "name": "glusterfs", "readinessProbe": { "exec": { "command": [ "/bin/bash", "-c", "systemctl status glusterd.service" ] }, "failureThreshold": 15, "initialDelaySeconds": 40, "periodSeconds": 25, "successThreshold": 1, "timeoutSeconds": 3 }, "resources": {}, "securityContext": { "privileged": true }, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File", "volumeMounts": [ { "mountPath": "/var/lib/heketi", "name": "glusterfs-heketi" }, { "mountPath": "/run", "name": "glusterfs-run" }, { "mountPath": "/run/lvm", "name": "glusterfs-lvm" }, { "mountPath": "/etc/glusterfs", "name": "glusterfs-etc" }, { "mountPath": "/var/log/glusterfs", "name": "glusterfs-logs" }, { "mountPath": "/var/lib/glusterd", "name": "glusterfs-config" }, { "mountPath": "/dev", "name": "glusterfs-dev" }, { "mountPath": "/var/lib/misc/glusterfsd", "name": "glusterfs-misc" }, { "mountPath": "/sys/fs/cgroup", "name": "glusterfs-cgroup", "readOnly": true }, { "mountPath": "/etc/ssl", "name": "glusterfs-ssl", "readOnly": true }, { "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount", "name": "default-token-kbm4z", "readOnly": true } ] } ], "dnsPolicy": "ClusterFirst", "hostNetwork": true, "imagePullSecrets": [ { "name": "default-dockercfg-bl4qd" } ], "nodeName": "infra2.example.com", "nodeSelector": { "glusterfs": "storage-host", "region": "compute" }, "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {}, "serviceAccount": "default", "serviceAccountName": "default", "terminationGracePeriodSeconds": 30, "tolerations": [ { "effect": "NoExecute", "key": "node.alpha.kubernetes.io/notReady", "operator": "Exists" }, { "effect": "NoExecute", "key": "node.alpha.kubernetes.io/unreachable", "operator": "Exists" } ], "volumes": [ { "hostPath": { "path": "/var/lib/heketi" }, "name": "glusterfs-heketi" }, { "emptyDir": {}, "name": "glusterfs-run" }, { "hostPath": { "path": "/run/lvm" }, "name": "glusterfs-lvm" }, { "hostPath": { "path": "/etc/glusterfs" }, "name": "glusterfs-etc" }, { "hostPath": { "path": "/var/log/glusterfs" }, "name": "glusterfs-logs" }, { "hostPath": { "path": "/var/lib/glusterd" }, "name": "glusterfs-config" }, { "hostPath": { "path": "/dev" }, "name": "glusterfs-dev" }, { "hostPath": { "path": "/var/lib/misc/glusterfsd" }, "name": "glusterfs-misc" }, { "hostPath": { "path": "/sys/fs/cgroup" }, "name": "glusterfs-cgroup" }, { "hostPath": { "path": "/etc/ssl" }, "name": "glusterfs-ssl" }, { "name": "default-token-kbm4z", "secret": { "defaultMode": 420, "secretName": "default-token-kbm4z" } } ] }, "status": { "message": "Pod Predicate MatchNodeSelector failed", "phase": "Failed", "reason": "MatchNodeSelector", "startTime": "2017-08-22T09:32:09Z" } } ], "kind": "List", "metadata": {}, "resourceVersion": "", "selfLink": "" } ], "returncode": 0 }, "state": "list" } Expected results: Installation should succeed.
What's the "oc describe" output for the GlusterFS daemonset and the cluster nodes?
Offhand I imagine the problem is that the DaemonSet is looking for a node with both the GlusterFS label and the osm_default_node_selector label. I don't think it'd be a good idea to also automatically label all GlusterFS nodes with the default label since that opens up the possibility for other pods that may not be desired running there. Scott, is there any way to ignore osm_default_node_selector?
Created attachment 1316679 [details] nodes
Created attachment 1316680 [details] daemonset
Uploaded the information as requested. Also, in my case, the default node selector is not any of the gluster nodes.
PR for this is upstream: https://github.com/openshift/openshift-ansible/pull/5316
PR is merged
No openshift-ansible build is attached to errata, no errata puddle, move it to "MODIFIED"
Failed to verify in version openshift-ansible-3.6.173.0.35-1.git.0.6c318bc.el7. Installation failed when set osm_default_node_selector='region=compute' cause there didn't have three nodes with label "region=compute". It succeed when set osm_default_node_selector='role=node' cause the glusterfs nodes both have label "role=node". According to the task "Verify target namespace exists", seems there is no different between openshift-ansible-3.6.173.0.35-1.git.0.6c318bc.el7 and openshift-ansible-3.6.173.0.5-3.git.0.522a92a.el7. # cat roles/openshift_storage_glusterfs/tasks/glusterfs_common.yml ... - name: Verify target namespace exists oc_project: state: present name: "{{ glusterfs_namespace }}" node_selector: "{% if glusterfs_use_default_selector %}{{ omit }}{% endif %}" when: glusterfs_is_native or glusterfs_heketi_is_native or glusterfs_storageclass ...
I mean there is no different between those version's output. The code does changed. # cat roles/openshift_storage_glusterfs/tasks/glusterfs_common.yml ... node_selector: "{% if glusterfs_use_default_selector %}{{ omit }}{% endif %}" ...
And you didn't set openshift_storage_glusterfs_use_default_selector=True?
(In reply to Jose A. Rivera from comment #14) > And you didn't set openshift_storage_glusterfs_use_default_selector=True? Yes, I didn't. Seems it should be False by default. # grep -nir "openshift_storage_glusterfs_use_default_selector" . ./roles/openshift_storage_glusterfs/defaults/main.yml:6:openshift_storage_glusterfs_use_default_selector: False
The following PR should resolve the issue: https://github.com/openshift/openshift-ansible/pull/5608
PR is merged.
Verified with version openshift-ansible-3.7.0-0.159.0.git.0.0cf8cf6.el7, set osm_default_node_selector='region=compute', which there didn't have three nodes with label "region=compute". Installation succeed.
Would this work as a workaround; if osm_default_node_selector='region=compute' you can set on the project itself (oc edit namespace or something similar) or on the DS, set the nodeselector manually 'region=gluster'. A conflict can be averted as long as you overwrite the defaults on a lower-grain.
So, the cluster at large will follow the default node selector. But the gluster pods can follow a different nodeselector if you architect properly: node1 region=infra, zone=apps node2 region=infra, zone=internal node3 region=compute, zone=apps node4 region=compute, zone=internal If the default nodeselector is region=infra, then you can set the nodeselector in the daemonset to go against this with region=compute. Or, if you dont want to force it to the compute nodes, but rather to zone=internal, you can set (in the namespace definition): metadata: annotations: openshift.io/node-selector: "" And in the daemonset set nodeselector to zone=internal.
For me, setting the nodeselector on the daemonset had no influence. It has to be set on the namespace. I also tried creating the glusterfs project via openshift_additional_projects and setting either no nodeselector or the same as the role sets per default (glusterfs=storage-host), but that didn't help. So as of now it seems the only option is to manually set a nodeselector on the namespace. Unless someone else has another idea.
Yes, you have to manually set a node selector on the entire namespace. Note that this selector can be "" (the empty string), which overwrites the default node selector with nothing thus imposing no additional node selector on pods in that namespace.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188