Description of problem: Pod using local storage did not schedule to the node where the local storage exists. Version-Release number of selected component (if applicable): openshift v3.7.0-0.126.4 kubernetes v1.7.0+80709908fd How reproducible: Always Steps to Reproduce: 1. Create a OCP cluster with 2 nodes. 2. On each onde, create directories(/mnt/disks/vol1) 3. Create PVCs, make sure using PVs from different nodes 4. Create Pods, using PVCs created step forward. 5. All Pods should be scheduled to correct node and running. Actual results: One Pod's status is "ContainerCreating", and reports: FailedMount Storage node affinity check failed for volume "local-pv-dc7ed566" : NodeSelectorTerm [{Key:kubernetes.io/hostname Operator:In Values:[host-8-241-73.host.centralci.eng.rdu2.redhat.com]}] does not match node labels Expected results: Pod's status is "Running" Master Log: Node Log (of failed PODs): Sep 20 03:50:18 host-8-241-40 atomic-openshift-node: E0920 03:50:18.903901 10056 reconciler.go:253] operationExecutor.MountVolume failed (controllerAttachDetachEnabled true) for volume "local-pv-ae231eff" (UniqueName: "kubernetes.io/local-volume/88c2a7b0-9dd5-11e7-b4ca-fa163e501ea4-local-pv-ae231eff") pod "local-volume-pod-2" (UID: "88c2a7b0-9dd5-11e7-b4ca-fa163e501ea4") : Storage node affinity check failed for volume "local-pv-ae231eff" (UniqueName: "kubernetes.io/local-volume/88c2a7b0-9dd5-11e7-b4ca-fa163e501ea4-local-pv-ae231eff") pod "local-volume-pod-2" (UID: "88c2a7b0-9dd5-11e7-b4ca-fa163e501ea4") : NodeSelectorTerm [{Key:kubernetes.io/hostname Operator:In Values:[host-8-241-73.host.centralci.eng.rdu2.redhat.com]}] does not match node labels PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info: $ oc describe pod local-volume-pod-2 Name: local-volume-pod-2 Namespace: mytest Security Policy: restricted Node: host-8-241-40.host.centralci.eng.rdu2.redhat.com/172.16.120.57 Start Time: Wed, 20 Sep 2017 15:30:10 +0800 Labels: <none> Status: Pending IP: Controllers: <none> Containers: myfront: Container ID: Image: aosqe/hello-openshift Image ID: Port: 80/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Volume Mounts: /mnt/local from pvol (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-9sr9w (ro) Environment Variables: <none> Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: pvol: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: localstorageclaim-2 ReadOnly: false default-token-9sr9w: Type: Secret (a volume populated by a Secret) SecretName: default-token-9sr9w QoS Class: BestEffort Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 15m 15m 1 {default-scheduler } Normal Scheduled Successfully assigned local-volume-pod-2 to host-8-241-40.host.centralci.eng.rdu2.redhat.com 15m 15m 1 {kubelet host-8-241-40.host.centralci.eng.rdu2.redhat.com} Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-9sr9w" 15m 57s 8970 {kubelet host-8-241-40.host.centralci.eng.rdu2.redhat.com} Warning FailedMount Storage node affinity check failed for volume "local-pv-ae231eff" : NodeSelectorTerm [{Key:kubernetes.io/hostname Operator:In Values:[host-8-241-73.host.centralci.eng.rdu2.redhat.com]}] does not match node labels
In controller-manager logs I can see that scheduler is started with these predicates and priorities: zář 19 22:29:11 host-8-241-62.host.centralci.eng.rdu2.redhat.com atomic-openshift-master-controllers[18383]: I0919 22:29:11.935972 18383 factory.go:351] Creating scheduler from configuration: {{ }[ {NoVolumeZoneConflict <nil>} {MaxEBSVolumeCount <nil>} {MaxGCEPDVolumeCount <nil>} {MatchInterPodAffinity <nil>} {NoDiskConflict <nil>} {GeneralPredicates <nil>} {PodToleratesNodeTaints <nil>} {CheckNodeMemoryPressure <nil>} {CheckNodeDiskPressure <nil>} {Region 0xc4210392d0}] [{SelectorSpreadPriority 1 <nil>} {InterPodAffinityPriority 1 <nil>} {LeastRequestedPriority 1 <nil>} {BalancedResourceAllocation 1 <nil>} {NodePreferAvoidPodsPriority 10000 <nil>} {NodeAffinityPriority 1 <nil>} {TaintTolerationPriority 1 <nil>} {Zone 2 0xc421039990}] [] 0} (edited for readability) "NoVolumeNodeConflict" is missing there from some reason.
NoVolumeNodeConflict predicate is not configured in /etc/origin/master/scheduler.json. It must be added there by installer. As a workaround, you can manually add it there and restart atomic-openshift-master-controllers: ... "predicates": [ { "name": "NoVolumeNodeConflict" }, ... Note that the predicate is enabled when running local cluster by simple "openshift start", i.e. unconfigured OpenShift works out of the box.
I'll try to fix it on my own, I need to get familiar with openshift-ansible anyway and this looks like trivial template modification.
Ansible PR: https://github.com/openshift/openshift-ansible/pull/5492
Verified the workaround in the following version: openshift v3.7.0-0.126.4 kubernetes v1.7.0+80709908fd Pod can be scheduled to the correct node.
Verified "NoVolumeNodeConflict" and "MaxAzureDiskVolumeCount" predicate were installed in the following version: oc v3.7.0-0.131.0 kubernetes v1.7.0+80709908fd
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188