Bug 1493432
| Summary: | Pod scheduled failed when it uses a local storage | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Qin Ping <piqin> |
| Component: | Installer | Assignee: | Jan Safranek <jsafrane> |
| Status: | CLOSED ERRATA | QA Contact: | Qin Ping <piqin> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.7.0 | CC: | aos-bugs, aos-storage-staff, jokerman, jsafrane, mmccomas |
| Target Milestone: | --- | ||
| Target Release: | 3.7.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-11-28 22:12:03 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
In controller-manager logs I can see that scheduler is started with these predicates and priorities:
zář 19 22:29:11 host-8-241-62.host.centralci.eng.rdu2.redhat.com atomic-openshift-master-controllers[18383]: I0919 22:29:11.935972 18383 factory.go:351] Creating scheduler from configuration: {{ }[
{NoVolumeZoneConflict <nil>}
{MaxEBSVolumeCount <nil>}
{MaxGCEPDVolumeCount <nil>}
{MatchInterPodAffinity <nil>}
{NoDiskConflict <nil>}
{GeneralPredicates <nil>}
{PodToleratesNodeTaints <nil>}
{CheckNodeMemoryPressure <nil>}
{CheckNodeDiskPressure <nil>}
{Region 0xc4210392d0}]
[{SelectorSpreadPriority 1 <nil>}
{InterPodAffinityPriority 1 <nil>}
{LeastRequestedPriority 1 <nil>}
{BalancedResourceAllocation 1 <nil>}
{NodePreferAvoidPodsPriority 10000 <nil>}
{NodeAffinityPriority 1 <nil>}
{TaintTolerationPriority 1 <nil>}
{Zone 2 0xc421039990}] [] 0}
(edited for readability)
"NoVolumeNodeConflict" is missing there from some reason.
NoVolumeNodeConflict predicate is not configured in /etc/origin/master/scheduler.json. It must be added there by installer.
As a workaround, you can manually add it there and restart atomic-openshift-master-controllers:
...
"predicates": [
{
"name": "NoVolumeNodeConflict"
},
...
Note that the predicate is enabled when running local cluster by simple "openshift start", i.e. unconfigured OpenShift works out of the box.
I'll try to fix it on my own, I need to get familiar with openshift-ansible anyway and this looks like trivial template modification. Verified the workaround in the following version: openshift v3.7.0-0.126.4 kubernetes v1.7.0+80709908fd Pod can be scheduled to the correct node. Verified "NoVolumeNodeConflict" and "MaxAzureDiskVolumeCount" predicate were installed in the following version: oc v3.7.0-0.131.0 kubernetes v1.7.0+80709908fd Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |
Description of problem: Pod using local storage did not schedule to the node where the local storage exists. Version-Release number of selected component (if applicable): openshift v3.7.0-0.126.4 kubernetes v1.7.0+80709908fd How reproducible: Always Steps to Reproduce: 1. Create a OCP cluster with 2 nodes. 2. On each onde, create directories(/mnt/disks/vol1) 3. Create PVCs, make sure using PVs from different nodes 4. Create Pods, using PVCs created step forward. 5. All Pods should be scheduled to correct node and running. Actual results: One Pod's status is "ContainerCreating", and reports: FailedMount Storage node affinity check failed for volume "local-pv-dc7ed566" : NodeSelectorTerm [{Key:kubernetes.io/hostname Operator:In Values:[host-8-241-73.host.centralci.eng.rdu2.redhat.com]}] does not match node labels Expected results: Pod's status is "Running" Master Log: Node Log (of failed PODs): Sep 20 03:50:18 host-8-241-40 atomic-openshift-node: E0920 03:50:18.903901 10056 reconciler.go:253] operationExecutor.MountVolume failed (controllerAttachDetachEnabled true) for volume "local-pv-ae231eff" (UniqueName: "kubernetes.io/local-volume/88c2a7b0-9dd5-11e7-b4ca-fa163e501ea4-local-pv-ae231eff") pod "local-volume-pod-2" (UID: "88c2a7b0-9dd5-11e7-b4ca-fa163e501ea4") : Storage node affinity check failed for volume "local-pv-ae231eff" (UniqueName: "kubernetes.io/local-volume/88c2a7b0-9dd5-11e7-b4ca-fa163e501ea4-local-pv-ae231eff") pod "local-volume-pod-2" (UID: "88c2a7b0-9dd5-11e7-b4ca-fa163e501ea4") : NodeSelectorTerm [{Key:kubernetes.io/hostname Operator:In Values:[host-8-241-73.host.centralci.eng.rdu2.redhat.com]}] does not match node labels PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info: $ oc describe pod local-volume-pod-2 Name: local-volume-pod-2 Namespace: mytest Security Policy: restricted Node: host-8-241-40.host.centralci.eng.rdu2.redhat.com/172.16.120.57 Start Time: Wed, 20 Sep 2017 15:30:10 +0800 Labels: <none> Status: Pending IP: Controllers: <none> Containers: myfront: Container ID: Image: aosqe/hello-openshift Image ID: Port: 80/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Volume Mounts: /mnt/local from pvol (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-9sr9w (ro) Environment Variables: <none> Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: pvol: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: localstorageclaim-2 ReadOnly: false default-token-9sr9w: Type: Secret (a volume populated by a Secret) SecretName: default-token-9sr9w QoS Class: BestEffort Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 15m 15m 1 {default-scheduler } Normal Scheduled Successfully assigned local-volume-pod-2 to host-8-241-40.host.centralci.eng.rdu2.redhat.com 15m 15m 1 {kubelet host-8-241-40.host.centralci.eng.rdu2.redhat.com} Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "default-token-9sr9w" 15m 57s 8970 {kubelet host-8-241-40.host.centralci.eng.rdu2.redhat.com} Warning FailedMount Storage node affinity check failed for volume "local-pv-ae231eff" : NodeSelectorTerm [{Key:kubernetes.io/hostname Operator:In Values:[host-8-241-73.host.centralci.eng.rdu2.redhat.com]}] does not match node labels