Bug 1670241
| Summary: | How gp2 PVs chooses a zone? | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Hongkai Liu <hongkliu> |
| Component: | Storage | Assignee: | Hemant Kumar <hekumar> |
| Status: | CLOSED WONTFIX | QA Contact: | Liang Xia <lxia> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.1.0 | CC: | aos-bugs, aos-storage-staff, hongkliu, jsafrane, mifiedle |
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-02-18 20:35:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
75 project, each has one statefulset with 2 REPLICAS and each pod has a PVC gernated by `volumeClaimTemplates` Tried with the 1.12 of k8s. The problem is still there.
# oc get clusterversion version -o json | jq .status.desired
{
"image": "registry.svc.ci.openshift.org/ocp/release@sha256:d03ce0ef85540a1fff8bfc1c408253404aaecb2b958d7c3f24896f3597c3715b",
"version": "4.0.0-0.nightly-2019-01-30-145955"
}
# oc version
oc v4.0.0-0.150.0
kubernetes v1.12.4+f39ab668d3
features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://hongkliu28-api.qe.devcluster.openshift.com:6443
kubernetes v1.12.4+f39ab668d3
Placement of dynamically provisioned volumes is based *only* only PVC name. It's hashed, the hash is divided by nr. of zones and the remainder is used as index of the zone. It works well if each PVC has a different name - their hashes are different and PVs are provisioned roughly equally among zones. If the PVCs have the same names (in different namespaces), they have the same hash and PVs are provisioned in the same zones. There is bug #1663012 that tries to fix that, but change of the hashing algorithm on Kubernetes update looks like significant behavior change. Can you use different StatefulSet names in each namespace? It should help you with this issue. Oh, and since this is 4.0, setting "volumeBindingMode: WaitForFirstConsumer" in storage class should fix it too, even with the same PVC names in all namespaces. It works when `volumeBindingMode: WaitForFirstConsumer`
# cat ~/gp2b.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
creationTimestamp: 2019-02-18T14:16:17Z
labels:
cluster.storage.openshift.io/owner-name: cluster-config-v1
cluster.storage.openshift.io/owner-namespace: kube-system
name: gp2b
resourceVersion: "9640"
selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2
uid: c1904b3f-3387-11e9-9c73-0ac06c3388a2
parameters:
type: gp2
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
# oc get clusterversion version -o json | jq -r .status.desired
{
"image": "registry.svc.ci.openshift.org/ocp/release@sha256:9f37d93acf2e7442e5bf74f06ca253e37ba299e89bbb66fb30b2cafda6c3d217",
"version": "4.0.0-0.ci-2019-02-18-105238"
}
|
Description of problem: 3 master and 6 works (m5.4xlarge) 6 workers: 2 in us-east-2a, 2 in us-east-2b, 2 in us-east-2c 75 Statefulsets behaves differently when they are in ONE project and when they are in 75 projects. How does a gp2 PV choose its availability zone when it gets created? What is the difference between one project and 75 projects? Version-Release number of selected component (if applicable): # oc get clusterversion version -o json | jq -r .status.desired { "image": "registry.svc.ci.openshift.org/ocp/release@sha256:ef5a60a10812f2fa1e4c93a5042c1520ca55675f9b4085b08579510d71031047", "version": "4.0.0-0.nightly-2019-01-25-205123" } How reproducible: 3/3 Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: StorageClass Dump (if StorageClass used by PV/PVC): Additional info: 75 project, each has one statefulset with 2 and each pod has a PVC gernated by `volumeClaimTemplates` # oc get pv --no-headers | wc -l 150 # oc get pod --all-namespaces | grep clusterb | grep Running | wc -l 150 # oc describe pv | grep zone | grep 2a | wc -l 50 # oc describe pv | grep zone | grep 2b | wc -l 50 # oc describe pv | grep zone | grep 2c | wc -l 50 However, if 1 project has all 75 statefulsets. # oc describe pv | grep zone | grep 2a | wc -l 50 # oc describe pv | grep zone | grep 2b | wc -l 0 # oc describe pv | grep zone | grep 2c | wc -l 75 # oc describe pod -n clusteraproject37 web0-0 Name: web0-0 Namespace: clusteraproject37 Priority: 0 PriorityClassName: <none> Node: <none> Labels: app=server0 controller-revision-hash=web0-7c994b8d69 statefulset.kubernetes.io/pod-name=web0-0 Annotations: openshift.io/scc=restricted Status: Pending IP: Controlled By: StatefulSet/web0 Containers: server: Image: openshift/hello-openshift Port: 8080/TCP Host Port: 0/TCP Limits: cpu: 1 memory: 256Mi Requests: cpu: 500m memory: 128Mi Environment: <none> Mounts: /mydata from www (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-z6hvb (ro) Conditions: Type Status PodScheduled False Volumes: www: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: www-web0-0 ReadOnly: false default-token-z6hvb: Type: Secret (a volume populated by a Secret) SecretName: default-token-z6hvb Optional: false QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 14m (x25 over 14m) default-scheduler pod has unbound PersistentVolumeClaims (repeated 6 times) Warning FailedScheduling 4m (x356 over 12m) default-scheduler 0/9 nodes are available: 2 node(s) had no available volume zone, 3 node(s) had taints that the pod didn't tolerate, 4 node(s) exceed max volume count.