Bug 1711688
Summary: | Should access volume from different nodes storage e2e test is flaky/failing for aws/nfs, others | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
Component: | Storage | Assignee: | Tomas Smetana <tsmetana> |
Status: | CLOSED ERRATA | QA Contact: | Chao Yang <chaoyang> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.0 | CC: | aos-bugs, aos-storage-staff, chaoyang, jsafrane, trankin |
Target Milestone: | --- | ||
Target Release: | 4.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-10-16 06:29:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Clayton Coleman
2019-05-19 16:10:40 UTC
The test creates a writer pod that stores some data on the NFS volume, gets the Node name of the writer pod's node, then starts a writer pod with NodeSelectorTerms:{kubernetes.io/hostname NotIn [<node name of the writer pod>]}. Obviously if the "kubernetes.io/hostname" label is different from the Node name the pod gets scheduled on the same node and the test fails. And yes, I think this logic is broken. We need to use the label from the writer's pod in the NodeSelector not the Node name. Fixed upstream: https://github.com/kubernetes/kubernetes/pull/74693 The patch is rather huge though... The upstream test is broken too: it just adds more abstraction layers. Will fix it in both places. Tomas, can you please re-enable "provisioning should access volume from different nodes" tests that were disabled because of this bug? The tests should be enabled by https://github.com/openshift/origin/pull/22935. There is a bigger problem with the AWS multi-node tests: Origin uses multi-AZ clusters and the tests don't count with that. I will have to prepare a fix for it upstream first since it might not be as simple. I filed bug #1716311 to track the multi-AZ fix. It is passed on NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-07-14-223254 True False 23h Cluster version is 4.2.0-0.nightly-2019-07-14-223254 1.Create nfs-provisioner 2.Create nfs pvc 3.Create a pod using above pvc 4.Pod is running oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES mypod 1/1 Running 0 11s 10.128.2.18 ip-10-0-133-135.us-east-2.compute.internal <none> <none> 5.Delete the pod, create another pod kind: Pod apiVersion: v1 metadata: name: mypod20 labels: name: frontendhttp spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: NotIn values: - ip-10-0-133-135 containers: - name: myfrontend image: aosqe/hello-openshift ports: - containerPort: 80 name: "http-server" volumeMounts: - mountPath: "/mypod" name: aws volumes: - name: aws persistentVolumeClaim: claimName: nfs 6.Pod is running on the other node oc get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES mypod20 1/1 Running 0 19s 10.131.0.24 ip-10-0-144-124.us-east-2.compute.internal <none> <none> 7.Check the data in the pod Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |