Bug 1711688
| Summary: | Should access volume from different nodes storage e2e test is flaky/failing for aws/nfs, others | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Clayton Coleman <ccoleman> |
| Component: | Storage | Assignee: | Tomas Smetana <tsmetana> |
| Status: | CLOSED ERRATA | QA Contact: | Chao Yang <chaoyang> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.1.0 | CC: | aos-bugs, aos-storage-staff, chaoyang, jsafrane, trankin |
| Target Milestone: | --- | ||
| Target Release: | 4.2.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-10-16 06:29:13 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Clayton Coleman
2019-05-19 16:10:40 UTC
The test creates a writer pod that stores some data on the NFS volume, gets the Node name of the writer pod's node, then starts a writer pod with NodeSelectorTerms:{kubernetes.io/hostname NotIn [<node name of the writer pod>]}. Obviously if the "kubernetes.io/hostname" label is different from the Node name the pod gets scheduled on the same node and the test fails. And yes, I think this logic is broken. We need to use the label from the writer's pod in the NodeSelector not the Node name.
Fixed upstream: https://github.com/kubernetes/kubernetes/pull/74693 The patch is rather huge though... The upstream test is broken too: it just adds more abstraction layers. Will fix it in both places. Tomas, can you please re-enable "provisioning should access volume from different nodes" tests that were disabled because of this bug? The tests should be enabled by https://github.com/openshift/origin/pull/22935. There is a bigger problem with the AWS multi-node tests: Origin uses multi-AZ clusters and the tests don't count with that. I will have to prepare a fix for it upstream first since it might not be as simple. I filed bug #1716311 to track the multi-AZ fix. It is passed on
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.2.0-0.nightly-2019-07-14-223254 True False 23h Cluster version is 4.2.0-0.nightly-2019-07-14-223254
1.Create nfs-provisioner
2.Create nfs pvc
3.Create a pod using above pvc
4.Pod is running
oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mypod 1/1 Running 0 11s 10.128.2.18 ip-10-0-133-135.us-east-2.compute.internal <none> <none>
5.Delete the pod, create another pod
kind: Pod
apiVersion: v1
metadata:
name: mypod20
labels:
name: frontendhttp
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: NotIn
values:
- ip-10-0-133-135
containers:
- name: myfrontend
image: aosqe/hello-openshift
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/mypod"
name: aws
volumes:
- name: aws
persistentVolumeClaim:
claimName: nfs
6.Pod is running on the other node
oc get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mypod20 1/1 Running 0 19s 10.131.0.24 ip-10-0-144-124.us-east-2.compute.internal <none> <none>
7.Check the data in the pod
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |