Bug 1711688 - Should access volume from different nodes storage e2e test is flaky/failing for aws/nfs, others
Summary: Should access volume from different nodes storage e2e test is flaky/failing f...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.2.0
Assignee: Tomas Smetana
QA Contact: Chao Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-19 16:10 UTC by Clayton Coleman
Modified: 2019-10-16 06:29 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-10-16 06:29:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:29:33 UTC

Description Clayton Coleman 2019-05-19 16:10:40 UTC
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/22858/pull-ci-openshift-origin-master-e2e-aws/8796/#openshift-tests-sig-storage-in-tree-volumes-driver-nfs-testpattern-dynamic-pv-default-fs-provisioning-should-access-volume-from-different-nodes-suiteopenshiftconformanceparallel-suitek8s

I think the test itself is broken on OpenShift, needs to be investigated

fail [k8s.io/kubernetes/test/e2e/storage/testsuites/provisioning.go:482]: second pod should have run on a different node
Expected
    <string>: ip-10-0-137-111.ec2.internal
not to equal
    <string>: ip-10-0-137-111.ec2.internal

Comment 2 Tomas Smetana 2019-05-21 08:35:58 UTC
The test creates a writer pod that stores some data on the NFS volume, gets the Node name of the writer pod's node, then starts a writer pod with NodeSelectorTerms:{kubernetes.io/hostname NotIn [<node name of the writer pod>]}. Obviously if the "kubernetes.io/hostname" label is different from the Node name the pod gets scheduled on the same node and the test fails. And yes, I think this logic is broken. We need to use the label from the writer's pod in the NodeSelector not the Node name.

Comment 3 Tomas Smetana 2019-05-21 08:46:05 UTC
Fixed upstream: https://github.com/kubernetes/kubernetes/pull/74693

The patch is rather huge though...

Comment 4 Tomas Smetana 2019-05-22 09:24:41 UTC
The upstream test is broken too: it just adds more abstraction layers. Will fix it in both places.

Comment 5 Tomas Smetana 2019-05-22 10:39:24 UTC
https://github.com/openshift/origin/pull/22886

Comment 6 Jan Safranek 2019-05-30 13:03:05 UTC
Tomas, can you please re-enable "provisioning should access volume from different nodes" tests that were disabled because of this bug?

Comment 7 Tomas Smetana 2019-06-03 08:03:00 UTC
The tests should be enabled by https://github.com/openshift/origin/pull/22935. There is a bigger problem with the AWS multi-node tests: Origin uses multi-AZ clusters and the tests don't count with that. I will have to prepare a fix for it upstream first since it might not be as simple.

Comment 8 Tomas Smetana 2019-06-03 08:09:44 UTC
I filed bug #1716311 to track the multi-AZ fix.

Comment 10 Chao Yang 2019-07-18 08:37:53 UTC
It is passed on
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-07-14-223254   True        False         23h     Cluster version is 4.2.0-0.nightly-2019-07-14-223254

1.Create nfs-provisioner
2.Create nfs pvc
3.Create a pod using above pvc
4.Pod is running 
oc get pods -o wide
NAME                               READY   STATUS             RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
mypod                              1/1     Running            0          11s     10.128.2.18   ip-10-0-133-135.us-east-2.compute.internal   <none>           <none>
5.Delete the pod, create another pod
kind: Pod
apiVersion: v1
metadata:
  name: mypod20
  labels:
    name: frontendhttp
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - ip-10-0-133-135
  containers:
    - name: myfrontend
      image: aosqe/hello-openshift
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
      - mountPath: "/mypod"
        name: aws
  volumes:
    - name: aws
      persistentVolumeClaim:
        claimName: nfs
6.Pod is running on the other node
oc get pods -o wide
NAME                               READY   STATUS             RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
mypod20                            1/1     Running            0          19s     10.131.0.24   ip-10-0-144-124.us-east-2.compute.internal   <none>           <none>
7.Check the data in the pod

Comment 12 errata-xmlrpc 2019-10-16 06:29:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.