Bug 1711688

Summary:	Should access volume from different nodes storage e2e test is flaky/failing for aws/nfs, others
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Storage	Assignee:	Tomas Smetana <tsmetana>
Status:	CLOSED ERRATA	QA Contact:	Chao Yang <chaoyang>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	4.1.0	CC:	aos-bugs, aos-storage-staff, chaoyang, jsafrane, trankin
Target Milestone:	---
Target Release:	4.2.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-10-16 06:29:13 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-05-19 16:10:40 UTC

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/22858/pull-ci-openshift-origin-master-e2e-aws/8796/#openshift-tests-sig-storage-in-tree-volumes-driver-nfs-testpattern-dynamic-pv-default-fs-provisioning-should-access-volume-from-different-nodes-suiteopenshiftconformanceparallel-suitek8s

I think the test itself is broken on OpenShift, needs to be investigated

fail [k8s.io/kubernetes/test/e2e/storage/testsuites/provisioning.go:482]: second pod should have run on a different node
Expected
    <string>: ip-10-0-137-111.ec2.internal
not to equal
    <string>: ip-10-0-137-111.ec2.internal

Comment 2 Tomas Smetana 2019-05-21 08:35:58 UTC

The test creates a writer pod that stores some data on the NFS volume, gets the Node name of the writer pod's node, then starts a writer pod with NodeSelectorTerms:{kubernetes.io/hostname NotIn [<node name of the writer pod>]}. Obviously if the "kubernetes.io/hostname" label is different from the Node name the pod gets scheduled on the same node and the test fails. And yes, I think this logic is broken. We need to use the label from the writer's pod in the NodeSelector not the Node name.

Comment 3 Tomas Smetana 2019-05-21 08:46:05 UTC

Fixed upstream: https://github.com/kubernetes/kubernetes/pull/74693

The patch is rather huge though...

Comment 4 Tomas Smetana 2019-05-22 09:24:41 UTC

The upstream test is broken too: it just adds more abstraction layers. Will fix it in both places.

Comment 5 Tomas Smetana 2019-05-22 10:39:24 UTC

https://github.com/openshift/origin/pull/22886

Comment 6 Jan Safranek 2019-05-30 13:03:05 UTC

Tomas, can you please re-enable "provisioning should access volume from different nodes" tests that were disabled because of this bug?

Comment 7 Tomas Smetana 2019-06-03 08:03:00 UTC

The tests should be enabled by https://github.com/openshift/origin/pull/22935. There is a bigger problem with the AWS multi-node tests: Origin uses multi-AZ clusters and the tests don't count with that. I will have to prepare a fix for it upstream first since it might not be as simple.

Comment 8 Tomas Smetana 2019-06-03 08:09:44 UTC

I filed bug #1716311 to track the multi-AZ fix.

Comment 10 Chao Yang 2019-07-18 08:37:53 UTC

It is passed on
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-07-14-223254   True        False         23h     Cluster version is 4.2.0-0.nightly-2019-07-14-223254

1.Create nfs-provisioner
2.Create nfs pvc
3.Create a pod using above pvc
4.Pod is running 
oc get pods -o wide
NAME                               READY   STATUS             RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
mypod                              1/1     Running            0          11s     10.128.2.18   ip-10-0-133-135.us-east-2.compute.internal   <none>           <none>
5.Delete the pod, create another pod
kind: Pod
apiVersion: v1
metadata:
  name: mypod20
  labels:
    name: frontendhttp
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: NotIn
            values:
            - ip-10-0-133-135
  containers:
    - name: myfrontend
      image: aosqe/hello-openshift
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
      - mountPath: "/mypod"
        name: aws
  volumes:
    - name: aws
      persistentVolumeClaim:
        claimName: nfs
6.Pod is running on the other node
oc get pods -o wide
NAME                               READY   STATUS             RESTARTS   AGE     IP            NODE                                         NOMINATED NODE   READINESS GATES
mypod20                            1/1     Running            0          19s     10.131.0.24   ip-10-0-144-124.us-east-2.compute.internal   <none>           <none>
7.Check the data in the pod

Comment 12 errata-xmlrpc 2019-10-16 06:29:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922