It looks some of the NFS specific tests are failing on baremetal clusters. For example - https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/1621/pull-ci-openshift-installer-master-e2e-metal/4/?log#log I debugged this and found that NFS server is unable to come up when started by e2e: Apr 15 17:44:56 worker-0.ci-op-w6lfbtli-d3e37.origin-ci-int-aws.dev.rhcloud.com kernel: NFSD: attempt to initialize umh client tracking in a container ignored. Apr 15 17:44:56 worker-0.ci-op-w6lfbtli-d3e37.origin-ci-int-aws.dev.rhcloud.com kernel: NFSD: attempt to initialize legacy client tracking in a container ignored. Apr 15 17:44:56 worker-0.ci-op-w6lfbtli-d3e37.origin-ci-int-aws.dev.rhcloud.com kernel: NFSD: Unable to initialize client recovery tracking! (-22) Apr 15 17:44:56 worker-0.ci-op-w6lfbtli-d3e37.origin-ci-int-aws.dev.rhcloud.com kernel: NFSD: starting 10-second grace period (net f0000498) The closest reference we have for this is - https://github.com/kubernetes/kubernetes/issues/33447 Which was because the node did not had nfs tools.
There are many NFS tests that passed in the test run, so it's not about missing NFS utils. Kernel logs listed above are IMO harmless. I ran the tests manually on a bare metal and it passed. In addition, the test creates a PVC + PV and checks they're bound together. NFS is not involved here yet, it would be used later, if they were Bound. PV and PVC are (from test teardown): Apr 15 23:10:25.120: INFO: Deleting PersistentVolumeClaim "pvc-dvmt7" Apr 15 23:10:25.142: INFO: Deleting PersistentVolume "nfs-8wxmm" controller-manager logs shows that the PVC can't find its PV: I0415 23:08:42.998947 1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"e2e-tests-pv-4tk65", Name:"pvc-dvmt7", UID:"3b64829e-5fd3-11e9-b3fc-0cc47a18ab96", APIVersion:"v1", ResourceVersion:"61900", FieldPath:""}): type: 'Normal' reason: 'FailedBinding' no persistent volumes available for this claim and no storage class is set And PV is not to be found, because it got bound to PVC from a different test: I0415 23:07:28.009082 1 pv_controller.go:874] claim "e2e-tests-statefulset-946xl/datadir-ss-0" bound to volume "nfs-8wxmm" I0415 23:07:28.012629 1 pv_controller.go:824] volume "nfs-8wxmm" entered phase "Bound" I0415 23:07:28.012648 1 pv_controller.go:963] volume "nfs-8wxmm" bound to claim "e2e-tests-statefulset-946xl/datadir-ss-0" The test apparently races with StatefulSet test "[It] should perform rolling updates and roll backs of template modifications with PVCs [Suite:openshift/conformance/parallel] [Suite:k8s]". That one expects that there is a default storage class and it would get its PV provisioned dynamically. Its PVC steals PV from the other test instead. Even if there was a default storage class + dynamic provisioning, there would still be (short) window of opportunity: 1. NFS test creates PV 2. StatefulSet test creates PVC 3. PV controller sees available PV from 1. and binds it to PVC from 2. instead of dynamic provisioning of a new PV for StatefulSet test. These two tests should use a different storage class.
> Even if there was a default storage class + dynamic provisioning, there would still be (short) window of opportunity: > > 1. NFS test creates PV > 2. StatefulSet test creates PVC > 3. PV controller sees available PV from 1. and binds it to PVC from 2. instead of dynamic provisioning of a new PV for StatefulSet test. > > These two tests should use a different storage class. False alarm, they *do* use a different storage class (when there is one). NFS PV tests explicitly set StorageClassName: "" in PVCs and they don't get the default one assigned by our default storage class admission plugin: https://github.com/kubernetes/kubernetes/blob/252cabf155308b43c8c612f482855dc0cfa2e29c/test/e2e/storage/persistent_volumes.go#L140 I think that skipping tests that need default storage class (bug #1700076) would be enough to fix also these NFS flakes. *** This bug has been marked as a duplicate of bug 1700076 ***