Bug 1700120 - e2e test [sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] is failing on UPI metal based installations
Summary: e2e test [sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [Sta...
Keywords:
Status: CLOSED DUPLICATE of bug 1700076
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Bradley Childs
QA Contact: Liang Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-15 22:16 UTC by Abhinav Dahiya
Modified: 2019-04-16 13:50 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-16 13:50:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Abhinav Dahiya 2019-04-15 22:16:21 UTC
e2e-metal CI that test UPI bare metal based installation is consistently failing on `[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic]` e2e tests.

for example:
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/1621/pull-ci-openshift-installer-master-e2e-metal/5

```
[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should adopt matching orphans and release non-matching pods [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should not deadlock when a pod's predecessor fails [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should perform rolling updates and roll backs of template modifications with PVCs [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should provide basic identity [Suite:openshift/conformance/parallel] [Suite:k8s]
```

are failing.

on further debugging it looks like these tests are using PVCs

```
openshift-tests run-test "[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should adopt matching orphans and release non-matching pods [Suite:openshift/conformance/parallel] [Suite:k8s]"

Apr 15 14:50:13.185: INFO: Overriding default scale value of zero to 1

Apr 15 14:50:13.185: INFO: Overriding default milliseconds value of zero to 5000

Apr 15 14:50:13.577: INFO: The --provider flag is not set.  Treating as a conformance test.  Some tests may not be run.

Apr 15 14:50:13.585: INFO: >>> kubeConfig: /home/adahiya/go/src/github.com/openshift/installer/upi/metal/dev-metal-0/auth/kubeconfig

Apr 15 14:50:13.587: INFO: Waiting up to 30m0s for all (but 100) nodes to be schedulable

Apr 15 14:50:14.158: INFO: Waiting up to 10m0s for all pods (need at least 0) in namespace 'kube-system' to be running and ready

Apr 15 14:50:14.427: INFO: 3 / 3 pods in namespace 'kube-system' are running and ready (0 seconds elapsed)

Apr 15 14:50:14.427: INFO: expected 0 pod replicas in namespace 'kube-system', 0 are Running and Ready.

Apr 15 14:50:14.427: INFO: Waiting up to 5m0s for all daemonsets in namespace 'kube-system' to start

Apr 15 14:50:14.512: INFO: e2e test version: v1.13.4+fbfe4e9397

Apr 15 14:50:14.585: INFO: kube-apiserver version: v1.13.4+af45cda

[BeforeEach] [Top Level]

  /home/adahiya/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/test/extended/util/test.go:69

[BeforeEach] [sig-apps] StatefulSet

  /home/adahiya/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:153

STEP: Creating a kubernetes client

Apr 15 14:50:14.589: INFO: >>> kubeConfig: /home/adahiya/go/src/github.com/openshift/installer/upi/metal/dev-metal-0/auth/kubeconfig

STEP: Building a namespace api object, basename statefulset

Apr 15 14:50:19.080: INFO: About to run a Kube e2e test, ensuring namespace is privileged

Apr 15 14:50:19.994: INFO: No PodSecurityPolicies found; assuming PodSecurityPolicy is disabled.

STEP: Waiting for a default service account to be provisioned in namespace

[BeforeEach] [sig-apps] StatefulSet

  /home/adahiya/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/apps/statefulset.go:59

[BeforeEach] [k8s.io] Basic StatefulSet functionality [StatefulSetBasic]

  /home/adahiya/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/apps/statefulset.go:74

STEP: Creating service test in namespace e2e-tests-statefulset-dv5bj

[It] should adopt matching orphans and release non-matching pods [Suite:openshift/conformance/parallel] [Suite:k8s]

  /home/adahiya/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/apps/statefulset.go:134

STEP: Creating statefulset ss in namespace e2e-tests-statefulset-dv5bj

STEP: Saturating stateful set ss

Apr 15 14:50:20.219: INFO: Waiting for stateful pod at index 0 to enter Running

Apr 15 14:50:20.293: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:50:30.368: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:50:40.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:50:50.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:51:00.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:51:10.368: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:51:20.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:51:30.368: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:51:40.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:51:50.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:52:00.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:52:10.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:52:20.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:52:30.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:52:40.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:52:50.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:53:00.379: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:53:10.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:53:20.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:53:30.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:53:40.374: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:53:50.373: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:54:00.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:54:10.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:54:20.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:54:30.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:54:40.368: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:54:50.368: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:55:00.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:55:10.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:55:20.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:55:30.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:55:40.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:55:50.368: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:56:00.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:56:10.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:56:20.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:56:30.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:56:40.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:56:50.373: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:57:00.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:57:10.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:57:20.371: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:57:30.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:57:40.368: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:57:50.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:58:00.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:58:10.368: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:58:20.367: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:58:30.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:58:40.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:58:50.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:59:00.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:59:10.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:59:20.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:59:30.368: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:59:40.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 14:59:50.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 15:00:00.374: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 15:00:10.369: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 15:00:20.370: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 15:00:20.444: INFO: Waiting for pod ss-0 to enter Running - Ready=false, currently Pending - Ready=false

Apr 15 15:00:20.444: INFO: Failed waiting for pods to enter running: timed out waiting for the condition

[AfterEach] [k8s.io] Basic StatefulSet functionality [StatefulSetBasic]

  /home/adahiya/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/apps/statefulset.go:85

Apr 15 15:00:20.520: INFO: Running '/usr/local/bin/kubectl --server=https://api.dev-metal-0.devcluster.openshift.com:6443 --kubeconfig=/home/adahiya/go/src/github.com/openshift/installer/upi/metal/dev-metal-0/auth/kubeconfig describe po ss-0 --namespace=e2e-tests-statefulset-dv5bj'

Apr 15 15:00:21.921: INFO: stderr: ""

Apr 15 15:00:21.921: INFO: stdout: "Name:               ss-0\nNamespace:          e2e-tests-statefulset-dv5bj\nPriority:           0\nPriorityClassName:  <none>\nNode:               <none>\nLabels:             baz=blah\n                    controller-revision-hash=ss-7b659b6b55\n                    foo=bar\n                    statefulset.kubernetes.io/pod-name=ss-0\nAnnotations:        openshift.io/scc=hostmount-anyuid\nStatus:             Pending\nIP:                 \nControlled By:      StatefulSet/ss\nContainers:\n  nginx:\n    Image:        docker.io/library/nginx:1.14-alpine\n    Port:         <none>\n    Host Port:    <none>\n    Readiness:    exec [test -f /data/statefulset-continue] delay=0s timeout=1s period=1s #success=1 #failure=1\n    Environment:  <none>\n    Mounts:\n      /data/ from datadir (rw)\n      /home from home (rw)\n      /var/run/secrets/kubernetes.io/serviceaccount from default-token-dcrn5 (ro)\nConditions:\n  Type           Status\n  PodScheduled   False \nVolumes:\n  datadir:\n    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)\n    ClaimName:  datadir-ss-0\n    ReadOnly:   false\n  home:\n    Type:          HostPath (bare host directory volume)\n    Path:          /tmp/home\n    HostPathType:  \n  default-token-dcrn5:\n    Type:        Secret (a volume populated by a Secret)\n    SecretName:  default-token-dcrn5\n    Optional:    false\nQoS Class:       BestEffort\nNode-Selectors:  <none>\nTolerations:     node.kubernetes.io/not-ready:NoExecute for 300s\n                 node.kubernetes.io/unreachable:NoExecute for 300s\nEvents:\n  Type     Reason            Age                From               Message\n  ----     ------            ----               ----               -------\n  Warning  FailedScheduling  4m (x41 over 10m)  default-scheduler  pod has unbound immediate PersistentVolumeClaims (repeated 3 times)\n"

Apr 15 15:00:21.921: INFO:

Output of kubectl describe ss-0:

Name:               ss-0

Namespace:          e2e-tests-statefulset-dv5bj

Priority:           0

PriorityClassName:  <none>

Node:               <none>

Labels:             baz=blah

                    controller-revision-hash=ss-7b659b6b55

                    foo=bar

                    statefulset.kubernetes.io/pod-name=ss-0

Annotations:        openshift.io/scc=hostmount-anyuid

Status:             Pending

IP:

Controlled By:      StatefulSet/ss

Containers:

  nginx:

    Image:        docker.io/library/nginx:1.14-alpine

    Port:         <none>

    Host Port:    <none>

    Readiness:    exec [test -f /data/statefulset-continue] delay=0s timeout=1s period=1s #success=1 #failure=1

    Environment:  <none>

    Mounts:

      /data/ from datadir (rw)

      /home from home (rw)

      /var/run/secrets/kubernetes.io/serviceaccount from default-token-dcrn5 (ro)

Conditions:

  Type           Status

  PodScheduled   False

Volumes:

  datadir:

    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)

    ClaimName:  datadir-ss-0

    ReadOnly:   false

  home:

    Type:          HostPath (bare host directory volume)

    Path:          /tmp/home

    HostPathType:

  default-token-dcrn5:

    Type:        Secret (a volume populated by a Secret)

    SecretName:  default-token-dcrn5

    Optional:    false

QoS Class:       BestEffort

Node-Selectors:  <none>

Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s

                 node.kubernetes.io/unreachable:NoExecute for 300s

Events:

  Type     Reason            Age                From               Message

  ----     ------            ----               ----               -------

  Warning  FailedScheduling  4m (x41 over 10m)  default-scheduler  pod has unbound immediate PersistentVolumeClaims (repeated 3 times)



Apr 15 15:00:21.921: INFO: Running '/usr/local/bin/kubectl --server=https://api.dev-metal-0.devcluster.openshift.com:6443 --kubeconfig=/home/adahiya/go/src/github.com/openshift/installer/upi/metal/dev-metal-0/auth/kubeconfig logs ss-0 --namespace=e2e-tests-statefulset-dv5bj --tail=100'

Apr 15 15:00:22.556: INFO: stderr: ""

Apr 15 15:00:22.556: INFO: stdout: ""

Apr 15 15:00:22.556: INFO:

Last 100 log lines of ss-0:



Apr 15 15:00:22.556: INFO: Deleting all statefulset in ns e2e-tests-statefulset-dv5bj

Apr 15 15:00:22.631: INFO: Scaling statefulset ss to 0

Apr 15 15:00:22.855: INFO: Waiting for statefulset status.replicas updated to 0

Apr 15 15:00:22.929: INFO: Deleting statefulset ss

Apr 15 15:00:23.081: INFO: Deleting pvc: datadir-ss-0 with volume

[AfterEach] [sig-apps] StatefulSet

  /home/adahiya/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/test/e2e/framework/framework.go:154

STEP: Collecting events from namespace "e2e-tests-statefulset-dv5bj".

STEP: Found 6 events.

Apr 15 15:00:23.305: INFO: At 2019-04-15 14:50:20 -0700 PDT - event for datadir-ss-0: {persistentvolume-controller } FailedBinding: no persistent volumes available for this claim and no storage class is set

Apr 15 15:00:23.305: INFO: At 2019-04-15 14:50:20 -0700 PDT - event for ss: {statefulset-controller } SuccessfulCreate: create Claim datadir-ss-0 Pod ss-0 in StatefulSet ss success

Apr 15 15:00:23.305: INFO: At 2019-04-15 14:50:20 -0700 PDT - event for ss: {statefulset-controller } SuccessfulCreate: create Pod ss-0 in StatefulSet ss successful

Apr 15 15:00:23.305: INFO: At 2019-04-15 14:50:20 -0700 PDT - event for ss-0: {default-scheduler } FailedScheduling: pod has unbound immediate PersistentVolumeClaims (repeated 3 times)

Apr 15 15:00:23.305: INFO: At 2019-04-15 15:00:22 -0700 PDT - event for ss: {statefulset-controller } SuccessfulDelete: delete Pod ss-0 in StatefulSet ss successful

Apr 15 15:00:23.305: INFO: At 2019-04-15 15:00:22 -0700 PDT - event for ss-0: {default-scheduler } FailedScheduling: skip schedule deleting pod: e2e-tests-statefulset-dv5bj/ss-0

Apr 15 15:00:23.453: INFO: skipping dumping cluster info - cluster too large

Apr 15 15:00:23.453: INFO: Waiting up to 3m0s for all (but 100) nodes to be ready

STEP: Destroying namespace "e2e-tests-statefulset-dv5bj" for this suite.

Apr 15 15:00:29.758: INFO: Waiting up to 30s for server preferred namespaced resources to be successfully discovered

Apr 15 15:00:35.105: INFO: namespace e2e-tests-statefulset-dv5bj deletion completed in 11.574859515s

Apr 15 15:00:35.114: INFO: Running AfterSuite actions on all nodes

Apr 15 15:00:35.114: INFO: Running AfterSuite actions on node 1

fail [k8s.io/kubernetes/test/e2e/framework/statefulset_utils.go:323]: Apr 15 15:00:20.444: Failed waiting for pods to enter running: timed out waiting for the condition
```

UPI metal installations do not have storage class configured and therefore any tests that depend on PV must either be skipped or move to non PV way of testing.

Expected:

[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] are not failing

Comment 1 Michal Fojtik 2019-04-16 08:27:02 UTC
The culprint seems to be:

Warning  FailedScheduling  4m (x41 over 10m)  default-scheduler  pod has unbound immediate PersistentVolumeClaims (repeated 3 times)

That suggests this is a scheduler or storage issue.

Comment 2 Tomáš Nožička 2019-04-16 08:35:37 UTC
Sounds like storage, in a very quick look this might be related
- https://github.com/kubernetes/kubernetes/issues/71928
- https://github.com/kubernetes/kubernetes/pull/72045

Comment 3 Jan Safranek 2019-04-16 13:50:55 UTC
Default storage class is missing on bare metal, these tests should be skipped.

*** This bug has been marked as a duplicate of bug 1700076 ***


Note You need to log in before you can comment on or make changes to this bug.