Bug 1915494
Summary: | Frequent taint-related test failures | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Fabian von Feilitzsch <fabian> |
Component: | Node | Assignee: | Elana Hashman <ehashman> |
Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | high | ||
Priority: | unspecified | CC: | adduarte, aos-bugs, ehashman, wking |
Version: | 4.7 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | TechnicalReleaseBlocker | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-01-12 18:31:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Fabian von Feilitzsch
2021-01-12 18:05:42 UTC
specific failing job link: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1348509976322117632 *** This bug has been marked as a duplicate of bug 1908880 *** Several of the taint test failures in the openstack platform seem to show errors about mounting volumes: like so: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1347785226629156864 https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1348509976322117632 "[]VolumeDevice{},StartupProbe:nil,} start failed in pod taint-eviction-1_e2e-taint-single-pod-7551(76ddbe1e-163b-4be3-9473-371500d53b85): CreateContainerConfigError: cannot find volume "default-token-9xw5g" to mount into container "pause"" In other taint failures (like this one https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1345792152164110336) it is not too clear why the pod is not evacuated. seems something is also amiss with the container in the pod, seems to be in "paused" state. and test fails waiting for it to be deleted/evacuated. Jan 3 19:15:41.445: INFO: At 2021-01-03 19:13:34 +0000 UTC - event for taint-eviction-3: {kubelet zj8bt86c-a9c3a-92s9q-worker-0-6dqq5} Created: Created container pause Jan 3 19:15:41.445: INFO: At 2021-01-03 19:13:34 +0000 UTC - event for taint-eviction-3: {kubelet zj8bt86c-a9c3a-92s9q-worker-0-6dqq5} Started: Started container pause Jan 3 19:15:41.445: INFO: At 2021-01-03 19:14:41 +0000 UTC - event for taint-eviction-3: {taint-controller } TaintManagerEviction: Marking for deletion Pod e2e-taint-single-pod-2191/taint-eviction-3 Jan 3 19:15:41.445: INFO: At 2021-01-03 19:14:41 +0000 UTC - event for taint-eviction-3: {kubelet zj8bt86c-a9c3a-92s9q-worker-0-6dqq5} Killing: Stopping container pause Jan 3 19:15:41.480: INFO: POD NODE PHASE GRACE CONDITIONS Jan 3 19:15:41.480: INFO: taint-eviction-3 zj8bt86c-a9c3a-92s9q-worker-0-6dqq5 Running 30s [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:13:31 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:14:42 +0000 UTC ContainersNotReady containers with unready status: [pause]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:14:42 +0000 UTC ContainersNotReady containers with unready status: [pause]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:13:31 +0000 UTC }] Jan 3 19:15:41.480: INFO: Jan 3 19:15:41.480: INFO: taint-eviction-3[e2e-taint-single-pod-2191].container[pause]=The container could not be located when the pod was deleted. The container used to be Running Jan 3 19:15:41.544: INFO: skipping dumping cluster info - cluster too large Jan 3 19:15:41.544: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-taint-single-pod-2191" for this suite. Several of the taint test failures in the openstack platform seem to show errors about mounting volumes: like so: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1347785226629156864 https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1348509976322117632 "[]VolumeDevice{},StartupProbe:nil,} start failed in pod taint-eviction-1_e2e-taint-single-pod-7551(76ddbe1e-163b-4be3-9473-371500d53b85): CreateContainerConfigError: cannot find volume "default-token-9xw5g" to mount into container "pause"" In other taint failures (like this one https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1345792152164110336) it is not too clear why the pod is not evacuated. seems something is also amiss with the container in the pod, seems to be in "paused" state. and test fails waiting for it to be deleted/evacuated. Jan 3 19:15:41.445: INFO: At 2021-01-03 19:13:34 +0000 UTC - event for taint-eviction-3: {kubelet zj8bt86c-a9c3a-92s9q-worker-0-6dqq5} Created: Created container pause Jan 3 19:15:41.445: INFO: At 2021-01-03 19:13:34 +0000 UTC - event for taint-eviction-3: {kubelet zj8bt86c-a9c3a-92s9q-worker-0-6dqq5} Started: Started container pause Jan 3 19:15:41.445: INFO: At 2021-01-03 19:14:41 +0000 UTC - event for taint-eviction-3: {taint-controller } TaintManagerEviction: Marking for deletion Pod e2e-taint-single-pod-2191/taint-eviction-3 Jan 3 19:15:41.445: INFO: At 2021-01-03 19:14:41 +0000 UTC - event for taint-eviction-3: {kubelet zj8bt86c-a9c3a-92s9q-worker-0-6dqq5} Killing: Stopping container pause Jan 3 19:15:41.480: INFO: POD NODE PHASE GRACE CONDITIONS Jan 3 19:15:41.480: INFO: taint-eviction-3 zj8bt86c-a9c3a-92s9q-worker-0-6dqq5 Running 30s [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:13:31 +0000 UTC } {Ready False 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:14:42 +0000 UTC ContainersNotReady containers with unready status: [pause]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:14:42 +0000 UTC ContainersNotReady containers with unready status: [pause]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:13:31 +0000 UTC }] Jan 3 19:15:41.480: INFO: Jan 3 19:15:41.480: INFO: taint-eviction-3[e2e-taint-single-pod-2191].container[pause]=The container could not be located when the pod was deleted. The container used to be Running Jan 3 19:15:41.544: INFO: skipping dumping cluster info - cluster too large Jan 3 19:15:41.544: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready STEP: Destroying namespace "e2e-taint-single-pod-2191" for this suite. |