Bug 1915494 - Frequent taint-related test failures
Summary: Frequent taint-related test failures
Keywords:
Status: CLOSED DUPLICATE of bug 1908880
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Elana Hashman
QA Contact: Sunil Choudhary
URL:
Whiteboard: TechnicalReleaseBlocker
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-12 18:05 UTC by Fabian von Feilitzsch
Modified: 2021-01-12 23:03 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-12 18:31:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Comment 2 Elana Hashman 2021-01-12 18:31:01 UTC

*** This bug has been marked as a duplicate of bug 1908880 ***

Comment 3 Adolfo Duarte 2021-01-12 23:03:32 UTC
Several of the taint test failures in the openstack platform seem to show errors about mounting volumes: 
like so: 

https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1347785226629156864
https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1348509976322117632

"[]VolumeDevice{},StartupProbe:nil,} start failed in pod taint-eviction-1_e2e-taint-single-pod-7551(76ddbe1e-163b-4be3-9473-371500d53b85): CreateContainerConfigError: cannot find volume "default-token-9xw5g" to mount into container "pause""

In other taint failures (like this one https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1345792152164110336) it is not too clear why the pod is not evacuated. seems something is also amiss with the container in the pod, seems to be in "paused" state. and test fails waiting for it to be deleted/evacuated. 

Jan  3 19:15:41.445: INFO: At 2021-01-03 19:13:34 +0000 UTC - event for taint-eviction-3: {kubelet zj8bt86c-a9c3a-92s9q-worker-0-6dqq5} Created: Created container pause
Jan  3 19:15:41.445: INFO: At 2021-01-03 19:13:34 +0000 UTC - event for taint-eviction-3: {kubelet zj8bt86c-a9c3a-92s9q-worker-0-6dqq5} Started: Started container pause
Jan  3 19:15:41.445: INFO: At 2021-01-03 19:14:41 +0000 UTC - event for taint-eviction-3: {taint-controller } TaintManagerEviction: Marking for deletion Pod e2e-taint-single-pod-2191/taint-eviction-3
Jan  3 19:15:41.445: INFO: At 2021-01-03 19:14:41 +0000 UTC - event for taint-eviction-3: {kubelet zj8bt86c-a9c3a-92s9q-worker-0-6dqq5} Killing: Stopping container pause
Jan  3 19:15:41.480: INFO: POD               NODE                                 PHASE    GRACE  CONDITIONS
Jan  3 19:15:41.480: INFO: taint-eviction-3  zj8bt86c-a9c3a-92s9q-worker-0-6dqq5  Running  30s    [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:13:31 +0000 UTC  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:14:42 +0000 UTC ContainersNotReady containers with unready status: [pause]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:14:42 +0000 UTC ContainersNotReady containers with unready status: [pause]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:13:31 +0000 UTC  }]
Jan  3 19:15:41.480: INFO: 
Jan  3 19:15:41.480: INFO: taint-eviction-3[e2e-taint-single-pod-2191].container[pause]=The container could not be located when the pod was deleted.  The container used to be Running
Jan  3 19:15:41.544: INFO: skipping dumping cluster info - cluster too large
Jan  3 19:15:41.544: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-taint-single-pod-2191" for this suite.

Comment 4 Adolfo Duarte 2021-01-12 23:03:44 UTC
Several of the taint test failures in the openstack platform seem to show errors about mounting volumes: 
like so: 

https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1347785226629156864
https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1348509976322117632

"[]VolumeDevice{},StartupProbe:nil,} start failed in pod taint-eviction-1_e2e-taint-single-pod-7551(76ddbe1e-163b-4be3-9473-371500d53b85): CreateContainerConfigError: cannot find volume "default-token-9xw5g" to mount into container "pause""

In other taint failures (like this one https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.7/1345792152164110336) it is not too clear why the pod is not evacuated. seems something is also amiss with the container in the pod, seems to be in "paused" state. and test fails waiting for it to be deleted/evacuated. 

Jan  3 19:15:41.445: INFO: At 2021-01-03 19:13:34 +0000 UTC - event for taint-eviction-3: {kubelet zj8bt86c-a9c3a-92s9q-worker-0-6dqq5} Created: Created container pause
Jan  3 19:15:41.445: INFO: At 2021-01-03 19:13:34 +0000 UTC - event for taint-eviction-3: {kubelet zj8bt86c-a9c3a-92s9q-worker-0-6dqq5} Started: Started container pause
Jan  3 19:15:41.445: INFO: At 2021-01-03 19:14:41 +0000 UTC - event for taint-eviction-3: {taint-controller } TaintManagerEviction: Marking for deletion Pod e2e-taint-single-pod-2191/taint-eviction-3
Jan  3 19:15:41.445: INFO: At 2021-01-03 19:14:41 +0000 UTC - event for taint-eviction-3: {kubelet zj8bt86c-a9c3a-92s9q-worker-0-6dqq5} Killing: Stopping container pause
Jan  3 19:15:41.480: INFO: POD               NODE                                 PHASE    GRACE  CONDITIONS
Jan  3 19:15:41.480: INFO: taint-eviction-3  zj8bt86c-a9c3a-92s9q-worker-0-6dqq5  Running  30s    [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:13:31 +0000 UTC  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:14:42 +0000 UTC ContainersNotReady containers with unready status: [pause]} {ContainersReady False 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:14:42 +0000 UTC ContainersNotReady containers with unready status: [pause]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2021-01-03 19:13:31 +0000 UTC  }]
Jan  3 19:15:41.480: INFO: 
Jan  3 19:15:41.480: INFO: taint-eviction-3[e2e-taint-single-pod-2191].container[pause]=The container could not be located when the pod was deleted.  The container used to be Running
Jan  3 19:15:41.544: INFO: skipping dumping cluster info - cluster too large
Jan  3 19:15:41.544: INFO: Waiting up to 7m0s for all (but 100) nodes to be ready
STEP: Destroying namespace "e2e-taint-single-pod-2191" for this suite.


Note You need to log in before you can comment on or make changes to this bug.