Bug 1845295 - e2e tests are failing secrets mounts fail due to etcd timeouts
Summary: e2e tests are failing secrets mounts fail due to etcd timeouts
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.5
Hardware: Unspecified
OS: Linux
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-08 20:35 UTC by Ben Howard
Modified: 2020-06-20 13:05 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-06-20 13:05:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ben Howard 2020-06-08 20:35:29 UTC
e2e tests for AWS are intermittently failing due to etcd timeouts when mounting secrets. 

On the test logs we see:
-ip-10-0-147-159.us-west-2.compute.internal-hqfrn node/ip-10-0-147-159.us-west-2.compute.internal reason/Scheduled
Jun 08 13:23:20.154 I ns/openshift-kube-apiserver pod/kube-apiserver-ip-10-0-175-158.us-west-2.compute.internal node/ip-10-0-175-158 reason/TerminationGracefulTerminationFinished All pending requests processed
Jun 08 13:23:20.747 I ns/e2e-var-expansion-5399 pod/var-expansion-6aebeb8b-579c-4ca9-b77b-67e252cbd3bf node/ reason/Created
Jun 08 13:23:20.841 I ns/e2e-var-expansion-5399 pod/var-expansion-6aebeb8b-579c-4ca9-b77b-67e252cbd3bf node/ip-10-0-147-159.us-west-2.compute.internal reason/Scheduled
Jun 08 13:23:21.134 W ns/e2e-volume-1086 pod/hostexec-ip-10-0-147-159.us-west-2.compute.internal-hqfrn node/ip-10-0-147-159.us-west-2.compute.internal reason/FailedMount MountVolume.SetUp failed for volume "default-token-6hqf9" : failed to sync secret cache: timed out waiting for the condition


From https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/2371/artifacts/e2e-aws/pods/openshift-etcd_etcd-ip-10-0-148-41.us-west-2.compute.internal_etcd.log:

2020-06-08 13:30:42.090034 W | etcdserver: request "header:<ID:13622951903028520052 username:\"etcd\" auth_revision:1 > txn:<compare:<target:MOD key:\"/kubernetes.io/serviceaccounts/e2e-csi-mock-volumes-4286/csi-resizer\" mod_revision:0 > success:<request_put:<key:\"/kubernetes.io/serviceaccounts/e2e-csi-mock-volumes-4286/csi-resizer\" value_size:132 >> failure:<>>" with result "size:18" took too long (134.091386ms) to execute
2020-06-08 13:30:42.094598 W | etcdserver: read-only range request "key:\"/kubernetes.io/serviceaccounts/e2e-persistent-local-volumes-test-3269/default\" " with result "range_response_count:1 size:322" took too long (129.390587ms) to execute
2020-06-08 13:30:42.095279 W | etcdserver: read-only range request "key:\"/kubernetes.io/events/e2e-test-cli-deployment-4qbk8/deployment-simple-3-5p6bt.161694afaf120831\" " with result "range_response_count:1 size:792" took too long (125.386409ms) to execute
2020-06-08 13:30:42.095532 W | etcdserver: read-only range request "key:\"/kubernetes.io/serviceaccounts/e2e-container-runtime-3388/builder\" " with result "range_response_count:1 size:298" took too long (125.130044ms) to execute
2020-06-08 13:30:42.097448 W | etcdserver: read-only range request "key:\"/kubernetes.io/rolebindings/e2e-projected-7417/\" range_end:\"/kubernetes.io/rolebindings/e2e-projected-74170\" " with result "range_response_count:0 size:6" took too long (127.255195ms) to execute
2020-06-08 13:30:42.101432 W | etcdserver: read-only range request "key:\"/kubernetes.io/rolebindings/e2e-provisioning-633/\" range_end:\"/kubernetes.io/rolebindings/e2e-provisioning-6330\" " with result "range_response_count:0 size:6" took too long (127.320764ms) to execute
2020-06-08 13:30:42.101605 W | etcdserver: read-only range request "key:\"/kubernetes.io/whereabouts.cni.cncf.io/ippools/e2e-volume-vsan-policy-9906/\" range_end:\"/kubernetes.io/whereabouts.cni.cncf.io/ippools/e2e-volume-vsan-policy-99060\" " with result "range_response_count:0 size:6" took too long (127.7028ms) to execute

From another node https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-aws-4.5/2371/artifacts/e2e-aws/pods/openshift-etcd_etcd-ip-10-0-175-158.us-west-2.compute.internal_etcd.log: 
2020-06-08 13:11:04.398739 W | etcdserver: read-only range request "key:\"/kubernetes.io/configmaps/openshift-apiserver-operator/openshift-apiserver-operator-lock\" " with result "range_response_count:1 size:657" took too long (140.730254ms) to execute
2020-06-08 13:11:04.398945 W | etcdserver: read-only range request "key:\"/kubernetes.io/serviceaccounts/openshift-kube-scheduler/installer-sa\" " with result "range_response_count:1 size:264" took too long (148.545757ms) to execute
2020-06-08 13:11:04.513100 W | etcdserver: read-only range request "key:\"/kubernetes.io/configmaps/openshift-authentication/v4-0-config-system-console-config\" " with result "range_response_count:0 size:5" took too long (125.361581ms) to execute
2020-06-08 13:11:04.513254 W | etcdserver: read-only range request "key:\"/kubernetes.io/pods/openshift-etcd/etcd-ip-10-0-175-158.us-west-2.compute.internal\" " with result "range_response_count:1 size:30636" took too long (142.538111ms) to execute
2020-06-08 13:11:04.513411 W | etcdserver: read-only range request "key:\"/kubernetes.io/secrets/openshift-apiserver/encryption-config-0\" " with result "range_response_count:0 size:5" took too long (134.753448ms) to execute
2020-06-08 13:11:04.513563 W | etcdserver: read-only range request "key:\"/kubernetes.io/podsecuritypolicy\" range_end:\"/kubernetes.io/podsecuritypolicz\" count_only:true " with result "range_response_count:0 size:5" took too long (138.147667ms) to execute
2020-06-08 13:11:04.513726 W | etcdserver: read-only range request "key:\"/kubernetes.io/clusterroles/basic-user\" " with result "range_response_count:1 size:976" took too long (139.67547ms) to execute
2020-06-08 13:11:04.513815 W | etcdserver: read-only range request "key:\"/kubernetes.io/clusterrolebindings/storage-version-migration-migrator\" " with result "range_response_count:1 size:591" took too long (143.075002ms) to execute
2020-06-08 13:11:04.581914 W | etcdserver: read-only range request "key:\"/kubernetes.io/configmaps/openshift-kube-apiserver-operator/kube-apiserver-operator-lock\" " with result "range_response_count:1 size:652" took too long (109.783474ms) to execute
2020-06-08 13:11:04.582030 W | etcdserver: read-only range request "key:\"/kubernetes.io/pods/openshift-kube-storage-version-migrator/migrator-dff484648-ldb7n\" " with result "range_response_count:1 size:2474" took too long (170.020658ms) to execute
2020-06-08 13:11:04.582122 W | etcdserver: read-only range request "key:\"/kubernetes.io/clusterrolebindings/system:openshift:operator:openshift-kube-scheduler-installer\" " with result "range_response_count:1 size:593" took too long (180.928568ms) to execute
2020-06-08 13:11:04.673961 W | etcdserver: read-only range request "key:\"/kubernetes.io/clusterrolebindings/\" range_end:\"/kubernetes.io/clusterrolebindings0\" " with result "range_response_count:156 size:104996" took too long (102.938249ms) to execute
2020-06-08 13:11:04.674490 W | etcdserver: read-only range request "key:\"/kubernetes.io/pods/openshift-etcd/etcd-ip-10-0-225-176.us-west-2.compute.internal\" " with result "range_response_count:1 size:30636" took too long (104.198162ms) to execute

Comment 1 Sam Batschelet 2020-06-10 15:33:19 UTC

*** This bug has been marked as a duplicate of bug 1845792 ***

Comment 2 Sam Batschelet 2020-06-11 16:50:48 UTC
still seeing this in 4.6 after 1845792 merged


Note You need to log in before you can comment on or make changes to this bug.