Bug 1737660

Summary: e2e-azure - etcd server overloaded
Product: OpenShift Container Platform Reporter: Kirsten Garrison <kgarriso>
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2.0CC: mfojtik, sbatsche, wking
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:34:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Kirsten Garrison 2019-08-06 01:22:52 UTC
Description of problem:

When running the new e2e-azure tests I am noticing that frequently we are seeing:
"etcdserver: server is likely overloaded" messages in the logs

Version-Release number of selected component (if applicable):
Current masters in installer & release

How reproducible:
- run e2e-azure in master
- if e2e test fails check etcd-member.logs and note that you will often see a significant number (~30-140) of `etcdserver: server is likely overloaded`

Actual results:
Many `etcdserver: server is likely overloaded` in logs

Expected results:
0 to a small handful (5-7) of `etcdserver: server is likely overloaded` in logs

Additional info:
example failed runs: 
https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/4582/rehearse-4582-pull-ci-openshift-cluster-ingress-operator-master-e2e-azure/2/artifacts/e2e-azure/pods/openshift-etcd_etcd-member-ci-op-3zqg2wdp-cb8da-5kszg-master-0_etcd-member.log

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_release/4582/rehearse-4582-pull-ci-openshift-cluster-ingress-operator-master-e2e-azure/3/artifacts/e2e-azure/pods/openshift-etcd_etcd-member-ci-op-9i9cd5vv-cb8da-cmm9v-master-0_etcd-member.log

For open PRs with e2e-azure runs see: 
https://github.com/openshift/release/pull/4582 
https://github.com/openshift/installer/pull/2123

Comment 5 Sam Batschelet 2019-08-08 20:24:56 UTC
After further exploration from the team, it was found that setting ReadOnly[1] cache for Azure made a considerable improvement to disk I/O.

[1] https://github.com/openshift/installer/pull/2186

Comment 7 ge liu 2019-09-03 07:04:30 UTC
hi Sam, do you have any suggestion for how to verify it, thanks in advance!

Comment 8 ge liu 2019-09-11 02:41:43 UTC
I can't open e2e test url, so I could not check the msg report situation, but I suppose it be fixed for it finish many rounds of regression test already.

Comment 9 errata-xmlrpc 2019-10-16 06:34:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922