Bug 1847313
Summary: | Tests are failing due to constant etcd leader elections changes | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Maysa Macedo <mdemaced> | ||||||
Component: | Networking | Assignee: | Maysa Macedo <mdemaced> | ||||||
Networking sub component: | kuryr | QA Contact: | GenadiC <gcheresh> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | urgent | CC: | ltomasbo, rlobillo | ||||||
Version: | 4.5 | ||||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.6.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-10-27 16:07:13 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1849051 | ||||||||
Attachments: |
|
Description
Maysa Macedo
2020-06-16 08:00:13 UTC
Created attachment 1702042 [details]
NP test results
Created attachment 1702043 [details]
ETCD metrics during test execution
Verified on OCP4.6.0-0.nightly-2020-07-21-004949 with OSP16.1 (RHOS-16.1-RHEL-8-20200714.n.0) with OVN. NP tests run with parallelism set to 2 with expected results. - No etcd leader change observed during test execution (on day 2020-07-22): [stack@undercloud-0 ~]$ date && for i in $(oc get pods -n openshift-etcd -l app=etcd -o NAME); do echo "# $i"; oc logs $i -n openshift-etcd -c etcd |grep 'became leader'; done Wed Jul 22 04:41:33 EDT 2020 # pod/etcd-ostest-tzdfc-master-0 # pod/etcd-ostest-tzdfc-master-1 raft2020/07/21 16:00:39 INFO: f56b8ef5cf671236 became leader at term 8 # pod/etcd-ostest-tzdfc-master-2 - 4 timeouts on port 2380 during test execution on master-1 but recovered succesfully (on day 2020-07-22):: [stack@undercloud-0 ~]$ date && for i in $(oc get pods -n openshift-etcd -l app=etcd -o NAME); do echo "# $i"; oc logs $i -n openshift-etcd -c etcd |grep 'timeout'; done Wed Jul 22 04:42:04 EDT 2020 # pod/etcd-ostest-tzdfc-master-0 # pod/etcd-ostest-tzdfc-master-1 2020-07-22 05:46:36.727875 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 2.559533ms, to fbb05cfa50510a87) 2020-07-22 05:46:36.727982 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 2.697912ms, to c0e6832f3d3c32b7) 2020-07-22 07:51:15.350022 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 3.185218ms, to fbb05cfa50510a87) 2020-07-22 07:51:15.350080 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 3.255695ms, to c0e6832f3d3c32b7) # pod/etcd-ostest-tzdfc-master-2 Furthermore, etcd metrics show an stable behaviour. Attached test logs and metrics: attachment 1702042 [details] & attachment 1702043 [details]. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |