Bug 1849051
Summary: | Tests are failing due to constant etcd leader elections changes | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | OpenShift BugZilla Robot <openshift-bugzilla-robot> | ||||||
Component: | Networking | Assignee: | Maysa Macedo <mdemaced> | ||||||
Networking sub component: | kuryr | QA Contact: | GenadiC <gcheresh> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | urgent | CC: | ltomasbo, rlobillo | ||||||
Version: | 4.5 | ||||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.5.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | No Doc Update | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-07-13 17:44:38 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1847313 | ||||||||
Bug Blocks: | 1849540 | ||||||||
Attachments: |
|
Description
OpenShift BugZilla Robot
2020-06-19 14:15:55 UTC
Created attachment 1698598 [details]
NP test results
Created attachment 1698599 [details]
ETCD metrics during test execution
Verified on OCP4.5.0-0.nightly-2020-06-23-075004 with OSP16.1 (RHOS-16.1-RHEL-8-20200623.n.0) with OVN. NP tests run with parallelism set to 3 with expected results. No etcd leader change observed during test execution (on day 2020-06-24): [stack@undercloud-0 ~]$ date && for i in $(oc get pods -n openshift-etcd -l app=etcd -o NAME); do echo "# $i"; oc logs $i -n openshift-etcd -c etcd |grep 'became leader'; done Wed Jun 24 08:30:46 EDT 2020 # pod/etcd-ostest-rl79c-master-0 # pod/etcd-ostest-rl79c-master-1 raft2020/06/23 19:20:32 INFO: 95db74b7d4920873 became leader at term 4 # pod/etcd-ostest-rl79c-master-2 No timeouts on port 2380 during test execution (on day 2020-06-24):: [stack@undercloud-0 ~]$ date && for i in $(oc get pods -n openshift-etcd -l app=etcd -o NAME); do echo "# $i"; oc logs $i -n openshift-etcd -c etcd |grep 'timeout'; done Wed Jun 24 08:32:24 EDT 2020 # pod/etcd-ostest-rl79c-master-0 # pod/etcd-ostest-rl79c-master-1 2020-06-23 19:22:07.725074 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 99.146027ms, to 669c7d0c57a3d244) 2020-06-23 19:22:07.725270 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 99.345401ms, to 498ed5c98fdb1ab8) 2020-06-23 19:36:22.023038 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 5.072443ms, to 669c7d0c57a3d244) 2020-06-23 19:36:22.023217 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 5.25579ms, to 498ed5c98fdb1ab8) 2020-06-23 19:36:39.306596 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 225.467102ms, to 669c7d0c57a3d244) 2020-06-23 19:36:39.306625 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 225.499258ms, to 498ed5c98fdb1ab8) 2020-06-23 19:37:21.550861 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 276.735925ms, to 669c7d0c57a3d244) 2020-06-23 19:37:21.551163 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 277.040716ms, to 498ed5c98fdb1ab8) 2020-06-24 01:50:08.688456 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 13.412946ms, to 669c7d0c57a3d244) 2020-06-24 01:50:08.688518 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 13.482507ms, to 498ed5c98fdb1ab8) # pod/etcd-ostest-rl79c-master-2 Furthermore, etcd metrics show an stable behaviour. Attached test logs and metrics. attachment 1698598 [details] & attachment 1698599 [details]. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |