Bug 1851338 - Tests are failing due to constant etcd leader elections changes
Summary: Tests are failing due to constant etcd leader elections changes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 4.3.z
Assignee: Maysa Macedo
QA Contact: GenadiC
URL:
Whiteboard:
: 1852990 (view as bug list)
Depends On: 1849540
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-06-26 08:35 UTC by Maysa Macedo
Modified: 2020-07-14 16:12 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 1849540
Environment:
Last Closed: 2020-07-14 16:11:52 UTC
Target Upstream Version:


Attachments (Terms of Use)
NP test results (1016.28 KB, application/gzip)
2020-07-07 10:08 UTC, rlobillo
no flags Details
ETCD metrics during test execution (408.57 KB, application/pdf)
2020-07-07 10:11 UTC, rlobillo
no flags Details


Links
System ID Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 680 None closed Bug 1851338: Split etcd sg rule ports range into different sg rules 2020-07-07 05:13:32 UTC
Red Hat Product Errata RHBA-2020:2872 None None None 2020-07-14 16:12:06 UTC

Comment 3 Jan Safranek 2020-07-02 07:59:59 UTC
*** Bug 1852990 has been marked as a duplicate of this bug. ***

Comment 6 rlobillo 2020-07-07 10:08:58 UTC
Created attachment 1700134 [details]
NP test results

Comment 7 rlobillo 2020-07-07 10:11:58 UTC
Created attachment 1700135 [details]
ETCD metrics during test execution

Comment 8 rlobillo 2020-07-07 10:13:00 UTC
Verified on OCP4.3.0-0.nightly-2020-07-06-074036 with OSP16.1
(RHOS-16.1-RHEL-8-20200701.n.0) with OVN.

Ingress rules to etcd are splitted in two instead of setting a range:

(shiftstack) [stack@undercloud-0 ~]$ openstack security group show ostest-h5nsm-master |
grep 10.196.0.0 | grep -e 2379 -e 2380
| | created_at='2020-07-06T14:19:41Z', direction='ingress', ethertype='IPv4',
id='45689162-6486-4a62-988e-7fc75f3b9178', port_range_max='2379', port_range_min='2379',
protocol='tcp', remote_ip_prefix='10.196.0.0/16', updated_at='2020-07-06T14:19:41Z' |
| | created_at='2020-07-06T14:19:41Z', direction='ingress', ethertype='IPv4',
id='b7230eda-b467-4ea7-8b1e-1aa48fae8818', port_range_max='2380',
port_range_min='2380', protocol='tcp', remote_ip_prefix='10.196.0.0/16',
updated_at='2020-07-06T14:19:41Z' |

NP tests run with parallelism set to 2 with expected results.

No etcd leader change observed during test execution (on day 2020-07-6 from 17:00 onwards):

(overcloud) [stack@undercloud-0 ~]$ for i in $(oc get pods -n openshift-etcd -l
k8s-app=etcd -o NAME); do echo "# $i"; oc logs $i -n openshift-etcd -c etcd-member
|grep 'became leader'; done
# pod/etcd-member-ostest-h5nsm-master-0
2020-07-06 14:17:22.082454 I | raft: 7e92ed1f2b132c63 became leader at term 8
# pod/etcd-member-ostest-h5nsm-master-1
# pod/etcd-member-ostest-h5nsm-master-2

No timeouts on port 2380 during test execution:

(overcloud) [stack@undercloud-0 ~]$ for i in $(oc get pods -n openshift-etcd -l
k8s-app=etcd -o NAME); do echo "# $i"; oc logs $i -n openshift-etcd -c etcd-member
|grep 'timeout'; done
# pod/etcd-member-ostest-h5nsm-master-0
# pod/etcd-member-ostest-h5nsm-master-1
# pod/etcd-member-ostest-h5nsm-master-2

Furthermore, etcd metrics show an stable behaviour during the same (attached).

Comment 10 errata-xmlrpc 2020-07-14 16:11:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2872


Note You need to log in before you can comment on or make changes to this bug.