Bug 1819907
Summary: | etcdserver timeouts and degraded cluster and after network partition on Azure | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ross Brattain <rbrattai> |
Component: | Etcd | Assignee: | Sam Batschelet <sbatsche> |
Status: | CLOSED NOTABUG | QA Contact: | ge liu <geliu> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.4 | CC: | anusaxen, danw, dcbw, sbatsche, skolicha |
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-05-20 14:06:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ross Brattain
2020-04-01 20:29:23 UTC
> Steps to Reproduce:
> 1. identify OVNKubernetes leader
That part seems irrelevant? You're blocking *all* traffic to one of the masters, and then seeing etcd problems. Nothing to do with OVN-Kubernetes...
(But maybe that's why you reproduced the problem on Azure and GCP but not AWS? Maybe the ovn-kube leader happened to be on the same master as the etcd leader when you tested on Azure and GCP, but not when you tested on AWS.)
>>Maybe the ovn-kube leader happened to be on the same master as the etcd leader
hmm, `oc exec <etcd-master-pod> -n openshift-etcd -- etcdctl endpoint status` might shed more light on this
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity. If you have further information on the current state of the bug, please update it, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. The choice of network plugin has no influence on etcd-to-etcd or apiserver-to-etcd traffic. If you are confident that etcd does not break in a real network partition then I would say to just close this bug; my guess would be that the command the OP was using to simulate a network partition was incorrect and had unexpected additional side effects. closing per https://bugzilla.redhat.com/show_bug.cgi?id=1819907#c8 We are working on a periodic networking partition test for the cluster but etcd is partition tolerant and tested extensively upstream by etcd CI. |