Bug 2071114 - divergent etcd revisions go undetected
Summary: divergent etcd revisions go undetected
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: W. Trevor King
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-01 20:55 UTC by W. Trevor King
Modified: 2022-10-14 10:15 UTC (History)
33 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2068601
Environment:
Last Closed: 2022-09-08 14:07:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description W. Trevor King 2022-04-01 20:55:21 UTC
This series tracks alerting, which supports bug 2068601's init-time corruption detection, but:

* The initial corruption check keeps the corrupted member from coming up, preventing split-braining. Alerting just lets you know if something bad is happening; it doesn't block anything.
* The current initial corruption check seems to ignore large divergence [1]. Alerting will complain about any divergence, regardless of size.
* Adding a new alert PrometheusRule to existing clusters is an easier mitigating patch than adjusting the etcd-launching configuration.

[1]: https://github.com/etcd-io/etcd/issues/13766#issuecomment-1083033017


Note You need to log in before you can comment on or make changes to this bug.