Bug 1801237

Summary: etcd: raft can stop before purge loop exists resulting in wal: file not found
Product: OpenShift Container Platform Reporter: Sam Batschelet <sbatsche>
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: high Docs Contact:
Priority: high    
Version: 4.4CC: geliu
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1801196
: 1801379 1815634 1815646 (view as bug list) Environment:
Last Closed: 2020-05-13 21:57:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1801196    
Bug Blocks: 1801379, 1815634, 1815646    

Description Sam Batschelet 2020-02-10 13:42:15 UTC
+++ This bug was initially created as a clone of Bug #1801196 +++

Description of problem: In some circumstances, raft can stop before purge loop exists. Basically the result is that etcd can remove wal files that are still needed to replay state. So when etcd is restarted it will fail with a catastrophic error.

C | etcdserver: open wal error: wal: file not found.


https://github.com/etcd-io/etcd/pull/11308

Version-Release number of selected component (if applicable):


How reproducible: rare


Steps to Reproduce:
1.
2.
3.

Actual results: catastrophic error


Expected results: etcd does not fail with unrecoverable catastrophic error


Additional info:

Comment 2 ge liu 2020-03-09 08:22:48 UTC
No regression issue.

Comment 4 errata-xmlrpc 2020-05-13 21:57:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581