Bug 1799006

Summary: open wal error: wal: file not found
Product: Red Hat Enterprise Linux 7 Reporter: Siddhant More <simore>
Component: etcdAssignee: Jan Chaloupka <jchaloup>
Status: CLOSED ERRATA QA Contact: atomic-bugs <atomic-bugs>
Severity: high Docs Contact:
Priority: urgent    
Version: 7.8CC: alpatel, dornelas, jchaloup, leiwang, lsm5, mfojtik, mirollin, openshift-bugs-escalate, rbost, rjaiswal, rsunog, sbatsche, sburke, skolicha, weshen
Target Milestone: rcKeywords: Extras
Target Release: 7.8Flags: simore: needinfo-
sbatsche: needinfo-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: etcd-3.2.28-1.el7_8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1828433 (view as bug list) Environment:
Last Closed: 2020-05-12 19:50:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913, 1810426, 1828433    

Description Siddhant More 2020-02-06 13:10:25 UTC
Description of problem:

etcd pod into CrashLoopBackoff with the above error. Possible reason being data directory corrupted/deleted. However the wal files seem intact.   


Version-Release number of selected component (if applicable):
etcd: 3.2.26

How reproducible:
Happens randomly in the environment. Affecting the production env. 

Steps to Reproduce:
Unable to find a reproducer. We have enabled audit logs on the data directory to identify if any process is causing corruption to the data.  

Additional info:
Removing and adding back the failed existing etcd member solves it.
However, the Cu is pushing us to find the exact reason behind to mitigate the issue.

Comment 1 Sam Batschelet 2020-02-10 11:49:05 UTC
This bug was resolved upstream and is available via 3.2.28. We will work to get this available to 3.11.

[1] https://github.com/etcd-io/etcd/pull/11308
[2] https://github.com/etcd-io/etcd/blob/master/CHANGELOG-3.2.md

Comment 27 errata-xmlrpc 2020-05-12 19:50:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2115