Bug 1520794

Summary: etcdserver: mvcc: database space exceeded issue after upgrade from OCP 3.6 to 3.7
Product: OpenShift Container Platform Reporter: Miheer Salunke <misalunk>
Component: MasterAssignee: Stefan Schimanski <sttts>
Status: CLOSED NOTABUG QA Contact: Wang Haoran <haowang>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.7.0CC: aos-bugs, erich, jokerman, mfojtik, misalunk, mmccomas, rhowe, vwalek
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-01-08 22:30:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1514612    
Bug Blocks:    

Comment 4 Michal Fojtik 2017-12-05 09:33:50 UTC
Have you tried the defrag procedure for etcd? The procedure is roughly explained in this Github comment: https://github.com/kubernetes/kops/issues/4005#issuecomment-349048006

Comment 5 Ryan Howe 2017-12-05 16:48:07 UTC
Yes the following steps were run: 

~~~
# export ETCDCTL_API=3
# source /etc/etcd/etcd.conf
# rev=$(etcdctl3 --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS endpoint status --write-out="json" |  egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*' -m1)
# etcdctl --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS compact $rev
# etcdctl --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS defrag
# etcdctl3 --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS alarm disarm
~~~

I think the following issue is being hit: 

https://github.com/kubernetes/kubernetes/issues/45037
https://github.com/coreos/etcd/issues/8009
https://github.com/coreos/etcd/pull/8210

Comment 10 Eric Rich 2018-01-08 22:30:01 UTC
Ultimately this diagnosed to be a configuration issue! With one ETCD host not being configured with the recommended 4GB quota limit defaulting to 2GB quota limit. 

If this is still an issue please file a new BZ capturing the issue.

Comment 11 Red Hat Bugzilla 2023-09-15 00:05:30 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days