Created attachment 1559333 [details]
etcd member logs
Description of problem:
We ran a scale test which created around 9500 namespaces with bunch of objects on a 250 nodes cluster and found out that etcd was complaining about exceeding database space and raised an alarm and put the cluster into a maintenance mode. It was only accepting key reads and deletes. We had to run defragmentation to release the compacted space for the DB to use it and disable the alarm to get the cluster back to functional state.
Version-Release number of selected component (if applicable):
Etcd Version: 3.3.10
OCP: 4.1 beta4/4.1.0-0.nightly-2019-04-22-005054
We encountered this issue for the first time but I think this can can be easily reproduced if the etcd defragmentaion is not done regularly and if space quota is not enough for a large scale cluster with lot of objects running.
Steps to Reproduce:
1. Install a large scale cluster using the default space quotas for etcd.
2. Load the cluster with bunch of objects.
3. Check the etcd component status, endpoints and alarm status.
- etcd component status was unhealthy
- etcd server got overloaded
- controller and etcd logs reported - "etcdserver: mvcc: database space exceeded". The DB size was 2.2G when we hit the issue.
- The default space quota should be at least 4GB, the cluster should be functional.
- Components including etcd reporting healthy state.
- Prometheus rule to alert users to run defragmention when needed.
Created attachment 1559345 [details]
Is there any technical issue with making the default etcd DB size large enough to handle the largest cluster that we support? This way we never have to tune it?
> Is there any technical issue with making the default etcd DB size large enough to handle the largest cluster that we support? This way we never have to tune it?
Yes, the main issue is the db size that the cluster can support is greatly dependent on not only workload/number of nodes but hardware. Speed of disks, dedicated vs colocated data-dir partitions, RAM/CPU procs all play a part of this equation as well. While hitting alarm is not what we want we also don't want a customer with a 8GB etcd db that we can never stabilize without new hardware. So we are not yet in a set it and forget it situation. At scale, we will need to properly tune to optimize performance.
I feel setting a sane default such as 4GB is reasonable for now. In the future, we can look to auto-tune based on ENV etc. We need more tools to help folks maintain etcd, that should improve. If we explicitly know all variables of deployment hardware we can revisit default based on common deployments.
Based on long comments list, this issue is not a simple one, and perhaps we may open a RFE to discuss deeply if necessary, currently, the etcd space quota is 4GB.
Verified with Beta 5 Final Build(4.1.0-rc.1),