Bug 1983131
Summary: | Defragmenting an etcd member doesn't reduce the DB size (7.5GB) on a setup with ~1000 spoke clusters | ||
---|---|---|---|
Product: | Red Hat Advanced Cluster Management for Kubernetes | Reporter: | Alexander Chuzhoy <sasha> |
Component: | Cluster Lifecycle | Assignee: | Jian Qiu <jqiu> |
Status: | CLOSED ERRATA | QA Contact: | Derek Ho <dho> |
Severity: | high | Docs Contact: | Christopher Dawson <cdawson> |
Priority: | high | ||
Version: | rhacm-2.3 | CC: | geliu, hanzhang, keyoung, leyan, mdelder, ohadlevy, ohochman, rfreiman, sbatsche, yuhe |
Target Milestone: | --- | Keywords: | TestBlocker |
Target Release: | rhacm-2.3 | Flags: | ming:
rhacm-2.3+
|
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | EmergencyRequest | ||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-08-06 00:52:39 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Alexander Chuzhoy
2021-07-16 15:23:33 UTC
Running the following commands helped reducing the size: sh-4.4#etcdctl compact $(etcdctl endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*' -m1) compacted revision 158774421 sh-4.4# etcdctl endpoint status -w json [{"Endpoint":"127.0.0.1:2379","Status":{"header":{"cluster_id":15048564470140409604,"member_id":14288348196171437800,"revision":158774421,"raft_term":16},"version":"3.4.14","dbSize":7475937280,"leader":8730137547783813330,"raftIndex":171897193,"raftTerm":16,"raftAppliedIndex":171897193,"errors":["memberID:14288348196171437800 alarm:NOSPACE ","memberID:8730137547783813330 alarm:NOSPACE ","memberID:7377309355919782153 alarm:NOSPACE "],"dbSizeInUse":826146816}}] sh-4.4# etcdctl --command-timeout=120s --endpoints=https://localhost:2379 defrag Finished defragmenting etcd member[https://localhost:2379] sh-4.4# etcdctl endpoint status -w table +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+ | 127.0.0.1:2379 | c64a68fb43e456e8 | 3.4.14 | 577 MB | false | false | 16 | 171903945 | 171903945 | memberID:14288348196171437800 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:8730137547783813330 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:7377309355919782153 | | | | | | | | | | | alarm:NOSPACE | +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+ sh-4.4# ** A NOTE ABOUT USING URGENT ** This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold. Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility. NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity. ** A NOTE ABOUT USING URGENT ** This BZ has been set to urgent severity and priority. When a BZ is marked urgent priority Engineers are asked to stop whatever they are doing, putting everything else on hold. Please be prepared to have reasonable justification ready to discuss, and ensure your own and engineering management are aware and agree this BZ is urgent. Keep in mind, urgent bugs are very expensive and have maximal management visibility. NOTE: This bug was automatically assigned to an engineering manager with the severity reset to *unspecified* until the emergency is vetted and confirmed. Please do not manually override the severity. after review I found that a surge in manifestworks objects (over 2k per second) put the etcd store over the `ETCD_QUOTA_BACKEND_BYTES` 7GB in under 10m. Because many of these resources were actually mutations on existing keys the mitigation for growth would be compaction. As compaction currently runs every 10m it was a perfect storm. While I think its important to understand the huge growth of manifestworks objects and improve our ability to compact the store without dependencies. I think this would be a very low probability failure case so marking it a non-blocker to 4.8. etcd team will outline a more complete RCA soon. Moving to ACM team to triage state growth. After defragmenting all etcd instances and disarming the alarms the issue went away. Then added in bulk 10 snos. The DB size jumped to: sh-4.4# etcdctl endpoint status -w table +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://[2000::ace1]:2379 | 7927add86ff438d2 | 3.4.14 | 4.3 GB | false | false | 17 | 191609426 | 191609426 | | | https://[2000::ace3]:2379 | c64a68fb43e456e8 | 3.4.14 | 4.3 GB | true | false | 17 | 191609428 | 191609428 | | | https://[2000::ace5]:2379 | 666177c5c5c3b109 | 3.4.14 | 4.3 GB | false | false | 17 | 191609431 | 191609431 | | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ Then I edited the cluster's proxy (oc edit proxy cluster) to exclude an additional subnet from being proxied. machineconfig update was triggered and the nodes got rebooted. One master doesn't return: [root@e24-h01-000-r640 f29-h01-000-r640]# oc get node NAME STATUS ROLES AGE VERSION e24-h02-000-r640.rdu2.scalelab.redhat.com NotReady,SchedulingDisabled master 20d v1.21.0-rc.0+766a5fe e24-h03-000-r640.rdu2.scalelab.redhat.com Ready worker 20d v1.21.0-rc.0+766a5fe e24-h05-000-r640.rdu2.scalelab.redhat.com Ready master 20d v1.21.0-rc.0+766a5fe e24-h06-000-r640.rdu2.scalelab.redhat.com Ready worker 20d v1.21.0-rc.0+766a5fe e24-h07-000-r640.rdu2.scalelab.redhat.com Ready master 20d v1.21.0-rc.0+766a5fe and the issue is back: +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+ | https://[2000::ace5]:2379 | 666177c5c5c3b109 | 3.4.14 | 7.5 GB | false | false | 20 | 195391046 | 195391046 | memberID:7377309355919782153 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:14288348196171437800 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:8730137547783813330 | | | | | | | | | | | alarm:NOSPACE | | https://[2000::ace1]:2379 | 7927add86ff438d2 | 3.4.14 | 7.5 GB | false | false | 20 | 195391348 | 195391348 | memberID:7377309355919782153 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:14288348196171437800 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:8730137547783813330 | | | | | | | | | | | alarm:NOSPACE | | https://[2000::ace3]:2379 | c64a68fb43e456e8 | 3.4.14 | 7.5 GB | true | false | 20 | 195391594 | 195391594 | memberID:8730137547783813330 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:7377309355919782153 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:14288348196171437800 | | | | | | | | | | | alarm:NOSPACE | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+ sh-4.4# etcdctl endpoint status -w json [{"Endpoint":"https://[2000::ace1]:2379","Status":{"header":{"cluster_id":15048564470140409604,"member_id":8730137547783813330,"revision":163718978,"raft_term":20},"version":"3.4.14","dbSize":7515844608,"leader":14288348196171437800,"raftIndex":195412664,"raftTerm":20,"raftAppliedIndex":195412664,"errors":["memberID:8730137547783813330 alarm:NOSPACE ","memberID:7377309355919782153 alarm:NOSPACE ","memberID:14288348196171437800 alarm:NOSPACE "],"dbSizeInUse":7510331392}},{"Endpoint":"https://[2000::ace3]:2379","Status":{"header":{"cluster_id":15048564470140409604,"member_id":14288348196171437800,"revision":163718978,"raft_term":20},"version":"3.4.14","dbSize":7515844608,"leader":14288348196171437800,"raftIndex":195412664,"raftTerm":20,"raftAppliedIndex":195412664,"errors":["memberID:7377309355919782153 alarm:NOSPACE ","memberID:14288348196171437800 alarm:NOSPACE ","memberID:8730137547783813330 alarm:NOSPACE "],"dbSizeInUse":7510331392}},{"Endpoint":"https://[2000::ace5]:2379","Status":{"header":{"cluster_id":15048564470140409604,"member_id":7377309355919782153,"revision":163718978,"raft_term":20},"version":"3.4.14","dbSize":7516975104,"leader":14288348196171437800,"raftIndex":195412664,"raftTerm":20,"raftAppliedIndex":195412664,"errors":["memberID:7377309355919782153 alarm:NOSPACE ","memberID:14288348196171437800 alarm:NOSPACE ","memberID:8730137547783813330 alarm:NOSPACE "],"dbSizeInUse":7510298624}}] sh-4.4# after defragmenting the db,noticed that the size gets bigger without any changes on my end. in ~20 mins the size was already: +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://[2000::ace5]:2379 | 666177c5c5c3b109 | 3.4.14 | 7.0 GB | false | false | 20 | 196212752 | 196212752 | | | https://[2000::ace1]:2379 | 7927add86ff438d2 | 3.4.14 | 7.1 GB | false | false | 20 | 196217519 | 196217479 | | | https://[2000::ace3]:2379 | c64a68fb43e456e8 | 3.4.14 | 7.1 GB | true | false | 20 | 196222207 | 196222170 | | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ and the issue is back: +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+ | https://[2000::ace5]:2379 | 666177c5c5c3b109 | 3.4.14 | 7.5 GB | false | false | 20 | 196469377 | 196469377 | memberID:7377309355919782153 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:8730137547783813330 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:14288348196171437800 | | | | | | | | | | | alarm:NOSPACE | | https://[2000::ace1]:2379 | 7927add86ff438d2 | 3.4.14 | 7.5 GB | false | false | 20 | 196469630 | 196469630 | memberID:7377309355919782153 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:8730137547783813330 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:14288348196171437800 | | | | | | | | | | | alarm:NOSPACE | | https://[2000::ace3]:2379 | c64a68fb43e456e8 | 3.4.14 | 7.5 GB | true | false | 20 | 196469893 | 196469893 | memberID:7377309355919782153 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:8730137547783813330 | | | | | | | | | | | alarm:NOSPACE , | | | | | | | | | | | memberID:14288348196171437800 | | | | | | | | | | | alarm:NOSPACE | +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+ (In reply to Sam Batschelet from comment #4) > after review I found that a surge in manifestworks objects (over 2k per > second) put the etcd store over the `ETCD_QUOTA_BACKEND_BYTES` 7GB in under > 10m. Because many of these resources were actually mutations on existing > keys the mitigation for growth would be compaction. As compaction currently > runs every 10m it was a perfect storm. While I think its important to > understand the huge growth of manifestworks objects and improve our ability > to compact the store without dependencies. I think this would be a very low > probability failure case so marking it a non-blocker to 4.8. etcd team will > outline a more complete RCA soon. I'm not sure it should be considered a blocker for the OCP4.8 etcd side, but I still think it should be tracked as a blocker for Telco release, specifically around the SE for VZ in 4.8. - This issued has been fixed. The root cause is Manifestworks containing two versions of the same CRD will result in update conflicts and trigger extra resync. That will create additional requests to apiserver. More details can be found: https://github.com/open-cluster-management/backlog/issues/14345. Verified with ACM 2.3.0-DOWNSTREAM-2021-07-26-18-43-26. With 2k SNOs, we applied machineconfig to let hub restart, and we are not seeing increasement of manifestwork update requests, and etcd db size doesn't grow to 7.5G. We have 2 update events of machineconfig, the first one we did see around 300MB etcd db size growth, and it can be related to the machineconfig we applied updated some of the SNOs' status. The second machineconfig update happened one day after the first machineconfig update, and we didn't see any etcd db size growth. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Advanced Cluster Management for Kubernetes version 2.3), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3016 |