Bug 2090628

Summary: sometimes a voting member is not added to the etcd-endpoints configmap
Product: OpenShift Container Platform Reporter: Lukasz Szaszkiewicz <lszaszki>
Component: EtcdAssignee: Allen Ray <alray>
Status: CLOSED DUPLICATE QA Contact: ge liu <geliu>
Severity: high Docs Contact:
Priority: high    
Version: 4.11CC: alray, dwest, emoss, htariq, melbeher, tjungblu, wking, wlewis
Target Milestone: ---Flags: alray: needinfo-
alray: needinfo-
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-20 11:22:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lukasz Szaszkiewicz 2022-05-26 07:29:55 UTC
I don't know how common is the issue. It was captured by the scaling test we added. A newly created machine was promoted to a voting member by never made it to the etcd-endpoints configmap.

Timeline:

At 19:37:48: CEO successfully promoted learner member https://10.0.0.7:2380
At ~19:42:33 newly promoted member (ID: 9fc4382989977f7e) was elected as a leader at term 8
At ~19.44:22 the test deleted the machine

The machine was never deleted because the removal controller reads data from the etcd-endpoints configmap which indicated no excessive machines.


Link to CI run: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.11-e2e-gcp-fips-serial/1527713140295340032

Comment 6 Allen Ray 2022-06-22 13:52:44 UTC
After discussing with @tjungblu and @htariq, we decided that this shouldn't be a blocker+ because it isn't perma-failing, currently doesn't have a reproducer, and haven't heard anything from the field.