Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1614479

Summary: Stopping master host causes another master controller to restart
Product: OpenShift Container Platform Reporter: Vikas Laad <vlaad>
Component: kube-apiserverAssignee: Stefan Schimanski <sttts>
Status: CLOSED WONTFIX QA Contact: Xingxing Xia <xxia>
Severity: low Docs Contact:
Priority: low    
Version: 3.11.0CC: aos-bugs, jokerman, mfojtik, mifiedle, mmccomas, yinzhou
Target Milestone: ---Keywords: NeedsTestCase
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-05 15:29:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
master controller logs none

Description Vikas Laad 2018-08-09 17:20:52 UTC
Description of problem:
On a HA cluster if I stop one of the master instances sometimes it causes other master controller pod to restart. I think it happens because master api on that node also goes down and it restarts the other controller due to that. I think there should be retry so that it connects to other api and should not restart. Here is the error in controller log.

E0809 15:38:23.099658       1 controllermanager.go:480] Error starting "resourcequota"
F0809 15:38:23.099677       1 controllermanager.go:185] error starting controllers: failed to discover resources: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1beta1: the server is currently unable to handle the request


Version-Release number of selected component (if applicable):
openshift v3.11.0-0.11.0

Steps to Reproduce:
1. Create HA cluster with 3 masters
2. Shutdown/Stop of the master host
3. watch other pods in kube-system project

Actual results:
One of the other 2 masters also get restarted.

Expected results:
No other master should restart.

Additional info:
Attaching full logs from the master controller container which restarted.

Comment 1 Vikas Laad 2018-08-09 17:23:01 UTC
Created attachment 1474781 [details]
master controller logs

Comment 2 zhou ying 2018-08-10 09:23:55 UTC
I could not reproduce the issue with my HA env on aws. 

But, We have met the same error when tested the destructive case with low memory instance. After change to large instance , then restart the master works.