Bug 1732583

Summary:	bootkube.sh is not re-entrant
Product:	OpenShift Container Platform	Reporter:	Eric Rich <erich>
Component:	Installer	Assignee:	Abhinav Dahiya <adahiya>
Installer sub component:	openshift-installer	QA Contact:	Johnny Liu <jialiu>
Status:	CLOSED NOTABUG	Docs Contact:
Severity:	high
Priority:	low	CC:	mat.sylvia, sdodson
Version:	4.1.z
Target Milestone:	---
Target Release:	4.3.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-09-30 17:20:49 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Eric Rich 2019-07-23 19:27:29 UTC

Description of problem: bootkube.sh is not re-entrant (and will not run properly) if it was stopped midway through running. 

> Jul 23 18:02:39 bootstrap systemd[1]: Stopped Bootstrap a Kubernetes cluster.
> Jul 23 18:02:46 bootstrap systemd[1]: Started Bootstrap a Kubernetes cluster.
> Jul 23 18:03:01 bootstrap bootkube.sh[8114]: Starting etcd certificate signer...
> Jul 23 18:03:02 bootstrap bootkube.sh[8114]: error creating container storage: the container name "etcd-signer" is already in use by "01b3ece5e73cee6a197fdc641cd362a05e40c26611a9b2a230700ab26e614af1". You have to remove that container to be able to reuse that name.: that name is already in use
> Jul 23 18:03:02 bootstrap bootkube.sh[8114]: 01b3ece5e73cee6a197fdc641cd362a05e40c26611a9b2a230700ab26e614af1
> Jul 23 18:03:02 bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=125/n/a
> Jul 23 18:03:02 bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.

Version-Release number of selected component (if applicable): 4.1.x 


How reproducible: 100% 


Steps to Reproduce:
1. Start UPI install 
2. SSH to bootstrap system
3. sudo systemctl stop bootkube.service 
4. ### Debug install issue
5. sudo systemctl start bootkube.service

Actual results: See error above. 

Expected results:

The bootkube.sh service should when started (clean up any previous invocations and restart) or (bypass things its already done - which it does in some cases). 

Alternatively a bootkube_cleanup.sh script should be provided to run inbetween stop and start issue (and should be messaged to a user should bootkube.sh fail). 

Additional info:

Comment 1 Abhinav Dahiya 2019-07-25 23:05:49 UTC

If the user wants to configure the bootstrap-host, the recommended mechanism is through ignition. But i think we will try to make bootkube re-entrant. updating priority because better method is available.

Comment 2 Eric Rich 2019-07-26 01:16:02 UTC

The reason for this needing to be re-entrant is not to make or adjust configurations. 

Often times it's to complete or continue a failed install. For example a miss-condigured load balancer. 

Pods will/can get out of sync and having a way to restart or continue the install process to complete an install over having to re-deploy a system can cave man hrs (given the installer only gives you ~24 hrs to complete the installer) the time pressure here can be too much for some customers.

Comment 3 Scott Dodson 2019-09-30 17:20:49 UTC

bootkube.sh retries over and over but this appears to require a change to the inputs, without a clear usecase closing this

Comment 4 MJS 2020-05-07 17:40:23 UTC

Hello, I am using openshift-installer v4.3.12 and encountering this issue. I attempted to try v4.3.18 version of the openshift-installer but the error persists were my "etcd-signer" container already exists by name and it causes the bootkube.sh service to crash (and loop).

I am unsure how to get around this problem, other than destroying my bootstrap machine all of my masters/workers and totally rebuilding which would be incredibly time consuming. 

I am installing OS v4.3.8 on bare-metal UPI. I tried several times to delete my installation directory and make fresh attempts with both the v4.3.12 and v4.3.18 installers.

Comment 5 Scott Dodson 2020-05-14 15:11:33 UTC

Please open a new bug with full details of what you're seeing including the log bundle from `openshift-install gather bootstrap`