1732583 – bootkube.sh is not re-entrant

Bug 1732583 - bootkube.sh is not re-entrant

Summary: bootkube.sh is not re-entrant

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.1.z
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	high
Target Milestone:	---
Target Release:	4.3.0
Assignee:	Abhinav Dahiya
QA Contact:	Johnny Liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-07-23 19:27 UTC by Eric Rich
Modified:	2020-05-14 15:11 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-09-30 17:20:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Eric Rich 2019-07-23 19:27:29 UTC

Description of problem: bootkube.sh is not re-entrant (and will not run properly) if it was stopped midway through running. 

> Jul 23 18:02:39 bootstrap systemd[1]: Stopped Bootstrap a Kubernetes cluster.
> Jul 23 18:02:46 bootstrap systemd[1]: Started Bootstrap a Kubernetes cluster.
> Jul 23 18:03:01 bootstrap bootkube.sh[8114]: Starting etcd certificate signer...
> Jul 23 18:03:02 bootstrap bootkube.sh[8114]: error creating container storage: the container name "etcd-signer" is already in use by "01b3ece5e73cee6a197fdc641cd362a05e40c26611a9b2a230700ab26e614af1". You have to remove that container to be able to reuse that name.: that name is already in use
> Jul 23 18:03:02 bootstrap bootkube.sh[8114]: 01b3ece5e73cee6a197fdc641cd362a05e40c26611a9b2a230700ab26e614af1
> Jul 23 18:03:02 bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=125/n/a
> Jul 23 18:03:02 bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.

Version-Release number of selected component (if applicable): 4.1.x 


How reproducible: 100% 


Steps to Reproduce:
1. Start UPI install 
2. SSH to bootstrap system
3. sudo systemctl stop bootkube.service 
4. ### Debug install issue
5. sudo systemctl start bootkube.service

Actual results: See error above. 

Expected results:

The bootkube.sh service should when started (clean up any previous invocations and restart) or (bypass things its already done - which it does in some cases). 

Alternatively a bootkube_cleanup.sh script should be provided to run inbetween stop and start issue (and should be messaged to a user should bootkube.sh fail). 

Additional info:

Comment 1 Abhinav Dahiya 2019-07-25 23:05:49 UTC

If the user wants to configure the bootstrap-host, the recommended mechanism is through ignition. But i think we will try to make bootkube re-entrant. updating priority because better method is available.

Comment 2 Eric Rich 2019-07-26 01:16:02 UTC

The reason for this needing to be re-entrant is not to make or adjust configurations. 

Often times it's to complete or continue a failed install. For example a miss-condigured load balancer. 

Pods will/can get out of sync and having a way to restart or continue the install process to complete an install over having to re-deploy a system can cave man hrs (given the installer only gives you ~24 hrs to complete the installer) the time pressure here can be too much for some customers.

Comment 3 Scott Dodson 2019-09-30 17:20:49 UTC

bootkube.sh retries over and over but this appears to require a change to the inputs, without a clear usecase closing this

Comment 4 MJS 2020-05-07 17:40:23 UTC

Hello, I am using openshift-installer v4.3.12 and encountering this issue. I attempted to try v4.3.18 version of the openshift-installer but the error persists were my "etcd-signer" container already exists by name and it causes the bootkube.sh service to crash (and loop).

I am unsure how to get around this problem, other than destroying my bootstrap machine all of my masters/workers and totally rebuilding which would be incredibly time consuming. 

I am installing OS v4.3.8 on bare-metal UPI. I tried several times to delete my installation directory and make fresh attempts with both the v4.3.12 and v4.3.18 installers.

Comment 5 Scott Dodson 2020-05-14 15:11:33 UTC

Please open a new bug with full details of what you're seeing including the log bundle from `openshift-install gather bootstrap`

Note You need to log in before you can comment on or make changes to this bug.