Bug 1732583

Summary: bootkube.sh is not re-entrant
Product: OpenShift Container Platform Reporter: Eric Rich <erich>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Johnny Liu <jialiu>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: low CC: mat.sylvia, sdodson
Version: 4.1.z   
Target Milestone: ---   
Target Release: 4.3.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-30 17:20:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eric Rich 2019-07-23 19:27:29 UTC
Description of problem: bootkube.sh is not re-entrant (and will not run properly) if it was stopped midway through running. 

> Jul 23 18:02:39 bootstrap systemd[1]: Stopped Bootstrap a Kubernetes cluster.
> Jul 23 18:02:46 bootstrap systemd[1]: Started Bootstrap a Kubernetes cluster.
> Jul 23 18:03:01 bootstrap bootkube.sh[8114]: Starting etcd certificate signer...
> Jul 23 18:03:02 bootstrap bootkube.sh[8114]: error creating container storage: the container name "etcd-signer" is already in use by "01b3ece5e73cee6a197fdc641cd362a05e40c26611a9b2a230700ab26e614af1". You have to remove that container to be able to reuse that name.: that name is already in use
> Jul 23 18:03:02 bootstrap bootkube.sh[8114]: 01b3ece5e73cee6a197fdc641cd362a05e40c26611a9b2a230700ab26e614af1
> Jul 23 18:03:02 bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=125/n/a
> Jul 23 18:03:02 bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.

Version-Release number of selected component (if applicable): 4.1.x 


How reproducible: 100% 


Steps to Reproduce:
1. Start UPI install 
2. SSH to bootstrap system
3. sudo systemctl stop bootkube.service 
4. ### Debug install issue
5. sudo systemctl start bootkube.service

Actual results: See error above. 

Expected results:

The bootkube.sh service should when started (clean up any previous invocations and restart) or (bypass things its already done - which it does in some cases). 

Alternatively a bootkube_cleanup.sh script should be provided to run inbetween stop and start issue (and should be messaged to a user should bootkube.sh fail). 

Additional info:

Comment 1 Abhinav Dahiya 2019-07-25 23:05:49 UTC
If the user wants to configure the bootstrap-host, the recommended mechanism is through ignition. But i think we will try to make bootkube re-entrant. updating priority because better method is available.

Comment 2 Eric Rich 2019-07-26 01:16:02 UTC
The reason for this needing to be re-entrant is not to make or adjust configurations. 

Often times it's to complete or continue a failed install. For example a miss-condigured load balancer. 

Pods will/can get out of sync and having a way to restart or continue the install process to complete an install over having to re-deploy a system can cave man hrs (given the installer only gives you ~24 hrs to complete the installer) the time pressure here can be too much for some customers.

Comment 3 Scott Dodson 2019-09-30 17:20:49 UTC
bootkube.sh retries over and over but this appears to require a change to the inputs, without a clear usecase closing this

Comment 4 MJS 2020-05-07 17:40:23 UTC
Hello, I am using openshift-installer v4.3.12 and encountering this issue. I attempted to try v4.3.18 version of the openshift-installer but the error persists were my "etcd-signer" container already exists by name and it causes the bootkube.sh service to crash (and loop).

I am unsure how to get around this problem, other than destroying my bootstrap machine all of my masters/workers and totally rebuilding which would be incredibly time consuming. 

I am installing OS v4.3.8 on bare-metal UPI. I tried several times to delete my installation directory and make fresh attempts with both the v4.3.12 and v4.3.18 installers.

Comment 5 Scott Dodson 2020-05-14 15:11:33 UTC
Please open a new bug with full details of what you're seeing including the log bundle from `openshift-install gather bootstrap`