Bug 1731441

Summary: [DOCS] Openshift on AWS: Shutdown of running cluster and start after 1-2 days not working
Product: OpenShift Container Platform Reporter: szustkowski
Component: DocumentationAssignee: Kathryn Alexander <kalexand>
Status: CLOSED CURRENTRELEASE QA Contact: Gaoyun Pei <gpei>
Severity: low Docs Contact: Vikram Goyal <vigoyal>
Priority: low    
Version: 4.1.zCC: aos-bugs, clasohm, jokerman, mfojtik, mjahangi, mmccomas, rhowe, sjenning, sttts, wsun
Target Milestone: ---Keywords: Reopened
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-01 15:01:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description szustkowski 2019-07-19 12:28:01 UTC
Description of problem:

A running Openshift 4.1 cluster on AWS, whose EC2 instances are stopped for saving money during a test phase does not come up again after 1-2 days. 

Version-Release number of the following components:

./openshift-install v4.1.6-201907101224-dirty
built from commit e8e6d8998bed2087244a14be16185235b43d6407
release image quay.io/openshift-release-dev/ocp-release@sha256:aa955a9ec40e55e5d9c0203a995b398e8c1031473dae24ed405efe9a95b43186


How reproducible:
Reproducible at least 2 times

Steps to Reproduce:
1. Install OpenShift 4.1 on installer-provisioned AWS, without a installer-config file
2. Stop all EC2 instances
3. Start them again immediately
4. Try to access the cluster: It works
5. Stop them again
6. Wait for 2 days
7. Start them again
8. Give them enough time to boot up: Around 30 minutes
9. Try to access the cluster: Doesn't work anymore

Actual results:
Open the Cluster Web Console in browser: Chrome shows ERR_CONNECTION_CLOSED

Expected results: The cluster is full operational, the Web frontend is accessible. 

Additional info:
Possibly related to https://github.com/openshift/installer/issues/818. Maybe it's an AWS infrastructure issue, tho. However, in this case the openshift-installer should install the cluster in such a way that it survives restarts even if it is installed by an AWS noob.

Comment 3 Seth Jennings 2019-07-29 19:18:03 UTC
This is a documented requirement of the new product architecture, that you must keep the cluster running for at least 24h so the components can rotate to their non-installation certificates
https://docs.openshift.com/container-platform/4.1/installing/installing_bare_metal/installing-bare-metal.html#installation-generate-ignition-configs_installing-bare-metal

Comment 4 Ryan Howe 2019-07-30 20:24:39 UTC
We need to have the same thing for AWS IPI install, stating that after the install the cluster must be up for 24 hrs.

Comment 5 Ryan Howe 2019-07-30 20:27:12 UTC
This warning needs to be added every type of installs: 
  
    https://docs.openshift.com/container-platform/4.1/installing/*

Comment 7 Kathryn Alexander 2019-10-17 16:02:50 UTC
The PR to add the note to all of the installation assemblies is here: https://github.com/openshift/openshift-docs/pull/17424

@Gaoyun Pei, will you PTAL?

Comment 10 Gaoyun Pei 2019-10-21 08:24:47 UTC
PR https://github.com/openshift/openshift-docs/pull/17424 lgtm.

Comment 12 Kathryn Alexander 2019-10-21 13:53:30 UTC
I've merged the change and am waiting for it to go live.