Bug 1279546 - [DOCS] Recovering after a failed install steps
Summary: [DOCS] Recovering after a failed install steps
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Ashley Hardin
QA Contact: Ma xiaoqiang
Vikram Goyal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-09 17:12 UTC by Ryan Howe
Modified: 2016-07-04 00:46 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-17 03:03:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ryan Howe 2015-11-09 17:12:54 UTC
Document URL: https://docs.openshift.com/enterprise/3.0/install_config/install/advanced_install.html


Section Number and Name: quick and advanced installer 

Describe the issue: 
- When the install fails no where in the docs does it say what should be done to remove the comments that were configured and start over. In known issues it covers vaguely some steps to remove the components. 

Suggestions for improvement: 
- Move the known issues infromation into its own section on recovering from a failed install. 
- Can we get steps that outline recovering from a failed HA and Regular install 
- Can we get information on rerunning the ansible installer, what is changed when this is done. what steps are skipped for the most part. 


Additional information: 

Maybe add where the logs get stored for the install, and how to make ansible more verbose to troubleshoot where the installer is failing.

Comment 3 Ashley Hardin 2016-05-31 20:50:53 UTC
@Jason.
This BZ was filed a while ago, for 3.0, and some bits have since been updated. For example, the last line in this section points users to the Known Issues section for specific workarounds:

https://docs.openshift.com/enterprise/latest/install_config/install/advanced_install.html#running-the-advanced-installation

The Known Issues section has some specifics about multiple master setups.

Should I add a line stating that the installer is safe to re-run again if it fails? What do you think? Are you able to offer guidance, or point me in the right direction? Thanks!

Comment 4 Jason DeTiberus 2016-06-06 16:43:42 UTC
So, multiple masters is greatly simplified since 3.0. There are still issues if certificates have already been generated during the install..  Tagging in Andrew to provide further info.

Comment 5 Andrew Butcher 2016-06-06 17:24:15 UTC
In most cases it's safe to re-run the installer but we can't guarantee that will work since certificates may be out of sync (depending on how the install failed). The best guidance is probably to run the uninstall playbook and retry.

@Jason, this lines up with the questions you had last week about what we're structuring the uninstall playbook as. Thoughts?

Comment 6 Jason DeTiberus 2016-06-09 14:29:32 UTC
Ashley, Andrew: I think if we can document which variables impact certificate creation/validity, then we can tell the users that it is safe to just re-run the installer/playbooks *if* those variables weren't changed to address failures in the installer. If they did change one of the listed variables, then we could tell them to re-run the uninstallat playbook before re-running the install.

That said, maybe it is easier to just say run the uninstall playbook every time.

Of course, all of this becomes moot once we have the certificate regeneration in place.

Comment 7 Ashley Hardin 2016-06-13 14:39:56 UTC
Andrew- what are your thoughts? I think you had mentioned to me last week that you had concerns about advertising the uninstall playbook as a failed install recovery tool. Is that correct?

Comment 8 Andrew Butcher 2016-06-13 14:47:37 UTC
Correct. There are issues with the uninstall playbook that we would need to resolve before documenting it as failed install recovery method.

Comment 9 Vikram Goyal 2016-06-15 03:18:17 UTC
(In reply to Andrew Butcher from comment #8)
> Correct. There are issues with the uninstall playbook that we would need to
> resolve before documenting it as failed install recovery method.

Andrew - in that case, I would suggest that we close this bug as CANTFIX or INSUFFICIENT_DATA? Please let me know.

Comment 10 Vikram Goyal 2016-06-17 03:03:26 UTC
Confirmed by Andrew Butcher over email that this can be CLOSED.


Note You need to log in before you can comment on or make changes to this bug.