Red Hat Bugzilla – Bug 1279546
[DOCS] Recovering after a failed install steps
Last modified: 2016-07-03 20:46:31 EDT
Document URL: https://docs.openshift.com/enterprise/3.0/install_config/install/advanced_install.html
Section Number and Name: quick and advanced installer
Describe the issue:
- When the install fails no where in the docs does it say what should be done to remove the comments that were configured and start over. In known issues it covers vaguely some steps to remove the components.
Suggestions for improvement:
- Move the known issues infromation into its own section on recovering from a failed install.
- Can we get steps that outline recovering from a failed HA and Regular install
- Can we get information on rerunning the ansible installer, what is changed when this is done. what steps are skipped for the most part.
Maybe add where the logs get stored for the install, and how to make ansible more verbose to troubleshoot where the installer is failing.
This BZ was filed a while ago, for 3.0, and some bits have since been updated. For example, the last line in this section points users to the Known Issues section for specific workarounds:
The Known Issues section has some specifics about multiple master setups.
Should I add a line stating that the installer is safe to re-run again if it fails? What do you think? Are you able to offer guidance, or point me in the right direction? Thanks!
So, multiple masters is greatly simplified since 3.0. There are still issues if certificates have already been generated during the install.. Tagging in Andrew to provide further info.
In most cases it's safe to re-run the installer but we can't guarantee that will work since certificates may be out of sync (depending on how the install failed). The best guidance is probably to run the uninstall playbook and retry.
@Jason, this lines up with the questions you had last week about what we're structuring the uninstall playbook as. Thoughts?
Ashley, Andrew: I think if we can document which variables impact certificate creation/validity, then we can tell the users that it is safe to just re-run the installer/playbooks *if* those variables weren't changed to address failures in the installer. If they did change one of the listed variables, then we could tell them to re-run the uninstallat playbook before re-running the install.
That said, maybe it is easier to just say run the uninstall playbook every time.
Of course, all of this becomes moot once we have the certificate regeneration in place.
Andrew- what are your thoughts? I think you had mentioned to me last week that you had concerns about advertising the uninstall playbook as a failed install recovery tool. Is that correct?
Correct. There are issues with the uninstall playbook that we would need to resolve before documenting it as failed install recovery method.
(In reply to Andrew Butcher from comment #8)
> Correct. There are issues with the uninstall playbook that we would need to
> resolve before documenting it as failed install recovery method.
Andrew - in that case, I would suggest that we close this bug as CANTFIX or INSUFFICIENT_DATA? Please let me know.
Confirmed by Andrew Butcher over email that this can be CLOSED.