Bug 1373660 - [DOCS] [Supportability] No diagnostics documentation
Summary: [DOCS] [Supportability] No diagnostics documentation
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 3.2.1
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Ashley Hardin
QA Contact: Vikram Goyal
Vikram Goyal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-06 21:55 UTC by Eric Rich
Modified: 2016-10-18 15:14 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-18 15:14:55 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Eric Rich 2016-09-06 21:55:42 UTC
Document URL: Need New Topic

Section Number and Name: Need New Topic

Describe the issue: 

In 2 different releases we have notes on a diagnostics tool. 

https://docs.openshift.com/enterprise/3.1/release_notes/ose_3_1_release_notes.html#ose-3-1-1-enhancements
https://docs.openshift.com/enterprise/3.2/release_notes/ose_3_2_release_notes.html#ose-32-configuration-and-administration

Suggestions for improvement: 

However we do not document how to use it and what all it test, or helps a user validate. 

Additional information:

Comment 5 Luke Meyer 2016-09-20 21:02:28 UTC
> -What it is used for

Provides diagnostics to verify that OpenShift hosts and systems are working as intended.

> -What it tests and/or validates

* Verify the default registry and router are running and correctly configured.
* Check ClusterRoleBindings and ClusterRoles for consistency with base policy.
* Check that all of the client configuration contexts are valid and can be connected to.
* Check that SkyDNS is working properly and pods have SDN connectivity.
* Validate master and node configuration on the host.
* Check that nodes are running and available.
* Analyze host logs for known errors.
* Check that systemd units are configured as expected for the host.

If you want to just summarize that feel free. Hopefully it will see expansion in the future.

> -How to use it

Run, preferably on a master host and as cluster-admin:

`oc adm diagnostics`

This will run all available diagnostics (skipping any that don't apply; for instance the NodeConfigCheck does not run unless a node config is available). Having done that, you can run specific diagnostics by name as you work to address issues:

`oc adm diagnostics NodeConfigCheck UnitStatus`

Comment 6 Ashley Hardin 2016-09-21 16:55:10 UTC
Work in progress: https://github.com/openshift/openshift-docs/pull/2885

Comment 7 Eric Rich 2016-09-21 17:22:59 UTC
(In reply to Luke Meyer from comment #5)

The details associated with this request are lacking. Mainly focused on: 

(In reply to Luke Meyer from comment #5)

> > -What it tests and/or validates
> 
> * Verify the default registry and router are running and correctly
> configured.
> * Check ClusterRoleBindings and ClusterRoles for consistency with base
> policy.
> * Check that all of the client configuration contexts are valid and can be
> connected to.
> * Check that SkyDNS is working properly and pods have SDN connectivity.
> * Validate master and node configuration on the host.
> * Check that nodes are running and available.
> * Analyze host logs for known errors.
> * Check that systemd units are configured as expected for the host.

What are some error conditions (failures) that this tool can catch or help spot, and what are some potential resolutions (is there a way to link to https://access.redhat.com/node/1599603 or other articles)? 

I wonder how we can tie these "docs" to https://bugzilla.redhat.com/show_bug.cgi?id=1259118 and the articles we have on troubleshooting.

Comment 8 Luke Meyer 2016-09-21 18:20:50 UTC
(In reply to Eric Rich from comment #7)

> What are some error conditions (failures) that this tool can catch or help
> spot, and what are some potential resolutions (is there a way to link to
> https://access.redhat.com/node/1599603 or other articles)?

The idea is to check for mistakes people have made in the past and give at least some idea what is going on, or even how to solve it. I would not say it does a great job of that yet, but it hasn't received much priority, so you guys could help by pressing for things your customers have done that you'd like a diagnostic to detect. They could certainly refer to articles or docs for resolution, as long as the URLs are stable and preferably public (so Origin users don't get sent to a solution they can't read - or maybe you want that).

How specific do you want the docs to be about what this checks?

Comment 9 Ashley Hardin 2016-10-06 18:48:43 UTC
Luke and Eric,
Is it safe to say that more development needs done with the tool before we can answer all of Eric's questions? Should I publish this PR as a first pass, then we can beef up the content later? 
https://github.com/openshift/openshift-docs/pull/2885
Let me know what you think. Thanks!

Comment 10 Luke Meyer 2016-10-12 19:27:09 UTC
I think it's fair as a first pass.

Comment 11 openshift-github-bot 2016-10-12 19:31:55 UTC
Commits pushed to master at https://github.com/openshift/openshift-docs

https://github.com/openshift/openshift-docs/commit/0208bd1ae08c55b0cc7d64311b06d81995f52def
Bug 1373660, added information about the diagnostics tool

https://github.com/openshift/openshift-docs/commit/13d28156ca6afdf4e45d385387c3d68ea5d4fc3d
Merge pull request #2885 from ahardin-rh/diagnostics-tool

Bug 1373660, added information about the diagnostics tool

Comment 12 Ashley Hardin 2016-10-12 19:33:00 UTC
Thanks!


Note You need to log in before you can comment on or make changes to this bug.