Hide Forgot
Document URL: Need New Topic Section Number and Name: Need New Topic Describe the issue: In 2 different releases we have notes on a diagnostics tool. https://docs.openshift.com/enterprise/3.1/release_notes/ose_3_1_release_notes.html#ose-3-1-1-enhancements https://docs.openshift.com/enterprise/3.2/release_notes/ose_3_2_release_notes.html#ose-32-configuration-and-administration Suggestions for improvement: However we do not document how to use it and what all it test, or helps a user validate. Additional information:
> -What it is used for Provides diagnostics to verify that OpenShift hosts and systems are working as intended. > -What it tests and/or validates * Verify the default registry and router are running and correctly configured. * Check ClusterRoleBindings and ClusterRoles for consistency with base policy. * Check that all of the client configuration contexts are valid and can be connected to. * Check that SkyDNS is working properly and pods have SDN connectivity. * Validate master and node configuration on the host. * Check that nodes are running and available. * Analyze host logs for known errors. * Check that systemd units are configured as expected for the host. If you want to just summarize that feel free. Hopefully it will see expansion in the future. > -How to use it Run, preferably on a master host and as cluster-admin: `oc adm diagnostics` This will run all available diagnostics (skipping any that don't apply; for instance the NodeConfigCheck does not run unless a node config is available). Having done that, you can run specific diagnostics by name as you work to address issues: `oc adm diagnostics NodeConfigCheck UnitStatus`
Work in progress: https://github.com/openshift/openshift-docs/pull/2885
(In reply to Luke Meyer from comment #5) The details associated with this request are lacking. Mainly focused on: (In reply to Luke Meyer from comment #5) > > -What it tests and/or validates > > * Verify the default registry and router are running and correctly > configured. > * Check ClusterRoleBindings and ClusterRoles for consistency with base > policy. > * Check that all of the client configuration contexts are valid and can be > connected to. > * Check that SkyDNS is working properly and pods have SDN connectivity. > * Validate master and node configuration on the host. > * Check that nodes are running and available. > * Analyze host logs for known errors. > * Check that systemd units are configured as expected for the host. What are some error conditions (failures) that this tool can catch or help spot, and what are some potential resolutions (is there a way to link to https://access.redhat.com/node/1599603 or other articles)? I wonder how we can tie these "docs" to https://bugzilla.redhat.com/show_bug.cgi?id=1259118 and the articles we have on troubleshooting.
(In reply to Eric Rich from comment #7) > What are some error conditions (failures) that this tool can catch or help > spot, and what are some potential resolutions (is there a way to link to > https://access.redhat.com/node/1599603 or other articles)? The idea is to check for mistakes people have made in the past and give at least some idea what is going on, or even how to solve it. I would not say it does a great job of that yet, but it hasn't received much priority, so you guys could help by pressing for things your customers have done that you'd like a diagnostic to detect. They could certainly refer to articles or docs for resolution, as long as the URLs are stable and preferably public (so Origin users don't get sent to a solution they can't read - or maybe you want that). How specific do you want the docs to be about what this checks?
Luke and Eric, Is it safe to say that more development needs done with the tool before we can answer all of Eric's questions? Should I publish this PR as a first pass, then we can beef up the content later? https://github.com/openshift/openshift-docs/pull/2885 Let me know what you think. Thanks!
I think it's fair as a first pass.
Commits pushed to master at https://github.com/openshift/openshift-docs https://github.com/openshift/openshift-docs/commit/0208bd1ae08c55b0cc7d64311b06d81995f52def Bug 1373660, added information about the diagnostics tool https://github.com/openshift/openshift-docs/commit/13d28156ca6afdf4e45d385387c3d68ea5d4fc3d Merge pull request #2885 from ahardin-rh/diagnostics-tool Bug 1373660, added information about the diagnostics tool
Thanks!
Content is now published: https://access.redhat.com/documentation/en/openshift-container-platform/3.3/single/cluster-administration/#admin-guide-diagnostics-tool