1524320 – Define troubleshooting procedures for critical RDO Infra services

RDO tickets are now tracked in Jira https://issues.redhat.com/projects/RDO/issues/

Bug 1524320 - Define troubleshooting procedures for critical RDO Infra services

Summary: Define troubleshooting procedures for critical RDO Infra services

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	RDO
Classification:	Community
Component:	Infrastructure
Sub Component:
Version:	trunk
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	trunk
Assignee:	Alan Pevec
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-12-11 09:10 UTC by Javier Peña
Modified:	2024-02-15 14:43 UTC (History)
CC List:	0 users
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2024-02-14 11:46:44 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Javier Peña 2017-12-11 09:10:44 UTC

There are several services provided by the RDO Infrastructure that are considered as critical (see https://www.rdoproject.org/infra/service-continuity/). A service failure or degradation can affect multiple projects.

Each of the RDO Infra maintainers has different levels of knowledge about the services, so we have to document the most common troubleshooting procedures for the critical services. This will help in the following areas:

- Allowing consistent troubleshooting when the most knowledgeable person is not around (e.g. weekends or holidays).
- Prevent the "someone is hit by the bus" effect.

Comment 1 Javier Peña 2017-12-11 09:12:58 UTC

From the RDO Service Continuity page, we should document troubleshooting procedures for at least:

- review.rdoproject.org nodepool nodes (or nodepool in general)
- RDO Trunk repositories
- DLRN DB instance
- images.rdoproject.org
- trunk.registry.rdoproject.org
- www.rdoproject.org
- lists.rdoprojects.org

We should rely on upstream published documentation as much as possible.

Comment 2 David Moreau Simard 2017-12-11 14:40:35 UTC

It's okay to link to a documentation place (in git or readthedocs) from the service continuity page but I'm not sure rdoproject.org is a good place for technical documentation like this.

Upstream uses system-config [1][2] for this purpose.
Our equivalent would be rdo-infra-playbooks I guess ?

Some projects already have their built-in documentation (RDO registry, delorean, weirdo) so we could see to link to them as appropriate (from the "main" documentation hub)

[1]: https://docs.openstack.org/infra/system-config/
[2]: https://github.com/openstack-infra/system-config

Comment 3 Javier Peña 2017-12-11 15:08:19 UTC

Maybe we could create a new rdo-docs repo for that, and publish it to readthedocs.org as mentioned on IRC?

Comment 4 Alan Pevec 2018-04-13 12:55:15 UTC

RDO Registry doc is published at http://rdo-container-registry.readthedocs.io/

Note You need to log in before you can comment on or make changes to this bug.