1697466 – Provide guidance in order to get proper healthchecks for "cron" containers

Bug 1697466 - Provide guidance in order to get proper healthchecks for "cron" containers

Summary: Provide guidance in order to get proper healthchecks for "cron" containers

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-common
Sub Component:
Version:	15.0 (Stein)
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	beta
Target Release:	15.0 (Stein)
Assignee:	Cédric Jeanneret
QA Contact:	Sasha Smolyak
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-04-08 13:17 UTC by Cédric Jeanneret
Modified:	2019-09-26 10:49 UTC (History)
CC List:	7 users (show)
Fixed In Version:	openstack-tripleo-common-10.7.1-0.20190423125010.2199eeb.el8ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-09-21 11:21:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
OpenStack gerrit	651456	'None'	'MERGED'	'New health check for cron containers'	2019-11-19 07:31:27 UTC
OpenStack gerrit	651460	'None'	'ABANDONED'	'Add health check directory and script to the container via kolla'	2019-11-19 07:31:27 UTC
OpenStack gerrit	651777	'None'	'MERGED'	'Activate health checks for cron containers'	2019-11-19 07:31:27 UTC
Red Hat Product Errata	RHEA-2019:2811	None	None	None	2019-09-21 11:21:35 UTC

Description Cédric Jeanneret 2019-04-08 13:17:05 UTC

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Cédric Jeanneret 2019-04-08 13:26:20 UTC

oh great, BZ doing its stuff (drop all content when we change the component, how nice)..

So.

This is a "research paper" in order to find the best way to get healthchecks for "cron" containers.

We have to take into account that:
- it's probably not in root crontab
- it's probably not a "crontab", some have a dedicated file in /etc/cron.* directories

We have to push ideas in here and test/validate them.

Comment 2 Cédric Jeanneret 2019-04-10 08:13:44 UTC

So, currently, the state is:
[root@undercloud ~]# podman exec logrotate_crond crontab -l
# HEADER: This file was autogenerated at 2019-04-10 07:47:33 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: logrotate-crond
PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh
0 * * * * sleep `expr ${RANDOM} \% 90`; /usr/sbin/logrotate -s /var/lib/logrotate/logrotate-crond.status /etc/logrotate-crond.conf 2>&1|logger -t logrotate-crond
[root@undercloud ~]# podman exec cinder_api_cron crontab -l
no crontab for root
exit status 1
[root@undercloud ~]# podman exec nova_api_cron crontab -l
no crontab for root
exit status 1
[root@undercloud ~]# podman exec keystone_cron crontab -l
no crontab for root
exit status 1

This means:
- we can't rely on "crontab -l", especially since puppet adds a lot of garbage, counting lines isn't good
- container user doesn't necessarily own the job

We might want to use something like:
[root@undercloud ~]# podman exec cinder_api_cron ls /var/spool/cron
cinder
[root@undercloud ~]# podman exec nova_api_cron ls /var/spool/cron
nova
[root@undercloud ~]# podman exec keystone_cron ls /var/spool/cron
keystone

For instance:
[root@undercloud ~]# podman exec keystone_cron cat /var/spool/cron/keystone
# HEADER: This file was autogenerated at 2019-04-10 07:46:44 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: keystone-manage token_flush
PATH=/bin:/usr/bin:/usr/sbin SHELL=/bin/sh
1 * * * * keystone-manage token_flush >>/var/log/keystone/keystone-tokenflush.log 2>&1

So a 2-step validation might be possible:
step 1: list all the crontab in /var/lib/spool/cron
step 2: ensure we either have something in "root", or something in container_name.split('_')[0] (returns keystone, cinder, nova)

For the second step, we can use some "grep -cEv '^#' <file>" in order to get the number of uncommented lines, which should be >=2 so far.

Comment 3 Cédric Jeanneret 2019-05-20 10:49:31 UTC

So instead of guidance the health checks are actually already added to the relevant containers. There are two patches, on in tripleo-common, creating the script, and one in tripleo-heat-templates, activating the healthcheck.

Comment 7 Sasha Smolyak 2019-07-07 08:53:08 UTC

All the needed healthchecks, including for cron jobs, are present

Comment 9 errata-xmlrpc 2019-09-21 11:21:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811

Note You need to log in before you can comment on or make changes to this bug.