188938 – satellite-sync watchdog script

Bug 188938 - satellite-sync watchdog script

Summary: satellite-sync watchdog script

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Satellite 5
Classification:	Red Hat
Component:	Server
Sub Component:
Version:	unspecified
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Mike Orazi
QA Contact:	Red Hat Satellite QA List
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	145467
TreeView+	depends on / blocked

Reported:	2006-04-13 19:58 UTC by Matt Domsch
Modified:	2008-10-29 14:21 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed:	2008-10-29 14:21:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Matt Domsch 2006-04-13 19:58:23 UTC

Description of problem:
For various reasons (code?, network issues, ...), satellite-sync can sometimes
get wedged, never dying, but never finishing either.  (I've got RHN Satellite
3.4 running).  When this happens, the regular cronjob doesn't start if the
previous cronjob hasn't yet completed, so unless you're watching the emails
closely, it can be days before you realize "hey, I'm not getting any updates
from satellite-sync now".

So, I wrote a script to run as the cronjob, which checks on satellite-sync every
minute, and kills it if it hasn't completed within 24 hours.  I'd encourage you
to add this (or something better) to the product and/or docs in order to make it
more resiliant to satellite-sync hangs.

#!/bin/sh

# to make the job scheduler report sigchld immediately
set -bm

perl -le 'sleep rand 9000'
trap check_child CHLD
satellite-sync --email > /dev/null 2>&1 &
PID=$!

function check_child()
{
    if ! `ps -p $PID > /dev/null 2>&1` ; then
        exit 0
    fi
}

# give the satellite-sync up to 24 hours to complete
# and kill it after that
let i=0
while [ $i -lt $((60 * 24)) ]; do
    sleep 60
    i=$((i+1))
done
kill $PID > /dev/null 2>&1

Comment 1 Bret McMillan 2006-09-05 17:39:16 UTC

Can you provide more data about the circumstances where satellite-sync wedges?

Dates & times, exact commandline used, etc will provide us with data to attack
the fundamental performance issues you're seeing.

Comment 2 Matt Domsch 2008-10-29 14:21:41 UTC

These have been open for years with no investigation or resolution.  Since then the code base has moved on significantly, such that many of these no longer would apply to the current spacewalk code.  I'm closing these requests in the hope they're no longer necessary, or if they are, they'll get discovered anew.

Note You need to log in before you can comment on or make changes to this bug.