Red Hat Bugzilla – Bug 188938
satellite-sync watchdog script
Last modified: 2008-10-29 10:21:41 EDT
Description of problem:
For various reasons (code?, network issues, ...), satellite-sync can sometimes
get wedged, never dying, but never finishing either. (I've got RHN Satellite
3.4 running). When this happens, the regular cronjob doesn't start if the
previous cronjob hasn't yet completed, so unless you're watching the emails
closely, it can be days before you realize "hey, I'm not getting any updates
from satellite-sync now".
So, I wrote a script to run as the cronjob, which checks on satellite-sync every
minute, and kills it if it hasn't completed within 24 hours. I'd encourage you
to add this (or something better) to the product and/or docs in order to make it
more resiliant to satellite-sync hangs.
# to make the job scheduler report sigchld immediately
perl -le 'sleep rand 9000'
trap check_child CHLD
satellite-sync --email > /dev/null 2>&1 &
if ! `ps -p $PID > /dev/null 2>&1` ; then
# give the satellite-sync up to 24 hours to complete
# and kill it after that
while [ $i -lt $((60 * 24)) ]; do
kill $PID > /dev/null 2>&1
Can you provide more data about the circumstances where satellite-sync wedges?
Dates & times, exact commandline used, etc will provide us with data to attack
the fundamental performance issues you're seeing.
These have been open for years with no investigation or resolution. Since then the code base has moved on significantly, such that many of these no longer would apply to the current spacewalk code. I'm closing these requests in the hope they're no longer necessary, or if they are, they'll get discovered anew.