Bug 188938 - satellite-sync watchdog script
satellite-sync watchdog script
Status: CLOSED WONTFIX
Product: Red Hat Satellite 5
Classification: Red Hat
Component: Server (Show other bugs)
unspecified
All Linux
medium Severity medium
: ---
: ---
Assigned To: Mike Orazi
Red Hat Satellite QA List
: FutureFeature
Depends On:
Blocks: 145467
  Show dependency treegraph
 
Reported: 2006-04-13 15:58 EDT by Matt Domsch
Modified: 2008-10-29 10:21 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-10-29 10:21:41 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Matt Domsch 2006-04-13 15:58:23 EDT
Description of problem:
For various reasons (code?, network issues, ...), satellite-sync can sometimes
get wedged, never dying, but never finishing either.  (I've got RHN Satellite
3.4 running).  When this happens, the regular cronjob doesn't start if the
previous cronjob hasn't yet completed, so unless you're watching the emails
closely, it can be days before you realize "hey, I'm not getting any updates
from satellite-sync now".

So, I wrote a script to run as the cronjob, which checks on satellite-sync every
minute, and kills it if it hasn't completed within 24 hours.  I'd encourage you
to add this (or something better) to the product and/or docs in order to make it
more resiliant to satellite-sync hangs.

#!/bin/sh

# to make the job scheduler report sigchld immediately
set -bm

perl -le 'sleep rand 9000'
trap check_child CHLD
satellite-sync --email > /dev/null 2>&1 &
PID=$!

function check_child()
{
    if ! `ps -p $PID > /dev/null 2>&1` ; then
        exit 0
    fi
}

# give the satellite-sync up to 24 hours to complete
# and kill it after that
let i=0
while [ $i -lt $((60 * 24)) ]; do
    sleep 60
    i=$((i+1))
done
kill $PID > /dev/null 2>&1
Comment 1 Bret McMillan 2006-09-05 13:39:16 EDT
Can you provide more data about the circumstances where satellite-sync wedges?

Dates & times, exact commandline used, etc will provide us with data to attack
the fundamental performance issues you're seeing.
Comment 2 Matt Domsch 2008-10-29 10:21:41 EDT
These have been open for years with no investigation or resolution.  Since then the code base has moved on significantly, such that many of these no longer would apply to the current spacewalk code.  I'm closing these requests in the hope they're no longer necessary, or if they are, they'll get discovered anew.

Note You need to log in before you can comment on or make changes to this bug.