Description of problem: For various reasons (code?, network issues, ...), satellite-sync can sometimes get wedged, never dying, but never finishing either. (I've got RHN Satellite 3.4 running). When this happens, the regular cronjob doesn't start if the previous cronjob hasn't yet completed, so unless you're watching the emails closely, it can be days before you realize "hey, I'm not getting any updates from satellite-sync now". So, I wrote a script to run as the cronjob, which checks on satellite-sync every minute, and kills it if it hasn't completed within 24 hours. I'd encourage you to add this (or something better) to the product and/or docs in order to make it more resiliant to satellite-sync hangs. #!/bin/sh # to make the job scheduler report sigchld immediately set -bm perl -le 'sleep rand 9000' trap check_child CHLD satellite-sync --email > /dev/null 2>&1 & PID=$! function check_child() { if ! `ps -p $PID > /dev/null 2>&1` ; then exit 0 fi } # give the satellite-sync up to 24 hours to complete # and kill it after that let i=0 while [ $i -lt $((60 * 24)) ]; do sleep 60 i=$((i+1)) done kill $PID > /dev/null 2>&1
Can you provide more data about the circumstances where satellite-sync wedges? Dates & times, exact commandline used, etc will provide us with data to attack the fundamental performance issues you're seeing.
These have been open for years with no investigation or resolution. Since then the code base has moved on significantly, such that many of these no longer would apply to the current spacewalk code. I'm closing these requests in the hope they're no longer necessary, or if they are, they'll get discovered anew.