Bug 826562

Summary: condor_q doesn't get queue info because of broken .schedd_address file
Product: Red Hat Enterprise MRG Reporter: Martin Kudlej <mkudlej>
Component: condorAssignee: grid-maint-list <grid-maint-list>
Status: CLOSED WONTFIX QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: medium Docs Contact:
Priority: low    
Version: 1.0CC: matt, tstclair
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-26 20:01:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Martin Kudlej 2012-05-30 14:00:14 UTC
Description of problem:
There is .schedd_address(contains invalid address, not cleaned after scheduler crash) file which has higher priority than scheduler name(information from collector).
condor_q has wrong output. condor_q -name `condor_config_val SCHEDD_NAME` has right output.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. set up pool with HA Scheduler + RH HA
2. run "i=1; while true; do echo -n "$i..."; ./y.sh ; i=$(($i+1));done" on all machines with potential HA Schedd
3. wait till condor_q start to get wrong output
  
Actual results:
Condor_q doesn't work right when there is .schedd_address with wrong address.

Expected results:
condor_q will output information from all available schedulers, so it will consider all schedulers from collector.

Additional info:

$ cat y.sh
#!/bin/bash

#wait till scheduler is up
while [ "0$(ps aux | grep condor_schedd | grep -v grep | wc -l)" -eq "0" ]; do
  sleep 1
done

sleep 10

killall -9 condor_schedd

#wait till scheduler is up
while [ "0$(ps aux | grep condor_schedd | grep -v grep | wc -l)" -eq "0" ]; do
  sleep 1
done

sleep 60

PID=$(pidof condor_schedd)
if [ -n "$PID" ]; then
  SAVED_PID=$(cat /var/run/condor/condor_schedd.pid)

  if [ "0$PID" -eq "0$SAVED_PID" ]; then
    echo "OK...... $PID == $SAVED_PID"
  else
    echo "ERROR... $PID != $SAVED_PID"
  fi
fi