Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 620511 - condor_configd hangs on rhel4 when receiving a remote config
condor_configd hangs on rhel4 when receiving a remote config
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor-wallaby-client (Show other bugs)
beta
All Linux
high Severity high
: 1.3
: ---
Assigned To: Robert Rati
Tomas Rusnak
:
Depends On: 612869 623220
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-02 14:13 EDT by Pete MacKinnon
Modified: 2010-10-21 14:44 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-21 14:44:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pete MacKinnon 2010-08-02 14:13:34 EDT
condor_configd invokes condor_config_val -dump which appears to hang when receiving a new remote config from wallaby

[root@mrg8 ~]# rpm -q condor-wallaby-client
condor-wallaby-client-3.2-1.el4
Comment 1 Robert Rati 2010-08-02 14:23:13 EDT
Issue is with large output from commands executed on rhel4.  The run_cmd function would end up deadlocking waiting for the command to exit so it could read the stdout/err buffers, but the program was waiting for the buffers to be read so it could put more data in and finish executing.

Fixed in:
condor-job-hooks-1.4-3
Comment 2 Tomas Rusnak 2010-08-11 05:30:52 EDT
Reproduced on:

$CondorVersion: 7.4.4 Aug  5 2010 BuildID: RH-7.4.4-0.8.el4 PRE-RELEASE $
$CondorPlatform: I386-LINUX_RHEL4 $

condor-job-hooks-1.4-1.el4

08/11 05:22:07 DEBUG: Retrieved node object from store
08/11 05:22:14 DEBUG: Checking version of condor configuration
08/11 05:22:15 DEBUG: The system is already running configuration version "1281517401758710"
08/11 05:22:15 DEBUG: Performing a checkin with the store
08/11 05:22:15 DEBUG: Checked in with the store
08/11 05:24:14 DEBUG: Received a NodeUpdatedNotice
08/11 05:24:14 DEBUG: The event is for this node
08/11 05:24:15 DEBUG: Checking version of condor configuration
08/11 05:24:15 INFO: Retrieving configuration version "1281518654174482" from the store
08/11 05:24:24 DEBUG: Retrieved configuration from the store
08/11 05:26:58 DEBUG: Received a NodeUpdatedNotice
08/11 05:26:58 DEBUG: The event is for this node
08/11 05:26:58 DEBUG: Checking version of condor configuration
08/11 05:28:19 DEBUG: Received a NodeUpdatedNotice
08/11 05:28:19 DEBUG: The event is for this node

Remote configuration hangs - no deamons restarted, no configuration retrieved from configure store.
Comment 3 Robert Rati 2010-08-11 08:44:29 EDT
You reported testing on condor-job-hooks-1.4-1, but the fix is stated as being in condor-job-hooks-1.4-3.  Please test with 1.4-3.
Comment 4 Tomas Rusnak 2010-08-18 04:48:36 EDT
Depends on set to BZ623220. Cannot be verified before the problems with config store are resolved.

08/18 04:16:05 DEBUG: Received a NodeUpdatedNotice
08/18 04:16:05 DEBUG: The event is for this node
08/18 04:16:06 DEBUG: Checking version of condor configuration
08/18 04:16:06 INFO: Retrieving configuration version "1282119365333910" from the store
08/18 04:16:07 DEBUG: Retrieved configuration from the store
08/18 04:16:07 ERROR: Store error: 1, ERROR: near "-": syntax error
08/18 04:16:07 ERROR: Failed to retrive differences between versions "1282116947136550" and "1282119365333910".  No update performed
08/18 04:16:07 DEBUG: Performing a checkin with the store
08/18 04:16:07 DEBUG: Checked in with the store

/var/log/wallaby/agent.log

E, [2010-08-18T04:16:07.038373 #30701] ERROR -- : Error calling whatChanged: near "-": syntax error
E, [2010-08-18T04:16:07.038496 #30701] ERROR -- :     /usr/lib/ruby/site_ruby/1.8/sqlite3/errors.rb:62:in `check'
    /usr/lib/ruby/site_ruby/1.8/sqlite3/statement.rb:39:in `initialize'
    /usr/lib/ruby/site_ruby/1.8/sqlite3/database.rb:154:in `new'
    /usr/lib/ruby/site_ruby/1.8/sqlite3/database.rb:154:in `prepare'
    /usr/lib/ruby/site_ruby/1.8/sqlite3/database.rb:181:in `execute'
    /usr/lib/ruby/site_ruby/1.8/mrg/grid/config/Configuration.rb:94:in `build'
    /usr/lib/ruby/site_ruby/1.8/mrg/grid/config/Node.rb:236:in `whatChanged'
    /usr/lib/ruby/site_ruby/1.8/spqr/app.rb:130:in `send'
    /usr/lib/ruby/site_ruby/1.8/spqr/app.rb:130:in `method_call'
    /usr/lib/ruby/site_ruby/1.8/qmf.rb:1460:in `do_agent_events'
    /usr/lib/ruby/site_ruby/1.8/qmf.rb:1493:in `do_events'
    /usr/lib/ruby/site_ruby/1.8/qmf.rb:1519:in `sess_event_recv'
    /usr/lib/ruby/site_ruby/1.8/qmf.rb:308:in `run'
    /usr/lib/ruby/site_ruby/1.8/qmf.rb:242:in `initialize'
    /usr/lib/ruby/site_ruby/1.8/qmf.rb:241:in `new'
    /usr/lib/ruby/site_ruby/1.8/qmf.rb:241:in `initialize'
    /usr/lib/ruby/site_ruby/1.8/spqr/app.rb:211:in `new'
    /usr/lib/ruby/site_ruby/1.8/spqr/app.rb:211:in `main'
    /usr/bin/wallaby-agent:235
Comment 5 Tomas Rusnak 2010-08-25 08:06:43 EDT
Tested on:

$CondorVersion: 7.4.4 Aug 23 2010 BuildID: RH-7.4.4-0.10.el4 PRE-RELEASE $
$CondorPlatform: I386-LINUX_RHEL4 $

$CondorVersion: 7.4.4 Aug 23 2010 BuildID: RH-7.4.4-0.10.el4 PRE-RELEASE $
$CondorPlatform: X86_64-LINUX_RHEL4 $

condor-wallaby-client-3.4-1.el4

08/25 08:01:56 DEBUG: Connected to broker "****:5672"
08/25 08:01:57 DEBUG: Retrieved node object from store
08/25 08:01:57 DEBUG: Checking version of condor configuration
08/25 08:01:57 DEBUG: The system is already running configuration version "1282737701199239"
08/25 08:01:57 DEBUG: Performing a checkin with the store
08/25 08:01:57 DEBUG: Checked in with the store
08/25 08:02:39 DEBUG: Received a NodeUpdatedNotice
08/25 08:02:39 DEBUG: The event is for this node
08/25 08:02:39 DEBUG: Checking version of condor configuration
08/25 08:02:40 INFO: Retrieving configuration version "1282737757546367" from the store
08/25 08:02:43 DEBUG: Retrieved configuration from the store
08/25 08:02:43 DEBUG: Daemons to restart: []
08/25 08:02:43 DEBUG: Daemons to reconfig: [u'schedd', u'collector', u'startd']
08/25 08:02:44 DEBUG: Not sending "condor_reconfig" to subsystem "schedd" since it is not currently running
08/25 08:02:44 DEBUG: Not sending "condor_reconfig" to subsystem "collector" since it is not currently running
08/25 08:02:44 DEBUG: Sending command "condor_reconfig" to subsystem "startd"
08/25 08:02:44 DEBUG: Sent command "condor_reconfig" to subsystem "startd"
08/25 08:02:44 DEBUG: Performing a checkin with the store
08/25 08:02:44 DEBUG: Checked in with the store

>>> VERIFIED

Note You need to log in before you can comment on or make changes to this bug.