Bug 620511
Summary: | condor_configd hangs on rhel4 when receiving a remote config | ||
---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Pete MacKinnon <pmackinn> |
Component: | condor-wallaby-client | Assignee: | Robert Rati <rrati> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Tomas Rusnak <trusnak> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | beta | CC: | rrati, trusnak |
Target Milestone: | 1.3 | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-10-21 18:44:50 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 612869, 623220 | ||
Bug Blocks: |
Description
Pete MacKinnon
2010-08-02 18:13:34 UTC
Issue is with large output from commands executed on rhel4. The run_cmd function would end up deadlocking waiting for the command to exit so it could read the stdout/err buffers, but the program was waiting for the buffers to be read so it could put more data in and finish executing. Fixed in: condor-job-hooks-1.4-3 Reproduced on: $CondorVersion: 7.4.4 Aug 5 2010 BuildID: RH-7.4.4-0.8.el4 PRE-RELEASE $ $CondorPlatform: I386-LINUX_RHEL4 $ condor-job-hooks-1.4-1.el4 08/11 05:22:07 DEBUG: Retrieved node object from store 08/11 05:22:14 DEBUG: Checking version of condor configuration 08/11 05:22:15 DEBUG: The system is already running configuration version "1281517401758710" 08/11 05:22:15 DEBUG: Performing a checkin with the store 08/11 05:22:15 DEBUG: Checked in with the store 08/11 05:24:14 DEBUG: Received a NodeUpdatedNotice 08/11 05:24:14 DEBUG: The event is for this node 08/11 05:24:15 DEBUG: Checking version of condor configuration 08/11 05:24:15 INFO: Retrieving configuration version "1281518654174482" from the store 08/11 05:24:24 DEBUG: Retrieved configuration from the store 08/11 05:26:58 DEBUG: Received a NodeUpdatedNotice 08/11 05:26:58 DEBUG: The event is for this node 08/11 05:26:58 DEBUG: Checking version of condor configuration 08/11 05:28:19 DEBUG: Received a NodeUpdatedNotice 08/11 05:28:19 DEBUG: The event is for this node Remote configuration hangs - no deamons restarted, no configuration retrieved from configure store. You reported testing on condor-job-hooks-1.4-1, but the fix is stated as being in condor-job-hooks-1.4-3. Please test with 1.4-3. Depends on set to BZ623220. Cannot be verified before the problems with config store are resolved. 08/18 04:16:05 DEBUG: Received a NodeUpdatedNotice 08/18 04:16:05 DEBUG: The event is for this node 08/18 04:16:06 DEBUG: Checking version of condor configuration 08/18 04:16:06 INFO: Retrieving configuration version "1282119365333910" from the store 08/18 04:16:07 DEBUG: Retrieved configuration from the store 08/18 04:16:07 ERROR: Store error: 1, ERROR: near "-": syntax error 08/18 04:16:07 ERROR: Failed to retrive differences between versions "1282116947136550" and "1282119365333910". No update performed 08/18 04:16:07 DEBUG: Performing a checkin with the store 08/18 04:16:07 DEBUG: Checked in with the store /var/log/wallaby/agent.log E, [2010-08-18T04:16:07.038373 #30701] ERROR -- : Error calling whatChanged: near "-": syntax error E, [2010-08-18T04:16:07.038496 #30701] ERROR -- : /usr/lib/ruby/site_ruby/1.8/sqlite3/errors.rb:62:in `check' /usr/lib/ruby/site_ruby/1.8/sqlite3/statement.rb:39:in `initialize' /usr/lib/ruby/site_ruby/1.8/sqlite3/database.rb:154:in `new' /usr/lib/ruby/site_ruby/1.8/sqlite3/database.rb:154:in `prepare' /usr/lib/ruby/site_ruby/1.8/sqlite3/database.rb:181:in `execute' /usr/lib/ruby/site_ruby/1.8/mrg/grid/config/Configuration.rb:94:in `build' /usr/lib/ruby/site_ruby/1.8/mrg/grid/config/Node.rb:236:in `whatChanged' /usr/lib/ruby/site_ruby/1.8/spqr/app.rb:130:in `send' /usr/lib/ruby/site_ruby/1.8/spqr/app.rb:130:in `method_call' /usr/lib/ruby/site_ruby/1.8/qmf.rb:1460:in `do_agent_events' /usr/lib/ruby/site_ruby/1.8/qmf.rb:1493:in `do_events' /usr/lib/ruby/site_ruby/1.8/qmf.rb:1519:in `sess_event_recv' /usr/lib/ruby/site_ruby/1.8/qmf.rb:308:in `run' /usr/lib/ruby/site_ruby/1.8/qmf.rb:242:in `initialize' /usr/lib/ruby/site_ruby/1.8/qmf.rb:241:in `new' /usr/lib/ruby/site_ruby/1.8/qmf.rb:241:in `initialize' /usr/lib/ruby/site_ruby/1.8/spqr/app.rb:211:in `new' /usr/lib/ruby/site_ruby/1.8/spqr/app.rb:211:in `main' /usr/bin/wallaby-agent:235 Tested on:
$CondorVersion: 7.4.4 Aug 23 2010 BuildID: RH-7.4.4-0.10.el4 PRE-RELEASE $
$CondorPlatform: I386-LINUX_RHEL4 $
$CondorVersion: 7.4.4 Aug 23 2010 BuildID: RH-7.4.4-0.10.el4 PRE-RELEASE $
$CondorPlatform: X86_64-LINUX_RHEL4 $
condor-wallaby-client-3.4-1.el4
08/25 08:01:56 DEBUG: Connected to broker "****:5672"
08/25 08:01:57 DEBUG: Retrieved node object from store
08/25 08:01:57 DEBUG: Checking version of condor configuration
08/25 08:01:57 DEBUG: The system is already running configuration version "1282737701199239"
08/25 08:01:57 DEBUG: Performing a checkin with the store
08/25 08:01:57 DEBUG: Checked in with the store
08/25 08:02:39 DEBUG: Received a NodeUpdatedNotice
08/25 08:02:39 DEBUG: The event is for this node
08/25 08:02:39 DEBUG: Checking version of condor configuration
08/25 08:02:40 INFO: Retrieving configuration version "1282737757546367" from the store
08/25 08:02:43 DEBUG: Retrieved configuration from the store
08/25 08:02:43 DEBUG: Daemons to restart: []
08/25 08:02:43 DEBUG: Daemons to reconfig: [u'schedd', u'collector', u'startd']
08/25 08:02:44 DEBUG: Not sending "condor_reconfig" to subsystem "schedd" since it is not currently running
08/25 08:02:44 DEBUG: Not sending "condor_reconfig" to subsystem "collector" since it is not currently running
08/25 08:02:44 DEBUG: Sending command "condor_reconfig" to subsystem "startd"
08/25 08:02:44 DEBUG: Sent command "condor_reconfig" to subsystem "startd"
08/25 08:02:44 DEBUG: Performing a checkin with the store
08/25 08:02:44 DEBUG: Checked in with the store
>>> VERIFIED
|