Bug 585212 - Recent updates to the collector caused a memory leak.
Summary: Recent updates to the collector caused a memory leak.
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: condor   
(Show other bugs)
Version: Development
Hardware: All Linux
high
high
Target Milestone: 1.3
: ---
Assignee: Timothy St. Clair
QA Contact: Luigi Toscano
URL:
Whiteboard:
Keywords:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-04-23 12:50 UTC by Timothy St. Clair
Modified: 2018-10-27 16:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-20 11:30:07 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

Description Timothy St. Clair 2010-04-23 12:50:13 UTC
Description of problem:
Memory leak in the collector on invalidate ads. 

Version-Release number of selected component (if applicable):
7.4.3-0.10

How reproducible:
100%

Steps to Reproduce:
1. Have a large pool with dynamic slots 
2. run a lot of jobs
3. watch memory usage of collector
  
Actual results:
steady increase in memory usage.

Expected results:
should stay constant.  

Additional info:

Comment 1 Timothy St. Clair 2010-04-23 12:50:56 UTC
Classad deletion.

Fixed in 7.4.3-0.11

Comment 4 Luigi Toscano 2010-07-23 15:51:22 UTC
How to quickly reproduce:
- configure a cluster of at least two condor instances (1 Central Manager, >=1 Execute nodes)
- enable Dynamic Slots on both (one big slot for each machine)
- increase the number of "generated" slots with NUM_CPUS (at least 32)
- submit a huge number of simple jobs (for example, a job description files which queues 15000 instances of "uname -a", each jdf submitted every 30 minutes)
- watch memory (RSS) used by collector on CM

With a simple cluster of two machines, RSS memory used by collector/7.4.3-0.10, increases quickly (in one or two hours), while it stays constants with condor-7.4.4-0.4 after one week of  uninterrupted job processing.

Verified on RHEL5.5, i386/x86_64.


Note You need to log in before you can comment on or make changes to this bug.