| Summary: | Investigate why aviary_query_server memory footprint is double that of condor_job_server | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Pete MacKinnon <pmackinn> | ||||
| Component: | condor-aviary | Assignee: | Pete MacKinnon <pmackinn> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | MRG Quality Engineering <mrgqe-bugs> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 2.1 | CC: | matt | ||||
| Target Milestone: | 2.1.1 | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-11-30 20:39:24 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Pete MacKinnon
2011-11-07 16:57:37 UTC
I see this after a restart: 16880 condor 15 0 315m 108m 6188 S 0.0 0.2 0:21.59 condor_job_serv 16669 condor 18 0 167m 91m 3372 S 0.0 0.2 0:05.10 aviary_query_se We cache summaries in the aviary query server and currently don't GC them regularly. However, in my testing the biggest RES growth is associated with calling getJobDetails on live jobs from the queue. 500 "sleep 20" jobs in the queue. After invoking getJobSummary on all jobs: 22054 pmackinn 20 0 67460 15m 7936 S 0.0 0.8 0:01.52 condor_job_serv 22052 pmackinn 20 0 23196 14m 5304 S 0.0 0.7 0:01.45 aviary_query_se After invoking getJobDetails on all jobs: 21183 pmackinn 20 0 67320 15m 7940 S 0.0 0.8 0:01.50 condor_job_serv 21182 pmackinn 20 0 31708 23m 5372 S 0.0 1.2 0:06.76 aviary_query_se These peak RES levels appear to stay constant with repeated get* invocations on the same job set. valgrind from getJobDetails on 500 jobs (500 submissions): ==32243== HEAP SUMMARY: ==32243== in use at exit: 742,648 bytes in 16,861 blocks ==32243== total heap usage: 3,869,478 allocs, 3,852,617 frees, 5,806,216,557 bytes allocated ==32243== ==32243== LEAK SUMMARY: ==32243== definitely lost: 236 bytes in 7 blocks ==32243== indirectly lost: 72 bytes in 3 blocks ==32243== possibly lost: 123,201 bytes in 3,611 blocks ==32243== still reachable: 619,139 bytes in 13,240 blocks ==32243== suppressed: 0 bytes in 0 blocks ==32243== Rerun with --leak-check=full to see details of leaked memory ==32243== ==32243== For counts of detected and suppressed errors, rerun with: -v ==32243== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 82 from 10) valgrind from getJobSummary on 500 jobs (500 submissions): ==32320== HEAP SUMMARY: ==32320== in use at exit: 742,379 bytes in 16,860 blocks ==32320== total heap usage: 1,340,780 allocs, 1,323,920 frees, 92,293,704 bytes allocated ==32320== ==32320== LEAK SUMMARY: ==32320== definitely lost: 236 bytes in 7 blocks ==32320== indirectly lost: 72 bytes in 3 blocks ==32320== possibly lost: 122,870 bytes in 3,609 blocks ==32320== still reachable: 619,201 bytes in 13,241 blocks ==32320== suppressed: 0 bytes in 0 blocks ==32320== Rerun with --leak-check=full to see details of leaked memory ==32320== ==32320== For counts of detected and suppressed errors, rerun with: -v ==32320== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 81 from 10) valgrind from getJobStatus on 500 jobs (500 submissions): ==32373== HEAP SUMMARY: ==32373== in use at exit: 742,379 bytes in 16,860 blocks ==32373== total heap usage: 1,295,591 allocs, 1,278,731 frees, 81,635,751 bytes allocated ==32373== ==32373== LEAK SUMMARY: ==32373== definitely lost: 236 bytes in 7 blocks ==32373== indirectly lost: 72 bytes in 3 blocks ==32373== possibly lost: 122,870 bytes in 3,609 blocks ==32373== still reachable: 619,201 bytes in 13,241 blocks ==32373== suppressed: 0 bytes in 0 blocks ==32373== Rerun with --leak-check=full to see details of leaked memory ==32373== ==32373== For counts of detected and suppressed errors, rerun with: -v ==32373== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 81 from 10) valgrind from getJobDetails on 500 jobs (1 submission): ==1724== HEAP SUMMARY: ==1724== in use at exit: 645,732 bytes in 17,690 blocks ==1724== total heap usage: 4,625,485 allocs, 4,607,795 frees, 6,159,873,460 bytes allocated ==1724== ==1724== LEAK SUMMARY: ==1724== definitely lost: 16,718 bytes in 2,243 blocks ==1724== indirectly lost: 24,122 bytes in 1,591 blocks ==1724== possibly lost: 87,806 bytes in 2,614 blocks ==1724== still reachable: 517,086 bytes in 11,242 blocks ==1724== suppressed: 0 bytes in 0 blocks ==1724== Rerun with --leak-check=full to see details of leaked memory ==1724== ==1724== For counts of detected and suppressed errors, rerun with: -v ==1724== ERROR SUMMARY: 8455 errors from 14 contexts (suppressed: 81 from 10) Created attachment 538791 [details]
massif output file for getJobDetails from 500 jobs
The Axis2/C engine retains a stream buffer for responses. It doesn't reallocate based on the size of each new response but instead just checks instead to see if the old one will fit the new data to be written. If not, then a new buffer is allocated and the old one freed. Thus, over time, the buffer will remain at a high water mark of the last largest response size. In the case of the aviary_query_server and depending on the operation invoked, the RES of the process will grow because of the heap still in use. For example, a request for 1 job status after a request for 500 job details will not grow the RES of the process (other non-RPC memory allocations notwithstanding). This is not a memory leak. Closing and adding Release Note for 2.2. |