Bug 689573 - pulp-agent running with near 100% cpu
Summary: pulp-agent running with near 100% cpu
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Pulp
Classification: Retired
Component: user-experience
Version: unspecified
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: Sprint 22
Assignee: Jeff Ortel
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On:
Blocks: verified-to-close
TreeView+ depends on / blocked
 
Reported: 2011-03-21 19:48 UTC by Preethi Thomas
Modified: 2013-09-09 16:30 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2011-08-16 12:07:12 UTC
Embargoed:


Attachments (Terms of Use)
bkearney agent.log (266.80 KB, application/octet-stream)
2011-03-21 21:34 UTC, Jeff Ortel
no flags Details

Description Preethi Thomas 2011-03-21 19:48:43 UTC
Description of problem:


When you have pulp-agent running the python process is running with near 100%cpu

[root@pulp-client ~]# rpm -qa |grep pulp
pulp-common-0.0.154-1.fc14.noarch
pulp-client-0.0.154-1.fc14.noarch


[root@pulp-client ~]# top

top - 15:46:32 up 6 days,  1:48,  2 users,  load average: 0.45, 0.76, 0.91
Tasks:  97 total,   1 running,  96 sleeping,   0 stopped,   0 zombie
Cpu(s): 16.4%us,  0.7%sy,  0.0%ni, 82.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1021240k total,   467172k used,   554068k free,   138892k buffers
Swap:  2031612k total,        0k used,  2031612k free,   215480k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 4263 root      20   0  685m  24m 6864 S 100.1  2.4   0:16.86 python     






[root@preethi ~]# rpm -qa |grep pulp
pulp-0.0.154-1.fc14.noarch
pulp-client-0.0.154-1.fc14.noarch
pulp-common-0.0.154-1.fc14.noarch



[root@preethi ~]# top

top - 15:56:41 up 35 days,  2:21,  3 users,  load average: 1.03, 1.13, 1.40
Tasks: 212 total,   1 running, 211 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.3%us,  0.7%sy,  0.0%ni, 74.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   6118332k total,  5371072k used,   747260k free,   370288k buffers
Swap:  8224764k total,   109104k used,  8115660k free,  4320052k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 1105 root      20   0  687m  26m 7048 S 99.9  0.4  11:31.48 python

Comment 1 Preethi Thomas 2011-03-21 21:10:57 UTC
from the agent.log


2011-03-21 17:19:20,359 [ERROR][Actions] __call__() @ action.py:117 - Enqueue capacity threshold exceeded on queue "2f074ce1-199a-42b9-acd9-4940e969ad05:0.0". (JournalImpl.cpp:587)(501)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/gofer/agent/action.py", line 115, in __call__
    self.target()
  File "/usr/lib/gofer/plugins/pulp.py", line 75, in heartbeat
    p.send(topic, ttl=delay, agent=myid, next=delay)
  File "/usr/lib/python2.7/site-packages/gofer/messaging/producer.py", line 56, in send
    sender = self.session().sender(address)
  File "<string>", line 6, in sender
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 579, in sender
    sender._ewait(lambda: sender.linked)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 786, in _ewait
    result = self.session._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 553, in _ewait
    result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 196, in _ewait
    self.check_error()
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 189, in check_error
    raise self.error
ConnectionError: Enqueue capacity threshold exceeded on queue "2f074ce1-199a-42b9-acd9-4940e969ad05:0.0". (JournalImpl.cpp:587)(501)
2011-03-21 17:19:30,377 [ERROR][Actions] __call__() @ action.py:117 - Enqueue capacity threshold exceeded on queue "2f074ce1-199a-42b9-acd9-4940e969ad05:0.0". (JournalImpl.cpp:587)(501)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/gofer/agent/action.py", line 115, in __call__
    self.target()
  File "/usr/lib/gofer/plugins/pulp.py", line 75, in heartbeat
    p.send(topic, ttl=delay, agent=myid, next=delay)
  File "/usr/lib/python2.7/site-packages/gofer/messaging/producer.py", line 56, in send
    sender = self.session().sender(address)
  File "<string>", line 6, in sender
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 579, in sender
    sender._ewait(lambda: sender.linked)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 786, in _ewait
    result = self.session._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 553, in _ewait
    result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 196, in _ewait
    self.check_error()
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 189, in check_error
    raise self.error
ConnectionError: Enqueue capacity threshold exceeded on queue "2f074ce1-199a-42b9-acd9-4940e969ad05:0.0". (JournalImpl.cpp:587)(501)

Comment 2 Jeff Ortel 2011-03-21 21:34:38 UTC
Created attachment 486698 [details]
bkearney agent.log

bkearney's agent.log

Comment 3 Jeff Ortel 2011-03-24 22:06:55 UTC
After a bit of code profiling and collecting metrics (using timers), I found that the gofer Producer was leaking qpid senders.  As a result, performance decreased and cpu usage increased with every message sent.  Although, I haven't reproduced the more commonly reported cases of goferd using 100% of the CPU, the metrics collected before and after applying this patch suggest this was a likely culprit.

In this case, I believe the filesystem being full caused errors in send heartbeat messages that was greatly exacerbated by leaking senders.  This caused goferd to use 100% cpu.

Related to: 689573

Fixed in gofer 0.26.

Comment 4 Preethi Thomas 2011-03-25 15:10:51 UTC
Considering what i am seeing on my box with the gofer 0.26, I am moving it back to assigned.

Comment 5 Jeff Ortel 2011-03-29 17:23:57 UTC
Seems to be related to qpid-cpp-server-store.  The 0.158 build removes this dependency.   Pulp users should:
 * uninstall qpid-cpp-server-store
 * restart qpidd & goferd

Comment 6 Jeff Ortel 2011-03-29 17:24:20 UTC
Build: 0.158

Comment 7 Preethi Thomas 2011-05-03 20:49:18 UTC
I haven't see this issue since uninstalling qpid-cpp-server-store

I am on 
[root@preethi unit]# rpm -q pulp
pulp-0.0.172-1.fc14.noarch

moving to verified

Comment 8 Preethi Thomas 2011-08-16 12:07:12 UTC
Closing with Community Release 15

pulp-0.0.223-4.

Comment 9 Preethi Thomas 2011-08-16 12:20:55 UTC
Closing with Community Release 15

pulp-0.0.223-4.


Note You need to log in before you can comment on or make changes to this bug.