Description of problem: When you have pulp-agent running the python process is running with near 100%cpu [root@pulp-client ~]# rpm -qa |grep pulp pulp-common-0.0.154-1.fc14.noarch pulp-client-0.0.154-1.fc14.noarch [root@pulp-client ~]# top top - 15:46:32 up 6 days, 1:48, 2 users, load average: 0.45, 0.76, 0.91 Tasks: 97 total, 1 running, 96 sleeping, 0 stopped, 0 zombie Cpu(s): 16.4%us, 0.7%sy, 0.0%ni, 82.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1021240k total, 467172k used, 554068k free, 138892k buffers Swap: 2031612k total, 0k used, 2031612k free, 215480k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4263 root 20 0 685m 24m 6864 S 100.1 2.4 0:16.86 python [root@preethi ~]# rpm -qa |grep pulp pulp-0.0.154-1.fc14.noarch pulp-client-0.0.154-1.fc14.noarch pulp-common-0.0.154-1.fc14.noarch [root@preethi ~]# top top - 15:56:41 up 35 days, 2:21, 3 users, load average: 1.03, 1.13, 1.40 Tasks: 212 total, 1 running, 211 sleeping, 0 stopped, 0 zombie Cpu(s): 25.3%us, 0.7%sy, 0.0%ni, 74.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 6118332k total, 5371072k used, 747260k free, 370288k buffers Swap: 8224764k total, 109104k used, 8115660k free, 4320052k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1105 root 20 0 687m 26m 7048 S 99.9 0.4 11:31.48 python
from the agent.log 2011-03-21 17:19:20,359 [ERROR][Actions] __call__() @ action.py:117 - Enqueue capacity threshold exceeded on queue "2f074ce1-199a-42b9-acd9-4940e969ad05:0.0". (JournalImpl.cpp:587)(501) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/gofer/agent/action.py", line 115, in __call__ self.target() File "/usr/lib/gofer/plugins/pulp.py", line 75, in heartbeat p.send(topic, ttl=delay, agent=myid, next=delay) File "/usr/lib/python2.7/site-packages/gofer/messaging/producer.py", line 56, in send sender = self.session().sender(address) File "<string>", line 6, in sender File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 579, in sender sender._ewait(lambda: sender.linked) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 786, in _ewait result = self.session._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 553, in _ewait result = self.connection._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 196, in _ewait self.check_error() File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 189, in check_error raise self.error ConnectionError: Enqueue capacity threshold exceeded on queue "2f074ce1-199a-42b9-acd9-4940e969ad05:0.0". (JournalImpl.cpp:587)(501) 2011-03-21 17:19:30,377 [ERROR][Actions] __call__() @ action.py:117 - Enqueue capacity threshold exceeded on queue "2f074ce1-199a-42b9-acd9-4940e969ad05:0.0". (JournalImpl.cpp:587)(501) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/gofer/agent/action.py", line 115, in __call__ self.target() File "/usr/lib/gofer/plugins/pulp.py", line 75, in heartbeat p.send(topic, ttl=delay, agent=myid, next=delay) File "/usr/lib/python2.7/site-packages/gofer/messaging/producer.py", line 56, in send sender = self.session().sender(address) File "<string>", line 6, in sender File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 579, in sender sender._ewait(lambda: sender.linked) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 786, in _ewait result = self.session._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 553, in _ewait result = self.connection._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 196, in _ewait self.check_error() File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 189, in check_error raise self.error ConnectionError: Enqueue capacity threshold exceeded on queue "2f074ce1-199a-42b9-acd9-4940e969ad05:0.0". (JournalImpl.cpp:587)(501)
Created attachment 486698 [details] bkearney agent.log bkearney's agent.log
After a bit of code profiling and collecting metrics (using timers), I found that the gofer Producer was leaking qpid senders. As a result, performance decreased and cpu usage increased with every message sent. Although, I haven't reproduced the more commonly reported cases of goferd using 100% of the CPU, the metrics collected before and after applying this patch suggest this was a likely culprit. In this case, I believe the filesystem being full caused errors in send heartbeat messages that was greatly exacerbated by leaking senders. This caused goferd to use 100% cpu. Related to: 689573 Fixed in gofer 0.26.
Considering what i am seeing on my box with the gofer 0.26, I am moving it back to assigned.
Seems to be related to qpid-cpp-server-store. The 0.158 build removes this dependency. Pulp users should: * uninstall qpid-cpp-server-store * restart qpidd & goferd
Build: 0.158
I haven't see this issue since uninstalling qpid-cpp-server-store I am on [root@preethi unit]# rpm -q pulp pulp-0.0.172-1.fc14.noarch moving to verified
Closing with Community Release 15 pulp-0.0.223-4.