Bug 663443 - qpidd locks up due to "too many open socket"
Summary: qpidd locks up due to "too many open socket"
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Pulp
Classification: Retired
Component: z_other
Version: unspecified
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: Sprint 21
Assignee: Jeff Ortel
QA Contact: Preethi Thomas
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-12-15 20:02 UTC by dgao
Modified: 2011-07-15 19:36 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2011-03-10 15:41:31 UTC
Embargoed:


Attachments (Terms of Use)

Description dgao 2010-12-15 20:02:23 UTC
qpidd occasionally goes into rage mode and opens up a lot of file socket to 
/var/lib/qpidd/rhm/jrnl/000X/*.jdat

At this point, any command issue to pulp cli will cause a 

201012151353:21.563 - FINE: Stderr: error: operation failed: "Traceback (most recent call last):\n  File \"/usr/lib/python2.6/site-packages/pulp/server/webservices/controllers/base.py\", line 53, in report_error\n    return method(self, *args, **kwargs)\n  File \"/usr/lib/python2.6/site-packages/pulp/server/webservices/role_check.py\", line 130, in check_roles\n    result = f(instance, *fargs, **kw)\n  File \"/usr/lib/python2.6/site-packages/pulp/server/webservices/controllers/repositories.py\", line 103, in PUT\n    gpgkeys=repo_data.get('gpgkeys', None),)\n  File \"/usr/lib/python2.6/site-packages/pulp/server/event/dispatcher.py\", line 84, in call\n    method(inst, *args, **kwargs)\n  File \"/usr/lib/python2.6/site-packages/pulp/server/event/handler/repo.py\", line 54, in create\n    self.producer.send('repo.created', event)\n  File \"/usr/lib/python2.6/site-packages/pulp/server/event/producer.py\", line 39, in send\n    Producer.send(self, destination, event=event)\n  File \"/usr/lib/python2.6/site-packages/gofer/messaging/producer.py\", line 56, in send\n    sender = self.session().sender(address)\n  File \"<string>\", line 6, in sender\n  File \"/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py\", line 576, in sender\n    sender._ewait(lambda: sender.linked)\n  File \"/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py\", line 783, in _ewait\n    result = self.session._ewait(lambda: self.error or predicate(), timeout)\n  File \"/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py\", line 550, in _ewait\n    result = self.connection._ewait(lambda: self.error or predicate(), timeout)\n  File \"/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py\", line 194, in _ewait\n    self.check_error()\n  File \"/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py\", line 187, in check_error\n    raise self.error\nConnectionError: Queue d68fd6e7-cfa1-498e-a787-85bce961fb05:0.0: create() failed: jexception 0x0400 fcntl::clean_file() threw JERR_FCNTL_OPENWR: Unable to open file for write. (open() failed: errno=24 (Too many open files)) (MessageStoreImpl.cpp:498)(501)\n"


[root@pulp-nightly jrnl]# lsof | grep "qpidd" | wc -l
1069

It has been observed on functioning system that if pulp is continued to be used, the number of open sockets will continue to grow without ever decreasing.  

For example, pulp-qe is working

[root@pulp-qe qpidd]# lsof | grep "qpidd"|wc -l
630

after one successful run of automated test,

[root@pulp-qe qpidd]# lsof | grep "qpidd"|wc -l
639

Given this, it's only a matter of time before this error will be encountered on this system as well.

Comment 1 Jeff Ortel 2011-03-04 14:18:53 UTC
This looks like a bug in the qpid message store.  Can you verify that you're running qpid-cpp-server 0.7?  If so, I'll get the qpid team to take a look.

Comment 2 dgao 2011-03-04 15:24:16 UTC
The version of qpid-cpp-server that's installed is actually 0.6.895736.

But qpid's open socket count no longer increases when I run automation test. This bug might of been fixed by accident.

Comment 3 Jeff Ortel 2011-03-10 15:41:31 UTC
Looks like this was fixed in qpid 0.6 - closing.


Note You need to log in before you can comment on or make changes to this bug.