Bug 663443

Summary: qpidd locks up due to "too many open socket"
Product: [Retired] Pulp Reporter: dgao
Component: z_otherAssignee: Jeff Ortel <jortel>
Status: CLOSED WORKSFORME QA Contact: Preethi Thomas <pthomas>
Severity: medium Docs Contact:
Priority: low    
Version: unspecifiedCC: jortel, skarmark
Target Milestone: ---Keywords: Triaged
Target Release: Sprint 21   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-10 15:41:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description dgao 2010-12-15 20:02:23 UTC
qpidd occasionally goes into rage mode and opens up a lot of file socket to 
/var/lib/qpidd/rhm/jrnl/000X/*.jdat

At this point, any command issue to pulp cli will cause a 

201012151353:21.563 - FINE: Stderr: error: operation failed: "Traceback (most recent call last):\n  File \"/usr/lib/python2.6/site-packages/pulp/server/webservices/controllers/base.py\", line 53, in report_error\n    return method(self, *args, **kwargs)\n  File \"/usr/lib/python2.6/site-packages/pulp/server/webservices/role_check.py\", line 130, in check_roles\n    result = f(instance, *fargs, **kw)\n  File \"/usr/lib/python2.6/site-packages/pulp/server/webservices/controllers/repositories.py\", line 103, in PUT\n    gpgkeys=repo_data.get('gpgkeys', None),)\n  File \"/usr/lib/python2.6/site-packages/pulp/server/event/dispatcher.py\", line 84, in call\n    method(inst, *args, **kwargs)\n  File \"/usr/lib/python2.6/site-packages/pulp/server/event/handler/repo.py\", line 54, in create\n    self.producer.send('repo.created', event)\n  File \"/usr/lib/python2.6/site-packages/pulp/server/event/producer.py\", line 39, in send\n    Producer.send(self, destination, event=event)\n  File \"/usr/lib/python2.6/site-packages/gofer/messaging/producer.py\", line 56, in send\n    sender = self.session().sender(address)\n  File \"<string>\", line 6, in sender\n  File \"/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py\", line 576, in sender\n    sender._ewait(lambda: sender.linked)\n  File \"/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py\", line 783, in _ewait\n    result = self.session._ewait(lambda: self.error or predicate(), timeout)\n  File \"/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py\", line 550, in _ewait\n    result = self.connection._ewait(lambda: self.error or predicate(), timeout)\n  File \"/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py\", line 194, in _ewait\n    self.check_error()\n  File \"/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py\", line 187, in check_error\n    raise self.error\nConnectionError: Queue d68fd6e7-cfa1-498e-a787-85bce961fb05:0.0: create() failed: jexception 0x0400 fcntl::clean_file() threw JERR_FCNTL_OPENWR: Unable to open file for write. (open() failed: errno=24 (Too many open files)) (MessageStoreImpl.cpp:498)(501)\n"


[root@pulp-nightly jrnl]# lsof | grep "qpidd" | wc -l
1069

It has been observed on functioning system that if pulp is continued to be used, the number of open sockets will continue to grow without ever decreasing.  

For example, pulp-qe is working

[root@pulp-qe qpidd]# lsof | grep "qpidd"|wc -l
630

after one successful run of automated test,

[root@pulp-qe qpidd]# lsof | grep "qpidd"|wc -l
639

Given this, it's only a matter of time before this error will be encountered on this system as well.

Comment 1 Jeff Ortel 2011-03-04 14:18:53 UTC
This looks like a bug in the qpid message store.  Can you verify that you're running qpid-cpp-server 0.7?  If so, I'll get the qpid team to take a look.

Comment 2 dgao 2011-03-04 15:24:16 UTC
The version of qpid-cpp-server that's installed is actually 0.6.895736.

But qpid's open socket count no longer increases when I run automation test. This bug might of been fixed by accident.

Comment 3 Jeff Ortel 2011-03-10 15:41:31 UTC
Looks like this was fixed in qpid 0.6 - closing.