Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 916594

Summary: Can't connect to mongodb due to too many connections to db.
Product: OpenShift Container Platform Reporter: xjia <xjia>
Component: NodeAssignee: Miciah Dashiel Butler Masters <mmasters>
Status: CLOSED ERRATA QA Contact: libra bugs <libra-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 1.1.1CC: baulakh, bleanhar, bmoss, juwu, libra-onpremise-devel, mmasters, nsun, xtian
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Creating an application in OpenShift Enterprise created 10 or more connections to MongoDB. Further attempts to connect to MongoDB failed due to too many open connections. This bug fix advises users to update rubygem-mongo and rubygem-mongo to the latest version to resolve the issue.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-02 08:50:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Auto create app script none

Description xjia 2013-02-28 12:31:13 UTC
Description of problem:
mongodb and broker service on the same node.
Can't connect to mongo db. Under /proc/<mongod-pid>/fd , there are 830 fds. When execute "mongo" on broker, it will prompt:
[root@broker ~]# mongo
MongoDB shell version: 2.0.2
connecting to: test
Thu Feb 28 06:24:26 DBClientCursor::init call() failed
Thu Feb 28 06:24:26 Error: Error during mongo startup. :: caused by :: DBClientBase::findN: transport error: 127.0.0.1 query: { whatsmyuri: 1 } shell/mongo.js:84
exception: connect failed

Execute "netstat -nal|grep 27017" on broker,   you will find a lot of socket are "CLOSE_WAIT" or "ESTABLISHED"

Version-Release number of selected component (if applicable):
http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.1.z/2013-02-27.1/

How reproducible:
Not sure

Steps to Reproduce:
1. Create app , and check the number of fd under /proc/<mongod-pid>/fd 
2. Delete the app and  check the number of fd under /proc/<mongod-pid>/fd 
3. Many time repeat the step 1 and 2 .

Actual results:
3. Can't connect mongo

Expected results:
Still can connect to db.

Additional info:

Comment 2 nsun 2013-03-05 07:25:45 UTC
Created attachment 705314 [details]
Auto create app script

Auto create app script.
Default password in script is "redhat"

Comment 3 nsun 2013-03-05 07:31:11 UTC
How reproducible:
1. run "rhc setup" create domain to config your environment(password is redhat in auto create script)
2. run auto create script(Attachment #705314 [details]), In a new environement on USA openstick, It will be error when about 40 applications created.

Error Info:
The server did not respond correctly. This may be an issue with the server configuration or with your connection to the server (such as a Web proxy or firewall).Please verify that you can access the OpenShift server https://broker.test.com/broker/rest/domains/sun000/applications

And then to see log mongodb.log will be found:
Mon Mar  4 23:48:52 [initandlisten] connection refused because too many open connections: 819

Comment 4 xjia 2013-03-19 12:49:15 UTC
When I execute "rhc apps" it will create several connections to db, check the log "/var/log/mongodb/mongodb.log"

Tue Mar 19 08:01:45 [conn2525]  authenticate: { authenticate: 1, user: "openshift", nonce: "1c8dfc2407b7542c", key: "0a3cf213b181462fd0a396a898cb99a0" }
Tue Mar 19 08:01:45 [conn2525]  authenticate: { authenticate: 1, user: "openshift", nonce: "baacd71539701256", key: "f7f8eca1c2a566c68bb0063a717b1bb4" }
Tue Mar 19 08:01:45 [conn2527]  authenticate: { authenticate: 1, user: "openshift", nonce: "3bf85fae2a2e5a9c", key: "4f48fece5481ce22a8551e013752ef74" }
Tue Mar 19 08:01:45 [conn2527]  authenticate: { authenticate: 1, user: "openshift", nonce: "f985573975147a03", key: "ceb57a62e33c255e1c1bb62bffd37f41" }
Tue Mar 19 08:01:45 [conn2529]  authenticate: { authenticate: 1, user: "openshift", nonce: "705236a4d3158e99", key: "6c71fa0d50a696f8700d261c41e94b78" }
Tue Mar 19 08:01:45 [conn2529]  authenticate: { authenticate: 1, user: "openshift", nonce: "6a363ec7244d7fa5", key: "ccbdfe81c0a42b2931ff588221de3218" }
Tue Mar 19 08:01:45 [conn2531]  authenticate: { authenticate: 1, user: "openshift", nonce: "7f03918f16d2c6b3", key: "e3506c58fd77eb9fb12f99ba562c9499" }
Tue Mar 19 08:01:45 [conn2531]  authenticate: { authenticate: 1, user: "openshift", nonce: "1a4e3399ee2ea846", key: "e3804c7656686be8dd5ec49e587d1bb9" }
Tue Mar 19 08:01:45 [conn2533]  authenticate: { authenticate: 1, user: "openshift", nonce: "5a3a0ac88cb817b8", key: "56bbc0fcf3b1113368f30601e3fe9deb" }
Tue Mar 19 08:01:45 [conn2533]  authenticate: { authenticate: 1, user: "openshift", nonce: "30939cdef2bcc59e", key: "d68d4067801b5a0d2ae9c48771eba7d1" }

One command will create 10 connections to db.

Comment 5 Miciah Dashiel Butler Masters 2013-03-19 16:46:19 UTC
I can reproduce the problem of many FDs' being left open indefinitely by mongod, as shown by `ls /proc/$(pgrep mongod)/fd`, using the referenced puddle.  Using the following packages from the puddle, I do see the problem:

rubygem-mongo-1.8.1-2.el6op
rubygem-bson-1.8.1-2.el6op
rubygem-bson_ext-1.8.1-4.el6op.x86_64.rpm

I tried downgrading these packages, forcing them with rpm's --nodeps option.  Using the following packages, I do not see the problem—mongod does not leave FDs hanging around for more than a few seconds:

rubygem-mongo-1.5.2-6.el6op
rubygem-bson-1.5.2-2.el6op
rubygem-bson_ext-1.5.2-2.el6op

Bundler didn't like it when I mixed and matched different versions of rubygem-mongo, rubygem-bson, and rubygem-bson_ext; otherwise I would have tried to isolate the problem further.

Upgrading back to the packages in the puddle, I do see the problem again.

I'm going to continue debugging the problem by looking through the changes in rubygem-mongo.

Comment 6 Miciah Dashiel Butler Masters 2013-03-19 21:18:03 UTC
Newer versions of the mongo, bson, and bson_ext gems also appear to resolve the problem.  I accidentally installed the latest bson 1.8.3, bson_ext 1.8.1, and mongo 1.8.3 gems using rubygems, and now I don't see FDs hanging around when I do an app create and destroy.  Shall I try to find the particular version and code that cause the problem, or shall we simply update these packages?

Comment 7 nsun 2013-03-20 03:15:29 UTC
I campare online environment(devenv_2968) with our Enterprise environment. 

Online can not create any mongo child process after created one app
OSE will be create many mongo child process, and one process will take one FDs

When I strace the main process of mongod, will see online environment can not create any connect to mongodb when creating app, And OSE env will be connect several times and clone new child process during create app. 

One connect and child create on OSE env:
accept(7, {sa_family=AF_INET, sin_port=htons(47976), sin_addr=inet_addr("127.0.0.1")}, [16]) = 36
setsockopt(36, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(36, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
getsockopt(36, SOL_TCP, TCP_KEEPIDLE, [140733193395232], [4]) = 0
setsockopt(36, SOL_TCP, TCP_KEEPIDLE, [300], 4) = 0
getsockopt(36, SOL_TCP, TCP_KEEPINTVL, [140733193388107], [4]) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
clone(child_stack=0x7f04f36b6f10, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f04f36b79d0, tls=0x7f04f36b7700, child_tidptr=0x7f04f36b79d0) = 18441


online devenv_2968 environment:
  rubygem-mongo-1.8.1-1.el6_3.noarch
  rubygem-bson-1.8.1-1.el6_3.noarch
  rubygem-bson_ext-1.8.1-4.el6_3.x86_64

Hope this info is helpful for you .

Comment 8 Brenton Leanhardt 2013-03-20 11:54:03 UTC
Online does indeed have those RPMs installed, however the ORM that is actually used is:

ruby193-rubygem-mongoid-3.0.21-1.el6_3.noarch

It wouldn't surprise me that mongoid is much more efficient than ruby-mongo.  In light of Comment #6 I suggest we upgrade to version of rubygem-mongo and bson that work.  We'll pick up mongoid and the model refactor in the next rebase.

Comment 9 Brenton Leanhardt 2013-03-20 13:37:25 UTC
To be clear, I'm going to build:

rubygem-bson-1.8.3
rubygem-bson_ext-1.8.3
rubygem-mongo-1.8.3

Technically comment #6 mentioned rubygem-bson_ext-1.8.1 but I think the bson and bson_ext builds have to stay in sync.

Comment 10 Brenton Leanhardt 2013-03-20 18:16:23 UTC
rubygem-bson-1.8.3, rubygem-bson_ext-1.8.3 and rubygem-mongo-1.8.3 are available @
http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.1.z/2013-03-20.2/

Comment 11 xjia 2013-03-21 09:56:51 UTC
Verify:
[root@broker fd]# rpm -qa | grep rubygem-bson
rubygem-bson_ext-1.8.3-1.el6op.x86_64
rubygem-bson-1.8.3-1.el6op.noarch
[root@broker fd]# rpm -qa | grep rubygem-mongo
rubygem-mongo-1.8.3-1.el6op.noarch

And after an acceptance test, we don't see that problem again. 

Version:
http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.1.z/2013-03-20.2/

Comment 13 errata-xmlrpc 2013-04-02 08:50:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0694.html