Bug 916594
| Summary: | Can't connect to mongodb due to too many connections to db. | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | xjia <xjia> | ||||
| Component: | Node | Assignee: | Miciah Dashiel Butler Masters <mmasters> | ||||
| Status: | CLOSED ERRATA | QA Contact: | libra bugs <libra-bugs> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 1.1.1 | CC: | baulakh, bleanhar, bmoss, juwu, libra-onpremise-devel, mmasters, nsun, xtian | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: |
Creating an application in OpenShift Enterprise created 10 or more connections to MongoDB. Further attempts to connect to MongoDB failed due to too many open connections. This bug fix advises users to update rubygem-mongo and rubygem-mongo to the latest version to resolve the issue.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2013-04-02 08:50:39 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Created attachment 705314 [details]
Auto create app script
Auto create app script.
Default password in script is "redhat"
How reproducible: 1. run "rhc setup" create domain to config your environment(password is redhat in auto create script) 2. run auto create script(Attachment #705314 [details]), In a new environement on USA openstick, It will be error when about 40 applications created. Error Info: The server did not respond correctly. This may be an issue with the server configuration or with your connection to the server (such as a Web proxy or firewall).Please verify that you can access the OpenShift server https://broker.test.com/broker/rest/domains/sun000/applications And then to see log mongodb.log will be found: Mon Mar 4 23:48:52 [initandlisten] connection refused because too many open connections: 819 When I execute "rhc apps" it will create several connections to db, check the log "/var/log/mongodb/mongodb.log"
Tue Mar 19 08:01:45 [conn2525] authenticate: { authenticate: 1, user: "openshift", nonce: "1c8dfc2407b7542c", key: "0a3cf213b181462fd0a396a898cb99a0" }
Tue Mar 19 08:01:45 [conn2525] authenticate: { authenticate: 1, user: "openshift", nonce: "baacd71539701256", key: "f7f8eca1c2a566c68bb0063a717b1bb4" }
Tue Mar 19 08:01:45 [conn2527] authenticate: { authenticate: 1, user: "openshift", nonce: "3bf85fae2a2e5a9c", key: "4f48fece5481ce22a8551e013752ef74" }
Tue Mar 19 08:01:45 [conn2527] authenticate: { authenticate: 1, user: "openshift", nonce: "f985573975147a03", key: "ceb57a62e33c255e1c1bb62bffd37f41" }
Tue Mar 19 08:01:45 [conn2529] authenticate: { authenticate: 1, user: "openshift", nonce: "705236a4d3158e99", key: "6c71fa0d50a696f8700d261c41e94b78" }
Tue Mar 19 08:01:45 [conn2529] authenticate: { authenticate: 1, user: "openshift", nonce: "6a363ec7244d7fa5", key: "ccbdfe81c0a42b2931ff588221de3218" }
Tue Mar 19 08:01:45 [conn2531] authenticate: { authenticate: 1, user: "openshift", nonce: "7f03918f16d2c6b3", key: "e3506c58fd77eb9fb12f99ba562c9499" }
Tue Mar 19 08:01:45 [conn2531] authenticate: { authenticate: 1, user: "openshift", nonce: "1a4e3399ee2ea846", key: "e3804c7656686be8dd5ec49e587d1bb9" }
Tue Mar 19 08:01:45 [conn2533] authenticate: { authenticate: 1, user: "openshift", nonce: "5a3a0ac88cb817b8", key: "56bbc0fcf3b1113368f30601e3fe9deb" }
Tue Mar 19 08:01:45 [conn2533] authenticate: { authenticate: 1, user: "openshift", nonce: "30939cdef2bcc59e", key: "d68d4067801b5a0d2ae9c48771eba7d1" }
One command will create 10 connections to db.
I can reproduce the problem of many FDs' being left open indefinitely by mongod, as shown by `ls /proc/$(pgrep mongod)/fd`, using the referenced puddle. Using the following packages from the puddle, I do see the problem: rubygem-mongo-1.8.1-2.el6op rubygem-bson-1.8.1-2.el6op rubygem-bson_ext-1.8.1-4.el6op.x86_64.rpm I tried downgrading these packages, forcing them with rpm's --nodeps option. Using the following packages, I do not see the problem—mongod does not leave FDs hanging around for more than a few seconds: rubygem-mongo-1.5.2-6.el6op rubygem-bson-1.5.2-2.el6op rubygem-bson_ext-1.5.2-2.el6op Bundler didn't like it when I mixed and matched different versions of rubygem-mongo, rubygem-bson, and rubygem-bson_ext; otherwise I would have tried to isolate the problem further. Upgrading back to the packages in the puddle, I do see the problem again. I'm going to continue debugging the problem by looking through the changes in rubygem-mongo. Newer versions of the mongo, bson, and bson_ext gems also appear to resolve the problem. I accidentally installed the latest bson 1.8.3, bson_ext 1.8.1, and mongo 1.8.3 gems using rubygems, and now I don't see FDs hanging around when I do an app create and destroy. Shall I try to find the particular version and code that cause the problem, or shall we simply update these packages? I campare online environment(devenv_2968) with our Enterprise environment.
Online can not create any mongo child process after created one app
OSE will be create many mongo child process, and one process will take one FDs
When I strace the main process of mongod, will see online environment can not create any connect to mongodb when creating app, And OSE env will be connect several times and clone new child process during create app.
One connect and child create on OSE env:
accept(7, {sa_family=AF_INET, sin_port=htons(47976), sin_addr=inet_addr("127.0.0.1")}, [16]) = 36
setsockopt(36, SOL_TCP, TCP_NODELAY, [1], 4) = 0
setsockopt(36, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
getsockopt(36, SOL_TCP, TCP_KEEPIDLE, [140733193395232], [4]) = 0
setsockopt(36, SOL_TCP, TCP_KEEPIDLE, [300], 4) = 0
getsockopt(36, SOL_TCP, TCP_KEEPINTVL, [140733193388107], [4]) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=10240*1024, rlim_max=RLIM_INFINITY}) = 0
clone(child_stack=0x7f04f36b6f10, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f04f36b79d0, tls=0x7f04f36b7700, child_tidptr=0x7f04f36b79d0) = 18441
online devenv_2968 environment:
rubygem-mongo-1.8.1-1.el6_3.noarch
rubygem-bson-1.8.1-1.el6_3.noarch
rubygem-bson_ext-1.8.1-4.el6_3.x86_64
Hope this info is helpful for you .
Online does indeed have those RPMs installed, however the ORM that is actually used is: ruby193-rubygem-mongoid-3.0.21-1.el6_3.noarch It wouldn't surprise me that mongoid is much more efficient than ruby-mongo. In light of Comment #6 I suggest we upgrade to version of rubygem-mongo and bson that work. We'll pick up mongoid and the model refactor in the next rebase. To be clear, I'm going to build: rubygem-bson-1.8.3 rubygem-bson_ext-1.8.3 rubygem-mongo-1.8.3 Technically comment #6 mentioned rubygem-bson_ext-1.8.1 but I think the bson and bson_ext builds have to stay in sync. rubygem-bson-1.8.3, rubygem-bson_ext-1.8.3 and rubygem-mongo-1.8.3 are available @ http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.1.z/2013-03-20.2/ Verify: [root@broker fd]# rpm -qa | grep rubygem-bson rubygem-bson_ext-1.8.3-1.el6op.x86_64 rubygem-bson-1.8.3-1.el6op.noarch [root@broker fd]# rpm -qa | grep rubygem-mongo rubygem-mongo-1.8.3-1.el6op.noarch And after an acceptance test, we don't see that problem again. Version: http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.1.z/2013-03-20.2/ Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0694.html |
Description of problem: mongodb and broker service on the same node. Can't connect to mongo db. Under /proc/<mongod-pid>/fd , there are 830 fds. When execute "mongo" on broker, it will prompt: [root@broker ~]# mongo MongoDB shell version: 2.0.2 connecting to: test Thu Feb 28 06:24:26 DBClientCursor::init call() failed Thu Feb 28 06:24:26 Error: Error during mongo startup. :: caused by :: DBClientBase::findN: transport error: 127.0.0.1 query: { whatsmyuri: 1 } shell/mongo.js:84 exception: connect failed Execute "netstat -nal|grep 27017" on broker, you will find a lot of socket are "CLOSE_WAIT" or "ESTABLISHED" Version-Release number of selected component (if applicable): http://buildvm-devops.usersys.redhat.com/puddle/build/OpenShiftEnterprise/1.1.z/2013-02-27.1/ How reproducible: Not sure Steps to Reproduce: 1. Create app , and check the number of fd under /proc/<mongod-pid>/fd 2. Delete the app and check the number of fd under /proc/<mongod-pid>/fd 3. Many time repeat the step 1 and 2 . Actual results: 3. Can't connect mongo Expected results: Still can connect to db. Additional info: