Bug 875270 - mongodb-2.2 cart restart sometimes spawns multiple processes
Summary: mongodb-2.2 cart restart sometimes spawns multiple processes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OKD
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Mrunal Patel
QA Contact: libra bugs
URL:
Whiteboard:
: 906654 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-11-10 01:15 UTC by Dan Mace
Modified: 2015-05-14 23:02 UTC (History)
5 users (show)

Fixed In Version: devenv_2776
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-02-13 23:40:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Cucumber logs demonstrating the behavior (4.51 KB, text/x-log)
2012-11-10 01:17 UTC, Dan Mace
no flags Details

Description Dan Mace 2012-11-10 01:15:10 UTC
Description of problem:

Occasionally during CI test runs, the mongodb-2.2 embedded cart will sometimes erroneously spawn multiple processes during a call to the restart hook. The mongodb_ctl.sh script has timing-sensitive logic to do process detection, which leaves the potential for a false negative check and stacked mongod invocations.

The exactly conditions which can reproduce this are not yet known, but it has been observed in recent fork AMI test executions. An analysis of the logs revealed the call to the restart hook and subsequent bogus processes (3 mongod procs where 1 was expected post-restart).

Version-Release number of selected component (if applicable):


How reproducible:

Unknown at this time.


Steps to Reproduce:

N/A
  
Actual results:

More than one mongod process following an app restart.

Expected results:

A single mongod process.

Additional info:

Comment 1 Dan Mace 2012-11-10 01:17:43 UTC
Created attachment 641921 [details]
Cucumber logs demonstrating the behavior

I've attached a portion of the cucumber logs demonstrating the before and after state of the process table during a mongo restart exhibiting the problematic behavior.

Comment 2 Mrunal Patel 2012-11-15 02:07:20 UTC
Unable to reproduce the issue. One possible scenario is repair getting called.
Even if three process are spawned, only one will bind to the port.

Comment 3 Mrunal Patel 2012-11-15 17:48:28 UTC
Marking this low since we can't reproduce it.

Comment 4 Mrunal Patel 2012-11-28 23:59:30 UTC
Investigated further -
[pid 20261] 1354144886.641274 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <0.060418>
[pid 20261] 1354144886.641316 futex(0x11bb240, FUTEX_WAIT_PRIVATE, 2, {0, 124225760} <unfinished ...>
[pid 20263] 1354144886.683156 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <0.101931>
[pid 20263] 1354144886.683198 futex(0x11bb240, FUTEX_WAIT_PRIVATE, 2, {0, 50865920}) = -1 ETIMEDOUT (Connection timed out) <0.051009>
[pid 20263] 1354144886.734285 futex(0x11bb240, FUTEX_WAIT_PRIVATE, 2, {0, 71312592} <unfinished ...>
[pid 20260] 1354144886.736554 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <0.237626>
[pid 20260] 1354144886.736595 futex(0x11bb240, FUTEX_WAIT_PRIVATE, 2, {0, 258809056} <unfinished ...>
[pid 20261] 1354144886.765659 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <0.124319>
[pid 20261] 1354144886.765713 futex(0x11bb240, FUTEX_WAIT_PRIVATE, 2, {0, 181012160}^C <unfinished ...>

(gdb) bt
#0  0x000000000098e650 in sys_futex (w=0x11bb240, value=2, loop=<value optimized out>)
    at src/third_party/gperftools-2.0/src/base/linux_syscall_support.h:1789
#1  base::internal::SpinLockDelay (w=0x11bb240, value=2, loop=<value optimized out>)
    at src/third_party/gperftools-2.0/src/base/spinlock_linux-inl.h:87
#2  0x000000000098e43c in SpinLock::SlowLock (this=0x11bb240)
    at src/third_party/gperftools-2.0/src/base/spinlock.cc:132
#3  0x00000000009a37f3 in Lock () at src/third_party/gperftools-2.0/src/base/spinlock.h:75
#4  SpinLockHolder () at src/third_party/gperftools-2.0/src/base/spinlock.h:141
#5  tcmalloc::ThreadCache::CreateCacheIfNecessary ()
    at src/third_party/gperftools-2.0/src/thread_cache.cc:341
#6  0x000000000099d9fa in GetCache (size=8) at src/third_party/gperftools-2.0/src/thread_cache.h:389
#7  (anonymous namespace)::do_malloc (size=8) at src/third_party/gperftools-2.0/src/tcmalloc.cc:1058
#8  0x0000000000a3de3c in do_malloc_or_cpp_alloc (size=8)
    at src/third_party/gperftools-2.0/src/tcmalloc.cc:1005
#9  tc_malloc (size=8) at src/third_party/gperftools-2.0/src/tcmalloc.cc:1492
#10 0x0000003c5a60ff2a in boost::detail::get_once_per_thread_epoch() ()
   from /usr/lib64/libboost_thread-mt.so.5

There is a bug filed against mongo upstream:
https://jira.mongodb.org/browse/SERVER-7434

Comment 5 Mrunal Patel 2012-12-06 18:10:40 UTC
Will try to compile with flag to avoid the issue. However, we have no definitive
test to tell that it is fixed.

Comment 6 Mrunal Patel 2012-12-12 01:16:51 UTC
The rpm compiled with the flag did not resolve the issue. We will have to wait
for upstream to fix this.

Comment 7 Mrunal Patel 2013-02-05 00:40:50 UTC
*** Bug 906654 has been marked as a duplicate of this bug. ***

Comment 8 Mrunal Patel 2013-02-05 00:42:32 UTC
The bug has been fixed in mongo 2.2.3
https://jira.mongodb.org/browse/SERVER/fixforversion/12088

Comment 9 Xiaoli Tian 2013-02-05 04:09:11 UTC
(In reply to comment #8)
> The bug has been fixed in mongo 2.2.3
> https://jira.mongodb.org/browse/SERVER/fixforversion/12088

There's a FutureFeature keyword added since 2012/12/12 in the bug, will this bug still be a FutureFeature bug since we have already upgraded mongo to 2.2.2.

Thanks

-xiaoli

Comment 10 Zhe Wang 2013-02-05 05:09:39 UTC
The version of mongodb in devenv_2772 is 2.2.2, and hence, this bug was sometimes reproducible. Specifically, for "snapshot and restore" cases, most of them were failed with the error message during restoration.

couldn't connect to [127.0.250.129:27017] couldn't connect to server 127.0.250.129:27017

============================================================================
WARNING: You may have possibly encountered the mongorestore bugs related to
         MongoDB JIRA issues 7181, 7262 and 7104. We tried working around
         some these issues. You will need to manually workaround the
         remaining problems prior to proceeding. For more details, see: 
             https://jira.mongodb.org/browse/SERVER-7181
             https://jira.mongodb.org/browse/SERVER-7262
             https://jira.mongodb.org/browse/SERVER-7104
============================================================================

MongoDB already running

!WARNING! Could not restore MongoDB databases - mongorestore failed!

Moreover, when connecting the mongodb after the (failed) restoration, the error reported was:

Error: couldn't connect to server 127.0.252.1:27017 src/mongo/shell/mongo.js:93

Comment 11 Xiaoli Tian 2013-02-05 05:20:23 UTC
I'm removing the FutureFeature keyword for now to request evaluation again since 2.2.2 is used in this sprint.

Comment 12 joycezhang 2013-02-05 06:00:41 UTC
As bug 906654 is duplicated to this bug, so updated the latest comments for mongodb on devenv_2772 here:

If try to force stop the app for adding mongodb, it would be added successfully, but failed to connect  to mongo shell as below:

[app2-dev2772dmt1.dev.rhcloud.com ~]\> mongo
MongoDB shell version: 2.2.2
connecting to: 127.1.0.129:27017/admin
Tue Feb  5 00:56:05 Error: couldn't connect to server 127.1.0.129:27017 src/mongo/shell/mongo.js:93
exception: connect failed

Comment 13 Jianwei Hou 2013-02-06 02:20:38 UTC
Verified on devenv_2778

The multi-process issue is now gone, notice the mongo shell version is 2.2.3
[php1-2778t.dev.rhcloud.com ~]\> mongo
MongoDB shell version: 2.2.3
connecting to: 127.0.252.129:27017/admin
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
	http://docs.mongodb.org/
Questions? Try the support group
	http://groups.google.com/group/mongodb-user
>


Create app and add mongodb to it, then ssh into gear and run 'ps -ef', there is only one mongod process running.
Restart app/restart mongodb, and check again, this issue is not reproducible now.

[php1-2778t.dev.rhcloud.com ~]\> ps -ef|grep mongo
505      19782     1  0 21:16 ?        00:00:00 /usr/bin/mongod --auth --nojournal --smallfiles --quiet -f /var/lib/openshift/5111ba9fdefdbcad6d000454//mongodb-2.2//etc/mongodb.conf run
505      20329 17959  0 21:16 pts/0    00:00:00 grep mongo


Moving to verified.


Note You need to log in before you can comment on or make changes to this bug.