Hide Forgot
Scenario: mongo died unexpectedly, causing itself to be locked. As a result, httpd failed to stop and start cleanly, leaving zombies httpd process in the system. ==mongodb.log== Thu May 12 03:57:45 MongoDB starting : pid=20827 port=27017 dbpath=/var/lib/mongodb 64-bit Thu May 12 03:57:45 db version v1.6.3, pdfile version 4.5 Thu May 12 03:57:45 git version: nogitversion Thu May 12 03:57:45 sys info: Linux x86-12.phx2.fedoraproject.org 2.6.32-72.el6.bz634452.x86_64 #1 SMP Fri Sep 17 06:52:25 EDT 2010 x86_64 BOOST_LIB_VERSION=1_41 ************** old lock file: /var/lib/mongodb/mongod.lock. probably means unclean shutdown recommend removing file and running --repair see: http://dochub.mongodb.org/core/repair for more information ************* Thu May 12 03:57:45 exception in initAndListen std::exception: old lock file, terminating Thu May 12 03:57:45 dbexit: Thu May 12 03:57:45 shutdown: going to close listening sockets... Thu May 12 03:57:45 shutdown: going to flush oplog... Thu May 12 03:57:45 shutdown: going to close sockets... Thu May 12 03:57:45 shutdown: waiting for fs preallocator... Thu May 12 03:57:45 shutdown: closing all files... Thu May 12 03:57:45 closeAllFiles() finished Thu May 12 03:57:45 dbexit: really exiting now == ssl_error_log == [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] mod_wsgi (pid=18931): Target WSGI script '/srv/pulp/webservices.wsgi' cannot be loaded as Python module. [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] mod_wsgi (pid=18931): Exception occurred processing WSGI script '/srv/pulp/webservices.wsgi'. [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] Traceback (most recent call last): [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] File "/srv/pulp/webservices.wsgi", line 17, in <module> [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] from pulp.server.webservices.application import wsgi_application [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] File "/usr/lib/python2.6/site-packages/pulp/server/webservices/application.py", line 31, in <module> [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] connection.initialize() [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] File "/usr/lib/python2.6/site-packages/pulp/server/db/connection.py", line 49, in initialize [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] _connection = pymongo.Connection(seeds) [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] File "/usr/lib64/python2.6/site-packages/pymongo/connection.py", line 305, in __init__ [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] self.__find_master() [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] File "/usr/lib64/python2.6/site-packages/pymongo/connection.py", line 509, in __find_master [Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] raise AutoReconnect("could not find master/primary") == pulp.log == 2011-05-11 21:48:20,514 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 0]: connection aborted 2011-05-11 21:48:20,514 [WARNING][Thread-1] connect() @ driver.py:492 - trying: pulp-nightly.usersys.redhat.com:5672 2011-05-11 21:48:20,515 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 1]: [Errno 111] Connection refused 2011-05-11 21:48:20,515 [WARNING][Thread-1] close_engine() @ driver.py:434 - sleeping 1 seconds 2011-05-11 21:48:20,515 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 0]: connection aborted 2011-05-11 21:48:20,516 [WARNING][Thread-1] connect() @ driver.py:492 - trying: localhost:5672 2011-05-11 21:48:20,516 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 1]: [Errno 111] Connection refused 2011-05-11 21:48:20,516 [WARNING][Thread-1] close_engine() @ driver.py:434 - sleeping 1 seconds 2011-05-11 21:48:21,516 [WARNING][Thread-1] connect() @ driver.py:492 - trying: pulp-nightly.usersys.redhat.com:5672 2011-05-11 21:48:21,517 [WARNING][Thread-1] connect() @ driver.py:502 - reconnect succeeded: pulp-nightly.usersys.redhat.com:5672 2011-05-11 21:48:21,517 [WARNING][Thread-1] connect() @ driver.py:492 - trying: localhost:5672 2011-05-11 21:48:21,518 [WARNING][Thread-1] connect() @ driver.py:502 - reconnect succeeded: localhost:5672 2011-05-12 02:59:10,223 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 0]: connection aborted 2011-05-12 02:59:10,318 [WARNING][Thread-1] connect() @ driver.py:492 - trying: pulp-nightly.usersys.redhat.com:5672 2011-05-12 02:59:10,378 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 1]: [Errno 111] Connection refused 2011-05-12 02:59:10,378 [WARNING][Thread-1] close_engine() @ driver.py:434 - sleeping 1 seconds 2011-05-12 02:59:10,379 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 0]: connection aborted 2011-05-12 02:59:10,379 [WARNING][Thread-1] connect() @ driver.py:492 - trying: localhost:5672 2011-05-12 02:59:10,386 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 1]: [Errno 111] Connection refused 2011-05-12 02:59:10,387 [WARNING][Thread-1] close_engine() @ driver.py:434 - sleeping 1 seconds 2011-05-12 02:59:11,383 [WARNING][Thread-1] connect() @ driver.py:492 - trying: pulp-nightly.usersys.redhat.com:5672 2011-05-12 02:59:11,383 [WARNING][Thread-1] connect() @ driver.py:502 - reconnect succeeded: pulp-nightly.usersys.redhat.com:5672 2011-05-12 02:59:11,395 [WARNING][Thread-1] connect() @ driver.py:492 - trying: localhost:5672 2011-05-12 02:59:11,396 [WARNING][Thread-1] connect() @ driver.py:502 - reconnect succeeded: localhost:5672 2011-05-12 02:59:16,596 [INFO][Dummy-3] initialize() @ connection.py:48 - Attempting Database connection with seeds = pulp-nightly.usersys.redhat.com 2011-05-12 02:59:16,775 [CRITICAL][Dummy-3] initialize() @ connection.py:55 - Database initialization failed [root@pulp-nightly httpd]# service httpd status httpd is stopped [root@pulp-nightly httpd]# service mongod status mongod dead but subsys locked [root@pulp-nightly httpd]# ps -elf|grep "httpd"|wc -l 13 [root@pulp-nightly httpd]# No other relevant information is found in /var/log/messages or /var/log/httpd/error.log. Note: 2:55am is when hudson kick off the nightly automation job. Judging from the pulp.log, looks like there are some problems starting at 21:48:20.
One theory is that mongo is already in a semi-bad state prior to automation starting. When automation starts, it uninstall pulp packages without shutting down all the services. This could lead to something lingering, causing mongo to fully die and not clean up properly. After the new version pulp is installed, "/etc/init.d/pulp-server restart" will punt mongo into a locked state and create a cascading effect where httpd is unable to cleanup properly, and left some zombie process in the system.
No more mod_python