Bug 704232

Summary: httpd left 4 dead
Product: [Retired] Pulp Reporter: dgao
Component: z_otherAssignee: John Matthews <jmatthew>
Status: CLOSED NOTABUG QA Contact: dgao
Severity: unspecified Docs Contact:
Priority: medium    
Version: unspecifiedCC: skarmark
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-07-15 22:12:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description dgao 2011-05-12 13:58:47 UTC
Scenario: mongo died unexpectedly, causing itself to be locked. As a result, httpd failed to stop and start cleanly, leaving zombies httpd process in the system. 

==mongodb.log==

Thu May 12 03:57:45 MongoDB starting : pid=20827 port=27017 dbpath=/var/lib/mongodb 64-bit 
Thu May 12 03:57:45 db version v1.6.3, pdfile version 4.5
Thu May 12 03:57:45 git version: nogitversion
Thu May 12 03:57:45 sys info: Linux x86-12.phx2.fedoraproject.org 2.6.32-72.el6.bz634452.x86_64 #1 SMP Fri Sep 17 06:52:25 EDT 2010 x86_64 BOOST_LIB_VERSION=1_41
************** 
old lock file: /var/lib/mongodb/mongod.lock.  probably means unclean shutdown
recommend removing file and running --repair
see: http://dochub.mongodb.org/core/repair for more information
*************
Thu May 12 03:57:45 exception in initAndListen std::exception: old lock file, terminating
Thu May 12 03:57:45 dbexit: 

Thu May 12 03:57:45 shutdown: going to close listening sockets...
Thu May 12 03:57:45 shutdown: going to flush oplog...
Thu May 12 03:57:45 shutdown: going to close sockets...
Thu May 12 03:57:45 shutdown: waiting for fs preallocator...
Thu May 12 03:57:45 shutdown: closing all files...
Thu May 12 03:57:45     closeAllFiles() finished

Thu May 12 03:57:45 dbexit: really exiting now


== ssl_error_log ==

[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] mod_wsgi (pid=18931): Target WSGI script '/srv/pulp/webservices.wsgi' cannot be loaded as Python module.
[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] mod_wsgi (pid=18931): Exception occurred processing WSGI script '/srv/pulp/webservices.wsgi'.
[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186] Traceback (most recent call last):
[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186]   File "/srv/pulp/webservices.wsgi", line 17, in <module>
[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186]     from pulp.server.webservices.application import wsgi_application
[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186]   File "/usr/lib/python2.6/site-packages/pulp/server/webservices/application.py", line 31, in <module>
[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186]     connection.initialize()
[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186]   File "/usr/lib/python2.6/site-packages/pulp/server/db/connection.py", line 49, in initialize
[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186]     _connection = pymongo.Connection(seeds)
[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186]   File "/usr/lib64/python2.6/site-packages/pymongo/connection.py", line 305, in __init__
[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186]     self.__find_master()
[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186]   File "/usr/lib64/python2.6/site-packages/pymongo/connection.py", line 509, in __find_master
[Thu May 12 02:59:16 2011] [error] [client 10.16.79.186]     raise AutoReconnect("could not find master/primary")


== pulp.log == 

2011-05-11 21:48:20,514 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 0]: connection aborted
2011-05-11 21:48:20,514 [WARNING][Thread-1] connect() @ driver.py:492 - trying: pulp-nightly.usersys.redhat.com:5672
2011-05-11 21:48:20,515 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 1]: [Errno 111] Connection refused
2011-05-11 21:48:20,515 [WARNING][Thread-1] close_engine() @ driver.py:434 - sleeping 1 seconds
2011-05-11 21:48:20,515 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 0]: connection aborted
2011-05-11 21:48:20,516 [WARNING][Thread-1] connect() @ driver.py:492 - trying: localhost:5672
2011-05-11 21:48:20,516 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 1]: [Errno 111] Connection refused
2011-05-11 21:48:20,516 [WARNING][Thread-1] close_engine() @ driver.py:434 - sleeping 1 seconds
2011-05-11 21:48:21,516 [WARNING][Thread-1] connect() @ driver.py:492 - trying: pulp-nightly.usersys.redhat.com:5672
2011-05-11 21:48:21,517 [WARNING][Thread-1] connect() @ driver.py:502 - reconnect succeeded: pulp-nightly.usersys.redhat.com:5672
2011-05-11 21:48:21,517 [WARNING][Thread-1] connect() @ driver.py:492 - trying: localhost:5672
2011-05-11 21:48:21,518 [WARNING][Thread-1] connect() @ driver.py:502 - reconnect succeeded: localhost:5672
2011-05-12 02:59:10,223 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 0]: connection aborted
2011-05-12 02:59:10,318 [WARNING][Thread-1] connect() @ driver.py:492 - trying: pulp-nightly.usersys.redhat.com:5672
2011-05-12 02:59:10,378 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 1]: [Errno 111] Connection refused
2011-05-12 02:59:10,378 [WARNING][Thread-1] close_engine() @ driver.py:434 - sleeping 1 seconds
2011-05-12 02:59:10,379 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 0]: connection aborted
2011-05-12 02:59:10,379 [WARNING][Thread-1] connect() @ driver.py:492 - trying: localhost:5672
2011-05-12 02:59:10,386 [WARNING][Thread-1] close_engine() @ driver.py:432 - recoverable error[attempt 1]: [Errno 111] Connection refused
2011-05-12 02:59:10,387 [WARNING][Thread-1] close_engine() @ driver.py:434 - sleeping 1 seconds
2011-05-12 02:59:11,383 [WARNING][Thread-1] connect() @ driver.py:492 - trying: pulp-nightly.usersys.redhat.com:5672
2011-05-12 02:59:11,383 [WARNING][Thread-1] connect() @ driver.py:502 - reconnect succeeded: pulp-nightly.usersys.redhat.com:5672
2011-05-12 02:59:11,395 [WARNING][Thread-1] connect() @ driver.py:492 - trying: localhost:5672
2011-05-12 02:59:11,396 [WARNING][Thread-1] connect() @ driver.py:502 - reconnect succeeded: localhost:5672
2011-05-12 02:59:16,596 [INFO][Dummy-3] initialize() @ connection.py:48 - Attempting Database connection with seeds = pulp-nightly.usersys.redhat.com
2011-05-12 02:59:16,775 [CRITICAL][Dummy-3] initialize() @ connection.py:55 - Database initialization failed


[root@pulp-nightly httpd]# service httpd status
httpd is stopped
[root@pulp-nightly httpd]# service mongod status
mongod dead but subsys locked
[root@pulp-nightly httpd]# ps -elf|grep "httpd"|wc -l
13
[root@pulp-nightly httpd]# 

No other relevant information is found in /var/log/messages or /var/log/httpd/error.log.

Note: 2:55am is when hudson kick off the nightly automation job. Judging from the pulp.log, looks like there are some problems starting at 21:48:20.

Comment 1 dgao 2011-05-12 14:08:18 UTC
One theory is that mongo is already in a semi-bad state prior to automation starting. When automation starts, it uninstall pulp packages without shutting down all the services. This could lead to something lingering, causing mongo to fully die and not clean up properly. After the new version pulp is installed, "/etc/init.d/pulp-server restart" will punt mongo into a locked state and create a cascading effect where httpd is unable to cleanup properly, and left some zombie process in the system.

Comment 2 Sayli Karmarkar 2011-07-15 22:12:49 UTC
No more mod_python