Created attachment 478934 [details] pulp server log file Description of problem: After a few days of running, the pulp wsgi daemon process becomes unresponsive Version-Release number of selected component (if applicable): commit 3487ce737bb937e7c010fdae16f1edb8f3748fe0 How reproducible: Seemingly very (more than 1 complaint) Steps to Reproduce: 1. install pulp in developer mode 2. let run for a few days w/ no operations 3. try any query to the pulp api Actual results: response times out and is logged by apache Expected results: valid queries to return results, invalid to return errors Additional info: even querying an invalid resource such as https://localhost/pulp/api/some-invalid-handler/ does not return the expected 404 the process seems to be alive: $ ps aux | grep pulp apache 1637 0.6 1.6 512952 16600 ? Sl 10:07 0:00 (wsgi:pulp) apache log entry: [Tue Feb 15 07:58:29 2011] [error] [client 127.0.0.1] Script timed out before returning headers: webservices.wsgi verified on fedora 13
Note: Restarting qpidd doesn't help when the server is in this stuck state
Created attachment 478937 [details] httpd access logs
Seeing this on Preethi's server that went unresponsive: [Sun Feb 20 03:13:02 2011] [warn] RSA server certificate is a CA certificate (BasicConstraints: CA == TRUE !?) [Sun Feb 20 03:16:29 2011] [error] [client 10.11.231.129] Script timed out before returning headers: webservices.wsgi [Sun Feb 20 07:16:30 2011] [error] [client 10.11.231.129] Script timed out before returning headers: webservices.wsgi [Sun Feb 20 11:16:30 2011] [error] [client 10.11.231.129] Script timed out before returning headers: webservices.wsgi [Sun Feb 20 15:16:31 2011] [error] [client 10.11.231.129] Script timed out before returning headers: webservices.wsgi [Sun Feb 20 19:16:31 2011] [error] [client 10.11.231.129] Script timed out before returning headers: webservices.wsgi [Sun Feb 20 23:16:31 2011] [error] [client 10.11.231.129] Script timed out before returning headers: webservices.wsgi [Mon Feb 21 03:16:32 2011] [error] [client 10.11.231.129] Script timed out before returning headers: webservices.wsgi [Mon Feb 21 07:16:32 2011] [error] [client 10.11.231.129] Script timed out before returning headers: webservices.wsgi [Mon Feb 21 09:37:20 2011] [error] [client 10.11.231.129] Script timed out before returning headers: webservices.wsgi
Saw this today on my server error_log 7f371f139000-7f371f1bb000 rw-p 00000000 00:00 0 7f371f1bb000-7f371f236000 rw-s 00000000 00:04 4314012 /dev/zero (deleted) 7f371f236000-7f371f23d000 rw-p 00000000 00:00 0 7f371f24b000-7f371f24e000 rw-p 00000000 00:00 0 7f371f24e000-7f371f24f000 rw-s 00000000 fd:00 1703966 /tmp/ffiHdgwve (deleted) 7f371f24f000-7f371f250000 r-xs 00000000 fd:00 1703966 /tmp/ffiHdgwve (deleted) 7f371f250000-7f371f251000 rw-p 00000000 00:00 0 7f371f251000-7f371f252000 r--p 0001e000 fd:00 268775 /lib64/ld-2.13.so 7f371f252000-7f371f253000 rw-p 0001f000 fd:00 268775 /lib64/ld-2.13.so 7f371f253000-7f371f254000 rw-p 00000000 00:00 0 7f371f254000-7f371f2a7000 r-xp 00000000 fd:00 2501636 /usr/sbin/httpd 7f371f4a6000-7f371f4a9000 r--p 00052000 fd:00 2501636 /usr/sbin/httpd 7f371f4a9000-7f371f4aa000 rw-p 00055000 fd:00 2501636 /usr/sbin/httpd 7f371f4aa000-7f371f4ae000 rw-p 00000000 00:00 0 7f371f896000-7f37201ea000 rw-p 00000000 00:00 0 [heap] 7fffc9ad8000-7fffc9b15000 rw-p 00000000 00:00 0 [stack] 7fffc9bff000-7fffc9c00000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] [Sun Mar 06 03:14:52 2011] [notice] child pid 15759 exit signal Aborted (6) *** glibc detected *** (wsgi:pulp) : malloc(): memory corruption: 0x00007f37201a2c10 ***
in ssl_error_log ==> /var/log/httpd/ssl_access_log <== 10.11.231.129 - - [06/Mar/2011:05:38:44 -0500] "POST /pulp/api//consumers/preethi/profile/ HTTP/1.1" 500 627 10.11.231.129 - - [06/Mar/2011:09:38:44 -0500] "POST /pulp/api//consumers/preethi/profile/ HTTP/1.1" 500 627 10.11.231.129 - - [06/Mar/2011:13:38:45 -0500] "POST /pulp/api//consumers/preethi/profile/ HTTP/1.1" 500 627 10.11.231.129 - - [06/Mar/2011:17:38:45 -0500] "POST /pulp/api//consumers/preethi/profile/ HTTP/1.1" 500 627 10.11.231.129 - - [06/Mar/2011:21:38:45 -0500] "POST /pulp/api//consumers/preethi/profile/ HTTP/1.1" 500 627 10.11.231.129 - - [07/Mar/2011:01:38:45 -0500] "POST /pulp/api//consumers/preethi/profile/ HTTP/1.1" 500 627 10.11.231.129 - - [07/Mar/2011:05:38:46 -0500] "POST /pulp/api//consumers/preethi/profile/ HTTP/1.1" 500 627 10.11.231.129 - - [07/Mar/2011:09:05:13 -0500] "PUT /pulp/api//roles/ HTTP/1.1" 500 627 10.11.231.129 - - [07/Mar/2011:09:08:13 -0500] "GET /pulp/api//repositories/ HTTP/1.1" 500 627
pulp 0.0.151 fedora 13 [Sun Apr 03 03:33:12 2011] [notice] Digest: generating secret for digest authentication ... [Sun Apr 03 03:33:12 2011] [notice] Digest: done [Sun Apr 03 03:33:13 2011] [error] python_init: Python version mismatch, expected '2.6', found '2.6.4'. [Sun Apr 03 03:33:13 2011] [error] python_init: Python executable found '/usr/bin/python'. [Sun Apr 03 03:33:13 2011] [error] python_init: Python path being used '/usr/lib64/python26.zip:/usr/lib64/python2.6/:/usr/lib64/python2.6/plat-linux2:/usr/lib64/python2.6/lib-tk:/usr/lib64/python2.6/lib-old:/usr/lib64/python2.6/lib-dynload'. [Sun Apr 03 03:33:13 2011] [notice] mod_python: Creating 4 session mutexes based on 256 max processes and 0 max threads. [Sun Apr 03 03:33:13 2011] [notice] mod_python: using mutex_directory /tmp [Sun Apr 03 03:33:13 2011] [notice] SSL FIPS mode disabled [Sun Apr 03 03:33:13 2011] [notice] Apache/2.2.17 (Unix) DAV/2 mod_auth_kerb/5.4 mod_auth_pgsql/2.0.3 PHP/5.3.5 mod_python/3.3.1 Python/2.6.4 mod_ssl/2.2.17 OpenSSL/1.0.0d-fips mod_wsgi/3.1 mod_perl/2.0.4 Perl/v5.10.1 configured -- resuming normal operations [Sun Apr 03 03:33:16 2011] [notice] child pid 1988 exit signal Segmentation fault (11) [Sun Apr 03 03:33:17 2011] [notice] child pid 2016 exit signal Segmentation fault (11) Fatal Python error: GC object already tracked [Sun Apr 03 03:33:18 2011] [error] mod_wsgi (pid=2023): Target WSGI script '/srv/pulp/bootstrap.wsgi' cannot be loaded as Python module. [Sun Apr 03 03:33:18 2011] [error] mod_wsgi (pid=2023): Exception occurred processing WSGI script '/srv/pulp/bootstrap.wsgi'. [Sun Apr 03 03:33:18 2011] [notice] child pid 2023 exit signal Aborted (6) [Sun Apr 03 03:33:19 2011] [notice] child pid 2031 exit signal Segmentation fault (11) *** glibc detected *** (wsgi:pulp) : malloc(): memory corruption: 0x00007f01236078d0 ***
ssl_error_log [Mon Apr 04 09:31:06 2011] [error] [client 127.0.0.1] Script timed out before returning headers: webservices.wsgi
abrtd log Apr 3 03:30:01 jrist python: abrt: detected unhandled Python exception in /usr/lib/python2.6/site-packages/pulp-0.0.6-py2.6.egg/pulp/auditing.pyc Apr 3 03:30:01 jrist abrtd: dumpsocket: New client connected Apr 3 03:30:01 jrist abrtd: dumpsocket: Saved Python crash dump of pid 1676 to /var/spool/abrt/pyhook-1301823001-1676 Apr 3 03:30:01 jrist abrtd: dumpsocket: Socket client disconnected Apr 3 03:30:01 jrist abrtd: Directory 'pyhook-1301823001-1676' creation detected Apr 3 03:30:01 jrist abrtd: Executable '/usr/lib/python2.6/site-packages/pulp-0.0.6-py2.6.egg/pulp/auditing.pyc' doesn't belong to any package Apr 3 03:30:01 jrist abrtd: Corrupted or bad crash /var/spool/abrt/pyhook-1301823001-1676 (res:4), deleting Apr 3 03:30:01 jrist python: abrt: detected unhandled Python exception in /workspace/pulp/src/pulp/server/auditing.pyc Apr 3 03:30:01 jrist abrtd: dumpsocket: New client connected Apr 3 03:30:01 jrist abrtd: dumpsocket: Saved Python crash dump of pid 1678 to /var/spool/abrt/pyhook-1301823001-1678 Apr 3 03:30:01 jrist abrtd: dumpsocket: Socket client disconnected Apr 3 03:30:01 jrist abrtd: Directory 'pyhook-1301823001-1678' creation detected Apr 3 03:30:01 jrist abrtd: Executable '/workspace/pulp/src/pulp/server/auditing.pyc' doesn't belong to any package Apr 3 03:30:01 jrist abrtd: Corrupted or bad crash /var/spool/abrt/pyhook-1301823001-1678 (res:4), deleting
Not sure if it's related or not, but throwing this in here since it looks similar. This was seen on Pulp builds around .190 (maybe a few back by the download service team) by two different people: Backtrace in /var/log/httpd/error_log. [Thu Jun 16 17:50:55 2011] [error] import md5 *** glibc detected *** (wsgi:pulp) : double free or corruption (!prev): 0x00007fa1e17a8d10 *** The full traceback is in the attachment comment_10_full_trace
Created attachment 505032 [details] Full traceback from corruption error in comment #10
Still persists with pulp - 194 and rhui 2.0.32 /var/log/httpd/ssl_error_log [Mon Jun 27 03:47:55 2011] [error] [client 10.122.70.115] Script timed out before returning headers: webservices.wsgi Connecting to RHUA [ip-10-122-70-115.ec2.internal]... Successfully connected to [ip-10-122-70-115.ec2.internal] Unexpected error caught at the shell level Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/rhui/tools/shell.py", line 75, in safe_listen self.listen(clear=first_run) File "/usr/lib/python2.6/site-packages/rhui/tools/shell.py", line 94, in listen Shell.listen(self) File "/usr/lib/python2.6/site-packages/rhui/common/shell.py", line 191, in listen item.func(*args, **item.kwargs) File "/usr/lib/python2.6/site-packages/rhui/tools/screens/sync.py", line 89, in cds_display self._sync_display('-= CDS Synchronization Status =-', self.pulp.cds_with_sync_status, True) File "/usr/lib/python2.6/site-packages/rhui/tools/screens/sync.py", line 268, in _sync_display data_sets = retrieve_func() File "/usr/lib/python2.6/site-packages/rhui/tools/pulp_api.py", line 518, in cds_with_sync_status cds_list = self.cds_list() File "/usr/lib/python2.6/site-packages/rhui/tools/pulp_api.py", line 509, in cds_list raise e ServerRequestError: (500, '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>500 Internal Server Error</title>\n</head><body>\n<h1>Internal Server Error</h1>\n<p>The server encountered an internal error or\nmisconfiguration and was unable to complete\nyour request.</p>\n<p>Please contact the server administrator,\n root@localhost and inform them of the time the error occurred,\nand anything you might have done that may have\ncaused the error.</p>\n<p>More information about this error may be available\nin the server error log.</p>\n<hr>\n<address>Apache/2.2.15 (Red Hat) Server at ip-10-122-70-115.ec2.internal Port 443</address>\n</body></html>\n', None)
James - this should be fixed with the mod_python removal you did. Can you comment on that and pitch this to MODIFIED?
The memory corruption and segmentation fault errors were tracked down to a logrotate cron process that was sending a SIGHUP signal to apache. Unfortunately, when both mod_python and mod_wsgi are loaded modules in apache, these errors ocurred. There is further information provided at http://code.google.com/p/modwsgi/wiki/FrequentlyAskedQuestions Now that we've moved off mod_python and rely solely on mod_wsgi, this issue is resolved.
Build 0.206, CR14.
I think following logs are related to this defect, so reporting this here. I'm using following build on fresh installed setup: pulp 0.208 rh-rhui-tools 2.0.40 The system was idle since last night and today when I started the rhui-manager, it was unresponsive for a long time and finally got unexpected error on UI. I got following in rhui.log: rhui.log: ============ Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/rhui/tools/shell.py", line 85, in safe_listen self.listen(clear=first_run) File "/usr/lib/python2.6/site-packages/rhui/tools/shell.py", line 111, in listen Shell.listen(self) File "/usr/lib/python2.6/site-packages/rhui/common/shell.py", line 191, in listen item.func(*args, **item.kwargs) File "/usr/lib/python2.6/site-packages/rhui/tools/screens/sync.py", line 79, in repo_display self._sync_display('-= Repository Synchronization Status =-', self.pulp.repos_with_sync_status, False) File "/usr/lib/python2.6/site-packages/rhui/tools/screens/sync.py", line 272, in _sync_display data_sets = retrieve_func() File "/usr/lib/python2.6/site-packages/rhui/tools/pulp_api.py", line 219, in repos_with_sync_status redhat_repos = self.redhat_repo_list() File "/usr/lib/python2.6/site-packages/rhui/tools/pulp_api.py", line 164, in redhat_repo_list raise e ServerRequestError: (500, '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>500 Internal Server Error</title>\n</head><body>\n<h1>Internal Server Error</h1>\n<p>The server encountered an internal error or\nmisconfiguration and was unable to complete\nyour request.</p>\n<p>Please contact the server administrator,\n root@localhost and inform them of the time the error occurred,\nand anything you might have done that may have\ncaused the error.</p>\n<p>More information about this error may be available\nin the server error log.</p>\n<hr>\n<address>Apache/2.2.15 (Red Hat) Server at dhcp201-175.englab.pnq.redhat.com Port 443</address>\n</body></html>\n', None) I checked the "pulp-server" status on rhua and found following: [root@dhcp201-175 ~]# service pulp-server status mongod dead but subsys locked [root@dhcp201-175 ~]# [root@dhcp201-175 ~]# service mongod status mongod dead but subsys locked [root@dhcp201-175 ~]# error_log says: ===================== [root@dhcp201-175 ~]# tail -n 100 /var/log/httpd/error_log IOError: [Errno 32] Broken pipe IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module> main() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main process() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process sys.stdout.flush() Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) IOError File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module> main() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main process() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process sys.stdout.flush() : [Errno 32] Broken pipe IOError: [Errno 32] Broken pipe ssl_error_log says: ====================== [Tue Jul 19 11:46:47 2011] [error] [client 10.65.201.133] Script timed out before returning headers: webservices.wsgi Pulp 0.208 build is installed and without mod_python package.: ============================================================== [root@dhcp201-175 ~]# rpm -qa | grep pulp pulp-common-0.0.208-1.el6.noarch pulp-0.0.208-1.el6.noarch pulp-client-0.0.208-1.el6.noarch [root@dhcp201-175 ~]# rpm -qa | grep mod_python [root@dhcp201-175 ~]# rpm -qa | grep mod_wsgi mod_wsgi-3.2-3.sslpatch.el6.x86_64 [root@dhcp201-175 ~]# [root@dhcp201-133 ~]# rpm -qa | grep rh-rhui-tools rh-rhui-tools-2.0.40-1.el6.noarch James, could you please take a look ? Not sure if its related to original defect.
The traceback from https://bugzilla.redhat.com/show_bug.cgi?id=677735#c16 is not related to this bug. A new bug was filed for that issue (https://bugzilla.redhat.com/show_bug.cgi?id=723534).
moving to verified as I have not seen this error root@preethi ~]# rpm -q pulp pulp-0.0.233-1.fc15.noarch
Pulp v1.0 is released Closed Current Release.
Pulp v1.0 is released.