Description of problem: The system was idle since last night and today when I started the rhui-manager, it was unresponsive for a long time and finally got unexpected error on UI. I got following in rhui.log: rhui.log: ============ Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/rhui/tools/shell.py", line 85, in safe_listen self.listen(clear=first_run) File "/usr/lib/python2.6/site-packages/rhui/tools/shell.py", line 111, in listen Shell.listen(self) File "/usr/lib/python2.6/site-packages/rhui/common/shell.py", line 191, in listen item.func(*args, **item.kwargs) File "/usr/lib/python2.6/site-packages/rhui/tools/screens/sync.py", line 79, in repo_display self._sync_display('-= Repository Synchronization Status =-', self.pulp.repos_with_sync_status, False) File "/usr/lib/python2.6/site-packages/rhui/tools/screens/sync.py", line 272, in _sync_display data_sets = retrieve_func() File "/usr/lib/python2.6/site-packages/rhui/tools/pulp_api.py", line 219, in repos_with_sync_status redhat_repos = self.redhat_repo_list() File "/usr/lib/python2.6/site-packages/rhui/tools/pulp_api.py", line 164, in redhat_repo_list raise e ServerRequestError: (500, '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>500 Internal Server Error</title>\n</head><body>\n<h1>Internal Server Error</h1>\n<p>The server encountered an internal error or\nmisconfiguration and was unable to complete\nyour request.</p>\n<p>Please contact the server administrator,\n root@localhost and inform them of the time the error occurred,\nand anything you might have done that may have\ncaused the error.</p>\n<p>More information about this error may be available\nin the server error log.</p>\n<hr>\n<address>Apache/2.2.15 (Red Hat) Server at dhcp201-175.englab.pnq.redhat.com Port 443</address>\n</body></html>\n', None) I checked the "pulp-server" status on rhua and found following: [root@dhcp201-175 ~]# service pulp-server status mongod dead but subsys locked [root@dhcp201-175 ~]# [root@dhcp201-175 ~]# service mongod status mongod dead but subsys locked [root@dhcp201-175 ~]# error_log says: ===================== [root@dhcp201-175 ~]# tail -n 100 /var/log/httpd/error_log IOError: [Errno 32] Broken pipe IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module> main() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main process() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process sys.stdout.flush() Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) IOError File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module> main() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main process() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process sys.stdout.flush() : [Errno 32] Broken pipe IOError: [Errno 32] Broken pipe ssl_error_log says: ====================== [Tue Jul 19 11:46:47 2011] [error] [client 10.65.201.133] Script timed out before returning headers: webservices.wsgi Pulp 0.208 build is installed and without mod_python package.: ============================================================== [root@dhcp201-175 ~]# rpm -qa | grep pulp pulp-common-0.0.208-1.el6.noarch pulp-0.0.208-1.el6.noarch pulp-client-0.0.208-1.el6.noarch [root@dhcp201-175 ~]# rpm -qa | grep mod_python [root@dhcp201-175 ~]# rpm -qa | grep mod_wsgi mod_wsgi-3.2-3.sslpatch.el6.x86_64 [root@dhcp201-175 ~]# [root@dhcp201-133 ~]# rpm -qa | grep rh-rhui-tools rh-rhui-tools-2.0.40-1.el6.noarch Version-Release number of selected component (if applicable): pulp 0.208 rh-rhui-tools 2.0.40 How reproducible: facing this from last 2 days. Steps to Reproduce: Leave the setup idle for a night. Actual results: - IOError: [Errno 32] Broken pipe in apache's log file - mongod crashed. Expected results: Additional info: I'm reporting this because I'm getting this very frequently.
Mongodb logs are fine.. [root@dhcp201-175 ~]# tail -f /var/log/pulp/db.log migration to data model version 15 started migration to data model version 15 complete migration to data model version 16 started migration to data model version 16 complete migration to data model version 17 started migration to data model version 17 complete migration to data model version 18 started migration to data model version 18 complete migration to data model version 19 started migration to data model version 19 complete
The broken pipe is the ActiveObject child process in grinder trying to talk back to the parent process and the pipe has been broken. The most likely cause is that the parent process (which is the wsgi pulp process) is dead. I believe this is a casualty of something killing one the mod_wsgi or httpd processes and not the cause.
Today I found the similar broken pipe error in the apache's error_log. But this time mongod was running. IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module> main() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main process() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process sys.stdout.flush() IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module> main() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main process() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process sys.stdout.flush() IOError: [Errno 32] Broken pipe [root@dhcp201-175 ~]# service pulp-server status mongod (pid 2153) is running... httpd (pid 2191) is running... qpidd (pid 2156) is running... [root@dhcp201-175 ~]#
Not sure, if the following info related or not, but still wanted to share here: CDN to RHUA sync failed with following exception: 2011-07-22 17:51:24,235 9440:140265603970816: grinder.RepoFetch:ERROR: RepoFetch:126 Caught exception when trying to fetch content from [https://cdn.redhat.com/content/dist/rhel/rhui/server/5Server/x86_64/rhui/1.1/os]: Cannot retrieve repository metadata (repomd.xml) for repository: . Please verify its path and try again 2011-07-22 17:52:15,962 9440:140266092168960: grinder.RepoFetch:INFO: RepoFetch:481 Processed <5163> items in [2280] seconds 2011-07-22 17:52:15,986 9440:140266092168960: grinder.RepoFetch:INFO: RepoFetch:484 Cleaning any orphaned packages.. 2011-07-22 17:52:15,963 9440:140265572501248: grinder.RepoFetch:INFO: RepoFetch:440 fetchYumRepo() basepath = /var/lib/pulp//repos/content/dist/rhel/rhui/server-6/updates/6Server/x86_64/optional/os 2011-07-22 17:52:30,356 9440:140265572501248: grinder.RepoFetch:INFO: RepoFetch:457 Fetching repo metadata... 2011-07-22 17:53:15,910 9440:140265572501248: grinder.RepoFetch:ERROR: RepoFetch:126 Caught exception when trying to fetch content from [https://cdn.redhat.com/content/dist/rhel/rhui/server-6/updates/6Server/x86_64/optional/os]: Cannot retrieve repository metadata (repomd.xml) for repository: . Please verify its path and try again Please note that these repos were already synched and this happened when next sync schedule occurred. When I restarted the pulp-server, I found following: [root@dhcp201-175 ~]# service pulp-server restart Stopping httpd: [FAILED] Stopping Qpid AMQP daemon: [ OK ] Stopping mongod: [ OK ] Starting mongod: [ OK ] Starting Qpid AMQP daemon: [ OK ] Starting httpd: [ OK ] /var/log/httpd/error_log says: =============================== IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module> main() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main process() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process sys.stdout.flush() IOError: [Errno 32] Broken pipe [Fri Jul 22 18:06:43 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored [Fri Jul 22 18:06:43 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored [Fri Jul 22 18:06:44 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored [Fri Jul 22 18:06:45 2011] [warn] child process 30475 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:45 2011] [warn] child process 2253 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:45 2011] [warn] child process 2254 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:45 2011] [warn] child process 2263 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:45 2011] [warn] child process 25720 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:47 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored [Fri Jul 22 18:06:47 2011] [warn] child process 30475 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:47 2011] [warn] child process 2253 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:47 2011] [warn] child process 2254 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:47 2011] [warn] child process 2263 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:47 2011] [warn] child process 25720 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:47 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored [Fri Jul 22 18:06:49 2011] [warn] child process 30475 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:49 2011] [warn] child process 2253 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:49 2011] [warn] child process 2254 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:49 2011] [warn] child process 2263 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:49 2011] [warn] child process 25720 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:51 2011] [error] child process 2253 still did not exit, sending a SIGKILL [Fri Jul 22 18:06:51 2011] [error] child process 2254 still did not exit, sending a SIGKILL [Fri Jul 22 18:06:51 2011] [error] child process 2263 still did not exit, sending a SIGKILL [Fri Jul 22 18:06:51 2011] [error] child process 25720 still did not exit, sending a SIGKILL
Created attachment 515116 [details] snips from messages on pulp-server
Looking into the old messages file /var/log/messages-20110724, I found a total of 77 call traces invoked by page_fault from being out of memory. These were invoked by a number of processes, including, but not limited to: python httpd qpidd hald mongod This vm reports 1gb mem and 2gb swap, however it always reports only 100mb mem free and 500 mb swap free at any time. I honestly believe that these problems are a result of running pulp on a significantly under-powered box and not the result of any programmatic errors in pulp or any other process that's being killed. I've attached a few snips of the call traces I saw in messages as dmesg.txt
Scratch part of my previous comment, after rebooting the mem freed up and now I've got to figure out what is leaking memory.
They lowered the grinder threads to 4 (under [yum] in pulp.conf). In build 0.214.
Tested this on Amazon setup with latest iso ( pulp 0.214). All nodes have 7.5G of ram. However during sync following was the mem status of different nodes: ON CDS nodes: ============== [root@ip-10-114-77-254 ~]# cat /proc/meminfo MemTotal: 7646944 kB MemFree: 2287268 kB [root@ip-10-76-229-139 ~]# cat /proc/meminfo MemTotal: 7646944 kB MemFree: 1553824 kB ON rhua node: ============= [root@ip-10-33-2-145 ~]# cat /proc/meminfo MemTotal: 7646944 kB MemFree: 46748 kB However, I didn't get that broken pipe error. But I'll test this again on local setup once and will update the results.
I verified with latest iso ( pulp 0.214) on local setup. This broken pipe error is not reproducible with new build. All nodes are of 1G mem. Here is the memory status while repo sync. On rhua: ========== [root@dhcp201-136 client_entitle]# cat /proc/meminfo MemTotal: 997156 kB MemFree: 60240 kB on CDS: ======= [root@dhcp201-149 ~]# cat /proc/meminfo MemTotal: 997156 kB MemFree: 71272 kB I started the repo sync for following repos: ------ rhui (repo) => l Custom Repositories custom_repo Red Hat Repositories Red Hat Enterprise Linux Server 6 Releases (RPMs) (6Server-x86_64) Red Hat Update Infrastructure 1.2 (RPMs) (5Server-x86_64) Red Hat Update Infrastructure 1.2 (RPMs) (5Server-i386) Red Hat Enterprise Linux Server 6 Updates (RPMs) (6Server-x86_64) ------
moving to release pending
closing out, product released