Bug 723534
Summary: | RHUA becomes unresponsive and got broken pipe error in apache's error_log | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Update Infrastructure for Cloud Providers | Reporter: | Sachin Ghai <sghai> | ||||
Component: | RHUA | Assignee: | Jay Dobies <jason.dobies> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | wes hayutin <whayutin> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 2.0 | CC: | jconnor, jortel, jslagle, kbidarka, sghai, tsanders | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-05-31 12:55:41 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Sachin Ghai
2011-07-20 13:48:12 UTC
Mongodb logs are fine.. [root@dhcp201-175 ~]# tail -f /var/log/pulp/db.log migration to data model version 15 started migration to data model version 15 complete migration to data model version 16 started migration to data model version 16 complete migration to data model version 17 started migration to data model version 17 complete migration to data model version 18 started migration to data model version 18 complete migration to data model version 19 started migration to data model version 19 complete The broken pipe is the ActiveObject child process in grinder trying to talk back to the parent process and the pipe has been broken. The most likely cause is that the parent process (which is the wsgi pulp process) is dead. I believe this is a casualty of something killing one the mod_wsgi or httpd processes and not the cause. Today I found the similar broken pipe error in the apache's error_log. But this time mongod was running. IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module> main() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main process() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process sys.stdout.flush() IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module> main() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main process() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process sys.stdout.flush() IOError: [Errno 32] Broken pipe [root@dhcp201-175 ~]# service pulp-server status mongod (pid 2153) is running... httpd (pid 2191) is running... qpidd (pid 2156) is running... [root@dhcp201-175 ~]# Not sure, if the following info related or not, but still wanted to share here: CDN to RHUA sync failed with following exception: 2011-07-22 17:51:24,235 9440:140265603970816: grinder.RepoFetch:ERROR: RepoFetch:126 Caught exception when trying to fetch content from [https://cdn.redhat.com/content/dist/rhel/rhui/server/5Server/x86_64/rhui/1.1/os]: Cannot retrieve repository metadata (repomd.xml) for repository: . Please verify its path and try again 2011-07-22 17:52:15,962 9440:140266092168960: grinder.RepoFetch:INFO: RepoFetch:481 Processed <5163> items in [2280] seconds 2011-07-22 17:52:15,986 9440:140266092168960: grinder.RepoFetch:INFO: RepoFetch:484 Cleaning any orphaned packages.. 2011-07-22 17:52:15,963 9440:140265572501248: grinder.RepoFetch:INFO: RepoFetch:440 fetchYumRepo() basepath = /var/lib/pulp//repos/content/dist/rhel/rhui/server-6/updates/6Server/x86_64/optional/os 2011-07-22 17:52:30,356 9440:140265572501248: grinder.RepoFetch:INFO: RepoFetch:457 Fetching repo metadata... 2011-07-22 17:53:15,910 9440:140265572501248: grinder.RepoFetch:ERROR: RepoFetch:126 Caught exception when trying to fetch content from [https://cdn.redhat.com/content/dist/rhel/rhui/server-6/updates/6Server/x86_64/optional/os]: Cannot retrieve repository metadata (repomd.xml) for repository: . Please verify its path and try again Please note that these repos were already synched and this happened when next sync schedule occurred. When I restarted the pulp-server, I found following: [root@dhcp201-175 ~]# service pulp-server restart Stopping httpd: [FAILED] Stopping Qpid AMQP daemon: [ OK ] Stopping mongod: [ OK ] Starting mongod: [ OK ] Starting Qpid AMQP daemon: [ OK ] Starting httpd: [ OK ] /var/log/httpd/error_log says: =============================== IOError: [Errno 32] Broken pipe Traceback (most recent call last): File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module> main() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main process() File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process sys.stdout.flush() IOError: [Errno 32] Broken pipe [Fri Jul 22 18:06:43 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored [Fri Jul 22 18:06:43 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored [Fri Jul 22 18:06:44 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored [Fri Jul 22 18:06:45 2011] [warn] child process 30475 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:45 2011] [warn] child process 2253 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:45 2011] [warn] child process 2254 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:45 2011] [warn] child process 2263 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:45 2011] [warn] child process 25720 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:47 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored [Fri Jul 22 18:06:47 2011] [warn] child process 30475 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:47 2011] [warn] child process 2253 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:47 2011] [warn] child process 2254 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:47 2011] [warn] child process 2263 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:47 2011] [warn] child process 25720 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:47 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored [Fri Jul 22 18:06:49 2011] [warn] child process 30475 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:49 2011] [warn] child process 2253 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:49 2011] [warn] child process 2254 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:49 2011] [warn] child process 2263 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:49 2011] [warn] child process 25720 still did not exit, sending a SIGTERM [Fri Jul 22 18:06:51 2011] [error] child process 2253 still did not exit, sending a SIGKILL [Fri Jul 22 18:06:51 2011] [error] child process 2254 still did not exit, sending a SIGKILL [Fri Jul 22 18:06:51 2011] [error] child process 2263 still did not exit, sending a SIGKILL [Fri Jul 22 18:06:51 2011] [error] child process 25720 still did not exit, sending a SIGKILL Created attachment 515116 [details]
snips from messages on pulp-server
Looking into the old messages file /var/log/messages-20110724, I found a total of 77 call traces invoked by page_fault from being out of memory. These were invoked by a number of processes, including, but not limited to: python httpd qpidd hald mongod This vm reports 1gb mem and 2gb swap, however it always reports only 100mb mem free and 500 mb swap free at any time. I honestly believe that these problems are a result of running pulp on a significantly under-powered box and not the result of any programmatic errors in pulp or any other process that's being killed. I've attached a few snips of the call traces I saw in messages as dmesg.txt Scratch part of my previous comment, after rebooting the mem freed up and now I've got to figure out what is leaking memory. They lowered the grinder threads to 4 (under [yum] in pulp.conf). In build 0.214. Tested this on Amazon setup with latest iso ( pulp 0.214). All nodes have 7.5G of ram. However during sync following was the mem status of different nodes: ON CDS nodes: ============== [root@ip-10-114-77-254 ~]# cat /proc/meminfo MemTotal: 7646944 kB MemFree: 2287268 kB [root@ip-10-76-229-139 ~]# cat /proc/meminfo MemTotal: 7646944 kB MemFree: 1553824 kB ON rhua node: ============= [root@ip-10-33-2-145 ~]# cat /proc/meminfo MemTotal: 7646944 kB MemFree: 46748 kB However, I didn't get that broken pipe error. But I'll test this again on local setup once and will update the results. I verified with latest iso ( pulp 0.214) on local setup. This broken pipe error is not reproducible with new build. All nodes are of 1G mem. Here is the memory status while repo sync. On rhua: ========== [root@dhcp201-136 client_entitle]# cat /proc/meminfo MemTotal: 997156 kB MemFree: 60240 kB on CDS: ======= [root@dhcp201-149 ~]# cat /proc/meminfo MemTotal: 997156 kB MemFree: 71272 kB I started the repo sync for following repos: ------ rhui (repo) => l Custom Repositories custom_repo Red Hat Repositories Red Hat Enterprise Linux Server 6 Releases (RPMs) (6Server-x86_64) Red Hat Update Infrastructure 1.2 (RPMs) (5Server-x86_64) Red Hat Update Infrastructure 1.2 (RPMs) (5Server-i386) Red Hat Enterprise Linux Server 6 Updates (RPMs) (6Server-x86_64) ------ moving to release pending closing out, product released |