Bug 723534 - RHUA becomes unresponsive and got broken pipe error in apache's error_log
Summary: RHUA becomes unresponsive and got broken pipe error in apache's error_log
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Update Infrastructure for Cloud Providers
Classification: Red Hat
Component: RHUA
Version: 2.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Jay Dobies
QA Contact: wes hayutin
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-20 13:48 UTC by Sachin Ghai
Modified: 2012-05-31 12:55 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-05-31 12:55:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
snips from messages on pulp-server (14.64 KB, text/plain)
2011-07-25 17:44 UTC, Jason Connor
no flags Details

Description Sachin Ghai 2011-07-20 13:48:12 UTC
Description of problem:

The system was idle since last night and today when I started the rhui-manager,
it was unresponsive for a long time and finally got unexpected error on UI. I
got following in rhui.log:


rhui.log:
============
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/rhui/tools/shell.py", line 85, in
safe_listen
    self.listen(clear=first_run)
  File "/usr/lib/python2.6/site-packages/rhui/tools/shell.py", line 111, in
listen
    Shell.listen(self)
  File "/usr/lib/python2.6/site-packages/rhui/common/shell.py", line 191, in
listen
    item.func(*args, **item.kwargs)
  File "/usr/lib/python2.6/site-packages/rhui/tools/screens/sync.py", line 79,
in repo_display
    self._sync_display('-= Repository Synchronization Status =-',
self.pulp.repos_with_sync_status, False)
  File "/usr/lib/python2.6/site-packages/rhui/tools/screens/sync.py", line 272,
in _sync_display
    data_sets = retrieve_func()
  File "/usr/lib/python2.6/site-packages/rhui/tools/pulp_api.py", line 219, in
repos_with_sync_status
    redhat_repos = self.redhat_repo_list()
  File "/usr/lib/python2.6/site-packages/rhui/tools/pulp_api.py", line 164, in
redhat_repo_list
    raise e
ServerRequestError: (500, '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML
2.0//EN">\n<html><head>\n<title>500 Internal Server
Error</title>\n</head><body>\n<h1>Internal Server Error</h1>\n<p>The server
encountered an internal error or\nmisconfiguration and was unable to
complete\nyour request.</p>\n<p>Please contact the server administrator,\n
root@localhost and inform them of the time the error occurred,\nand anything
you might have done that may have\ncaused the error.</p>\n<p>More information
about this error may be available\nin the server error
log.</p>\n<hr>\n<address>Apache/2.2.15 (Red Hat) Server at
dhcp201-175.englab.pnq.redhat.com Port 443</address>\n</body></html>\n', None)


I checked the "pulp-server" status on rhua and found following:

[root@dhcp201-175 ~]# service pulp-server status
mongod dead but subsys locked
[root@dhcp201-175 ~]# 

[root@dhcp201-175 ~]# service mongod status
mongod dead but subsys locked
[root@dhcp201-175 ~]# 


error_log says:
=====================
[root@dhcp201-175 ~]# tail -n 100 /var/log/httpd/error_log
IOError: [Errno 32] Broken pipe
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in
<module>
    main()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in
main
    process()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in
process
    sys.stdout.flush()
Traceback (most recent call last):
  File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
IOError  File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in
<module>
    main()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in
main
    process()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in
process
    sys.stdout.flush()
: [Errno 32] Broken pipe
IOError: [Errno 32] Broken pipe


ssl_error_log says:
======================
[Tue Jul 19 11:46:47 2011] [error] [client 10.65.201.133] Script timed out
before returning headers: webservices.wsgi

Pulp 0.208 build is installed and without mod_python package.:
==============================================================

[root@dhcp201-175 ~]# rpm -qa | grep pulp
pulp-common-0.0.208-1.el6.noarch
pulp-0.0.208-1.el6.noarch
pulp-client-0.0.208-1.el6.noarch
[root@dhcp201-175 ~]# rpm -qa | grep mod_python
[root@dhcp201-175 ~]# rpm -qa | grep mod_wsgi
mod_wsgi-3.2-3.sslpatch.el6.x86_64
[root@dhcp201-175 ~]# 


[root@dhcp201-133 ~]# rpm -qa | grep rh-rhui-tools
rh-rhui-tools-2.0.40-1.el6.noarch


Version-Release number of selected component (if applicable):
pulp 0.208
rh-rhui-tools 2.0.40

How reproducible:
facing this from last 2 days.

Steps to Reproduce:
Leave the setup idle for a night.
  
Actual results:
- IOError: [Errno 32] Broken pipe in apache's log file
- mongod crashed.



Expected results:


Additional info:

I'm reporting this because I'm getting this very frequently.

Comment 1 Sachin Ghai 2011-07-20 13:49:57 UTC
Mongodb logs are fine..

[root@dhcp201-175 ~]# tail -f /var/log/pulp/db.log 
migration to data model version 15 started
migration to data model version 15 complete
migration to data model version 16 started
migration to data model version 16 complete
migration to data model version 17 started
migration to data model version 17 complete
migration to data model version 18 started
migration to data model version 18 complete
migration to data model version 19 started
migration to data model version 19 complete

Comment 2 Jeff Ortel 2011-07-21 15:05:14 UTC
The broken pipe is the ActiveObject child process in grinder trying to talk back to the parent process and the pipe has been broken.  The most likely cause is that the parent process (which is the wsgi  pulp process) is dead.  I believe this is a casualty of something killing one the mod_wsgi or httpd processes and not the cause.

Comment 3 Sachin Ghai 2011-07-22 05:53:40 UTC
Today I found the similar broken pipe error in the apache's error_log. But this time mongod was running.

IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module>
    main()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main
    process()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process
    sys.stdout.flush()
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module>
    main()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main
    process()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process
    sys.stdout.flush()
IOError: [Errno 32] Broken pipe

[root@dhcp201-175 ~]# service pulp-server status
mongod (pid 2153) is running...
httpd (pid  2191) is running...
qpidd (pid  2156) is running...
[root@dhcp201-175 ~]#

Comment 4 Sachin Ghai 2011-07-22 13:08:25 UTC
Not sure, if the following info related or not, but still wanted to share here:

CDN to RHUA sync failed with following exception:

2011-07-22 17:51:24,235 9440:140265603970816: grinder.RepoFetch:ERROR: RepoFetch:126 Caught exception when trying to fetch content from [https://cdn.redhat.com/content/dist/rhel/rhui/server/5Server/x86_64/rhui/1.1/os]: Cannot retrieve repository metadata (repomd.xml) for repository: . Please verify its path and try again
2011-07-22 17:52:15,962 9440:140266092168960: grinder.RepoFetch:INFO: RepoFetch:481 Processed <5163> items in [2280] seconds
2011-07-22 17:52:15,986 9440:140266092168960: grinder.RepoFetch:INFO: RepoFetch:484 Cleaning any orphaned packages..
2011-07-22 17:52:15,963 9440:140265572501248: grinder.RepoFetch:INFO: RepoFetch:440 fetchYumRepo() basepath = /var/lib/pulp//repos/content/dist/rhel/rhui/server-6/updates/6Server/x86_64/optional/os
2011-07-22 17:52:30,356 9440:140265572501248: grinder.RepoFetch:INFO: RepoFetch:457 Fetching repo metadata...
2011-07-22 17:53:15,910 9440:140265572501248: grinder.RepoFetch:ERROR: RepoFetch:126 Caught exception when trying to fetch content from [https://cdn.redhat.com/content/dist/rhel/rhui/server-6/updates/6Server/x86_64/optional/os]: Cannot retrieve repository metadata (repomd.xml) for repository: . Please verify its path and try again


Please note that these repos were already synched and this happened when next sync schedule occurred.


When I restarted the pulp-server, I found following: 
[root@dhcp201-175 ~]# service pulp-server restart
Stopping httpd:                                            [FAILED]
Stopping Qpid AMQP daemon:                                 [  OK  ]
Stopping mongod:                                           [  OK  ]
Starting mongod:                                           [  OK  ]
Starting Qpid AMQP daemon:                                 [  OK  ]
Starting httpd:                                            [  OK  ]



/var/log/httpd/error_log says:
===============================
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/usr/lib64/python2.6/runpy.py", line 122, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.6/runpy.py", line 34, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 341, in <module>
    main()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 338, in main
    process()
  File "/usr/lib/python2.6/site-packages/grinder/activeobject.py", line 313, in process
    sys.stdout.flush()
IOError: [Errno 32] Broken pipe
[Fri Jul 22 18:06:43 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored
[Fri Jul 22 18:06:43 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored
[Fri Jul 22 18:06:44 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored
[Fri Jul 22 18:06:45 2011] [warn] child process 30475 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:45 2011] [warn] child process 2253 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:45 2011] [warn] child process 2254 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:45 2011] [warn] child process 2263 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:45 2011] [warn] child process 25720 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:47 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored
[Fri Jul 22 18:06:47 2011] [warn] child process 30475 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:47 2011] [warn] child process 2253 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:47 2011] [warn] child process 2254 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:47 2011] [warn] child process 2263 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:47 2011] [warn] child process 25720 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:47 2011] [error] Exception KeyError: KeyError(140266853300192,) in <module 'threading' from '/usr/lib64/python2.6/threading.pyc'> ignored
[Fri Jul 22 18:06:49 2011] [warn] child process 30475 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:49 2011] [warn] child process 2253 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:49 2011] [warn] child process 2254 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:49 2011] [warn] child process 2263 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:49 2011] [warn] child process 25720 still did not exit, sending a SIGTERM
[Fri Jul 22 18:06:51 2011] [error] child process 2253 still did not exit, sending a SIGKILL
[Fri Jul 22 18:06:51 2011] [error] child process 2254 still did not exit, sending a SIGKILL
[Fri Jul 22 18:06:51 2011] [error] child process 2263 still did not exit, sending a SIGKILL
[Fri Jul 22 18:06:51 2011] [error] child process 25720 still did not exit, sending a SIGKILL

Comment 5 Jason Connor 2011-07-25 17:44:39 UTC
Created attachment 515116 [details]
snips from messages on pulp-server

Comment 6 Jason Connor 2011-07-25 17:45:17 UTC
Looking into the old messages file /var/log/messages-20110724, I found a total of 77 call traces invoked by page_fault from being out of memory. These were invoked by a number of processes, including, but not limited to:
 python
 httpd
 qpidd
 hald
 mongod

This vm reports 1gb mem and 2gb swap, however it always reports only 100mb mem free and 500 mb swap free at any time.

I honestly believe that these problems are a result of running pulp on a significantly under-powered box and not the result of any programmatic errors in pulp or any other process that's being killed.

I've attached a few snips of the call traces I saw in messages as dmesg.txt

Comment 7 Jason Connor 2011-07-25 18:04:18 UTC
Scratch part of my previous comment, after rebooting the mem freed up and now I've got to figure out what is leaking memory.

Comment 8 Jay Dobies 2011-07-27 13:21:21 UTC
They lowered the grinder threads to 4 (under [yum] in pulp.conf). In build 0.214.

Comment 9 Sachin Ghai 2011-07-27 14:28:40 UTC
Tested this on Amazon setup with latest iso ( pulp 0.214). All nodes have 7.5G of ram. However during sync following was the mem status of different nodes:

ON CDS nodes:
==============
[root@ip-10-114-77-254 ~]# cat /proc/meminfo
MemTotal:        7646944 kB
MemFree:         2287268 kB

[root@ip-10-76-229-139 ~]# cat /proc/meminfo
MemTotal:        7646944 kB
MemFree:         1553824 kB 

ON rhua node:
=============
[root@ip-10-33-2-145 ~]# cat /proc/meminfo
 MemTotal:        7646944 kB
 MemFree:           46748 kB



However, I didn't get that broken pipe error. But I'll test this again on local setup once and will update the results.

Comment 10 Sachin Ghai 2011-07-28 12:53:50 UTC
I verified with latest iso ( pulp 0.214) on local setup. This broken pipe error is not reproducible with new build.

All nodes are of 1G mem. Here is the memory status while repo sync. 

On rhua:
==========

[root@dhcp201-136 client_entitle]# cat /proc/meminfo 
MemTotal:         997156 kB
MemFree:           60240 kB


on CDS:
=======
[root@dhcp201-149 ~]# cat /proc/meminfo 
MemTotal:         997156 kB
MemFree:           71272 kB


I started the repo sync for following repos:

------
rhui (repo) => l

Custom Repositories
  custom_repo

Red Hat Repositories
  Red Hat Enterprise Linux Server 6 Releases (RPMs) (6Server-x86_64)
  Red Hat Update Infrastructure 1.2 (RPMs) (5Server-x86_64)
  Red Hat Update Infrastructure 1.2 (RPMs) (5Server-i386)
  Red Hat Enterprise Linux Server 6 Updates (RPMs) (6Server-x86_64)

------

Comment 11 wes hayutin 2011-08-01 21:40:04 UTC
moving to release pending

Comment 12 wes hayutin 2012-05-31 12:55:41 UTC
closing out, product released


Note You need to log in before you can comment on or make changes to this bug.