Beaker hit a MemoryError while handling /csv/action_export, most likely to the size of the result set. (We may need to file a separate bug about making that code more efficient, it should be able to run without exhausting our heap limit.) bkr.server ERROR Exception on /csv/action_export [GET] Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/flask/app.py", line 1817, in wsgi_app response = self.full_dispatch_request() File "/usr/lib/python2.6/site-packages/flask/app.py", line 1479, in full_dispatch_request response = self.process_response(response) File "/usr/lib/python2.6/site-packages/flask/app.py", line 1691, in process_response response = handler(response) File "/usr/lib/python2.6/site-packages/bkr/server/wsgi.py", line 113, in commit_or_rollback_session session.rollback() File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/scoping.py", line 139, in do return getattr(self.registry(), name)(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 583, in rollback self.transaction.rollback() File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 411, in rollback transaction._rollback_impl() File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 427, in _rollback_impl self._restore_snapshot() File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 306, in _restore_snapshot for s in self.session.identity_map.all_states(): File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/identity.py", line 197, in all_states return dict.values(self) MemoryError: <bound method CSV.action_export of <bkr.server.CSV_import_export.CSV object at 0x7f38b3778c50>> The more serious problem is that, after this exception, every HTTP request to that worker process failed because it never successfully closed the SQLAlchemy session: bkr.server.wsgi WARNING Session active when tearing down app context, rolling back bkr.server.wsgi ERROR Error closing session when tearing down app context Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/bkr/server/wsgi.py", line 121, in close_session session.rollback() File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/scoping.py", line 139, in do return getattr(self.registry(), name)(*args, **kwargs) File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 583, in rollback self.transaction.rollback() File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 411, in rollback transaction._rollback_impl() File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 427, in _rollback_impl self._restore_snapshot() File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 306, in _restore_snapshot for s in self.session.identity_map.all_states(): File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/identity.py", line 197, in all_states return dict.values(self) MemoryError: <bound method CSV.action_export of <bkr.server.CSV_import_export.CSV object at 0x7f38b3778c50>> The Flask handler for closing the session needs to be more robust -- it needs to close the session and transaction under all circumstances, or crash the entire worker process (at least then it would be restarted by mod_wsgi instead of continuing to fail subsequent requests). It may be enough to do finally: session.close(), or even replace session.rollback() with session.close() entirely. But MemoryError is a tricky case, we need to recover or die *without* allocating anything new...
If the MemoryError was due to some large allocation failing, or enough of the stack has been unwound, then allocations may actually work.
This is quite a tricky one to reproduce in the gunicorn dev server. I was trying to fetch systems CSV and gradually reducing rlimit_as until it no longer succeeded. The problem was that below 660000000 rather than hitting a MemoryError in Python land, the worker would abort with this bizarre message: libgcc_s.so.1 must be installed for pthread_cancel to work That turned out to be because the MySQL client libraries do some pthread hackery which involves spawning a new thread that calls pthread_exit() on itself: http://osxr.org/mysql/source/mysys/my_thr_init.c#0054 But in order to implement stack unwinding in pthread_exit() glibc also has some hackery which dlopen's libgcc_s.so in order to use GCC's stack unwinding machinery: https://sourceware.org/ml/libc-help/2009-10/msg00023.html But the dlopen() was failing with ENOMEM because rlimit_as was already exceeded. Anyway, it turns out I could reproduce the MemoryError by exporting the system key-values CSV with rlimit_as 700000000, I guess because the system key-values CSV is much larger and involves loading more stuff into the SQLAlchemy session. (This is using a production db dump.)
On Gerrit: http://gerrit.beaker-project.org/3216
This bug will stay at ON_QA until 0.17.3 passes smoke testing. We decided that independently verifying the fix was not feasible given how difficult it is to reproduce the exact failure scenario. While writing the patch I did verify on my development VM with a production DB dump that the worker process now aborts if session.close() fails due to MemoryError.
Beaker 0.17.3 has been released (https://beaker-project.org/docs/whats-new/release-0.17.html#beaker-0-17-3)