Bug 1422874

Summary: Error handling code in close_session, which tries to prevent serving requests when the application is in a bad state, itself crashes with MemoryError
Product: [Community] Beaker Reporter: Dan Callaghan <dcallagh>
Component: generalAssignee: Dan Callaghan <dcallagh>
Status: CLOSED CURRENTRELEASE QA Contact: Dan Callaghan <dcallagh>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 24CC: dcallagh, mjia, rjoost
Target Milestone: 24.1Keywords: Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-02-28 16:50:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Dan Callaghan 2017-02-16 12:43:04 UTC
In bug 1120439 we added some handling in the post-request functions to avoid the case where a MemoryError during closing the SQLAlchemy session leaves the worker in an inconsistent state (with an open db transaction and partially closed session) which then causes the worker to fail every subsequent request.

However that error path itself tries to allocate, due to the log message it emits, which can then fail for the same MemoryError leaving the worker still broken:

Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/flask/app.py", line 1836, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/lib/python2.6/site-packages/flask/app.py", line 1825, in wsgi_app
    ctx.auto_pop(error)
  File "/usr/lib/python2.6/site-packages/flask/ctx.py", line 374, in auto_pop
    self.pop(exc)
  File "/usr/lib/python2.6/site-packages/flask/ctx.py", line 366, in pop
    app_ctx.pop(exc)
  File "/usr/lib/python2.6/site-packages/flask/ctx.py", line 175, in pop
    self.app.do_teardown_appcontext(exc)
  File "/usr/lib/python2.6/site-packages/flask/app.py", line 1727, in do_teardown_appcontext
    func(exc)
  File "/usr/lib/python2.6/site-packages/bkr/server/wsgi.py", line 170, in close_session
    log.exception('Error closing session when tearing down app context, aborting')
  File "/usr/lib64/python2.6/logging/__init__.py", line 1096, in exception
    self.error(*((msg,) + args), **{'exc_info': 1})
  File "/usr/lib64/python2.6/logging/__init__.py", line 1090, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/lib64/python2.6/logging/__init__.py", line 1181, in _log
    self.handle(record)
  File "/usr/lib64/python2.6/logging/__init__.py", line 1191, in handle
    self.callHandlers(record)
  File "/usr/lib64/python2.6/logging/__init__.py", line 1228, in callHandlers
    hdlr.handle(record)
  File "/usr/lib64/python2.6/logging/__init__.py", line 687, in handle
    self.emit(record)
  File "/usr/lib/python2.6/site-packages/bkr/log.py", line 47, in emit
    msg = self.format(record)
  File "/usr/lib64/python2.6/logging/__init__.py", line 662, in format
    return fmt.format(record)
  File "/usr/lib64/python2.6/logging/__init__.py", line 452, in format
    record.exc_text = self.formatException(record.exc_info)
  File "/usr/lib64/python2.6/logging/__init__.py", line 424, in formatException
    traceback.print_exception(ei[0], ei[1], ei[2], None, sio)
  File "/usr/lib64/python2.6/traceback.py", line 125, in print_exception
    print_tb(tb, limit, file)
  File "/usr/lib64/python2.6/traceback.py", line 69, in print_tb
    line = linecache.getline(filename, lineno, f.f_globals)
  File "/usr/lib64/python2.6/linecache.py", line 14, in getline
    lines = getlines(filename, module_globals)
  File "/usr/lib64/python2.6/linecache.py", line 40, in getlines
    return updatecache(filename, module_globals)
  File "/usr/lib64/python2.6/linecache.py", line 131, in updatecache
    lines = fp.readlines()
MemoryError: <bound method Root.RPC2 of <bkr.server.controllers.Root object at 0x7f9a24d32150>>

Comment 1 Dan Callaghan 2017-02-16 12:45:21 UTC
We probably need to immediately os.abort() in this case, without trying to log any more messages. Hopefully the original cause was logged or printed elsewhere beforehand anyway...

Comment 2 Dan Callaghan 2017-02-16 15:26:23 UTC
https://gerrit.beaker-project.org/5639

Comment 4 Dan Callaghan 2017-02-28 09:18:05 UTC
Verified in my devel environment as per the Gerrit comments.

Comment 5 Dan Callaghan 2017-02-28 16:50:35 UTC
Beaker 24.1 has been released.