Bug 1037813 - Importing a large CSV file may cause MemoryError
Summary: Importing a large CSV file may cause MemoryError
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Beaker
Classification: Retired
Component: general
Version: develop
Hardware: Unspecified
OS: Unspecified
low
unspecified vote
Target Milestone: ---
Assignee: beaker-dev-list
QA Contact: tools-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-12-03 20:49 UTC by andrew
Modified: 2020-10-21 14:16 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-21 14:16:07 UTC


Attachments (Terms of Use)

Description andrew 2013-12-03 20:49:39 UTC
Description of problem:
When using beakers csv function to change the owner of a system it causes a 500 error.

Version-Release number of selected component (if applicable):


How reproducible: 50%


Steps to Reproduce:


head -n 1 system.csv > csv-test.csv
grep z10-02 system.csv >> csv-test.csv 
sed -i 's/alemay/admin/g' csv-test.csv

Actual results: 

500 Internal error

The server encountered an unexpected condition which prevented it from fulfilling the request.

Expected results: No error


Additional info: to get that to work it took about 5 attempts with about 2 mins in between

Comment 4 Dan Callaghan 2013-12-04 00:34:00 UTC
The server logs show a bunch of MemoryErrors related to CSV import, I assume they are the errors which Andrew was seeing.

It's possible that the CSV import code was using so much memory that it was hitting the address space limit, but I think it's more likely that the system was under memory pressure at the time.

Dec  3 04:08:01 beaker-02 beaker-server[564]: cherrypy.msg INFO : Traceback (most recent call last):
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/filters/__init__.py", line 145, in applyFilters
Dec  3 04:08:01 beaker-02 beaker-server[564]:      method()
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "/usr/lib/python2.6/site-packages/turbogears/database.py", line 556, in on_end_resource
Dec  3 04:08:01 beaker-02 beaker-server[564]:      session.expunge_all()
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/scoping.py", line 139, in do
Dec  3 04:08:01 beaker-02 beaker-server[564]:      return getattr(self.registry(), name)(*args, **kwargs)
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 760, in expunge_all
Dec  3 04:08:01 beaker-02 beaker-server[564]:      for state in self.identity_map.all_states() + list(self._new):
Dec  3 04:08:01 beaker-02 beaker-server[564]:  MemoryError: <bound method CSV.action_import of <bkr.server.CSV_import_export.CSV object at 0x7fcdbaf77f90>>

Dec  3 04:08:01 beaker-02 beaker-server[564]: cherrypy.msg INFO HTTP: Traceback (most recent call last):
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/_cphttptools.py", line 121, in _run
Dec  3 04:08:01 beaker-02 beaker-server[564]:      self.main()
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/_cphttptools.py", line 264, in main
Dec  3 04:08:01 beaker-02 beaker-server[564]:      body = page_handler(*virtual_path, **self.params)
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "<string>", line 3, in action_import
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 361, in expose
Dec  3 04:08:01 beaker-02 beaker-server[564]:      *args, **kw)
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "<generated code>", line 0, in run_with_transaction
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__
Dec  3 04:08:01 beaker-02 beaker-server[564]:      return self.body(*args, **kw)
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "/usr/lib/python2.6/site-packages/turbogears/database.py", line 483, in sa_rwt
Dec  3 04:08:01 beaker-02 beaker-server[564]:      session.close()
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/scoping.py", line 139, in do
Dec  3 04:08:01 beaker-02 beaker-server[564]:      return getattr(self.registry(), name)(*args, **kwargs)
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 741, in close
Dec  3 04:08:01 beaker-02 beaker-server[564]:      self.expunge_all()
Dec  3 04:08:01 beaker-02 beaker-server[564]:    File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 760, in expunge_all
Dec  3 04:08:01 beaker-02 beaker-server[564]:      for state in self.identity_map.all_states() + list(self._new):
Dec  3 04:08:01 beaker-02 beaker-server[564]:  MemoryError: <bound method CSV.action_import of <bkr.server.CSV_import_export.CSV object at 0x7fcdbaf77f90>>

Dec  3 04:08:02 beaker-02 beaker-server[564]: cherrypy.msg INFO : Traceback (most recent call last):
Dec  3 04:08:02 beaker-02 beaker-server[564]:    File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/filters/__init__.py", line 145, in applyFilters
Dec  3 04:08:02 beaker-02 beaker-server[564]:      method()
Dec  3 04:08:02 beaker-02 beaker-server[564]:    File "/usr/lib/python2.6/site-packages/turbogears/database.py", line 556, in on_end_resource
Dec  3 04:08:02 beaker-02 beaker-server[564]:      session.expunge_all()
Dec  3 04:08:02 beaker-02 beaker-server[564]:    File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/scoping.py", line 139, in do
Dec  3 04:08:02 beaker-02 beaker-server[564]:      return getattr(self.registry(), name)(*args, **kwargs)
Dec  3 04:08:02 beaker-02 beaker-server[564]:    File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 760, in expunge_all
Dec  3 04:08:02 beaker-02 beaker-server[564]:      for state in self.identity_map.all_states() + list(self._new):
Dec  3 04:08:02 beaker-02 beaker-server[564]:    File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/identity.py", line 197, in all_states
Dec  3 04:08:02 beaker-02 beaker-server[564]:      return dict.values(self)
Dec  3 04:08:02 beaker-02 beaker-server[564]:  MemoryError: <bound method CSV.action_import of <bkr.server.CSV_import_export.CSV object at 0x7fcdbaf77f90>>

Comment 5 Dan Callaghan 2013-12-10 07:16:55 UTC
Further investigation suggests that this was caused by memory pressure on the Beaker server rather than a bug in Beaker itself.

Comment 12 Nick Coghlan 2014-01-22 00:57:53 UTC
When importing a CSV, the entire file is loaded into the DB as a single transaction, specifically so that you don't get a situation where some rows may be imported while other aren't. It's possible that this approach may consume excessive memory given a sufficiently large CSV file.

If we can get a copy of a CSV file that exhibits the problem, we can investigate the memory usage in a more controlled environment.

Comment 13 Nick Coghlan 2014-01-22 01:00:22 UTC
If this theory is correct, splitting a single large CSV update into several smaller uploads may serve as a near term workaround.

Comment 15 Nick Coghlan 2014-01-23 00:31:12 UTC
Andrew's sample file only contains about 75 lines, and breaking it down into smaller 15-30 line blocks does indeed let the whole change be processed across multiple uploads.

We regularly load many more systems than that into memory in order to process the recipe queue, so I think this calls for an investigation into how our current CSV import process is managing to multiply the impact of a relatively small upload to the point where we chew through a GiB or more of RAM.


Note You need to log in before you can comment on or make changes to this bug.