| Summary: | Importing a large CSV file may cause MemoryError | ||
|---|---|---|---|
| Product: | [Retired] Beaker | Reporter: | andrew <alemay> |
| Component: | general | Assignee: | beaker-dev-list |
| Status: | CLOSED WONTFIX | QA Contact: | tools-bugs <tools-bugs> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | low | ||
| Version: | develop | CC: | alemay, jburke, qwan, tools-bugs |
| Target Milestone: | --- | Keywords: | Reopened, Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-10-21 14:16:07 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
andrew
2013-12-03 20:49:39 UTC
The server logs show a bunch of MemoryErrors related to CSV import, I assume they are the errors which Andrew was seeing. It's possible that the CSV import code was using so much memory that it was hitting the address space limit, but I think it's more likely that the system was under memory pressure at the time. Dec 3 04:08:01 beaker-02 beaker-server[564]: cherrypy.msg INFO : Traceback (most recent call last): Dec 3 04:08:01 beaker-02 beaker-server[564]: File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/filters/__init__.py", line 145, in applyFilters Dec 3 04:08:01 beaker-02 beaker-server[564]: method() Dec 3 04:08:01 beaker-02 beaker-server[564]: File "/usr/lib/python2.6/site-packages/turbogears/database.py", line 556, in on_end_resource Dec 3 04:08:01 beaker-02 beaker-server[564]: session.expunge_all() Dec 3 04:08:01 beaker-02 beaker-server[564]: File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/scoping.py", line 139, in do Dec 3 04:08:01 beaker-02 beaker-server[564]: return getattr(self.registry(), name)(*args, **kwargs) Dec 3 04:08:01 beaker-02 beaker-server[564]: File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 760, in expunge_all Dec 3 04:08:01 beaker-02 beaker-server[564]: for state in self.identity_map.all_states() + list(self._new): Dec 3 04:08:01 beaker-02 beaker-server[564]: MemoryError: <bound method CSV.action_import of <bkr.server.CSV_import_export.CSV object at 0x7fcdbaf77f90>> Dec 3 04:08:01 beaker-02 beaker-server[564]: cherrypy.msg INFO HTTP: Traceback (most recent call last): Dec 3 04:08:01 beaker-02 beaker-server[564]: File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/_cphttptools.py", line 121, in _run Dec 3 04:08:01 beaker-02 beaker-server[564]: self.main() Dec 3 04:08:01 beaker-02 beaker-server[564]: File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/_cphttptools.py", line 264, in main Dec 3 04:08:01 beaker-02 beaker-server[564]: body = page_handler(*virtual_path, **self.params) Dec 3 04:08:01 beaker-02 beaker-server[564]: File "<string>", line 3, in action_import Dec 3 04:08:01 beaker-02 beaker-server[564]: File "/usr/lib/python2.6/site-packages/turbogears/controllers.py", line 361, in expose Dec 3 04:08:01 beaker-02 beaker-server[564]: *args, **kw) Dec 3 04:08:01 beaker-02 beaker-server[564]: File "<generated code>", line 0, in run_with_transaction Dec 3 04:08:01 beaker-02 beaker-server[564]: File "/usr/lib/python2.6/site-packages/peak/rules/core.py", line 153, in __call__ Dec 3 04:08:01 beaker-02 beaker-server[564]: return self.body(*args, **kw) Dec 3 04:08:01 beaker-02 beaker-server[564]: File "/usr/lib/python2.6/site-packages/turbogears/database.py", line 483, in sa_rwt Dec 3 04:08:01 beaker-02 beaker-server[564]: session.close() Dec 3 04:08:01 beaker-02 beaker-server[564]: File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/scoping.py", line 139, in do Dec 3 04:08:01 beaker-02 beaker-server[564]: return getattr(self.registry(), name)(*args, **kwargs) Dec 3 04:08:01 beaker-02 beaker-server[564]: File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 741, in close Dec 3 04:08:01 beaker-02 beaker-server[564]: self.expunge_all() Dec 3 04:08:01 beaker-02 beaker-server[564]: File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 760, in expunge_all Dec 3 04:08:01 beaker-02 beaker-server[564]: for state in self.identity_map.all_states() + list(self._new): Dec 3 04:08:01 beaker-02 beaker-server[564]: MemoryError: <bound method CSV.action_import of <bkr.server.CSV_import_export.CSV object at 0x7fcdbaf77f90>> Dec 3 04:08:02 beaker-02 beaker-server[564]: cherrypy.msg INFO : Traceback (most recent call last): Dec 3 04:08:02 beaker-02 beaker-server[564]: File "/usr/lib/python2.6/site-packages/CherryPy-2.3.0-py2.6.egg/cherrypy/filters/__init__.py", line 145, in applyFilters Dec 3 04:08:02 beaker-02 beaker-server[564]: method() Dec 3 04:08:02 beaker-02 beaker-server[564]: File "/usr/lib/python2.6/site-packages/turbogears/database.py", line 556, in on_end_resource Dec 3 04:08:02 beaker-02 beaker-server[564]: session.expunge_all() Dec 3 04:08:02 beaker-02 beaker-server[564]: File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/scoping.py", line 139, in do Dec 3 04:08:02 beaker-02 beaker-server[564]: return getattr(self.registry(), name)(*args, **kwargs) Dec 3 04:08:02 beaker-02 beaker-server[564]: File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/session.py", line 760, in expunge_all Dec 3 04:08:02 beaker-02 beaker-server[564]: for state in self.identity_map.all_states() + list(self._new): Dec 3 04:08:02 beaker-02 beaker-server[564]: File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/identity.py", line 197, in all_states Dec 3 04:08:02 beaker-02 beaker-server[564]: return dict.values(self) Dec 3 04:08:02 beaker-02 beaker-server[564]: MemoryError: <bound method CSV.action_import of <bkr.server.CSV_import_export.CSV object at 0x7fcdbaf77f90>> Further investigation suggests that this was caused by memory pressure on the Beaker server rather than a bug in Beaker itself. When importing a CSV, the entire file is loaded into the DB as a single transaction, specifically so that you don't get a situation where some rows may be imported while other aren't. It's possible that this approach may consume excessive memory given a sufficiently large CSV file. If we can get a copy of a CSV file that exhibits the problem, we can investigate the memory usage in a more controlled environment. If this theory is correct, splitting a single large CSV update into several smaller uploads may serve as a near term workaround. Andrew's sample file only contains about 75 lines, and breaking it down into smaller 15-30 line blocks does indeed let the whole change be processed across multiple uploads. We regularly load many more systems than that into memory in order to process the recipe queue, so I think this calls for an investigation into how our current CSV import process is managing to multiply the impact of a relatively small upload to the point where we chew through a GiB or more of RAM. |