Hide Forgot
The units.json.gz file contains one json document per line, rather than one json document in the file. The .json extension is misleading in this case. It would perhaps be an improvement if the units.json file contained a single json object that had an attribute called 'units' with a list of units. Currently, the json parser cannot open the file: $ python Python 2.6.6 (r266:84292, May 27 2013, 05:35:12) [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import json >>> with open('units.json') as f: ... a = json.load(f) ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/usr/lib64/python2.6/json/__init__.py", line 267, in load parse_constant=parse_constant, **kw) File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads return _default_decoder.decode(s) File "/usr/lib64/python2.6/json/decoder.py", line 322, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 2 column 1 - line 82 column 1 (char 858 - 2527719)
This is by design. Each unit is written on a separate line so that the entire document is not read into memory. Instead, each content unit (document) is written on a separate line and read/processed individually. As for the extension ... the document does contain json. do you have a better suggestion?