Description of problem: Using of etree.iterparse() on valid xml throw a weird exception without description. Version-Release number of selected component (if applicable): Version: 2.3.5 Release: 1.fc17 How reproducible: XML: ==== <?xml version="1.0" encoding="UTF-8"?> <metadata> <foo> <bar>a</bar> </foo> <foo> <bar>b</bar> </foo> </metadata> Reproducer: =========== #!/usr/bin/python from lxml import etree for element in etree.iterparse(open("xml.xml")): print element[0], element[1].tag Actual results: end bar end foo end bar end foo end metadata Traceback (most recent call last): File "./reproducer.py", line 3, in <module> for element in etree.iterparse(open("xml.xml")): File "iterparse.pxi", line 491, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:103790) File "iterparse.pxi", line 543, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:104333) File "parser.pxi", line 601, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:79743) lxml.etree.XMLSyntaxError: None Expected results: end bar end foo end bar end foo end metadata
Downgrade of: libxml2, libxml2-devel and libxml2-python from 0:2.7.8-9.fc17 to 0:2.7.8-7.fc17 solves the problem!
Thanks for the investigation Tomas. I can confirm this on F18 too. I see the problem with libxml2-2.9.0-1.fc18 and libxml2-2.9.0-0rc1.fc18, downgrading to libxml2-2.8.0-2.fc18 makes it work again. I also noticed that if I first read the xml file and pass the xml string to iterparse() instead of the file object it works ok. I mean f = open("xml.xml") xml = f.read() for element in etree.parse(StringIO(xml)): instead of for element in etree.iterparse(open("xml.xml")): Daniel, any idea ?
Hum, no idea ... we have had errors reported for parsing from memory string, but that was for very large documents and you're seeing the opposite on a small document instead http://git.gnome.org/browse/libxml2/commit/?id=153cf15905cf4ec080612ada6703757d10caba1e you don't seems to be doing actual validation here (just well formedness checking) so that should not be the validation error fixed there: http://git.gnome.org/browse/libxml2/commit/?id=6c91aa384f48ff6d406553a6dd47fd556c1ef2e6 I tried to put a breakpoint in libxml2 main routine which concentrates all error reports: (gdb) b __xmlRaiseError Breakpoint 1 at 0x33d7835890: file error.c, line 459. (gdb) c Continuing. >>> for element in etree.iterparse(open("tst.xml")): ... print element[0], element[1].tag ... end bar end foo end bar end foo end metadata Traceback (most recent call last): File "<stdin>", line 1, in <module> File "iterparse.pxi", line 491, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:103790) File "iterparse.pxi", line 543, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:104333) File "parser.pxi", line 601, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:79743) lxml.etree.XMLSyntaxError: None >>> so no I don't know what is going on there, the last chunk of http://git.gnome.org/browse/libxml2/diff/parser.c?id=6c91aa384f48ff6d406553a6dd47fd556c1ef2e6 may however fix stray parse error with the reader as you experienced, but the problem was present in older releases, so I doubt it's this, Daniel
Hello, I am experiencing the same problem with lxml. A valid xml file fails to be parsed, while converting it into string using the described here method fixes the issue. I would also like to mention that the problem is reproduced only on my working environment, other guys from my team don't experience this problem. Can you please tell me, if you have found any other solution? Is this bug planned to be fixed? Thank you in advance, J.
The error seems not raised by libxml2, otherwise my breakpoint in __xmlRaiseError would have been raised. Seems to me that libxml2 update raised an error in lxml , reassigning to python-lxml Daniel
I have tried debugging it with python-lxml-2.3.5-1.fc17 apparently hitting line 601 of https://github.com/lxml/lxml/blob/master/src/lxml/parser.pxi elif ctxt.lastError.message is not NULL: ... raise XMLSyntaxError(message, code, line, column) end bar end foo end bar end foo end metadata Traceback (most recent call last): File "<stdin>", line 1, in <module> File "iterparse.pxi", line 491, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:103790) File "iterparse.pxi", line 543, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:104333) File "parser.pxi", line 601, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:79743) lxml.etree.XMLSyntaxError: None >>> Program received signal SIGINT, Interrupt. 0x000000360b8ea9d3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:82 82 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) b _raiseParseError Function "_raiseParseError" not defined. Make breakpoint pending on future shared library load? (y or [n]) n (gdb) l src/lxml/lxml.etree.c:79743 79738 __Pyx_GOTREF(__pyx_t_8); 79739 __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0; 79740 __Pyx_DECREF(((PyObject *)__pyx_t_6)); __pyx_t_6 = 0; 79741 __Pyx_Raise(__pyx_t_8, 0, 0, 0); 79742 __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0; 79743 {__pyx_filename = __pyx_f[3]; __pyx_lineno = 601; __pyx_clineno = __LINE__; goto __pyx_L1_error;} 79744 } 79745 __pyx_L3:; 79746 79747 __pyx_r = 0; (gdb) src/lxml/lxml.etree.c:79700 Undefined command: "src". Try "help". (gdb) l src/lxml/lxml.etree.c:79700 79695 __Pyx_GIVEREF(__pyx_t_4); 79696 PyTuple_SET_ITEM(__pyx_t_8, 3, __pyx_t_7); 79697 __Pyx_GIVEREF(__pyx_t_7); 79698 __pyx_t_5 = 0; 79699 __pyx_t_4 = 0; 79700 __pyx_t_7 = 0; 79701 __pyx_t_7 = PyObject_Call(__pyx_t_6, ((PyObject *)__pyx_t_8), NULL); if (unlikely(!__pyx_t_7)) {__pyx_filename = __pyx_f[3]; __pyx_lineno = 599; __pyx_clineno = __LINE__; goto __pyx_L1_error;} 79702 __Pyx_GOTREF(__pyx_t_7); 79703 __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0; 79704 __Pyx_DECREF(((PyObject *)__pyx_t_8)); __pyx_t_8 = 0; (gdb) 79705 __Pyx_Raise(__pyx_t_7, 0, 0, 0); 79706 __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0; 79707 {__pyx_filename = __pyx_f[3]; __pyx_lineno = 599; __pyx_clineno = __LINE__; goto __pyx_L1_error;} 79708 goto __pyx_L3; 79709 } 79710 /*else*/ { 79711 79712 /* "/builddir/build/BUILD/lxml-2.3.5/src/lxml/parser.pxi":601 79713 * raise XMLSyntaxError(message, code, line, column) 79714 * else: (gdb) 79715 * raise XMLSyntaxError(None, xmlerror.XML_ERR_INTERNAL_ERROR, 0, 0) # <<<<<<<<<<<<<< 79716 * 79717 * cdef xmlDoc* _handleParseResult(_ParserContext context, 79718 */ 79719 __pyx_t_7 = __Pyx_GetName(__pyx_m, __pyx_n_s__XMLSyntaxError); if (unlikely(!__pyx_t_7)) {__pyx_filename = __pyx_f[3]; __pyx_lineno = 601; __pyx_clineno = __LINE__; goto __pyx_L1_error;} 79720 __Pyx_GOTREF(__pyx_t_7); 79721 __pyx_t_8 = PyInt_FromLong(XML_ERR_INTERNAL_ERROR); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[3]; __pyx_lineno = 601; __pyx_clineno = __LINE__; goto __pyx_L1_error;} 79722 __Pyx_GOTREF(__pyx_t_8); 79723 __pyx_t_6 = PyTuple_New(4); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[3]; __pyx_lineno = 601; __pyx_clineno = __LINE__; goto __pyx_L1_error;} 79724 __Pyx_GOTREF(__pyx_t_6); (gdb) I really can't make sense of that generated code but it looks like to me that lxml takes the wrong way to try to detect a parser error, there isn't really any but its detection fails and it stop reporting a non existent error. Still seems to me to be on lxml side... Daniel
I have encountered this bug (ubuntu box, lxml 3.x version). Besides that I encounered another bug which seems to be related: elem.getnext() returnes None despite elem having a sibling beneth it. Unfortunately I encountered this while using a big and private xml file which I can not share. The issue with getnext() seems related to this one becuase the workaround suggested in this ticket (using StringIO instead of file object) solved both of my issues.
python-lxml-3.2.0-1.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/python-lxml-3.2.0-1.fc18
python-lxml-3.2.0-1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/python-lxml-3.2.0-1.fc17
python-lxml-3.2.0-1.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/python-lxml-3.2.0-1.fc19
Still exists with python-lxml-3.2.0-1.fc19.x86_64
Has anyone reported this upstream? I don't have the time/experience to debug this myself but I'm certainly willing to pull in patches that are destined for upstream.
Package python-lxml-3.2.0-1.fc18: * should fix your issue, * was pushed to the Fedora 18 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing python-lxml-3.2.0-1.fc18' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-7875/python-lxml-3.2.0-1.fc18 then log in and leave karma (feedback).
python-lxml-3.2.1-1.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/python-lxml-3.2.1-1.fc17
python-lxml-3.2.1-1.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/python-lxml-3.2.1-1.fc19
python-lxml-3.2.1-1.fc18 has been submitted as an update for Fedora 18. https://admin.fedoraproject.org/updates/python-lxml-3.2.1-1.fc18
python-lxml-3.2.1-1.fc18 has been pushed to the Fedora 18 stable repository. If problems still persist, please make note of it in this bug report.
python-lxml-3.2.1-1.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report.
python-lxml-3.2.1-1.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.
Hi, I just test the python-lxml-3.2.1-1 and the problem still persists. Combination: libxml2.x86_64 0:2.7.8-7.fc17 libxml2-python.x86_64 0:2.7.8-7.fc17 python-lxml.x86_64 0:2.3.5-1.fc17 == WORKS FINE == Combination: libxml2.x86_64 0:2.7.8-7.fc17 libxml2-python.x86_64 0:2.7.8-7.fc17 python-lxml.x86_64 0:3.2.1-1.fc17 == WORKS FINE == But when I update the libxml2 and libxml2-python: libxml2.x86_64 0:2.7.8-9.fc17 libxml2-python.x86_64 0:2.7.8-9.fc17 python-lxml.x86_64 0:3.2.1-1.fc17 == ERROR == How reproducible: XML: ==== <?xml version="1.0" encoding="UTF-8"?> <metadata> <foo> <bar>a</bar> </foo> <foo> <bar>b</bar> </foo> </metadata> Reproducer: =========== #!/usr/bin/python from lxml import etree for element in etree.iterparse(open("xml.xml")): print element[0], element[1].tag Actual results: end bar end foo end bar end foo end metadata Traceback (most recent call last): File "./reproducer.py", line 3, in <module> for element in etree.iterparse(open("xml.xml")): File "iterparse.pxi", line 484, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:113793) File "iterparse.pxi", line 537, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:114367) File "parser.pxi", line 627, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:84362) lxml.etree.XMLSyntaxError: None (Note: line numbers slightly changed since my first report) Expected results: end bar end foo end bar end foo end metadata
Thomas, please report this upstream to the lxml developers so that this can get fixed. I do not have the time nor knowledge to fix bugs like this. I'll be AFK next week so if you want to get an updated package into testing you'll need to work with the upstream developers quickly to get a patch. Details about the mailing list can be found here: http://lxml.de/mailinglist/
I've just reported the bug to the upstream: https://bugs.launchpad.net/lxml/+bug/1185701
This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Upstream fix https://github.com/lxml/lxml/commit/19f0a477c935b402c93395f8c0cb561646f4bdc3
The fix for this went out a while ago, not sure why the bug never got closed.