Bug 874546 - Bad exception in etree.iterparse()
Summary: Bad exception in etree.iterparse()
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: python-lxml
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jeffrey C. Ollie
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-11-08 12:19 UTC by Tomas Mlcoch
Modified: 2014-02-28 14:58 UTC (History)
7 users (show)

Fixed In Version: python-lxml-3.2.1-1.fc19
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-28 14:58:33 UTC


Attachments (Terms of Use)

Description Tomas Mlcoch 2012-11-08 12:19:11 UTC
Description of problem:
Using of etree.iterparse() on valid xml throw a weird exception without description.

Version-Release number of selected component (if applicable):
Version: 2.3.5
Release: 1.fc17

How reproducible:

XML:
====
<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<foo>
  <bar>a</bar>
</foo>
<foo>
  <bar>b</bar>
</foo>
</metadata>

Reproducer:
===========
#!/usr/bin/python
from lxml import etree
for element in etree.iterparse(open("xml.xml")):
    print element[0], element[1].tag
 
Actual results:
end bar
end foo
end bar
end foo
end metadata
Traceback (most recent call last):
  File "./reproducer.py", line 3, in <module>
    for element in etree.iterparse(open("xml.xml")):
  File "iterparse.pxi", line 491, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:103790)
  File "iterparse.pxi", line 543, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:104333)
  File "parser.pxi", line 601, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:79743)
lxml.etree.XMLSyntaxError: None 

Expected results:
end bar
end foo
end bar
end foo
end metadata

Comment 1 Tomas Mlcoch 2012-11-09 07:53:54 UTC
Downgrade of: libxml2, libxml2-devel and libxml2-python from 0:2.7.8-9.fc17 to 0:2.7.8-7.fc17 solves the problem!

Comment 2 Jiri Popelka 2012-11-09 08:28:59 UTC
Thanks for the investigation Tomas.

I can confirm this on F18 too.
I see the problem with libxml2-2.9.0-1.fc18 and libxml2-2.9.0-0rc1.fc18,
downgrading to libxml2-2.8.0-2.fc18 makes it work again.

I also noticed that if I first read the xml file and pass the xml string to iterparse() instead of the file object it works ok.
I mean
f = open("xml.xml")
xml = f.read()
for element in etree.parse(StringIO(xml)):
instead of
for element in etree.iterparse(open("xml.xml")):

Daniel, any idea ?

Comment 3 Daniel Veillard 2012-11-09 09:03:30 UTC
Hum, no idea ... we have had errors reported for parsing from
memory string, but that was for very large documents and you're
seeing the opposite on a small document instead
http://git.gnome.org/browse/libxml2/commit/?id=153cf15905cf4ec080612ada6703757d10caba1e

you don't seems to be doing actual validation here (just
well formedness checking) so that should not be the
validation error fixed there:
http://git.gnome.org/browse/libxml2/commit/?id=6c91aa384f48ff6d406553a6dd47fd556c1ef2e6

I tried to put a breakpoint in libxml2 main routine which
concentrates all error reports:

(gdb) b __xmlRaiseError
Breakpoint 1 at 0x33d7835890: file error.c, line 459.
(gdb) c
Continuing.

>>> for element in etree.iterparse(open("tst.xml")):
...     print element[0], element[1].tag
... 
end bar
end foo
end bar
end foo
end metadata
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "iterparse.pxi", line 491, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:103790)
  File "iterparse.pxi", line 543, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:104333)
  File "parser.pxi", line 601, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:79743)
lxml.etree.XMLSyntaxError: None
>>> 

so no I don't know what is going on there, 
the last chunk of
http://git.gnome.org/browse/libxml2/diff/parser.c?id=6c91aa384f48ff6d406553a6dd47fd556c1ef2e6

may however fix stray parse error with the reader as you experienced,
but the problem was present in older releases, so I doubt it's this,

Daniel

Comment 4 Janie Starks 2013-01-31 13:32:53 UTC
Hello,
I am experiencing the same problem with lxml.

A valid xml file fails to be parsed, while converting it into string using the described here method fixes the issue.

I would also like to mention that the problem is reproduced only on my working environment, other guys from my team don't experience this problem.

Can you please tell me, if you have found any other solution? Is this bug planned to be fixed?

Thank you in advance,
J.

Comment 5 Daniel Veillard 2013-04-10 07:46:35 UTC
The error seems not raised by libxml2, otherwise my breakpoint in
__xmlRaiseError would have been raised. Seems to me that libxml2
update raised an error in lxml , reassigning to python-lxml

Daniel

Comment 6 Daniel Veillard 2013-04-19 07:21:13 UTC
I have tried debugging it with python-lxml-2.3.5-1.fc17

apparently hitting line 601 of
https://github.com/lxml/lxml/blob/master/src/lxml/parser.pxi

elif ctxt.lastError.message is not NULL:
...
raise XMLSyntaxError(message, code, line, column)

end bar
end foo
end bar
end foo
end metadata
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "iterparse.pxi", line 491, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:103790)
  File "iterparse.pxi", line 543, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:104333)
  File "parser.pxi", line 601, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:79743)
lxml.etree.XMLSyntaxError: None
>>> 
Program received signal SIGINT, Interrupt.
0x000000360b8ea9d3 in __select_nocancel ()
    at ../sysdeps/unix/syscall-template.S:82
82	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) b _raiseParseError
Function "_raiseParseError" not defined.
Make breakpoint pending on future shared library load? (y or [n]) n
(gdb) l src/lxml/lxml.etree.c:79743
79738	    __Pyx_GOTREF(__pyx_t_8);
79739	    __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
79740	    __Pyx_DECREF(((PyObject *)__pyx_t_6)); __pyx_t_6 = 0;
79741	    __Pyx_Raise(__pyx_t_8, 0, 0, 0);
79742	    __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0;
79743	    {__pyx_filename = __pyx_f[3]; __pyx_lineno = 601; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
79744	  }
79745	  __pyx_L3:;
79746	
79747	  __pyx_r = 0;
(gdb) src/lxml/lxml.etree.c:79700
Undefined command: "src".  Try "help".
(gdb) l src/lxml/lxml.etree.c:79700
79695	    __Pyx_GIVEREF(__pyx_t_4);
79696	    PyTuple_SET_ITEM(__pyx_t_8, 3, __pyx_t_7);
79697	    __Pyx_GIVEREF(__pyx_t_7);
79698	    __pyx_t_5 = 0;
79699	    __pyx_t_4 = 0;
79700	    __pyx_t_7 = 0;
79701	    __pyx_t_7 = PyObject_Call(__pyx_t_6, ((PyObject *)__pyx_t_8), NULL); if (unlikely(!__pyx_t_7)) {__pyx_filename = __pyx_f[3]; __pyx_lineno = 599; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
79702	    __Pyx_GOTREF(__pyx_t_7);
79703	    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
79704	    __Pyx_DECREF(((PyObject *)__pyx_t_8)); __pyx_t_8 = 0;
(gdb) 
79705	    __Pyx_Raise(__pyx_t_7, 0, 0, 0);
79706	    __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
79707	    {__pyx_filename = __pyx_f[3]; __pyx_lineno = 599; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
79708	    goto __pyx_L3;
79709	  }
79710	  /*else*/ {
79711	
79712	    /* "/builddir/build/BUILD/lxml-2.3.5/src/lxml/parser.pxi":601
79713	 *         raise XMLSyntaxError(message, code, line, column)
79714	 *     else:
(gdb) 
79715	 *         raise XMLSyntaxError(None, xmlerror.XML_ERR_INTERNAL_ERROR, 0, 0)             # <<<<<<<<<<<<<<
79716	 * 
79717	 * cdef xmlDoc* _handleParseResult(_ParserContext context,
79718	 */
79719	    __pyx_t_7 = __Pyx_GetName(__pyx_m, __pyx_n_s__XMLSyntaxError); if (unlikely(!__pyx_t_7)) {__pyx_filename = __pyx_f[3]; __pyx_lineno = 601; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
79720	    __Pyx_GOTREF(__pyx_t_7);
79721	    __pyx_t_8 = PyInt_FromLong(XML_ERR_INTERNAL_ERROR); if (unlikely(!__pyx_t_8)) {__pyx_filename = __pyx_f[3]; __pyx_lineno = 601; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
79722	    __Pyx_GOTREF(__pyx_t_8);
79723	    __pyx_t_6 = PyTuple_New(4); if (unlikely(!__pyx_t_6)) {__pyx_filename = __pyx_f[3]; __pyx_lineno = 601; __pyx_clineno = __LINE__; goto __pyx_L1_error;}
79724	    __Pyx_GOTREF(__pyx_t_6);
(gdb) 


  I really can't make sense of that generated code but it looks like to
me that lxml takes the wrong way to try to detect a parser error, there
isn't really any but its detection fails and it stop reporting a non
existent error.

  Still seems to me to be on lxml side...

Daniel

Comment 7 Mohsen 2013-05-10 11:05:18 UTC
I have encountered this bug (ubuntu box, lxml 3.x version). Besides that
I encounered another bug which seems to be related: elem.getnext()
returnes None despite elem having a sibling beneth it. Unfortunately 
I encountered this while using a big and private xml file which I can 
not share.

The issue with getnext() seems related to this one becuase the workaround
suggested in this ticket (using StringIO instead of file object) solved 
both of my issues.

Comment 8 Fedora Update System 2013-05-10 15:49:18 UTC
python-lxml-3.2.0-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/python-lxml-3.2.0-1.fc18

Comment 9 Fedora Update System 2013-05-10 15:49:33 UTC
python-lxml-3.2.0-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/python-lxml-3.2.0-1.fc17

Comment 10 Fedora Update System 2013-05-10 15:49:46 UTC
python-lxml-3.2.0-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/python-lxml-3.2.0-1.fc19

Comment 11 Jiri Popelka 2013-05-10 16:14:47 UTC
Still exists with python-lxml-3.2.0-1.fc19.x86_64

Comment 12 Jeffrey C. Ollie 2013-05-10 16:23:15 UTC
Has anyone reported this upstream?  I don't have the time/experience to debug this myself but I'm certainly willing to pull in patches that are destined for upstream.

Comment 13 Fedora Update System 2013-05-11 00:27:25 UTC
Package python-lxml-3.2.0-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing python-lxml-3.2.0-1.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-7875/python-lxml-3.2.0-1.fc18
then log in and leave karma (feedback).

Comment 14 Fedora Update System 2013-05-12 03:05:04 UTC
python-lxml-3.2.1-1.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/python-lxml-3.2.1-1.fc17

Comment 15 Fedora Update System 2013-05-12 03:05:25 UTC
python-lxml-3.2.1-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/python-lxml-3.2.1-1.fc19

Comment 16 Fedora Update System 2013-05-12 03:05:41 UTC
python-lxml-3.2.1-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/python-lxml-3.2.1-1.fc18

Comment 17 Fedora Update System 2013-05-21 08:31:56 UTC
python-lxml-3.2.1-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 18 Fedora Update System 2013-05-21 08:32:51 UTC
python-lxml-3.2.1-1.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 19 Fedora Update System 2013-05-24 20:12:43 UTC
python-lxml-3.2.1-1.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 20 Tomas Mlcoch 2013-05-29 13:34:36 UTC
Hi, I just test the python-lxml-3.2.1-1 and the problem still persists.

Combination:
libxml2.x86_64 0:2.7.8-7.fc17
libxml2-python.x86_64 0:2.7.8-7.fc17
python-lxml.x86_64 0:2.3.5-1.fc17
== WORKS FINE ==

Combination:
libxml2.x86_64 0:2.7.8-7.fc17
libxml2-python.x86_64 0:2.7.8-7.fc17
python-lxml.x86_64 0:3.2.1-1.fc17
== WORKS FINE ==

But when I update the libxml2 and libxml2-python:
libxml2.x86_64 0:2.7.8-9.fc17
libxml2-python.x86_64 0:2.7.8-9.fc17
python-lxml.x86_64 0:3.2.1-1.fc17
== ERROR ==

How reproducible:

XML:
====
<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<foo>
  <bar>a</bar>
</foo>
<foo>
  <bar>b</bar>
</foo>
</metadata>

Reproducer:
===========
#!/usr/bin/python
from lxml import etree
for element in etree.iterparse(open("xml.xml")):
    print element[0], element[1].tag
 
Actual results:
end bar
end foo
end bar
end foo
end metadata
Traceback (most recent call last):
  File "./reproducer.py", line 3, in <module>
    for element in etree.iterparse(open("xml.xml")):
  File "iterparse.pxi", line 484, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:113793)
  File "iterparse.pxi", line 537, in lxml.etree.iterparse._read_more_events (src/lxml/lxml.etree.c:114367)
  File "parser.pxi", line 627, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:84362)
lxml.etree.XMLSyntaxError: None

(Note: line numbers slightly changed since my first report)

Expected results:
end bar
end foo
end bar
end foo
end metadata

Comment 21 Jeffrey C. Ollie 2013-05-29 15:52:10 UTC
Thomas, please report this upstream to the lxml developers so that this can get fixed.  I do not have the time nor knowledge to fix bugs like this.  I'll be AFK next week so if you want to get an updated package into testing you'll need to work with the upstream developers quickly to get a patch.

Details about the mailing list can be found here:  http://lxml.de/mailinglist/

Comment 22 Tomas Mlcoch 2013-05-30 07:32:37 UTC
I've just reported the bug to the upstream: https://bugs.launchpad.net/lxml/+bug/1185701

Comment 23 Fedora End Of Life 2013-07-04 05:53:59 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 25 Jeffrey C. Ollie 2014-02-28 14:58:33 UTC
The fix for this went out a while ago, not sure why the bug never got closed.


Note You need to log in before you can comment on or make changes to this bug.