Peter Valchev reported a parser crash with specially formatted UTF-8 sequences[1] I have discovered a way to crash libexpat's xml parser with certain specially formatted UTF-8 sequences. All applications that link w/ expat and use it to render user-provided XML files are affected. As far as I see, the issue is not exploitable, just denial of service. This is the patch that I have come up with, also attached to this email: +++ lib/xmltok_impl.c 2007-12-21 11:11:42.054417000 -0800 @@ -1745,6 +1745,9 @@ switch (BYTE_TYPE(enc, ptr)) { #define LEAD_CASE(n) \ case BT_LEAD ## n: \ + if (end - ptr < n) { \ + return; \ + } \ ptr += n; \ break; LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4) The parser's updatePosition function which keeps track of the current position pointer increments the ptr by {2, 3, 4} to skip past multibyte character ombinations, and this causes ptr in the "while (ptr != end)" loop to jump past the terminating condition, causing the loop to continue reading past 'end' and into out of bounds memory until a crash. In general this parser does not appear the most robust and could be the source of some security issues. A fault file is attached. To reproduce, compile examples/outline.c and run against it. This patch may not be 100% complete... Contact: Peter Valchev <pvalchev at google.com> [1] http://mail.python.org/pipermail/expat-bugs/2009-January/002781.html http://sourceforge.net/tracker/?func=detail&atid=110127&aid=1990430&group_id=10127
*** This bug has been marked as a duplicate of bug 531697 ***