Bug 1301928 (CVE-2016-2073) - CVE-2016-2073 libxml2: out-of-bounds read in htmlParseNameComplex()
Summary: CVE-2016-2073 libxml2: out-of-bounds read in htmlParseNameComplex()
Status: CLOSED DUPLICATE of bug 1338703
Alias: CVE-2016-2073
Product: Security Response
Classification: Other
Component: vulnerability
Version: unspecified
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Red Hat Product Security
QA Contact:
Depends On: 1301929 1301930 1301931
Blocks: 1301932
TreeView+ depends on / blocked
Reported: 2016-01-26 12:02 UTC by Martin Prpič
Modified: 2021-02-17 04:26 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2016-05-25 10:49:58 UTC

Attachments (Terms of Use)

Description Martin Prpič 2016-01-26 12:02:43 UTC
An out-of-bounds read flaw was reported in libxml2's htmlParseNameComplex() function:


A remote attacker could provide a specially crafted XML file that, when processed by an application linked against libxml2, could cause the application to disclose crash.

Comment 1 Martin Prpič 2016-01-26 12:03:38 UTC
Created libxml2 tracking bugs for this issue:

Affects: fedora-all [bug 1301929]

Comment 2 Martin Prpič 2016-01-26 12:03:45 UTC
Created mingw-libxml2 tracking bugs for this issue:

Affects: fedora-all [bug 1301930]
Affects: epel-7 [bug 1301931]

Comment 3 Cedric Buissart 2016-04-25 09:34:58 UTC
Below is my current understanding of this issue (which, I believe, is identical to 1304636) :

The issue is when a word starts with normal ASCII chars and jumps to UTF multibytes chars.

The issue is in htmlParseNameComplex. More precisely, in the while{} loop. The following happens :

len: the size of the word, in bytes. this is used to be able to get back to the begining of the word (i.e.: 'ctxt->input->cur - len')
c : the current character (can be multibytes)
l : the size in bytes of character c

The while loop will find the end of the word. The expectation is the following : 'ctxt->input->cur' points to the end of the word, and len contains the word's length in byte, thus ctxt->input->cur - len points to the beginning of the word.

htmlCurrentChar() is called during the process (via macro CUR_CHAR), and returns the next character (possibly multibytes) along with its size (l is updated).
While in htmlCurrentChar(), if the character is multibytes/non-ASCII, this will lead to a change of encoding via the function xmlSwitchToEncodingInt().
During this switch, xmlBufShrink() is called. The purpose of this function is to remove the beginning of an XML buffer, via memmove. But this is done from the current character, not from the begining of the word. Thus, in the process, ctxt->input->cur will point to the begining of the string, and thus 'ctxt->input->cur - len' will point before the beginning of the string.

The number of bytes to be removed is based on the following calculation : [parserInternals.c:1202] processed = input->cur - input->base;
In order to keep the current word, htmlParseNameComplex's 'len' value should be removed here too so that the shrinking stops at the begining of the word instead of the current character.

i.e.: my understanding is that xmlBufShrink() should not shrink beyond the beginning of the current word (but I have no idea about the potential side effects)


Breakpoint 2, xmlBufShrink__internal_alias (buf=0x602550, len=len@entry=14) at buf.c:386
386     xmlBufShrink(xmlBufPtr buf, size_t len) {
(gdb) bt
#0  xmlBufShrink__internal_alias (buf=0x602550, len=len@entry=14) at buf.c:386
#1  0x00007ffff7aa8764 in xmlSwitchInputEncodingInt (ctxt=0x6045b0, input=0x605b30, handler=0x6023e0, len=45) at parserInternals.c:1194
#2  0x00007ffff7aa9b59 in xmlSwitchToEncodingInt (len=<optimized out>, handler=<optimized out>, ctxt=0x6045b0) at parserInternals.c:1272
#3  xmlSwitchEncoding__internal_alias (ctxt=ctxt@entry=0x6045b0, enc=enc@entry=XML_CHAR_ENCODING_8859_1) at parserInternals.c:1100
#4  0x00007ffff7ae7425 in htmlCurrentChar (ctxt=0x6045b0, len=0x7fffffffdc54) at HTMLparser.c:518
#5  0x00007ffff7ae77d5 in htmlParseNameComplex (ctxt=0x6045b0) at HTMLparser.c:2515
#6  htmlParseName (ctxt=ctxt@entry=0x6045b0) at HTMLparser.c:2483
#7  0x00007ffff7aa0a73 in htmlParseDocTypeDecl (ctxt=ctxt@entry=0x6045b0) at HTMLparser.c:3398
#8  0x00007ffff7aed52d in htmlParseTryOrFinish (terminate=<optimized out>, ctxt=<optimized out>) at HTMLparser.c:5440
#9  htmlParseChunk__internal_alias (ctxt=0x6045b0, chunk=<optimized out>, size=<optimized out>, terminate=0) at HTMLparser.c:6070
#10 0x00000000004007f4 in main (argc=1, arg=0x7fffffffdec8) at foo.c:25

Before the shrink :
(gdb) p *ctxt->input
$9 = {buf = 0x602500, [...], base = 0x6025a0 "<!DOCTYPE html\342\t</</body></html>", cur = 0x6025ae "\342\t</</body></html>", end = 0x6025c1 "", length = 0, line = 1, col = 15, consumed = 0, [...]}

After the shrink:
(gdb) p *ctxt->input
$15 = {buf = 0x602500, [...], base = 0x6025a0 "\342\t</</body></html>", cur = 0x6025ae "tml>", end = 0x6025c1 "", length = 0, line = 1, col = 15, consumed = 0, [...]}

The final input, after readjustement :
(gdb) p *ctxt->input
$23 = {buf = 0x602500, [...], base = 0x605d50 "â\t</</body></html>", cur = 0x605d50 "â\t</</body></html>", end = 0x605d64 "", length = 0, line = 1, col = 15, consumed = 0, [...]}

Comment 4 Cedric Buissart 2016-05-25 10:49:58 UTC
This is resolved by upstream commit CVE-2016-1839 (a820d). Marking as duplicate.

I verified that reproducer for this incident is resolved by that commit.

Marking this as duplicate of CVE-2016-1839 to follow upstream.

*** This bug has been marked as a duplicate of bug 1338703 ***

Comment 5 Doran Moppert 2020-02-10 04:31:00 UTC

This flaw was found to be a duplicate of CVE-2016-1839. Please see https://access.redhat.com/security/cve/CVE-2016-1839 for information about affected products and security errata.

Note You need to log in before you can comment on or make changes to this bug.