Bug 1301928 (CVE-2016-2073)

Summary: CVE-2016-2073 libxml2: out-of-bounds read in htmlParseNameComplex()
Product: [Other] Security Response Reporter: Martin Prpič <mprpic>
Component: vulnerabilityAssignee: Red Hat Product Security <security-response-team>
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: athmanem, carnil, cbuissar, c.david86, erik-fedora, fedora-mingw, ktietz, lfarkas, ohudlick, petr.sumbera, rjones, sardella, slawomir, veillard
Target Milestone: ---Keywords: Security
Target Release: ---   
Hardware: All   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-05-25 10:49:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1301929, 1301930, 1301931    
Bug Blocks: 1301932    

Description Martin Prpič 2016-01-26 12:02:43 UTC
An out-of-bounds read flaw was reported in libxml2's htmlParseNameComplex() function:


A remote attacker could provide a specially crafted XML file that, when processed by an application linked against libxml2, could cause the application to disclose crash.

Comment 1 Martin Prpič 2016-01-26 12:03:38 UTC
Created libxml2 tracking bugs for this issue:

Affects: fedora-all [bug 1301929]

Comment 2 Martin Prpič 2016-01-26 12:03:45 UTC
Created mingw-libxml2 tracking bugs for this issue:

Affects: fedora-all [bug 1301930]
Affects: epel-7 [bug 1301931]

Comment 3 Cedric Buissart 2016-04-25 09:34:58 UTC
Below is my current understanding of this issue (which, I believe, is identical to 1304636) :

The issue is when a word starts with normal ASCII chars and jumps to UTF multibytes chars.

The issue is in htmlParseNameComplex. More precisely, in the while{} loop. The following happens :

len: the size of the word, in bytes. this is used to be able to get back to the begining of the word (i.e.: 'ctxt->input->cur - len')
c : the current character (can be multibytes)
l : the size in bytes of character c

The while loop will find the end of the word. The expectation is the following : 'ctxt->input->cur' points to the end of the word, and len contains the word's length in byte, thus ctxt->input->cur - len points to the beginning of the word.

htmlCurrentChar() is called during the process (via macro CUR_CHAR), and returns the next character (possibly multibytes) along with its size (l is updated).
While in htmlCurrentChar(), if the character is multibytes/non-ASCII, this will lead to a change of encoding via the function xmlSwitchToEncodingInt().
During this switch, xmlBufShrink() is called. The purpose of this function is to remove the beginning of an XML buffer, via memmove. But this is done from the current character, not from the begining of the word. Thus, in the process, ctxt->input->cur will point to the begining of the string, and thus 'ctxt->input->cur - len' will point before the beginning of the string.

The number of bytes to be removed is based on the following calculation : [parserInternals.c:1202] processed = input->cur - input->base;
In order to keep the current word, htmlParseNameComplex's 'len' value should be removed here too so that the shrinking stops at the begining of the word instead of the current character.

i.e.: my understanding is that xmlBufShrink() should not shrink beyond the beginning of the current word (but I have no idea about the potential side effects)


Breakpoint 2, xmlBufShrink__internal_alias (buf=0x602550, len=len@entry=14) at buf.c:386
386     xmlBufShrink(xmlBufPtr buf, size_t len) {
(gdb) bt
#0  xmlBufShrink__internal_alias (buf=0x602550, len=len@entry=14) at buf.c:386
#1  0x00007ffff7aa8764 in xmlSwitchInputEncodingInt (ctxt=0x6045b0, input=0x605b30, handler=0x6023e0, len=45) at parserInternals.c:1194
#2  0x00007ffff7aa9b59 in xmlSwitchToEncodingInt (len=<optimized out>, handler=<optimized out>, ctxt=0x6045b0) at parserInternals.c:1272
#3  xmlSwitchEncoding__internal_alias (ctxt=ctxt@entry=0x6045b0, enc=enc@entry=XML_CHAR_ENCODING_8859_1) at parserInternals.c:1100
#4  0x00007ffff7ae7425 in htmlCurrentChar (ctxt=0x6045b0, len=0x7fffffffdc54) at HTMLparser.c:518
#5  0x00007ffff7ae77d5 in htmlParseNameComplex (ctxt=0x6045b0) at HTMLparser.c:2515
#6  htmlParseName (ctxt=ctxt@entry=0x6045b0) at HTMLparser.c:2483
#7  0x00007ffff7aa0a73 in htmlParseDocTypeDecl (ctxt=ctxt@entry=0x6045b0) at HTMLparser.c:3398
#8  0x00007ffff7aed52d in htmlParseTryOrFinish (terminate=<optimized out>, ctxt=<optimized out>) at HTMLparser.c:5440
#9  htmlParseChunk__internal_alias (ctxt=0x6045b0, chunk=<optimized out>, size=<optimized out>, terminate=0) at HTMLparser.c:6070
#10 0x00000000004007f4 in main (argc=1, arg=0x7fffffffdec8) at foo.c:25

Before the shrink :
(gdb) p *ctxt->input
$9 = {buf = 0x602500, [...], base = 0x6025a0 "<!DOCTYPE html\342\t</</body></html>", cur = 0x6025ae "\342\t</</body></html>", end = 0x6025c1 "", length = 0, line = 1, col = 15, consumed = 0, [...]}

After the shrink:
(gdb) p *ctxt->input
$15 = {buf = 0x602500, [...], base = 0x6025a0 "\342\t</</body></html>", cur = 0x6025ae "tml>", end = 0x6025c1 "", length = 0, line = 1, col = 15, consumed = 0, [...]}

The final input, after readjustement :
(gdb) p *ctxt->input
$23 = {buf = 0x602500, [...], base = 0x605d50 "â\t</</body></html>", cur = 0x605d50 "â\t</</body></html>", end = 0x605d64 "", length = 0, line = 1, col = 15, consumed = 0, [...]}

Comment 4 Cedric Buissart 2016-05-25 10:49:58 UTC
This is resolved by upstream commit CVE-2016-1839 (a820d). Marking as duplicate.

I verified that reproducer for this incident is resolved by that commit.

Marking this as duplicate of CVE-2016-1839 to follow upstream.

*** This bug has been marked as a duplicate of bug 1338703 ***

Comment 5 Doran Moppert 2020-02-10 04:31:00 UTC

This flaw was found to be a duplicate of CVE-2016-1839. Please see https://access.redhat.com/security/cve/CVE-2016-1839 for information about affected products and security errata.