Bug 1649470

Summary: httpd response contains garbage in Content-Type header
Product: Red Hat Enterprise Linux 7 Reporter: Àngel Ollé Blázquez <aollebla>
Component: httpdAssignee: Luboš Uhliarik <luhliari>
Status: CLOSED ERRATA QA Contact: Maryna Nalbandian <mnalband>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.4CC: aollebla, bgollahe, bnater, jorton, kwalker, luhliari
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1724549 1828812 (view as bug list) Environment:
Last Closed: 2020-03-31 20:03:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1716962, 1724549    
Attachments:
Description Flags
fixed magic none

Description Àngel Ollé Blázquez 2018-11-13 16:28:05 UTC
Description of problem:

When I fetch audio file without file suffix, httpd response contains garbage in Content-Type header:

# curl -sv -o /dev/null http://127.0.0.1/path/to/audio
* About to connect() to 127.0.0.1 port 80 (#0)
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> GET /path/to/audio HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 127.0.0.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 01 Nov 2018 20:08:53 GMT
< Server: Apache
< Last-Modified: Thu, 01 Nov 2018 19:22:59 GMT
< Accept-Ranges: bytes
< Content-Length: 24378
< X-Content-Type-Options: nosniff
< X-XSS-protection: 1; mode=block
< Content-Type: audio/unknown@",▒
< Content-Encoding: v/x-wav
<
{ [data not shown]
* Connection #0 to host 127.0.0.1 left intact

analysis:

Request with the extension:

~~~
# curl -sv -o /dev/null localhost:8080/audio2.wav
* About to connect() to localhost port 8080 (#0)
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /audio2.wav HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 08 Nov 2018 13:04:07 GMT
< Server: Apache
< Last-Modified: Wed, 07 Nov 2018 19:54:28 GMT
< ETag: "923032-57a187c631027"
< Accept-Ranges: bytes
< Content-Length: 9580594
< Connection: close
< Content-Type: audio/x-wav
<
{ [data not shown]
* Closing connection 0
~~~

The content-type is ok. Here, the mod_mime_magic module is not used, because the file type is identified by the mod_mime via TypesConfig /etc/mime.types (because I have an entry: audio/x-wav  wav).

The mod_mime_magic is used when no matches after processing mime.types. Here is the issue:

~~~
curl -sv -o /dev/null localhost:8080/audio2
* About to connect() to localhost port 8080 (#0)
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /audio2 HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 08 Nov 2018 13:58:12 GMT
< Server: Apache
< Last-Modified: Wed, 07 Nov 2018 14:15:01 GMT
< ETag: "923032-57a13be6edc57"
< Accept-Ranges: bytes
< Content-Length: 9580594
< Connection: close
< Content-Type: audio/unknown(-audio/x-wav
<
{ [data not shown]
* Closing connection 0
~~~

Because of the garbage, sometimes we get:

~~~
curl -sv -o /dev/null localhost:8080/audio2
* About to connect() to localhost port 8080 (#0)
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /audio2 HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 08 Nov 2018 13:58:50 GMT
< Server: Apache
< Last-Modified: Wed, 07 Nov 2018 14:15:01 GMT
< ETag: "923032-57a13be6edc57"
< Accept-Ranges: bytes
< Content-Length: 9580594
< Connection: close
< Content-Type: audio/unknownh

< Content-Encoding: audio/x-wav
<
{ [data not shown]
* Closing connection 0
~~~

debug mod_mime_magic traces and matches (with junk):

~~~
[Thu Nov 08 15:15:57.475116 2018] [mime_magic:debug] [pid 7521:tid 139889756055296] mod_mime_magic.c(755): [client 127.0.0.1:34632] AH01508: mod_mime_magic: rsl_strdup() 14 chars: audio/unknownH\x0c
[Thu Nov 08 15:15:57.476340 2018] [mime_magic:debug] [pid 7521:tid 139889756055296] mod_mime_magic.c(755): [client 127.0.0.1:34632] AH01508: mod_mime_magic: rsl_strdup() 10 chars: audio/x-wav

~~~

I think that the junk comes from the function magic_rsl_to_request of mod_mime_magic.c that process the RSL and set the MIME info:

~~~
    magic_req_rec *req_dat = (magic_req_rec *)
                    ap_get_module_config(r->request_config, &mime_magic_module);
~~~

I am not very familiar with the httpd api structs, and also the request_config struct attributes are a bit opaque, but seems that the request_config is returning a magic_rsl_s which has a chained next struct magic_rsl_s pointing to some junk:

Here,
~~~
    for (frag = req_dat->head, cur_frag = 0; frag && frag->next; frag = frag->next, cur_frag++) {
~~~

It's returning 3 fragments instead of 2 (2 for the content-type instead of 1):

~~~
(own debug traces)

cur_frag: 0
frag:
audio/unknown
cur_frag: 1
frag:
H

cur_frag: 2
frag:
audio/x-wav
~~~

We can see the junk in the fragment number 1.

Debugging a bit more, the Content-Type and Content-Encoding are collected in:

~~~
tmp = rsl_strdup(r, type_frag, type_pos, type_len);
...
ap_set_content_type(r, tmp);
...
    if (state == rsl_encoding) {
        tmp = rsl_strdup(r, encoding_frag,
                                         encoding_pos, encoding_len);
~~~

the request_rec, r, argument passed to the rsl_strdup contains the junk and it's also assigned to tmp.

I think that the issue is in the magic file which brings with httpd:

~~~
# Microsoft WAVE format (*.wav)
# [GRR 950115:  probably all of the shorts and longs should be leshort/lelong]
#                    Microsoft RIFF
0    string        RIFF        audio/unknown
#                    - WAVE format
>8    string        WAVE        audio/x-wav
~~~

The file format is:

~~~
# The format is 4-5 columns:
#    Column #1: byte number to begin checking from, ">" indicates continuation
#    Column #2: type of data to match
#    Column #3: contents of data to match
#    Column #4: MIME type of result
#    Column #5: MIME encoding of result (optional)
~~~

In fact, the WAVE files has the first 4bytes the RIFF magic number and from the 8th byte has another 4 bytes with the format magic number (WAV 0x57415645):

~~~
xxd /var/www/html/audio2.wav | head
0000000: 5249 4646 2a30 9200 5741 5645 666d 7420  RIFF*0..WAVEfmt
~~~

but, It seems that the 'match' function that processes the multi-level continuations doesn't like the multi-level lines where the first level defines also the MIME type, like the WAVE format:

~~~
0    string        RIFF        audio/unknown
#                    - WAVE format
>8    string        WAVE        audio/x-wav
~~~

~~~
$ file -r -0 -m ../conf/magic /var/www/html/audio2
/var/www/html/audio2: audio/unknown audio/x-wav <= returning two types
~~~

Maybe because not make sense. I think that if we have a multi-level is because we don't want to define the content-type without checking the subsequent levels, and the correct form should be:

~~~
0    string        RIFF       
#                    - WAVE format
>8    string        WAVE        audio/x-wav
~~~

With this, the type is correctly identified:

~~~
curl -sv -o /dev/null localhost:8080/audio2
* About to connect() to localhost port 8080 (#0)
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /audio2 HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 08 Nov 2018 15:10:24 GMT
< Server: Apache
< Last-Modified: Wed, 07 Nov 2018 14:15:01 GMT
< ETag: "923032-57a13be6edc57"
< Accept-Ranges: bytes
< Content-Length: 9580594
< Connection: close
< Content-Type: audio/x-wav
<
{ [data not shown]
* Closing connection 0
~~~

~~~
$ file -r -0 -m ../conf/magic /var/www/html/audio2
/var/www/html/audio2: audio/x-wav
~~~


The RIFF is very generic and the real format is defined into the format offset of the RIFF descriptor, and it is maybe off base to define a type a RIFF file as audio/unknown, because many files uses RIFF, like AVI. 

Little test:

~~~
0       string          RIFF           
>8      string          WAVE            audio/x-wav
>8      string          AVI             video/x-msvideo
~~~

result:

~~~
curl -sv -o /dev/null localhost:8080/drop
* About to connect() to localhost port 8080 (#0)
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /drop HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 08 Nov 2018 15:20:16 GMT
< Server: Apache
< Last-Modified: Wed, 07 Nov 2018 22:12:34 GMT
< ETag: "a5000-57a1a6a4ef64c"
< Accept-Ranges: bytes
< Content-Length: 675840
< Connection: close
< Content-Type: video/x-msvideo
<
{ [data not shown]
* Closing connection 0
~~~

~~~
$ file -r -0 -m ../conf/magic /var/www/html/drop
/var/www/html/drop: video/x-msvideo
~~~


Version-Release number of selected component (if applicable):

httpd-2.4.6-67.el7_4.6.x86_64 and also Apache/2.4.29.


How reproducible:


Steps to Reproduce:
1. make a request to fetch WAVE audio file without file suffix:

curl -sv -o /dev/null http://127.0.0.1/path/to/audio
* About to connect() to 127.0.0.1 port 80 (#0)
*   Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> GET /path/to/audio HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 127.0.0.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Thu, 01 Nov 2018 20:08:53 GMT
< Server: Apache
< Last-Modified: Thu, 01 Nov 2018 19:22:59 GMT
< Accept-Ranges: bytes
< Content-Length: 24378
< X-Content-Type-Options: nosniff
< X-XSS-protection: 1; mode=block
< Content-Type: audio/unknown@",▒
< Content-Encoding: v/x-wav
<
{ [data not shown]
* Connection #0 to host 127.0.0.1 left intact


Actual results:

The content-type header contains garbage and the type it is not well identified by magic:
< Content-Type: audio/unknown@",▒

Expected results:

The content-type should be audio/x-wav without garbage.

Additional info:

workaround:

replace the following line of your /etc/httpd/conf/magic file:

~~~
0	string		RIFF		audio/unknown
~~~

to

~~~
0       string          RIFF
~~~

If you have the default magic file. 

Final form:

~~~
0    string        RIFF
>8    string        WAVE        audio/x-wav
~~~

Comment 2 Àngel Ollé Blázquez 2019-01-30 21:36:52 UTC
Created attachment 1525150 [details]
fixed magic

Comment 4 Joe Orton 2019-06-27 09:11:58 UTC
Àngel - thanks, very nice analysis!

I agree with your conclusion that mod_mime_magic can't handle both a MIME type defined for the top-level match and the continuation line, and have pushed your suggested change to the magic file upstream:

https://svn.apache.org/viewvc?view=revision&revision=1862200

I am not sure where the memory corruption is coming from and can't reproduce that against upstream, but possibly it is this fix:

https://svn.apache.org/viewvc?view=revision&revision=1491700

Comment 5 Joe Orton 2019-07-05 11:28:52 UTC
Merged upstream for 2.4.40 - http://svn.apache.org/r1862604

Comment 11 errata-xmlrpc 2020-03-31 20:03:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1121