Bug 1472829 - mod_auth_mellon SIGSEGV in am_urldecode
mod_auth_mellon SIGSEGV in am_urldecode
Status: ASSIGNED
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: mod_auth_mellon (Show other bugs)
6.9
Unspecified Unspecified
urgent Severity urgent
: rc
: ---
Assigned To: John Dennis
ipa-qe
:
Depends On:
Blocks: 1461138 1484119
  Show dependency treegraph
 
Reported: 2017-07-19 09:34 EDT by Robert Bost
Modified: 2017-09-20 12:19 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1484119 (view as bug list)
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3119651 None None None 2017-07-19 09:54 EDT

  None (edit)
Description Robert Bost 2017-07-19 09:34:22 EDT
Description of problem:

Customer reports the following:
When handling a SAML2 session or cookie expiry, a long RelayState can cause some or all of the existing httpd child processes to segfault together meaning the user is stuck with an error page (and potentially loses data stuck in a post-preservation state).


Program terminated with signal 11, Segmentation fault.
#0  am_urldecode (data=<value optimized out>) at auth_mellon_util.c:664
664	    *op = '\0';
(gdb) p op
$2 = 0x7f26250e23f1 ""
(gdb) bt
#0  am_urldecode (data=<value optimized out>) at auth_mellon_util.c:664
#1  0x00007f26250de756 in am_post_mkform_urlencoded (r=0x7f2630bc1538, post_data=<value optimized out>) at auth_mellon_handler.c:2325
#2  0x00007f26250de274 in am_handle_repost (r=0x7f2630bc1538) at auth_mellon_handler.c:2454
#3  am_handler (r=0x7f2630bc1538) at auth_mellon_handler.c:3116
#4  0x00007f262fa10fc0 in ap_run_handler (r=0x7f2630bc1538) at /usr/src/debug/httpd-2.2.15/server/config.c:158
#5  0x00007f262fa1487e in ap_invoke_handler (r=0x7f2630bc1538) at /usr/src/debug/httpd-2.2.15/server/config.c:376
#6  0x00007f262fa1ffd0 in ap_process_request (r=0x7f2630bc1538) at /usr/src/debug/httpd-2.2.15/modules/http/http_request.c:282
#7  0x00007f262fa1ce18 in ap_process_http_connection (c=0x7f26309f8348) at /usr/src/debug/httpd-2.2.15/modules/http/http_core.c:190
#8  0x00007f262fa18ae8 in ap_run_process_connection (c=0x7f26309f8348) at /usr/src/debug/httpd-2.2.15/server/connection.c:43
#9  0x00007f262fa24d77 in child_main (child_num_arg=<value optimized out>) at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:670
#10 0x00007f262fa25099 in make_child (s=0x7f26305c5870, slot=2) at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:773
#11 0x00007f262fa25d1c in perform_idle_server_maintenance (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>)
    at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:908
#12 ap_mpm_run (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:1112
#13 0x00007f262f9fcaa0 in main (argc=1, argv=0x7ffea1867e58) at /usr/src/debug/httpd-2.2.15/server/main.c:763

Version-Release number of selected component (if applicable):
mod_auth_mellon-0.8.0-4.el6.x86_64

Additional info:
Upstream report looks similar: https://github.com/UNINETT/mod_auth_mellon/pull/115
Comment 2 John Dennis 2017-07-20 12:58:35 EDT
We can certainly fix the segfault but that may be fixing the symptom rather that the root cause.

The problem is am_post_mkform_urlencoded() does not detect the case where a query parameter does not have a value. More to the point I do not believe it's even an RFC requirement that the query part of a URI be formatted as a sequence of name=value pairs. So certainly am_post_mkform_urlencoded() needs some tightening up.

But what really worries me is statement the problem manifests itself when the RelayState parameter is too long. No where in this bug report does it state what the RelayState parameter was, that is essential information. If the RelayState parameter is being dropped for some reason there are going to be a host of other problems even after fixing the segfault because the RealyState is an essential piece of data for the SAML flow. If it's missing you won't be able to get back to the original resource.

Also there are two things which violate the SAML spec going on here (if indeed the RelayState is the root cause).

1) SAML requires each party that receives a RelayState value to pass it along unmodified.

2) SAML mandates the RelayState not exceed 80 characters.

Mellon (as well as a number of other SAML implementations) have never enforced the 80 character limit on the RelayState because you can usually get away with embedding the resource URL in the RelayState. But there are limits on how long URL's can be as well as their query parameters. Mellon could generate a unique id for the resource URL and store that in a table. The unique id is then encoded in the RelayState as opposed to the resource URL, this has other privacy advantages as well.

I need 2 pieces of information:

I would like to see the post_data passed to am_post_mkform_urlencoded(), this can be captured in gdb, but please make sure it's the entire string and gdb didn't truncate it.

I would also like to know what the RelayState was.

It would also be helpful to know why the customer believes a long RelayState is the root cause (any query parameter with a null value will trigger the segfault, it does not have to be the RelayState parameter, but if the RelayState is being stripped that is essential information, providing the post_data will answer this.
Comment 4 John Dennis 2017-08-03 16:04:25 EDT
1) The RelayState is derived from the return_url, it's also url-encoded which usually increases it sized (depends on how many characters need escaping)

2) You can get the post data from your browser by using the browser's developer tools. With Firefox it's firebug or with Chrome it's the built-in developer tool panel. What you need to do start an authentication flow in the browser and then examine the HTML response from the IdP, it should be mostly form data.

Hint, if you use firebug the data is in the "net" panel but you must enable the "persist" button in the "net" panel, that keeps a running log of all requests, otherwise the "net" panel only show the most recent request. The data should be in the "html" tab. Chrome always seems to keep a persistent log until cleared.
Comment 6 David Mulford 2017-08-07 09:52:10 EDT
I've attached the requested post data containing the RelayState, which appears to be over 80 characters as described.

In addition, the customer offered the following clarification on reproducing the issue: "... this is a problem when a local mellon session times out (or you clear the local cookie so the session is invalid) and the next request is a POST with a large RelayState."

With this additional explanation, I'll test this out to see if we have a reproducer.
Comment 7 David Mulford 2017-08-08 13:09:06 EDT
I think the issue must be elsewhere, i.e. not caused by RelayState with length greater than 80. I was able to test this and see the following message in httpd's error_log:

Lasso-WARNING **: 2017-08-08 12:43:22	Encoded a RelayState of more than 80 bytes, see #3.4.3 of saml-bindings-2.0-os

The request still completed successfully for me.
Comment 8 John Dennis 2017-08-22 10:26:10 EDT
I believe this is the same issue identified in this upstream issue report and pull request:

https://github.com/UNINETT/mod_auth_mellon/pull/115

The upstream commit is:

967f820a087bf2567db4b5f0dbeeff845c1e7f8a

Note You need to log in before you can comment on or make changes to this bug.