Red Hat Bugzilla – Bug 1472829
mod_auth_mellon SIGSEGV in am_urldecode
Last modified: 2017-11-21 10:22:20 EST
Description of problem:
Customer reports the following:
When handling a SAML2 session or cookie expiry, a long RelayState can cause some or all of the existing httpd child processes to segfault together meaning the user is stuck with an error page (and potentially loses data stuck in a post-preservation state).
Program terminated with signal 11, Segmentation fault.
#0 am_urldecode (data=<value optimized out>) at auth_mellon_util.c:664
664 *op = '\0';
(gdb) p op
$2 = 0x7f26250e23f1 ""
#0 am_urldecode (data=<value optimized out>) at auth_mellon_util.c:664
#1 0x00007f26250de756 in am_post_mkform_urlencoded (r=0x7f2630bc1538, post_data=<value optimized out>) at auth_mellon_handler.c:2325
#2 0x00007f26250de274 in am_handle_repost (r=0x7f2630bc1538) at auth_mellon_handler.c:2454
#3 am_handler (r=0x7f2630bc1538) at auth_mellon_handler.c:3116
#4 0x00007f262fa10fc0 in ap_run_handler (r=0x7f2630bc1538) at /usr/src/debug/httpd-2.2.15/server/config.c:158
#5 0x00007f262fa1487e in ap_invoke_handler (r=0x7f2630bc1538) at /usr/src/debug/httpd-2.2.15/server/config.c:376
#6 0x00007f262fa1ffd0 in ap_process_request (r=0x7f2630bc1538) at /usr/src/debug/httpd-2.2.15/modules/http/http_request.c:282
#7 0x00007f262fa1ce18 in ap_process_http_connection (c=0x7f26309f8348) at /usr/src/debug/httpd-2.2.15/modules/http/http_core.c:190
#8 0x00007f262fa18ae8 in ap_run_process_connection (c=0x7f26309f8348) at /usr/src/debug/httpd-2.2.15/server/connection.c:43
#9 0x00007f262fa24d77 in child_main (child_num_arg=<value optimized out>) at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:670
#10 0x00007f262fa25099 in make_child (s=0x7f26305c5870, slot=2) at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:773
#11 0x00007f262fa25d1c in perform_idle_server_maintenance (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>)
#12 ap_mpm_run (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) at /usr/src/debug/httpd-2.2.15/server/mpm/prefork/prefork.c:1112
#13 0x00007f262f9fcaa0 in main (argc=1, argv=0x7ffea1867e58) at /usr/src/debug/httpd-2.2.15/server/main.c:763
Version-Release number of selected component (if applicable):
Upstream report looks similar: https://github.com/UNINETT/mod_auth_mellon/pull/115
We can certainly fix the segfault but that may be fixing the symptom rather that the root cause.
The problem is am_post_mkform_urlencoded() does not detect the case where a query parameter does not have a value. More to the point I do not believe it's even an RFC requirement that the query part of a URI be formatted as a sequence of name=value pairs. So certainly am_post_mkform_urlencoded() needs some tightening up.
But what really worries me is statement the problem manifests itself when the RelayState parameter is too long. No where in this bug report does it state what the RelayState parameter was, that is essential information. If the RelayState parameter is being dropped for some reason there are going to be a host of other problems even after fixing the segfault because the RealyState is an essential piece of data for the SAML flow. If it's missing you won't be able to get back to the original resource.
Also there are two things which violate the SAML spec going on here (if indeed the RelayState is the root cause).
1) SAML requires each party that receives a RelayState value to pass it along unmodified.
2) SAML mandates the RelayState not exceed 80 characters.
Mellon (as well as a number of other SAML implementations) have never enforced the 80 character limit on the RelayState because you can usually get away with embedding the resource URL in the RelayState. But there are limits on how long URL's can be as well as their query parameters. Mellon could generate a unique id for the resource URL and store that in a table. The unique id is then encoded in the RelayState as opposed to the resource URL, this has other privacy advantages as well.
I need 2 pieces of information:
I would like to see the post_data passed to am_post_mkform_urlencoded(), this can be captured in gdb, but please make sure it's the entire string and gdb didn't truncate it.
I would also like to know what the RelayState was.
It would also be helpful to know why the customer believes a long RelayState is the root cause (any query parameter with a null value will trigger the segfault, it does not have to be the RelayState parameter, but if the RelayState is being stripped that is essential information, providing the post_data will answer this.
1) The RelayState is derived from the return_url, it's also url-encoded which usually increases it sized (depends on how many characters need escaping)
2) You can get the post data from your browser by using the browser's developer tools. With Firefox it's firebug or with Chrome it's the built-in developer tool panel. What you need to do start an authentication flow in the browser and then examine the HTML response from the IdP, it should be mostly form data.
Hint, if you use firebug the data is in the "net" panel but you must enable the "persist" button in the "net" panel, that keeps a running log of all requests, otherwise the "net" panel only show the most recent request. The data should be in the "html" tab. Chrome always seems to keep a persistent log until cleared.
I've attached the requested post data containing the RelayState, which appears to be over 80 characters as described.
In addition, the customer offered the following clarification on reproducing the issue: "... this is a problem when a local mellon session times out (or you clear the local cookie so the session is invalid) and the next request is a POST with a large RelayState."
With this additional explanation, I'll test this out to see if we have a reproducer.
I think the issue must be elsewhere, i.e. not caused by RelayState with length greater than 80. I was able to test this and see the following message in httpd's error_log:
Lasso-WARNING **: 2017-08-08 12:43:22 Encoded a RelayState of more than 80 bytes, see #3.4.3 of saml-bindings-2.0-os
The request still completed successfully for me.
I believe this is the same issue identified in this upstream issue report and pull request:
The upstream commit is:
The issue occurs when the protected resource is accessed via a POST as opposed to a GET. I do not know whether or not that is sufficient to trigger the segfault. I personally have not encountered this particular failure, mostly have been relying on upstream to diagnose and fix this issue.
Specifically the issue occurs when the enctype query parameter is application/x-www-form-urlencoded and one of the query parameters has an empty value, e.g. foo=
when the requust is posted to the "repost" mellon endpoint.
The sequence of operations is as follows:
1) HTTP client sends a POST request to a resource protected by Mellon.
2) Mellon saves the post information and begins the authentication flow to the IdP.
3) After successful authentication from the IdP Mellon will redirect to the RelayState parameter, which because this was a saved POST will look like this:
"%srepost?id=%s&ReturnTo=%s&enctype=%s%s" where the %s substitutions are:
After this redirect you'll land back in the am_handle_repost handler where because the enctype is application/x-www-form-urlencoded it will call am_post_mkform_urlencoded which will try to parse the list of query parameters, if one of those query parameters has an empty value it will segfault.
You can either try to reproduce the entire sequence or short circuit the entire flow and just send a request with the above query parameters to $MellonEndpointPath/repost.