Bug 1760106 - FAIL_RETURN_IN_FIPS_MODE() patch breaks mod_wsgi: ValueError: unsupported hash type blake2b [NEEDINFO]
Summary: FAIL_RETURN_IN_FIPS_MODE() patch breaks mod_wsgi: ValueError: unsupported has...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: python3
Version: 8.1
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 8.1
Assignee: Python Maintainers
QA Contact: Lukáš Zachar
URL:
Whiteboard:
Depends On:
Blocks: 1758119 1760850 1761599
TreeView+ depends on / blocked
 
Reported: 2019-10-09 21:15 UTC by Christian Heimes
Modified: 2019-11-05 22:04 UTC (History)
8 users (show)

Fixed In Version: python3-3.6.8-15.1.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1761599 (view as bug list)
Environment:
Last Closed: 2019-11-05 22:03:44 UTC
Type: Bug
Target Upstream Version:
cheimes: needinfo? (mmcgrath)


Attachments (Terms of Use)
reproducer script (390 bytes, text/x-python)
2019-10-09 21:15 UTC, Christian Heimes
no flags Details
Skip error checking in _Py_hashlib_fips_error (1.29 KB, patch)
2019-10-10 11:10 UTC, Petr Viktorin
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1745499 'unspecified' 'CLOSED' 'mod_wsgi: hashlib.get_fips_mode() fails with ValueError: [PEM routines: get_name] no start line' 2019-12-05 15:35:44 UTC
Red Hat Bugzilla 1758119 'unspecified' 'NEW' '/var/log/httpd/error_log contains errors about blake2b' 2019-12-05 15:35:44 UTC
Red Hat Product Errata RHSA-2019:3520 None None None 2019-11-05 22:04:04 UTC

Description Christian Heimes 2019-10-09 21:15:18 UTC
Created attachment 1624052 [details]
reproducer script

Description of problem:
IPA, KdcProxy and other mod_wsgi applications fail to start in RHEL 8.1. A bug in the macro FAIL_RETURN_IN_FIPS_MODE() and function call _Py_hashlib_fips_error() breaks import of mod_wsgi with error message:  "ValueError: unsupported hash type blake2b".

The root cause of the bug is the same as #1745499. mod_ssl leaves an error state on the OpenSSL error stack. _Py_hashlib_fips_error() does not clear the error stack and turns the leftover error into an exception.

Version-Release number of selected component (if applicable):
python3-libs-3.6.8-15.el8.x86_64
mod_ssl-2.4.37-16.module+el8.1.0+4134+e6bad0ed.x86_64
ipa-server-4.8.0-11.module+el8.1.0+4247+9f3fd721.x86_64

How reproducible:
always

Steps to Reproduce:
- execute attached reproducer script
- or Install IPA with ipa-server-install and check /var/log/http/error_log

Actual results:
The initial exception is:

ERROR:root:code for hash blake2b was not found.
Traceback (most recent call last):
  File "/usr/lib64/python3.6/hashlib.py", line 219, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/usr/lib64/python3.6/hashlib.py", line 130, in __get_openssl_constructor
    return __get_builtin_constructor(name)
  File "/usr/lib64/python3.6/hashlib.py", line 120, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type blake2b


After I replaced "pass" on hashlib.py:114 with "raise" I got the proper error message:

mod_wsgi (pid=16936): Exception occurred processing WSGI script '/usr/share/ipa/kdcproxy.wsgi'.
Traceback (most recent call last):
  File "/usr/share/ipa/kdcproxy.wsgi", line 4, in <module>
    from kdcproxy import application
  File "/usr/lib/python3.6/site-packages/kdcproxy/__init__.py", line 31, in <module>
    from kdcproxy.config import MetaResolver
  File "/usr/lib/python3.6/site-packages/kdcproxy/config/__init__.py", line 33, in <module>
    import dns.resolver
  File "/usr/lib/python3.6/site-packages/dns/resolver.py", line 24, in <module>
    import random
  File "/usr/lib64/python3.6/random.py", line 46, in <module>
    from hashlib import sha512 as _sha512
  File "/usr/lib64/python3.6/hashlib.py", line 220, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/usr/lib64/python3.6/hashlib.py", line 131, in __get_openssl_constructor
    return __get_builtin_constructor(name)
  File "/usr/lib64/python3.6/hashlib.py", line 101, in __get_builtin_constructor
    import _blake2
ImportError: [PEM routines: get_name] no start line

Expected results:
No exception

Additional info:

Comment 8 Petr Viktorin 2019-10-10 08:41:18 UTC
Well, again, from the Python side, I don't really see an error in the reproducer. You set an error, and you get an exception with that exact error.

Again, as someone unfamiliar with the OpenSL error stack, I see two possibilities:

- either applications are expected to leave unhandled errors lying around, and we're supposed to clear and ignore unknown errors. This sounds very wrong, at least to an outsider. If it is the case, could you point me to some documentation on how to actually deal with the OpenSSL error stack? Or could someone familiar with OpenSSL to write the patch?

- or this is asking to work around a bug in mod_wsgi, which is fine, but could you file a mod_wsgi bug we can link to?

Comment 9 Alexander Bokovoy 2019-10-10 09:13:16 UTC
mod_wsgi doesn't call any OpenSSL code itself, so there is no error cleanup to do there.

Perhaps, _hashlib_get_fips_mode_impl() needs to peek the last error before calling into FIPS_mode() and then pick again in case of 0 result. Then compare these two and see if they aren't the same and raise the last error.

Comment 10 Christian Heimes 2019-10-10 09:13:31 UTC
The exception is the bug in the downstream FIPS patch. "import hashlib" must not fail in Apache mod_wsgi environment on a HTTPS enabled Apache HTTPd server. The reproducer is a simplified examples how Apache HTTPd, mod_ssl, mod_wsgi, and embedded Python interact. The reproducer is not failing with upstream Python or Python on Fedora 29. Only the downstream patch turns the leftover error on the error stack into an exception and causes a regression for mod_wsgi applications.

Python 3.7 on Fedora:

$ python3 hashlib_fips_bug.py 
$ echo $?
0

Python 3.6 on RHEL 8 with FIPS patch:

$ /usr/libexec/platform-python hashlib_fips_bug.py 
ERROR:root:code for hash blake2b was not found.
Traceback (most recent call last):
  File "/usr/lib64/python3.6/hashlib.py", line 219, in <module>
    globals()[__func_name] = __get_hash(__func_name)
  File "/usr/lib64/python3.6/hashlib.py", line 130, in __get_openssl_constructor
    return __get_builtin_constructor(name)
  File "/usr/lib64/python3.6/hashlib.py", line 120, in __get_builtin_constructor
    raise ValueError('unsupported hash type ' + name)
ValueError: unsupported hash type blake2b

If you change "except ImportError: pass" into "except ImportError: raise", then you get the actual error:

  ...
  File "/usr/lib64/python3.6/hashlib.py", line 101, in __get_builtin_constructor
    import _blake2
ImportError: [PEM routines: get_name] no start line


To be honest I'm not familiar with any good documentation on OpenSSL's error stack. AFAIK code must handle the error stack whenever an OpenSSL function call indicates a potential error situation. That also means that _hashlib.get_fips_mode() is theoretically buggy because it does not clear the error stack after FIPS_mode() call. In practice it can never error because RHEL always comes with FIPS capable OpenSSL.

In my humble opinion this is an old bug in mod_ssl. mod_wsgi is innocent as it does not call any OpenSSL APIs. I also don't want to tell you how to fix the bug. I leave it up to you if you want to take the problem up to mod_ssl developers or address the problem in the FIPS downstream patch. I'm only interested to get the bug fixed ASAP.

Comment 12 Petr Viktorin 2019-10-10 11:10:25 UTC
Created attachment 1624267 [details]
Skip error checking in _Py_hashlib_fips_error

Comment 13 Petr Viktorin 2019-10-10 11:14:45 UTC
python-maint will build and do an errata once this bug has the acks. (IdM, please argue for the blocker+.)

What do you need to start testing a fix with FreeIPA?

Comment 14 Christian Heimes 2019-10-10 11:16:13 UTC
Can you provide a scratch build with the fix?

Comment 17 Petr Viktorin 2019-10-10 12:40:29 UTC
The scratch build is here: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=23948020

Comment 21 Petr Viktorin 2019-10-10 13:14:57 UTC
Tomáš, I was referred to you as an OpenSSL expert. Could you please advise on how to deal with the OpenSSL error stack? The issue here is that another library (mod_ssl, presumably) is putting errors on the stack. My expectation is that it should clean up after itself, but apparently it's acceptable to leave errors there.

Christian says:
> To be honest I'm not familiar with any good documentation on OpenSSL's error stack. AFAIK code must handle the error stack whenever an OpenSSL function call indicates a potential error situation. That also means that _hashlib.get_fips_mode() is theoretically buggy because it does not clear the error stack after FIPS_mode() call. In practice it can never error because RHEL always comes with FIPS capable OpenSSL.

That's my expectation as well. We are calling the function FIPS_mode; about which the wiki ( https://wiki.openssl.org/index.php/FIPS_mode() ) says: 

> Calling the function from an application linked to OpenSSL versions 1.1.0 or 1.1.1 will always return 0, indicating non-FIPS mode, with an error code of CRYPTO_R_FIPS_MODE_NOT_SUPPORTED (0x0f06d065). 

How are we supposed to check this error? I did ERR_peek_last_error(), and if an exception was set, ERR_clear_error() and . However, that doesn't work if a different library is allowed to leave errors on the error stack?
Or should I check for CRYPTO_R_FIPS_MODE_NOT_SUPPORTED and only report that? Is it guaranteed that FIPS_mode() will not cause different errors (e.g. in future versions)?

Comment 22 Petr Viktorin 2019-10-10 13:17:27 UTC
As I some time ago, we don't have security in python-maint, and we need help. Christian said he can spend time on these things in Python, but that didn't really happen. So, I wrote the FIPS patch myself. Yes, it is written by an amateur. I don't think it is possible for me to learn OpenSSL fast enough -- I think we all agree the documentation is lacking, especially in in this area.

I don't need reminding that the patch is buggy. I did what I could, and I'm continuing to ask for help.


Christian:
> In my humble opinion this is an old bug in mod_ssl. mod_wsgi is innocent as it does not call any OpenSSL APIs. I also don't want to tell you how to fix the bug. I leave it up to you if you want to take the problem up to mod_ssl developers or address the problem in the FIPS downstream patch. I'm only interested to get the bug fixed ASAP.

Since you agreed some time ago to help with issues like this, that comment surprises me.

Comment 23 Petr Viktorin 2019-10-10 13:20:43 UTC
As for 0day or not, that's a PM/QE decision. python-maint can provide the builds and erratas when the acks are there.

Comment 25 Christian Heimes 2019-10-10 13:41:43 UTC
(In reply to Petr Viktorin from comment #22)
> Christian:
> > In my humble opinion this is an old bug in mod_ssl. mod_wsgi is innocent as it does not call any OpenSSL APIs. I also don't want to tell you how to fix the bug. I leave it up to you if you want to take the problem up to mod_ssl developers or address the problem in the FIPS downstream patch. I'm only interested to get the bug fixed ASAP.
> 
> Since you agreed some time ago to help with issues like this, that comment
> surprises me.

I'm still willing to assist with the general issue. That hasn't changed. However a proper and technical sound fix either requires a thorough and lengthy discussion with mod_ssl upstream development. Or it requires to make a decision how to handle errors on the error stack in FIPS downstream patches. Since time is running out and the regression is breaking IdM, we need a fix ASAP. I'm not confident that mod_ssl is going to address the issue in time.

Your patch is solving the problem for us. This means the ASAP part was is handled quickly and efficiently. Thank you very much for that.

For 8.2 it makes sense to investigate the issue properly and to look for a better approach. I have an interest to get the problem solved as both upstream owner of CPython's hashlib and downstream owner of FIPS support in IdM. Since the FIPS patch set is owned by the Python maintenance team, it is ultimately your call how you want to drive the effort. I can give you my expertise and assist with testing, but I don't see myself as owner of the undertaking.

Comment 27 Tomas Mraz 2019-10-10 13:54:53 UTC
In general you should not depend on error stack to be empty before calling an OpenSSL function. But on the other hand you should clear the errors on the stack if your  call generates some.

So you should call ERR_clear_error() before the FIPS_mode() call and call ERR_get_error() repeatedly to clean up all the errors from the stack after the error return from the FIPS_mode() call.

On the other hand there is no big point in reporting exception for unimplemented FIPS mode error from OpenSSL - it can be treated just as if the FIPS mode was not set. So you might just call ERR_clear_error() when FIPS_mode() returns 0.

Comment 29 Tomas Mraz 2019-10-10 13:57:59 UTC
To be absolutely on the safe side, your patch makes sense for the blocker resolution because it is the least invasive and lowest risk solution. But for upstream what I said in comment 27 applies.

Comment 30 Petr Viktorin 2019-10-10 13:58:53 UTC
I reacted emotionally. I am sorry for that, Christian!
Thank you for clarifying the "ASAP" comment; that had me confused. Now, even the emotional side of my brain believes we're all nice human beings! :)

I plan to leave the blocker meeting to you, since python-maint doesn't have much to say about prioritization. We can push the build anywhere it's approved. But, good luck with pushing it through!


Also, I'm sorry for pushing python3's FIPS patch so late. Changing expectations, and OpenSSL bugs, and my inexperience with OpenSSL kept us blocked too long. I couldn't have done it much sooner, but still, the lack of testing time is now causing issues :(

Comment 31 Petr Viktorin 2019-10-10 14:03:01 UTC
Tomáš, thank you!

Yup, I was mostly asking about upstream and general practices. The wiki implies that on RHEL8, FIPS_mode cannot fail with error.

Still, calling ERR_clear_error() to discard an error indication makes me very uncomfortable. I was trained that "errors should never pass silently"; that ignoring unknown issues is a recipe for later disaster.

Comment 32 Tomas Mraz 2019-10-10 14:08:24 UTC
Then, if you call ERR_clear_error() before the FIPS_mode() call, you could just ignore the CRYPTO_R_FIPS_MODE_NOT_SUPPORTED error and raise an exception on any other unexpected error.

This would be probably the most correct thing. But as I said I'd leave such patch for upstream work and use the current patch for the blocker resolution.

Comment 47 errata-xmlrpc 2019-11-05 22:03:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:3520


Note You need to log in before you can comment on or make changes to this bug.