Bug 2122841 - Startup of bind as part of FreeIPA server deployment fails on crypto error with bind-9.18.6
Summary: Startup of bind as part of FreeIPA server deployment fails on crypto error wi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: bind
Version: rawhide
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Petr Menšík
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: openqa
Depends On: 2117859
Blocks: el8_bind_doh_idm
TreeView+ depends on / blocked
 
Reported: 2022-08-31 00:37 UTC by Adam Williamson
Modified: 2022-09-18 00:17 UTC (History)
11 users (show)

Fixed In Version: bind-9.18.6-3.fc38 bind-9.18.6-3.fc37
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-13 12:25:20 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openssl openssl issues 19102 0 None open Engine, provider and EVP_PKEY_fromdata_init 2022-08-31 10:37:24 UTC
Internet Systems Consortium (ISC) isc-projects bind9 merge_requests 5385 0 None merged Refactoring for OpenSSL 3.0.0 support 2022-09-02 12:03:39 UTC
Internet Systems Consortium (ISC) isc-projects bind9 merge_requests 6711 0 None opened Detect disabled algorithm also on Fedora 38+ 2022-08-31 11:42:45 UTC

Description Adam Williamson 2022-08-31 00:37:23 UTC
With today's update to bind-9.18.6 in Rawhide, FreeIPA server deployment fails. The system logs show bind failing in early startup on a crypto error:

    Aug 30 13:55:33 ipa002.test.openqa.fedoraproject.org named[8008]: starting BIND 9.18.6 (Stable Release) <id:>
    ....
    Aug 30 13:55:33 ipa002.test.openqa.fedoraproject.org named[8008]: adjusted limit on open files from 524288 to 1048576
    Aug 30 13:55:33 ipa002.test.openqa.fedoraproject.org named[8008]: found 2 CPUs, using 2 worker threads
    Aug 30 13:55:33 ipa002.test.openqa.fedoraproject.org named[8008]: using 2 UDP listeners per interface
    Aug 30 13:55:33 ipa002.test.openqa.fedoraproject.org named[8008]: EVP_PKEY_fromdata_init failed (crypto failure)
    Aug 30 13:55:33 ipa002.test.openqa.fedoraproject.org named[8008]: error:03000096:digital envelope routines::operation not supported for this keytype:crypto/evp/pmeth_gn.c:354:
    Aug 30 13:55:33 ipa002.test.openqa.fedoraproject.org named[8008]: initializing DST: crypto failure
    Aug 30 13:55:33 ipa002.test.openqa.fedoraproject.org named[8008]: exiting (due to fatal error)

This does not happen with the previous bind package, bind-9.18.5-2.fc38 . I've requested releng untag this update, otherwise all Rawhide update tests (and daily compose tests) will fail on this issue.

Comment 1 Alexander Bokovoy 2022-08-31 05:34:21 UTC
This looks like one of DSA/RSA/DH being disabled by the openssl 3.x and default cryptopolicies, thus denying access to the specific method.

Comment 2 Alexander Bokovoy 2022-08-31 05:47:15 UTC
The failure happens in one of the dst__openssl*_init() calls below:

isc_result_t
dst_lib_init(isc_mem_t *mctx, const char *engine) {
        isc_result_t result;

        REQUIRE(mctx != NULL);
        REQUIRE(!dst_initialized);

        UNUSED(engine);

        memset(dst_t_func, 0, sizeof(dst_t_func));
        RETERR(dst__hmacmd5_init(&dst_t_func[DST_ALG_HMACMD5]));
        RETERR(dst__hmacsha1_init(&dst_t_func[DST_ALG_HMACSHA1]));
        RETERR(dst__hmacsha224_init(&dst_t_func[DST_ALG_HMACSHA224]));
        RETERR(dst__hmacsha256_init(&dst_t_func[DST_ALG_HMACSHA256]));
        RETERR(dst__hmacsha384_init(&dst_t_func[DST_ALG_HMACSHA384]));
        RETERR(dst__hmacsha512_init(&dst_t_func[DST_ALG_HMACSHA512]));
        RETERR(dst__openssl_init(engine));
        RETERR(dst__openssldh_init(&dst_t_func[DST_ALG_DH]));
        RETERR(dst__opensslrsa_init(&dst_t_func[DST_ALG_RSASHA1],
                                    DST_ALG_RSASHA1));
        RETERR(dst__opensslrsa_init(&dst_t_func[DST_ALG_NSEC3RSASHA1],
                                    DST_ALG_NSEC3RSASHA1));
        RETERR(dst__opensslrsa_init(&dst_t_func[DST_ALG_RSASHA256],
                                    DST_ALG_RSASHA256));
        RETERR(dst__opensslrsa_init(&dst_t_func[DST_ALG_RSASHA512],
                                    DST_ALG_RSASHA512));
        RETERR(dst__opensslecdsa_init(&dst_t_func[DST_ALG_ECDSA256]));
        RETERR(dst__opensslecdsa_init(&dst_t_func[DST_ALG_ECDSA384]));
#ifdef HAVE_OPENSSL_ED25519
        RETERR(dst__openssleddsa_init(&dst_t_func[DST_ALG_ED25519]));
#endif /* ifdef HAVE_OPENSSL_ED25519 */
#ifdef HAVE_OPENSSL_ED448
        RETERR(dst__openssleddsa_init(&dst_t_func[DST_ALG_ED448]));
#endif /* ifdef HAVE_OPENSSL_ED448 */

#if HAVE_GSSAPI
        RETERR(dst__gssapi_init(&dst_t_func[DST_ALG_GSSAPI]));
#endif /* HAVE_GSSAPI */

        dst_initialized = true;
        return (ISC_R_SUCCESS);

out:
        /* avoid immediate crash! */
        dst_initialized = true;
        dst_lib_destroy();
        return (result);
}

Comment 3 Alexander Bokovoy 2022-08-31 05:51:04 UTC
Dmitriy, could you please help us here? Please see the bug itself.

Comment 4 Alexander Bokovoy 2022-08-31 06:04:31 UTC
I think it might be this part of bind code which attempts to recover from RSASHA1 and NSEC3RSASHA1 being disabled:

commit f3a0dac0573d21887ee0fa262b2c3a75466a538b
Author: Mark Andrews <marka>
Date:   Tue Mar 22 16:16:57 2022 +1100

    Check that we can verify a signature at initialisation time
    
    Fedora 33 doesn't support RSASHA1 in future mode.  There is no easy
    check for this other than by attempting to perform a verification
    using known good signatures.  We don't attempt to sign with RSASHA1
    as that would not work in FIPS mode.  RSASHA1 is verify only.
    
    The test vectors were generated using OpenSSL 3.0 and
    util/gen-rsa-sha-vectors.c.  Rerunning will generate a new set of
    test vectors as the private key is not preserved.
    
    e.g.
            cc util/gen-rsa-sha-vectors.c -I /opt/local/include \
                    -L /opt/local/lib -lcrypto
    
    (cherry picked from commit cd3f00874f63a50954cebb78edac8f580a27c0de)

....
....

 isc_result_t
 dst__opensslrsa_init(dst_func_t **funcp, unsigned char algorithm) {
+       isc_result_t result;
+
        REQUIRE(funcp != NULL);
 
-       UNUSED(algorithm);
+       result = check_algorithm(algorithm);
 
-       if (*funcp == NULL) {
-               *funcp = &opensslrsa_functions;
+       if (result == ISC_R_SUCCESS) {
+               if (*funcp == NULL) {
+                       *funcp = &opensslrsa_functions;
+               }
+       } else if (result == ISC_R_NOTIMPLEMENTED) {
+               result = ISC_R_SUCCESS;
        }
-       return (ISC_R_SUCCESS);
+
+       return (result);
 }

if check_algorithm() does not return ISC_R_NOTIMPLEMENTED or ISC_R_SUCCESS, we'd fail the whole initialization. In this case we get DST_R_OPENSSLFAILURE returned:

(in check_algorithm())
...

        status = EVP_PKEY_fromdata_init(ctx);
        if (status != 1) {
                DST_RET(dst__openssl_toresult2("EVP_PKEY_fromdata_init",
                                               DST_R_OPENSSLFAILURE));
        }
...

Comment 5 Alexander Bokovoy 2022-08-31 06:09:14 UTC
The code expects to call EVP_PKEY_fromdata_init() and only fail when doing an actual EVP operation:

[.. EVP initialization code above ..]

        /*
         * Check that we can verify the signature.
         */
        if (EVP_DigestInit_ex(evp_md_ctx, type, NULL) != 1 ||
            EVP_DigestUpdate(evp_md_ctx, "test", 4) != 1 ||
            EVP_VerifyFinal(evp_md_ctx, sig, len, pkey) != 1)
        {
                DST_RET(ISC_R_NOTIMPLEMENTED);
        }

So bind's check_algorithm() expects that EVP_PKEY_fromdata_init() would still work for an algorithm that would be blocked later by openssl. And openssl simply does not allow anymore to even initialize PKEY data for it.

I guess a simple fix would be to treat DST_R_OPENSSLFAILURE similar to ISC_R_NOTIMPLEMENTED in the dst__openssl*_init() functions (not only dst__opensslrsa_init()).

Comment 6 Dmitry Belyavskiy 2022-08-31 09:02:30 UTC
Looks like the error in the original log shouldn't be caused by SHA1 stuff.

error:03000096:digital envelope routines::operation not supported for this keytype:crypto/evp/pmeth_gn.c:354:

is probably related to lack of the necessary key management according to code

https://github.com/openssl/openssl/blob/56233ba8574c01b3912cf662335fedaabc7faec2/crypto/evp/pmeth_gn.c#L339-L356

Could you please provide more details about the algorithm causing failure?

Comment 7 Petr Menšík 2022-08-31 10:09:09 UTC
Oh, I did expect this change would improve things, not break them. I tried this change on RHEL9 before merging and it seemed it worked fine.

Is SHA1 disabled in rawhide already?

Comment 8 Dmitry Belyavskiy 2022-08-31 10:14:50 UTC
Yes, it is. But I suspect the problem is different.

I'd test if commenting pkcs11 engine out of the config (as a temporary workaround) will resolve the situation. If I understand correctly, the key you create is legacy one, so it doesn't have a keymgmt and EVP_PKE_fromdata_init fails. If I'm wrong, we need to investigate more. If I'm correct, we will have to deal with PKCS11 stuff somehow.

Comment 9 Alexander Bokovoy 2022-08-31 10:17:43 UTC
Well, we cannot disable PKCS11 engine because otherwise bind would not see the keys stored there by IPA DNSSEC helpers.

If you want to try, comment them out in /usr/share/ipa/bind.openssl.cryptopolicy.cnf.template before running ipa-server-install.

Comment 10 Dmitry Belyavskiy 2022-08-31 10:37:25 UTC
https://github.com/openssl/openssl/issues/19102 is the upstream issue. I kindly ask Alexander and Petr watch it.

Comment 11 Petr Menšík 2022-08-31 11:21:27 UTC
Thanks Dmitry for pointers.

I think it needs just mapping from OpenSSL error to BIND's internal error code, which can it map to intentionally disabled in crypto library.

Something similar existing entry in to_result() function [1]. Unless there is a need to change this behaviour in openssl, I think that can be fixed on bind component only. We just need to bind recognize that it was not runtime error in openssl, but signalling from OpenSSL policy this action is not (and not going to be) supported. Which is a purpose of check_algorithm function where if fails anyway.

1. https://github.com/isc-projects/bind9/blob/main/lib/dns/openssl_link.c#L135

Comment 12 Petr Menšík 2022-08-31 11:42:45 UTC
Just a draft with mapping current error code to ISC_R_DISABLED, which is in turn recognized by in check_algorithm function.

Comment 13 Dmitry Belyavskiy 2022-08-31 11:46:31 UTC
I'm afraid it's wrong decision - RSA keys are to work, we just can't check it this way...

Comment 14 Petr Menšík 2022-08-31 12:19:00 UTC
I have made a scratch build with comment #12 change. It seems it helps in my case.

Could you be more verbose why it is a bad decision? Does openssl reporting of disabled algorithm need to change?

https://koji.fedoraproject.org/koji/taskinfo?taskID=91464191

Comment 15 Dmitry Belyavskiy 2022-08-31 12:23:22 UTC
Upstream has introduced the change that works normally with providers but works bad when we have simultaneously use engine and provider. And if understand correctly you refuse using RSA keys according to this check. Am I wrong?

Comment 16 Petr Menšík 2022-08-31 14:17:55 UTC
No, of course not, that would not be acceptable. It tests each DNSSEC algorithm number. Hence it does not check RSA as a whole, but RSA algorithms 5,7,8,10. It should detect algorithms 5 and 7 disabled, but algorithms 8 and 10 has to be working in any case. If that were the result, then it needs indeed change in OpenSSL.

I verified it and you are correct. It disables all RSA algorithms this way, so it cannot validate anything, because the root key is RSA of course.

(gdb) p dst_t_func
$1 = {0x0, 0x0, 0x7f1ed0e181c0 <openssldh_functions>, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x7f1ed0e18100 <opensslecdsa_functions>, 
  0x7f1ed0e18100 <opensslecdsa_functions>, 0x7f1ed0e18040 <openssleddsa_functions>, 0x7f1ed0e18040 <openssleddsa_functions>, 0x0 <repeats 140 times>, 
  0x7f1ed0e17880 <hmacmd5_functions>, 0x0, 0x0, 0x7f1ed0e1a540 <gssapi_functions>, 0x7f1ed0e177c0 <hmacsha1_functions>, 
  0x7f1ed0e17700 <hmacsha224_functions>, 0x7f1ed0e17640 <hmacsha256_functions>, 0x7f1ed0e17580 <hmacsha384_functions>, 
  0x7f1ed0e174c0 <hmacsha512_functions>, 0x0 <repeats 90 times>}

(gdb) p dst_t_func[8]
$2 = (dst_func_t *) 0x0
(gdb) p dst_t_func[10]
$3 = (dst_func_t *) 0x0
(gdb) p dst_t_func[7]
$4 = (dst_func_t *) 0x0
(gdb) p dst_t_func[5]
$5 = (dst_func_t *) 0x0
(gdb) p dst_t_func[13]
$6 = (dst_func_t *) 0x7f1ed0e18100 <opensslecdsa_functions>
(gdb) p dst_t_func[12]
$7 = (dst_func_t *) 0x0
(gdb) p dst_t_func[11]
$8 = (dst_func_t *) 0x0
(gdb) p dst_t_func[15]
$9 = (dst_func_t *) 0x7f1ed0e18040 <openssleddsa_functions>

Comment 17 Adam Williamson 2022-08-31 15:29:47 UTC
Note, it seems this failure really happens only on Rawhide - the bind update for F37 passed tests:
https://openqa.fedoraproject.org/tests/overview?distri=fedora&version=37&groupid=2&build=Update-FEDORA-2022-710b831bc0
so some difference between Rawhide and F37 must be involved here. I'm not sure what, though. Could this actually be related to https://bugzilla.redhat.com/show_bug.cgi?id=2117859 - the bug in openssl-pkcs11-0.4.12-2 (which we have in Rawhide, but not F37)?

Comment 18 Petr Menšík 2022-08-31 16:30:35 UTC
(In reply to Adam Williamson from comment #17)
> Note, it seems this failure really happens only on Rawhide - the bind update
> for F37 passed tests:
> https://openqa.fedoraproject.org/tests/
> overview?distri=fedora&version=37&groupid=2&build=Update-FEDORA-2022-
> 710b831bc0
> so some difference between Rawhide and F37 must be involved here. I'm not
> sure what, though. Could this actually be related to
> https://bugzilla.redhat.com/show_bug.cgi?id=2117859 - the bug in
> openssl-pkcs11-0.4.12-2 (which we have in Rawhide, but not F37)?

I thought it would be difference in openssl, but found there is not yet openssl build for f38. So they share the same binary, which means there is no difference between f37 and rawhide. So yes, this remains as very good candidate of differences.

My f37 instance has openssl-pkcs11-0.4.12-1.fc37.x86_64
rawhide instance has openssl-pkcs11-0.4.12-2.fc37.x86_64

Comment 19 Adam Williamson 2022-08-31 16:46:18 UTC
yeah, that's basically intentional: we're holding the update out of f37 because of https://bugzilla.redhat.com/show_bug.cgi?id=2117859 .

What I can do later today is run a special test of the bind update for F37 with openssl-pkcs11-0.4.12-2.fc37 included, and see if that makes it fail. If it does, then that would definitely mean openssl-pkcs11-0.4.12-2.fc37 triggers this problem as well as 2117859.

Comment 20 Dmitry Belyavskiy 2022-08-31 17:05:06 UTC
I'm sorry but let me repeat. This issue may be caused by a specific change of openssl-pkcs11, but mixing providers and engines in OpenSSL 3.0 is a bad practice especially in case they implement the same algorithm.

Comment 21 Petr Menšík 2022-08-31 17:48:27 UTC
I have cloned bug #2123076 from RHEL9 to track OpenSSL provider support. But I doubt we would have it ready for Fedora 38, let alone for anything before it.

But I think bind's code does not attempt to mix providers and engines. It uses engine, sure. When it does, it stores all private keys in the engine and none should be stored the other way. But we need to keep FreeIPA working and PKCS11 provider is not there (yet) to satisfy those needs. If we can improve engine usage, then we will. But just general phrases "use provider" won't help much.

Comment 22 Petr Menšík 2022-08-31 17:50:58 UTC
I have tested downgrading to:

openssl-pkcs11-0.4.12-1.fc37.x86_64
bind-9.18.6-1.fc38.x86_64

named service at least starts fine with it. At first glance validation is working, even on SHA-1 based signatures. Marking bug #2117859 dependent.

Comment 23 Petr Menšík 2022-09-01 15:39:56 UTC
@jjelen just noticed whole support for PKCS11 engine has been removed. It has a bit misleading release notes link, but that were mentioned [1] and I have overlooked that.
That is explicitly removed in commit 60535fc5 [2].

[1] https://downloads.isc.org/isc/bind9/9.18.6/doc/arm/html/notes.html#removed-feature
[2] https://gitlab.isc.org/isc-projects/bind9/commit/60535fc5f7ccee58c641a96fe52d9b15c192698b

Comment 24 Jakub Jelen 2022-09-02 10:25:29 UTC
Thats the same link I mentioned to you in bug #2117342 and on IRC yesterday. So I am not sure what would be the next steps here or what is the question on me right now.

Shall we revert that commit in bind to keep bind working with pkcs11 engines? Shall we mark it unsupported? I think it is quite late for this drastic change.

Comment 25 Alexander Bokovoy 2022-09-02 10:39:21 UTC
I think we should revert it. Given that F36 with the same openssl version works fine for previous bind version, engine is working for us.
When openssl-pkcs11 provider will be ready, we can migrate to it.

Comment 26 Petr Menšík 2022-09-02 12:03:39 UTC
Adding a link to MR !5385, which contains the responsible commit.

Comment 27 Petr Menšík 2022-09-07 19:50:02 UTC
I have started experiments on branch engine_pkcs11-revert [1] to development release. It seems just reverting engine disabling is not enough. OpenSSL cannot cope with EVP_PKEY_fromdata initialization after engine were set and used. I have found multiple issues, but ended again on null ctx->keymgmt, which fails check_algorithm() check.

We found it should work fine with RSA_* calls as used in v9_16 branch. But no simple change led to code branch for OpenSSL >= 3.0 to work with engines.

1. https://gitlab.isc.org/pemensik/bind9/-/commits/feature/main/engine_pkcs11-revert

Comment 28 Petr Menšík 2022-09-09 13:10:53 UTC
I have build a test build [1], it seems it works better. I have built it also as a copr build on pemensik/bind repository [2]. After updating to that version, doing commands:
- rndc managed-keys destroy
- systemctl restart named

Then resolution started working. It should allow also passing of upstream system tests keyfromlabel and engine_pkcs11 when proper configuration environments are passed.

1. https://koji.fedoraproject.org/koji/taskinfo?taskID=91804993
2. https://copr.fedorainfracloud.org/coprs/pemensik/bind/

Comment 29 Adam Williamson 2022-09-10 20:54:36 UTC
Testing of that in openQA looks promising: with that build plus bind-dyndb-ldap-11.10-4.fc38 and the rest of Rawhide as it currently is, openQA tests passed even with dnssec enabled...

Comment 30 Petr Menšík 2022-09-13 10:23:10 UTC
I guess I will make builds of this into rawhide and fedora37. Even if upstream will make more changes to this, the basic seems to be the only working solution available. Of course excluding implementing real and working PKCS11 provider into openssl and starting using it.

This my change makes it use the same API as OpenSSL 1.1 builds, while still linking to OpenSSL 3.0. It is kind of hack, but I do not think we have other ready to work solution available.

Comment 31 Fedora Update System 2022-09-13 12:12:07 UTC
FEDORA-2022-0fea8abd6e has been submitted as an update to Fedora 38. https://bodhi.fedoraproject.org/updates/FEDORA-2022-0fea8abd6e

Comment 32 Fedora Update System 2022-09-13 12:25:20 UTC
FEDORA-2022-0fea8abd6e has been pushed to the Fedora 38 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 33 Fedora Update System 2022-09-13 13:49:13 UTC
FEDORA-2022-cbcb55d5c7 has been submitted as an update to Fedora 37. https://bodhi.fedoraproject.org/updates/FEDORA-2022-cbcb55d5c7

Comment 34 Fedora Update System 2022-09-14 01:52:28 UTC
FEDORA-2022-cbcb55d5c7 has been pushed to the Fedora 37 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2022-cbcb55d5c7`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2022-cbcb55d5c7

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 35 Fedora Update System 2022-09-18 00:17:31 UTC
FEDORA-2022-cbcb55d5c7 has been pushed to the Fedora 37 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.