Bug 2277997 - Using pkcs11-provider leads to crash (regression)
Summary: Using pkcs11-provider leads to crash (regression)
Keywords:
Status: ASSIGNED
Alias: None
Product: Fedora
Classification: Fedora
Component: bind9-next
Version: 42
Hardware: All
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Petr Menšík
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-04-30 14:37 UTC by Ondrej Moriš
Modified: 2025-02-26 13:01 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Fedora Package Sources bind9-next pull-request 7 0 None None None 2024-06-24 15:07:24 UTC
Internet Systems Consortium (ISC) isc-projects bind9 issues 4404 0 None closed Unexpected partial use of jemalloc by BIND (tested on 9.18.19-S1) 2024-06-12 14:28:14 UTC
Internet Systems Consortium (ISC) isc-projects bind9 issues 4764 0 None opened Regression of pkcs11-provider usage in dnsssec-keyfromlabel 2024-06-12 14:28:14 UTC

Description Ondrej Moriš 2024-04-30 14:37:34 UTC
We have an integration test between pkcs11-provider and bind. It worked fine with bind 9.19.21 but started failing once 9.12.22 was released to Fedora.

In the test we are using dnssec-keyfromlabel to gather the keys from the software token. This command start crashing as follows:

# dnssec-keyfromlabel -a RSASHA256 -l 'pkcs11:model=SoftHSM%20v2;manufacturer=SoftHSM%20project;serial=a54dd083bab6e1fa;token=softhsm;object=localhost-zsk;pin-value=123456' localhost
free(): invalid pointer
2098 Aborted                 (core dumped)

# coredumpctl info
           PID: 2104 (dnssec-keyfroml)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Tue 2024-04-30 10:22:53 EDT (3min 51s ago)
  Command Line: dnssec-keyfromlabel -a RSASHA256 -l $'pkcs11:model=SoftHSM%20v2;manufacturer=SoftHSM%20project;serial=a54dd083bab6e1fa;token=softhsm;object=lo>
    Executable: /usr/bin/dnssec-keyfromlabel
 Control Group: /user.slice/user-0.slice/session-4.scope
          Unit: session-4.scope
         Slice: user-0.slice
       Session: 4
     Owner UID: 0 (root)
       Boot ID: d01369d0db3b49ee9d0d8a8917f42f0a
    Machine ID: 65931729233e46f3b29865737b77c427
      Hostname: vm-10-0-186-123.hosted.upshift.rdu2.redhat.com
       Storage: /var/lib/systemd/coredump/core.dnssec-keyfroml.0.d01369d0db3b49ee9d0d8a8917f42f0a.2104.1714486973000000.zst (present)
  Size on Disk: 290.4K
       Package: bind9-next/9.19.22-1.fc41
      build-id: 1261b2883e741e59e129b4ff6eeb6668cd7490ff
       Message: Process 2104 (dnssec-keyfroml) of user 0 dumped core.

                Module libffi.so.8 from rpm libffi-3.4.6-1.fc41.x86_64
                Module p11-kit-client.so from rpm p11-kit-0.25.3-4.fc40.x86_64
                Module pkcs11.so from rpm pkcs11-provider-0.3-2.fc40.x86_64
                Module libpcre2-8.so.0 from rpm pcre2-10.43-1.fc41.x86_64
                Module libselinux.so.1 from rpm libselinux-3.6-4.fc40.x86_64
                Module libkeyutils.so.1 from rpm keyutils-1.6.3-3.fc40.x86_64
                Module libkrb5support.so.0 from rpm krb5-1.21.2-5.fc40.x86_64
                Module liblzma.so.5 from rpm xz-5.4.6-3.fc41.x86_64
                Module liblmdb.so.0.0.0 from rpm lmdb-0.9.32-1.fc40.x86_64
                Module libprotobuf-c.so.1 from rpm protobuf-c-1.5.0-3.fc40.x86_64
                Module libfstrm.so.0 from rpm fstrm-0.6.1-10.fc40.x86_64
                Module libmaxminddb.so.0 from rpm libmaxminddb-1.9.1-2.fc40.x86_64
                Module libcom_err.so.2 from rpm e2fsprogs-1.47.0-5.fc40.x86_64
                Module libk5crypto.so.3 from rpm krb5-1.21.2-5.fc40.x86_64
                Module libkrb5.so.3 from rpm krb5-1.21.2-5.fc40.x86_64
                Module libgssapi_krb5.so.2 from rpm krb5-1.21.2-5.fc40.x86_64
                Module liburcu-cds.so.8 from rpm userspace-rcu-0.14.0-4.fc40.x86_64
                Module liburcu-common.so.8 from rpm userspace-rcu-0.14.0-4.fc40.x86_64
                Module liburcu.so.8 from rpm userspace-rcu-0.14.0-4.fc40.x86_64
                Module libxml2.so.2 from rpm libxml2-2.12.6-1.fc41.x86_64
                Module libnghttp2.so.14 from rpm nghttp2-1.61.0-1.fc41.x86_64
                Module libjson-c.so.5 from rpm json-c-0.17-3.fc40.x86_64
                Module libz.so.1 from rpm zlib-ng-2.1.6-2.fc40.x86_64
                Module libcrypto.so.3 from rpm openssl-3.2.1-6.fc41.x86_64
                Module libssl.so.3 from rpm openssl-3.2.1-6.fc41.x86_64
                Module libuv.so.1 from rpm libuv-1.48.0-1.fc40.x86_64
                Module libjemalloc.so.2 from rpm jemalloc-5.3.0-6.fc40.x86_64
                Module libdns-9.19.22.so from rpm bind9-next-9.19.22-1.fc41.x86_64
                Module libisc-9.19.22.so from rpm bind9-next-9.19.22-1.fc41.x86_64
                Module dnssec-keyfromlabel from rpm bind9-next-9.19.22-1.fc41.x86_64
                Stack trace of thread 2104:
                #0  0x00007f1ab04aa184 __pthread_kill_implementation (libc.so.6 + 0x98184)
                #1  0x00007f1ab045269e raise (libc.so.6 + 0x4069e)
                #2  0x00007f1ab043a942 abort (libc.so.6 + 0x28942)
                #3  0x00007f1ab043b7a7 __libc_message_impl.cold (libc.so.6 + 0x297a7)
                #4  0x00007f1ab04b41b5 malloc_printerr (libc.so.6 + 0xa21b5)
                #5  0x00007f1ab04b654c _int_free (libc.so.6 + 0xa454c)
                #6  0x00007f1ab04b8e0e free (libc.so.6 + 0xa6e0e)
                #7  0x00007f1aaf29f5fa get_interface_inlock (p11-kit-client.so + 0x75fa)
                #8  0x00007f1aaf2aaf14 C_GetInterface (p11-kit-client.so + 0x12f14)
                #9  0x00007f1aaf3cbe79 p11prov_module_init (pkcs11.so + 0x13e79)
                #10 0x00007f1aaf3d6b71 p11prov_ctx_status (pkcs11.so + 0x1eb71)
                #11 0x00007f1aaf3dd590 OSSL_provider_init (pkcs11.so + 0x25590)
                #12 0x00007f1aaff81a62 provider_activate (libcrypto.so.3 + 0x181a62)
                #13 0x00007f1aaff81f8f ossl_provider_activate (libcrypto.so.3 + 0x181f8f)
                #14 0x00007f1aaff832cf provider_conf_activate (libcrypto.so.3 + 0x1832cf)
                #15 0x00007f1aaff83651 provider_conf_load (libcrypto.so.3 + 0x183651)
                #16 0x00007f1aaff8386b provider_conf_init.lto_priv.0 (libcrypto.so.3 + 0x18386b)
                #17 0x00007f1aafebbe65 CONF_modules_load (libcrypto.so.3 + 0xbbe65)
                #18 0x00007f1aafebc408 CONF_modules_load_file_ex (libcrypto.so.3 + 0xbc408)
                #19 0x00007f1aaff72398 ossl_init_config_ossl_ (libcrypto.so.3 + 0x172398)
                #20 0x00007f1ab04ad55b __pthread_once_slow.isra.0 (libc.so.6 + 0x9b55b)
                #21 0x00007f1ab04ad5c9 pthread_once.5 (libc.so.6 + 0x9b5c9)
                #22 0x00007f1aaff7edad CRYPTO_THREAD_run_once (libcrypto.so.3 + 0x17edad)
                #23 0x00007f1aaff72e6b OPENSSL_init_crypto (libcrypto.so.3 + 0x172e6b)

Reproducible: Always

Steps to Reproduce:
1. Clone pkcs11-provider github, it is only needed to get the test.

   # git clone https://github.com/latchset/pkcs11-provider.git

2. Change directory to pkcs11-provider/tests/integration

   # cd pkcs11-provider/tests/integration/

3. Execute bind integration test:

   # bash -x bind.sh

   Test does the following:

   * install all the dependencies
   * setup softhsm token
   * setup p11kit server as a proxy to the sofhsm token
   * get the latest pkcs11-provider, build it and install it
   * create testing openssl configuration
   * setup bind
   * attempt to get the keys from the token and use them to sign the zone

To create the test we followed https://bind9.readthedocs.io/en/latest/chapter5.html.
Actual Results:  
Unable to get the keys from the token (crash).

Expected Results:  
(this is how it works with bind9-next-9.19.21-1.fc41)

# # Test 1: Extract KSK and ZSK keys from PKCS11 URIs

# OPENSSL_CONF=openssl.cnf dnssec-keyfromlabel -a RSASHA256 -l 'pkcs11:model=SoftHSM%20v2;manufacturer=SoftHSM%20project;serial=782800c9320b4da1;token=softhsm;object=localhost-zsk;pin-value=123456' localhost
Klocalhost.+008+59760

# OPENSSL_CONF=openssl.cnf dnssec-keyfromlabel -a RSASHA256 -l 'pkcs11:model=SoftHSM%20v2;manufacturer=SoftHSM%20project;serial=782800c9320b4da1;token=softhsm;object=localhost-ksk;pin-value=123456' -f KSK localhost

# # Test 2: Sign zone

# OPENSSL_CONF=openssl.cnf dnssec-signzone -o localhost localhost
Verifying the zone using the following algorithms:
- RSASHA256
Zone fully signed:
Algorithm: RSASHA256: KSKs: 1 active, 0 stand-by, 0 revoked
                      ZSKs: 1 active, 0 stand-by, 0 revoked
localhost.signed

Comment 1 Petr Menšík 2024-06-07 16:34:55 UTC
It seems still present with 9.19.24

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at pthread_kill.c:44
#1  0x00007f931a8aa1f3 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2  0x00007f931a85269e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f931a83a942 in __GI_abort () at abort.c:79
#4  0x00007f931a83b7a7 in __libc_message_impl (fmt=fmt@entry=0x7f931a9c22e8 "%s\n")
    at ../sysdeps/posix/libc_fatal.c:132
#5  0x00007f931a8b41b5 in malloc_printerr (str=str@entry=0x7f931a9c0062 "free(): invalid pointer") at malloc.c:5772
#6  0x00007f931a8b654c in _int_free (av=<optimized out>, p=p@entry=0x7f9319a0d810, have_lock=have_lock@entry=0)
    at malloc.c:4507
#7  0x00007f931a8b8e0e in __GI___libc_free (mem=0x7f9319a0d820) at malloc.c:3398
#8  0x00007f93194e75fa in get_interface_inlock (interface=interface@entry=0x7fffd9421998, 
    version=version@entry=0x7fffd94219a6, flags=flags@entry=0) at ../p11-kit/client.c:183
#9  0x00007f93194f2f14 in C_GetInterface (pInterfaceName=<optimized out>, pVersion=<optimized out>, 
    ppInterface=0x7fffd9421998, flags=0) at ../p11-kit/client.c:274
#10 0x00007f9319f37644 in p11prov_interface_init (mctx=0x7f9319a09240) at ../src/interface.c:178
#11 0x00007f9319f37b00 in p11prov_module_init (mctx=0x7f9319a09240) at ../src/interface.c:302
#12 0x00007f9319f442fa in p11prov_ctx_status (ctx=0x7f9319a66540) at ../src/provider.c:474
#13 0x00007f9319f4a58f in OSSL_provider_init (handle=0x7f9319a66380, in=0x7f931a69f300 <core_dispatch_>, 
    out=0x7fffd9421da8, provctx=0x7fffd9421db0) at ../src/provider.c:1613
#14 0x00007f931a381aa2 in provider_init (prov=0x7f9319a66380) at crypto/provider_core.c:964
#15 provider_activate (prov=prov@entry=0x7f9319a66380, lock=0, lock@entry=1, upcalls=upcalls@entry=1)
    at crypto/provider_core.c:1173
#16 0x00007f931a381fcf in ossl_provider_activate (prov=prov@entry=0x7f9319a66380, upcalls=upcalls@entry=1, 
    aschild=aschild@entry=0) at crypto/provider_core.c:1310
#17 0x00007f931a3832bf in provider_conf_activate (libctx=libctx@entry=0x7f931a6a8c60 <default_context_int.lto_priv>, 
    name=name@entry=0x7f9319a20038 "pkcs11", value=value@entry=0x7f9319a52ba0 "pkcs11_sect", 
    path=path@entry=0x7f9319a1d1b0 "/usr/lib64/ossl-modules/pkcs11.so", soft=soft@entry=0, 
    cnf=cnf@entry=0x7f9319a1d180) at crypto/provider_conf.c:245
#18 0x00007f931a3836a1 in provider_conf_load (libctx=0x7f931a6a8c60 <default_context_int.lto_priv>, 
    name=<optimized out>, name@entry=0x7f9319a20038 "pkcs11", value=value@entry=0x7f9319a52ba0 "pkcs11_sect", 
    cnf=cnf@entry=0x7f9319a1d180) at crypto/provider_conf.c:325
#19 0x00007f931a383d6b in provider_conf_init (md=<optimized out>, cnf=0x7f9319a1d180) at crypto/provider_conf.c:381
#20 0x00007f931a2bbe45 in module_init (pmod=0x7f9319a1d4e0, name=0x7f9319a08110 "providers", 
    value=0x7f9319a52940 "provider_sect", cnf=<optimized out>) at crypto/conf/conf_mod.c:425
#21 module_run (cnf=<optimized out>, name=0x7f9319a08110 "providers", value=<optimized out>, flags=<optimized out>)
    at crypto/conf/conf_mod.c:272
#22 CONF_modules_load (cnf=cnf@entry=0x7f9319a1d180, appname=appname@entry=0x0, flags=32, flags@entry=50)
    at crypto/conf/conf_mod.c:165
#23 0x00007f931a2bc3e8 in CONF_modules_load_file_ex (libctx=<optimized out>, filename=0x0, appname=0x0, flags=50)
    at crypto/conf/conf_mod.c:214
#24 0x00007f931a3723d8 in ossl_config_int (settings=0x0) at crypto/conf/conf_sap.c:68
#25 ossl_config_int (settings=0x0) at crypto/conf/conf_sap.c:44
#26 ossl_init_config () at crypto/init.c:256
#27 ossl_init_config_ossl_ () at crypto/init.c:254
#28 0x00007f931a8ad55b in __pthread_once_slow (once_control=0x7f931a6a950c <config>, 
    init_routine=0x7f931a3723b0 <ossl_init_config_ossl_>) at pthread_once.c:116
#29 0x00007f931a8ad5c9 in ___pthread_once (once_control=<optimized out>, init_routine=<optimized out>)
    at pthread_once.c:143
#30 0x00007f931a37eded in CRYPTO_THREAD_run_once (once=<optimized out>, init=<optimized out>)
    at crypto/threads_pthread.c:152
#31 0x00007f931a372eab in OPENSSL_init_crypto (opts=opts@entry=554572, settings=settings@entry=0x0)
    at crypto/init.c:599
#32 0x00007f931b0e790d in OPENSSL_init_ssl (opts=554572, opts@entry=554560, settings=settings@entry=0x0)
    at ssl/ssl_init.c:115
#33 0x00007f931b275294 in isc__tls_initialize () at ../../../lib/isc/tls.c:177
#34 0x00007f931b22f79c in isc__initialize () at ../../../lib/isc/lib.c:50
#35 0x00007f931b2b6427 in call_init (l=<optimized out>, argc=4, argv=0x7fffd9422238, env=0x7fffd9422260)
    at dl-init.c:74
#36 call_init (l=<optimized out>, argc=4, argv=0x7fffd9422238, env=0x7fffd9422260) at dl-init.c:26
#37 0x00007f931b2b651d in _dl_init (main_map=0x7f931b2e82e0, argc=4, argv=0x7fffd9422238, env=0x7fffd9422260)
    at dl-init.c:121
#38 0x00007f931b2ce420 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#39 0x0000000000000004 in ?? ()
#40 0x00007fffd942338c in ?? ()
#41 0x00007fffd942339c in ?? ()
#42 0x00007fffd942339f in ?? ()
#43 0x00007fffd94233a9 in ?? ()
#44 0x0000000000000000 in ?? ()

Comment 2 Simo Sorce 2024-06-07 16:55:56 UTC
Looks like p11-kit is crashing here.
Can you try with a pkcs11-provider coniguration that explicitly load softhsm directly and see if that mitigates the issue?

You can do that by setting this env variable before running bind:
PKCS11_PROVIDER_MODULE="/usr/lib64/pkcs11/libsofthsm2.so"

Comment 3 Petr Menšík 2024-06-07 16:59:54 UTC
This is super strange. It does not happen anywhere in bind code, but even long before it hits main function. That is why ?? is on bottom of stack.

Local exec file:
	`/usr/bin/dnssec-keyfromlabel', file type elf64-x86-64.

(gdb) show args
Argument list to give program being debugged when it is started is "-h".
(gdb) run
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at pthread_kill.c:44
#1  0x00007ffff76aa1f3 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2  0x00007ffff765269e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff763a942 in __GI_abort () at abort.c:79
#4  0x00007ffff763b7a7 in __libc_message_impl (fmt=fmt@entry=0x7ffff77c22e8 "%s\n")
    at ../sysdeps/posix/libc_fatal.c:132
#5  0x00007ffff76b41b5 in malloc_printerr (str=str@entry=0x7ffff77c0062 "free(): invalid pointer") at malloc.c:5772
#6  0x00007ffff76b654c in _int_free (av=<optimized out>, p=p@entry=0x7ffff680d810, have_lock=have_lock@entry=0)
    at malloc.c:4507
#7  0x00007ffff76b8e0e in __GI___libc_free (mem=0x7ffff680d820) at malloc.c:3398
#8  0x00007ffff62e75fa in get_interface_inlock (interface=interface@entry=0x7fffffffd7f8, 
    version=version@entry=0x7fffffffd806, flags=flags@entry=0) at ../p11-kit/client.c:183
#9  0x00007ffff62f2f14 in C_GetInterface (pInterfaceName=<optimized out>, pVersion=<optimized out>, 
    ppInterface=0x7fffffffd7f8, flags=0) at ../p11-kit/client.c:274
#10 0x00007ffff6c48644 in p11prov_interface_init (mctx=0x7ffff6809240) at ../src/interface.c:178
#11 0x00007ffff6c48b00 in p11prov_module_init (mctx=0x7ffff6809240) at ../src/interface.c:302
#12 0x00007ffff6c552fa in p11prov_ctx_status (ctx=0x7ffff6866540) at ../src/provider.c:474
#13 0x00007ffff6c5b58f in OSSL_provider_init (handle=0x7ffff6866380, in=0x7ffff749f300 <core_dispatch_>, 
    out=0x7fffffffdc08, provctx=0x7fffffffdc10) at ../src/provider.c:1613
#14 0x00007ffff7181aa2 in provider_init (prov=0x7ffff6866380) at crypto/provider_core.c:964
#15 provider_activate (prov=prov@entry=0x7ffff6866380, lock=0, lock@entry=1, upcalls=upcalls@entry=1)
    at crypto/provider_core.c:1173
#16 0x00007ffff7181fcf in ossl_provider_activate (prov=prov@entry=0x7ffff6866380, upcalls=upcalls@entry=1, 
    aschild=aschild@entry=0) at crypto/provider_core.c:1310
#17 0x00007ffff71832bf in provider_conf_activate (libctx=libctx@entry=0x7ffff74a8c60 <default_context_int.lto_priv>, 
    name=name@entry=0x7ffff6820038 "pkcs11", value=value@entry=0x7ffff6852ba0 "pkcs11_sect", 
    path=path@entry=0x7ffff681d1b0 "/usr/lib64/ossl-modules/pkcs11.so", soft=soft@entry=0, 
    cnf=cnf@entry=0x7ffff681d180) at crypto/provider_conf.c:245
#18 0x00007ffff71836a1 in provider_conf_load (libctx=0x7ffff74a8c60 <default_context_int.lto_priv>, 
    name=<optimized out>, name@entry=0x7ffff6820038 "pkcs11", value=value@entry=0x7ffff6852ba0 "pkcs11_sect", 
    cnf=cnf@entry=0x7ffff681d180) at crypto/provider_conf.c:325
#19 0x00007ffff7183d6b in provider_conf_init (md=<optimized out>, cnf=0x7ffff681d180) at crypto/provider_conf.c:381
#20 0x00007ffff70bbe45 in module_init (pmod=0x7ffff681d4e0, name=0x7ffff6808110 "providers", 
    value=0x7ffff6852940 "provider_sect", cnf=<optimized out>) at crypto/conf/conf_mod.c:425
#21 module_run (cnf=<optimized out>, name=0x7ffff6808110 "providers", value=<optimized out>, flags=<optimized out>)
    at crypto/conf/conf_mod.c:272
#22 CONF_modules_load (cnf=cnf@entry=0x7ffff681d180, appname=appname@entry=0x0, flags=32, flags@entry=50)
    at crypto/conf/conf_mod.c:165
#23 0x00007ffff70bc3e8 in CONF_modules_load_file_ex (libctx=<optimized out>, filename=0x0, appname=0x0, flags=50)
    at crypto/conf/conf_mod.c:214
#24 0x00007ffff71723d8 in ossl_config_int (settings=0x0) at crypto/conf/conf_sap.c:68
#25 ossl_config_int (settings=0x0) at crypto/conf/conf_sap.c:44
#26 ossl_init_config () at crypto/init.c:256
#27 ossl_init_config_ossl_ () at crypto/init.c:254
#28 0x00007ffff76ad55b in __pthread_once_slow (once_control=0x7ffff74a950c <config>, 
    init_routine=0x7ffff71723b0 <ossl_init_config_ossl_>) at pthread_once.c:116
#29 0x00007ffff76ad5c9 in ___pthread_once (once_control=<optimized out>, init_routine=<optimized out>)
    at pthread_once.c:143
#30 0x00007ffff717eded in CRYPTO_THREAD_run_once (once=<optimized out>, init=<optimized out>)
    at crypto/threads_pthread.c:152
#31 0x00007ffff7172eab in OPENSSL_init_crypto (opts=opts@entry=554572, settings=settings@entry=0x0)
    at crypto/init.c:599
#32 0x00007ffff7b5190d in OPENSSL_init_ssl (opts=554572, opts@entry=554560, settings=settings@entry=0x0)
    at ssl/ssl_init.c:115
#33 0x00007ffff7f8b294 in isc__tls_initialize () at ../../../lib/isc/tls.c:177
#34 0x00007ffff7f4579c in isc__initialize () at ../../../lib/isc/lib.c:50
#35 0x00007ffff7fcc427 in call_init (l=<optimized out>, argc=2, argv=0x7fffffffe098, env=0x7fffffffe0b0)
    at dl-init.c:74
#36 call_init (l=<optimized out>, argc=2, argv=0x7fffffffe098, env=0x7fffffffe0b0) at dl-init.c:26
#37 0x00007ffff7fcc51d in _dl_init (main_map=0x7ffff7ffe2e0, argc=2, argv=0x7fffffffe098, env=0x7fffffffe0b0)
    at dl-init.c:121
#38 0x00007ffff7fe4420 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#39 0x0000000000000002 in ?? ()
#40 0x00007fffffffe383 in ?? ()
#41 0x00007fffffffe3a0 in ?? ()
#42 0x0000000000000000 in ?? ()

Comment 4 Petr Menšík 2024-06-07 17:03:04 UTC
It crashes even with softhsm directly.
PKCS11_PROVIDER_MODULE=/usr/lib64/pkcs11/libsofthsm2.so gdb --args dnssec-keyfromlabel -h 

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at pthread_kill.c:44
#1  0x00007ffff76aa1f3 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2  0x00007ffff765269e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff763a942 in __GI_abort () at abort.c:79
#4  0x00007ffff763b7a7 in __libc_message_impl (fmt=fmt@entry=0x7ffff77c22e8 "%s\n")
    at ../sysdeps/posix/libc_fatal.c:132
#5  0x00007ffff76b41b5 in malloc_printerr (str=str@entry=0x7ffff77c0062 "free(): invalid pointer") at malloc.c:5772
#6  0x00007ffff76b654c in _int_free (av=<optimized out>, p=p@entry=0x7ffff686f710, have_lock=have_lock@entry=0)
    at malloc.c:4507
#7  0x00007ffff76b8e0e in __GI___libc_free (mem=0x7ffff686f720) at malloc.c:3398
#8  0x00007ffff638ca22 in SimpleConfigLoader::loadConfiguration (this=0x555555572610) at SimpleConfigLoader.cpp:88
#9  0x00007ffff638b5a3 in Configuration::reload (this=this@entry=0x555555572560) at Configuration.cpp:161
#10 0x00007ffff638b600 in Configuration::reload (this=this@entry=0x555555572560, inConfigLoader=<optimized out>)
    at Configuration.cpp:176
#11 0x00007ffff637452c in SoftHSM::C_Initialize (this=0x555555572330, pInitArgs=pInitArgs@entry=0x7fffffffd850)
    at SoftHSM.cpp:530
#12 0x00007ffff633e9ac in C_Initialize (pInitArgs=0x7fffffffd850) at main.cpp:137
#13 0x00007ffff6c41fe3 in p11prov_Initialize (ctx=0x7ffff6866540, pInitArgs=0x7fffffffd850)
    at ../src/interface.gen.c:18
#14 0x00007ffff6c48b39 in p11prov_module_init (mctx=0x7ffff6809240) at ../src/interface.c:309
#15 0x00007ffff6c552fa in p11prov_ctx_status (ctx=0x7ffff6866540) at ../src/provider.c:474
#16 0x00007ffff6c5b58f in OSSL_provider_init (handle=0x7ffff6866380, in=0x7ffff749f300 <core_dispatch_>, 
    out=0x7fffffffdc08, provctx=0x7fffffffdc10) at ../src/provider.c:1613
#17 0x00007ffff7181aa2 in provider_init (prov=0x7ffff6866380) at crypto/provider_core.c:964
#18 provider_activate (prov=prov@entry=0x7ffff6866380, lock=0, lock@entry=1, upcalls=upcalls@entry=1)
    at crypto/provider_core.c:1173
#19 0x00007ffff7181fcf in ossl_provider_activate (prov=prov@entry=0x7ffff6866380, upcalls=upcalls@entry=1, 
    aschild=aschild@entry=0) at crypto/provider_core.c:1310
#20 0x00007ffff71832bf in provider_conf_activate (libctx=libctx@entry=0x7ffff74a8c60 <default_context_int.lto_priv>, 
    name=name@entry=0x7ffff6820038 "pkcs11", value=value@entry=0x7ffff6852ba0 "pkcs11_sect", 
    path=path@entry=0x7ffff681d1b0 "/usr/lib64/ossl-modules/pkcs11.so", soft=soft@entry=0, 
    cnf=cnf@entry=0x7ffff681d180) at crypto/provider_conf.c:245
#21 0x00007ffff71836a1 in provider_conf_load (libctx=0x7ffff74a8c60 <default_context_int.lto_priv>, 
    name=<optimized out>, name@entry=0x7ffff6820038 "pkcs11", value=value@entry=0x7ffff6852ba0 "pkcs11_sect", 
    cnf=cnf@entry=0x7ffff681d180) at crypto/provider_conf.c:325
#22 0x00007ffff7183d6b in provider_conf_init (md=<optimized out>, cnf=0x7ffff681d180) at crypto/provider_conf.c:381
#23 0x00007ffff70bbe45 in module_init (pmod=0x7ffff681d4e0, name=0x7ffff6808110 "providers", 
    value=0x7ffff6852940 "provider_sect", cnf=<optimized out>) at crypto/conf/conf_mod.c:425
#24 module_run (cnf=<optimized out>, name=0x7ffff6808110 "providers", value=<optimized out>, flags=<optimized out>)
    at crypto/conf/conf_mod.c:272
#25 CONF_modules_load (cnf=cnf@entry=0x7ffff681d180, appname=appname@entry=0x0, flags=32, flags@entry=50)
    at crypto/conf/conf_mod.c:165
#26 0x00007ffff70bc3e8 in CONF_modules_load_file_ex (libctx=<optimized out>, filename=0x0, appname=0x0, flags=50)
    at crypto/conf/conf_mod.c:214
#27 0x00007ffff71723d8 in ossl_config_int (settings=0x0) at crypto/conf/conf_sap.c:68
#28 ossl_config_int (settings=0x0) at crypto/conf/conf_sap.c:44
#29 ossl_init_config () at crypto/init.c:256
#30 ossl_init_config_ossl_ () at crypto/init.c:254
#31 0x00007ffff76ad55b in __pthread_once_slow (once_control=0x7ffff74a950c <config>, 
    init_routine=0x7ffff71723b0 <ossl_init_config_ossl_>) at pthread_once.c:116
#32 0x00007ffff76ad5c9 in ___pthread_once (once_control=<optimized out>, init_routine=<optimized out>)
    at pthread_once.c:143
#33 0x00007ffff717eded in CRYPTO_THREAD_run_once (once=<optimized out>, init=<optimized out>)
    at crypto/threads_pthread.c:152
#34 0x00007ffff7172eab in OPENSSL_init_crypto (opts=opts@entry=554572, settings=settings@entry=0x0)
    at crypto/init.c:599
#35 0x00007ffff7b5190d in OPENSSL_init_ssl (opts=554572, opts@entry=554560, settings=settings@entry=0x0)
    at ssl/ssl_init.c:115
#36 0x00007ffff7f8b294 in isc__tls_initialize () at ../../../lib/isc/tls.c:177
#37 0x00007ffff7f4579c in isc__initialize () at ../../../lib/isc/lib.c:50
#38 0x00007ffff7fcc427 in call_init (l=<optimized out>, argc=2, argv=0x7fffffffe098, env=0x7fffffffe0b0)
    at dl-init.c:74
#39 call_init (l=<optimized out>, argc=2, argv=0x7fffffffe098, env=0x7fffffffe0b0) at dl-init.c:26
#40 0x00007ffff7fcc51d in _dl_init (main_map=0x7ffff7ffe2e0, argc=2, argv=0x7fffffffe098, env=0x7fffffffe0b0)
    at dl-init.c:121
#41 0x00007ffff7fe4420 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#42 0x0000000000000002 in ?? ()
#43 0x00007fffffffe386 in ?? ()
#44 0x00007fffffffe3a3 in ?? ()
#45 0x0000000000000000 in ?? ()

Comment 5 Petr Menšík 2024-06-07 17:13:34 UTC
Could it be caused by jemalloc linked in?

ISC code is somehow referenced, but not from normal main function.

#36 0x00007ffff7f8b294 in isc__tls_initialize () at ../../../lib/isc/tls.c:177
#37 0x00007ffff7f4579c in isc__initialize () at ../../../lib/isc/lib.c:50

PKCS11_PROVIDER_MODULE=/dev/null

Not sure what hacks are responsible for initialize called before main is even hit. I can just assume this might be linked to -flto flag.

Breakpoint 1, isc__initialize () at ../../../lib/isc/lib.c:46
46	isc__initialize(void) {
(gdb) bt
#0  isc__initialize () at ../../../lib/isc/lib.c:46
#1  0x00007ffff7fcc427 in call_init (l=<optimized out>, argc=2, argv=0x7fffffffe0a8, env=0x7fffffffe0c0)
    at dl-init.c:74
#2  call_init (l=<optimized out>, argc=2, argv=0x7fffffffe0a8, env=0x7fffffffe0c0) at dl-init.c:26
#3  0x00007ffff7fcc51d in _dl_init (main_map=0x7ffff7ffe2e0, argc=2, argv=0x7fffffffe0a8, env=0x7fffffffe0c0)
    at dl-init.c:121
#4  0x00007ffff7fe4420 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#5  0x0000000000000002 in ?? ()
#6  0x00007fffffffe39d in ?? ()
#7  0x00007fffffffe3ba in ?? ()
#8  0x0000000000000000 in ?? ()
(gdb) cont
Continuing.

Breakpoint 2, main (argc=2, argv=0x7fffffffe0a8) at ../../../bin/dnssec/dnssec-keyfromlabel.c:109
109	main(int argc, char **argv) {
(gdb) bt
#0  main (argc=2, argv=0x7fffffffe0a8) at ../../../bin/dnssec/dnssec-keyfromlabel.c:109
(gdb) cont

Comment 6 Simo Sorce 2024-06-07 18:13:43 UTC
Is any of the ISC code using ELF constructors/destructors like __attribute__((constructor)) or .init/.fini ?
Those would run before main() is invoked (by design).
However it is not immediately clear to me why that would cause a fault unless, once again, softhsm being based on .cpp has issues because its code depends on cpp constructors to be run before it is initialized and such constructors have not run yet because they have been pre-empted by an ISC library constructor.

If ISC code uses constructors I would definitely see if that feature can be disabled, or if we can ensure it does not try to initialize the whole tls stack. That stack should be explicitly initialized in the main function, or implicitly initialized later at first time of use.

Comment 7 Petr Menšík 2024-06-07 18:43:21 UTC
lib/isc/include/isc/util.h:

#if HAVE_FUNC_ATTRIBUTE_CONSTRUCTOR && HAVE_FUNC_ATTRIBUTE_DESTRUCTOR
#define ISC_CONSTRUCTOR __attribute__((constructor))
#define ISC_DESTRUCTOR	__attribute__((destructor))
#else
#define ISC_CONSTRUCTOR
#define ISC_DESTRUCTOR
#endif


lib/isc/lib.c:

void
isc__initialize(void) ISC_CONSTRUCTOR;
void
isc__shutdown(void) ISC_DESTRUCTOR;

This attribute is probably responsible for early calls. Somehow I doubt it were meant for such complex initialization, including loading of multiple plugins in it. I just assume some libraries rely on similar initialization, but this is not ordered explicitly. It looks fragile and no wonder series of plugins fail to cope with it properly.

isc__initialize(void); is never called explicitly, but relies on constructor of library loader. But that is there quite long. But so far I have failed to identify any change, which would look related. It does not seem supported to handle it different way later on demand.

https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/8576 has made changes to tls parameters and crashes happen in tls initialization. But at least from looking at them, they should not be related.

Comment 8 Simo Sorce 2024-06-07 18:57:39 UTC
What library needs this?
Why not just call isc__initialize(void) from the daemon's main() process?

Comment 9 Simo Sorce 2024-06-07 18:59:30 UTC
or have a global lazy variable with a lock and initialize on demand on first use, library constructors are very tricky and generally a bad idea for this kind of initialization, good libraries have a library_init() (and a library_fini()) function applications can call explicitly.

Comment 10 Petr Menšík 2024-06-07 19:34:43 UTC
something like that used to be present still in 9.16. There were always strange issues with various openssl modules changing this and that initialization or openssl destruction.

Very similar crash happens also with bind-dnssec-utils-9.18.27-1.fc39.x86_64 on normal bind package.

Comment 11 Petr Menšík 2024-06-07 21:05:00 UTC
I have tried compilation of main branch, v9.19.21 and v9.19.19 tag. They all fail with configured environment by test the same way. Are you sure the change causing crashes were working in 9.19.21 version? Could have something else changed in addition? But I admit the same test on f41 rawhide passes with bind9-next-dnssec-utils-9.19.21-1.fc41.x86_64 and bind9-next-dnssec-utils-9.19.19-3.fc40.x86_64.

On my fedora39, with the same p11-kit, I have prepared for bisecting bind code. But even version which works from package still fails on my machine, compiled from source. With basically identical failure. Could it be change in compiler or linker perhaps, triggering this?

I admit debugger backtrace is not much helping, as the freed address is optimized out since it were passed to state->rpc = p11_rpc_transport_new (&state->virt, address, "client"); at ./p11-kit/client.c:141. It returns again to the same line again, but address is optimized out at that time, until it should be freed. Which makes it bit more probable there is something weird with memory functions in binary.

It I am not sure. It seems memory allocators are clashing in unexpected ways.

Comment 12 Petr Menšík 2024-06-07 21:31:44 UTC
Oh crap. At least my machine issues happen, because malloc is used from jemalloc library, but free is used from normal libc. And then is no wonder it does not recognize the pointer.

Breakpoint 2.1, __GI___strdup (s=0x7fffffffcc60 "/home/pemensik/.config/softhsm2/softhsm2.conf") at strdup.c:40
40	{
(gdb) info symbol __GI___strdup 
strdup in section .text of /lib64/libc.so.6


(gdb) bt 
#0  malloc (size=size@entry=46) at src/jemalloc.c:2745
#1  0x00007ffff6ec360f in __GI___strdup (s=0x7fffffffcc60 "/home/pemensik/.config/softhsm2/softhsm2.conf")
    at strdup.c:42
#2  0x00007ffff6c7b60d in get_user_path () at SimpleConfigLoader.cpp:244
#3  0x00007ffff6c7c0a5 in SimpleConfigLoader::getConfigPath (this=<optimized out>) at SimpleConfigLoader.cpp:304
#4  SimpleConfigLoader::loadConfiguration (this=0x41d610) at SimpleConfigLoader.cpp:78
#5  0x00007ffff6c7a5a3 in Configuration::reload (this=this@entry=0x41d560) at Configuration.cpp:161
#6  0x00007ffff6c7a600 in Configuration::reload (this=this@entry=0x41d560, inConfigLoader=<optimized out>)
    at Configuration.cpp:176
#7  0x00007ffff6c6352c in SoftHSM::C_Initialize (this=0x41d330, pInitArgs=pInitArgs@entry=0x7fffffffd610)
    at SoftHSM.cpp:530
#8  0x00007ffff6c2d9ac in C_Initialize (pInitArgs=0x7fffffffd610) at main.cpp:137
#9  0x00007ffff6d09fe5 in p11prov_Initialize (ctx=0x7ffff685b540, pInitArgs=0x7fffffffd610)
    at ../src/interface.gen.c:18
(gdb) info symbol malloc 
malloc in section .text of /lib64/libjemalloc.so.2



Breakpoint 3.3, __GI___libc_free (mem=0x7ffff684c7e0) at malloc.c:3352
3352	{
(gdb) bt
#0  __GI___libc_free (mem=0x7ffff684c7e0) at malloc.c:3352
#1  0x00007ffff6c7ba22 in SimpleConfigLoader::loadConfiguration (this=0x41d610) at SimpleConfigLoader.cpp:88
#2  0x00007ffff6c7a5a3 in Configuration::reload (this=this@entry=0x41d560) at Configuration.cpp:161
#3  0x00007ffff6c7a600 in Configuration::reload (this=this@entry=0x41d560, inConfigLoader=<optimized out>)
    at Configuration.cpp:176
#4  0x00007ffff6c6352c in SoftHSM::C_Initialize (this=0x41d330, pInitArgs=pInitArgs@entry=0x7fffffffd610)
    at SoftHSM.cpp:530
#5  0x00007ffff6c2d9ac in C_Initialize (pInitArgs=0x7fffffffd610) at main.cpp:137
#6  0x00007ffff6d09fe5 in p11prov_Initialize (ctx=0x7ffff685b540, pInitArgs=0x7fffffffd610)
    at ../src/interface.gen.c:18


When debugging the working version in package, it were calling libc malloc from strdup and on free as well.

Breakpoint 3.2, __GI___libc_malloc (bytes=bytes@entry=19) at malloc.c:3294
3294	{
(gdb) info symbol __GI___libc_malloc 
malloc in section .text of /lib64/libc.so.6

Breakpoint 2.4, __GI___libc_free (mem=0x555555565140) at malloc.c:3359
(gdb) info symbol __GI___libc_free 
free in section .text of /lib64/libc.so.6

So there is clear mess and I am not sure how it could be solved. Seems like problem somewhere in dynamic loader or linker.

Comment 13 Petr Menšík 2024-06-07 22:06:45 UTC
versions installed on my machine:

jemalloc-devel-5.3.0-4.fc39.x86_64
p11-kit-0.25.3-1.fc39.x86_64
openssl-devel-3.1.1-4.fc39.x86_64
gcc-13.3.1-1.fc39.x86_64
binutils-2.40-14.fc39.x86_64

working version bind9-next-9.19.21-1.fc41 were build with:

DEBUG util.py:463:   binutils                      x86_64 2.41-32.fc40                  build 6.2 M
DEBUG util.py:463:   gcc                                    x86_64  14.0.1-0.6.fc40                 build   37 M
DEBUG util.py:463:   gnupg2                                 x86_64  2.4.4-1.fc40                    build  2.7 M
DEBUG util.py:463:   jemalloc-devel                         x86_64  5.3.0-6.fc40                    build   81 k
DEBUG util.py:463:   openssl-devel                          x86_64  1:3.2.1-2.fc40                  build  2.7 M
DEBUG util.py:463:   p11-kit                       x86_64 0.25.3-4.fc40                 build 476 k
DEBUG util.py:463:   p11-kit-trust                 x86_64 0.25.3-4.fc40                 build 132 k
DEBUG util.py:463:    libstdc++-14.0.1-0.6.fc40.x86_64                                              

Broken version bind9-next-9.19.22-1.fc41 build with:
DEBUG util.py:463:   binutils                         x86_64 2.42.50-6.fc41             build       27.2 MiB
DEBUG util.py:463:   binutils-gold                    x86_64 2.42.50-6.fc41             build        2.0 MiB
DEBUG util.py:463:   openssl-devel                        x86_64 1:3.2.1-6.fc41                  build        4.3 MiB
DEBUG util.py:463:   libgcc                           x86_64 14.0.1-0.15.fc41           build      270.6 KiB
DEBUG util.py:463:   openssl-libs                     x86_64 1:3.2.1-6.fc41             build        7.8 MiB
DEBUG util.py:463:   p11-kit                          x86_64 0.25.3-4.fc40              build        2.2 MiB
DEBUG util.py:463:   p11-kit-trust                    x86_64 0.25.3-4.fc40              build      391.4 KiB
DEBUG util.py:463:   jemalloc-devel                       x86_64 5.3.0-6.fc40                    build      221.1 KiB

The only version difference seems binutils.

https://koji.fedoraproject.org/koji/taskinfo?taskID=118717255
My new scratch build at .24 still at:
binutils-2.42.50-14.fc41.x86_64
libgcc-14.1.1-4.fc41.x86_64
elfutils-0.191-7.fc41.x86_64
openssl-libs-3.2.2-1.fc41.x86_64
p11-kit-0.25.3-4.fc40.x86_64
p11-kit-trust-0.25.3-4.fc40.x86_64

Comment 14 Petr Menšík 2024-06-07 22:34:38 UTC
at least objdump suggests it should be different:

(debug-test) objdump -T /usr/lib64/pkcs11/libsofthsm2.so  | grep -wE 'malloc|realloc|free'
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) free
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) malloc
(debug-test) objdump -T /usr/lib64/libisc-9.19.21.so  | grep -wE 'malloc|realloc|free'
0000000000000000      DF *UND*	0000000000000000  Base        free
0000000000000000      DF *UND*	0000000000000000  Base        malloc

(debug-test) objdump -T /usr/lib64/libisc-9.19.24.so  | grep -wE 'malloc|realloc|free'
0000000000000000      DF *UND*	0000000000000000  Base        free
0000000000000000      DF *UND*	0000000000000000  Base        malloc

I guess it would depend on dlopen flags also.

Breakpoint 1, ___dlopen (file=0x7ffff681d600 "/usr/lib64/ossl-modules/pkcs11.so", mode=2) at dlopen.c:77

Breakpoint 1, ___dlopen (file=0x7ffff681d7e0 "/usr/lib64/pkcs11/libsofthsm2.so", mode=10) at dlopen.c:77

Comment 15 Simo Sorce 2024-06-10 13:47:37 UTC
Petr,
can you just disable jemalloc use?
Seem like the best way to go.

Comment 16 Petr Menšík 2024-06-12 14:28:15 UTC
There seems to be documented in upstream issue: https://gitlab.isc.org/isc-projects/bind9/-/issues/4404

I tried to scan code, but this changed mostly only link flags of resulting issues with linker. But the overall result might be what we see in debugger. Especially because pkcs11 provider does not use RTLD_DEEPBIND flag, which should ensure symbols linked directly to pkcs11.so are preferred, including memory management functions.

But both seem to use RTLD_NOW, which should ensure whatever memory allocation function are used, they should come from the same memory allocator. Having allocator from jemalloc and free from glibc is strange hybrid, which should not happen.

There seems to be also issue with bug #2278016 on samba component, which reports issues with recent bind issues.

Comment 17 Petr Menšík 2024-06-12 14:48:16 UTC
configure.ac contains:
    echo "Features disabled or unavailable on this platform:"
    if test "no" = "$with_jemalloc"; then
      echo "    Memory allocator: system"
      echo
      echo "    +------------------------------------------+"
      echo "    |             ==== WARNING ====            |"
      echo "    |                                          |"
      echo "    | This is NOT a recommended configuration. |"
      echo "    | Using the system memory allocator causes |"
      echo "    | reduced performance and increased memory |"
      echo "    | fragmentation. Installing the jemalloc   |"
      echo "    | memory allocator (version >= 4.0.0) is   |"
      echo "    | strongly recommended.                    |"
      echo "    +------------------------------------------+"
      echo
    fi

I have not done much actual testing of alternative allocator, but I trust they did not choose it by random changes. ISC is doing great work, so I would like to use recommended solution if possible. But I admit, we are hitting strange issues. Without clear evidence this change is actually broken by bind code indeed.

But commits https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/8609/commits happened from good to broken version. It just changes flags on tests however.

But we see again partial use of jemalloc, but in different cases. Fixes referenced in https://gitlab.isc.org/isc-projects/bind9/-/issues/4404 may not be complete.

Comment 18 Petr Menšík 2024-06-24 15:07:25 UTC
Okay, I have tried to disable jemalloc in bind builds by: https://src.fedoraproject.org/rpms/bind9-next/pull-request/7

But there is some reproducible issue visible on all Fedora releases, where it just fails to create documentation using cfg_test tool.

- Passed on test build on epel9: https://koji.fedoraproject.org/koji/taskinfo?taskID=119510454
- Failed on rawhide scratch: https://koji.fedoraproject.org/koji/taskinfo?taskID=119510454
- Failed on f40 scratch: https://koji.fedoraproject.org/koji/taskinfo?taskID=119513219
- Failed on f39 scratch: https://koji.fedoraproject.org/koji/taskinfo?taskID=119513743

I have reproduced it multiple times in mockbuild.

Even have backtrace for it:
*** buffer overflow detected ***: terminated

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44	      return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff7d89613 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2  0x00007ffff7d30d0e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7d18942 in __GI_abort () at abort.c:79
#4  0x00007ffff7d197a7 in __libc_message_impl (fmt=fmt@entry=0x7ffff7ea4163 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:132
#5  0x00007ffff7e178f9 in __GI___fortify_fail (msg=msg@entry=0x7ffff7ea414a "buffer overflow detected") at fortify_fail.c:24
#6  0x00007ffff7e17294 in __GI___chk_fail () at chk_fail.c:28
#7  0x00007ffff7f6e018 in memset (__dest=<optimized out>, __ch=<optimized out>, __len=<optimized out>, __dest=<optimized out>, __ch=<optimized out>, 
    __len=<optimized out>) at /usr/include/bits/string_fortified.h:59
#8  mallocx (size=93824992546016, flags=<optimized out>) at ../../../lib/isc/jemalloc_shim.h:71
#9  mem_get (ctx=ctx@entry=0x55555559ea10, size=size@entry=48, flags=flags@entry=64) at ../../../lib/isc/mem.c:307
#10 0x00007ffff7f6fb33 in isc__mem_get (ctx=0x55555559ea10, size=48, flags=flags@entry=64) at ../../../lib/isc/mem.c:692
#11 0x00007ffff7f70377 in isc__mem_reget (ctx=<optimized out>, old_ptr=<optimized out>, old_size=<optimized out>, new_size=<optimized out>, flags=flags@entry=64)
    at ../../../lib/isc/mem.c:813
#12 0x00007ffff7f706f3 in sync_channellist (lcfg=0x5555555a0b50) at ../../../lib/isc/log.c:970
#13 0x00007ffff7f7201b in isc_log_create (mctx=<optimized out>, lctxp=lctxp@entry=0x7fffffffde00, lcfgp=lcfgp@entry=0x7fffffffde08) at ../../../lib/isc/log.c:273
#14 0x00005555555553f9 in main (argc=1, argv=0x7fffffffdfa8) at ../../../doc/misc/cfg_test.c:76

There seems to be happening something quite weird in mallocx stub replacement. When it enters the mallocx function after hitting isc__mem_reget, it has still good size.

#0  checked_request2size (req=<optimized out>) at malloc.c:3306
#1  __GI___libc_malloc (bytes=bytes@entry=48) at malloc.c:3305
#2  0x00007ffff7f6dfc9 in mallocx (size=48, flags=64) at ../../../lib/isc/jemalloc_shim.h:67
#3  mem_get (ctx=ctx@entry=0x55555559ea10, size=size@entry=48, flags=flags@entry=64) at ../../../lib/isc/mem.c:307
#4  0x00007ffff7f6fb33 in isc__mem_get (ctx=0x55555559ea10, size=48, flags=flags@entry=64) at ../../../lib/isc/mem.c:692
#5  0x00007ffff7f70377 in isc__mem_reget (ctx=<optimized out>, old_ptr=<optimized out>, old_size=<optimized out>, new_size=<optimized out>, flags=flags@entry=64)
    at ../../../lib/isc/mem.c:813
#6  0x00007ffff7f706f3 in sync_channellist (lcfg=0x5555555a0b50) at ../../../lib/isc/log.c:970
#7  0x00007ffff7f7201b in isc_log_create (mctx=<optimized out>, lctxp=lctxp@entry=0x7fffffffde00, lcfgp=lcfgp@entry=0x7fffffffde08) at ../../../lib/isc/log.c:273
#8  0x00005555555553f9 in main (argc=1, argv=0x7fffffffdfa8) at ../../../doc/misc/cfg_test.c:76

But then something strange happens in malloc and size changes in backtrace.
#0  __malloc_usable_size (m=m@entry=0x5555555a0ce0) at malloc.c:5249
#1  0x00007ffff7f6e008 in sallocx (ptr=0x5555555a0ce0, flags=<optimized out>) at ../../../lib/isc/jemalloc_shim.h:60
#2  mallocx (size=93824992546016, flags=<optimized out>) at ../../../lib/isc/jemalloc_shim.h:71
#3  mem_get (ctx=ctx@entry=0x55555559ea10, size=size@entry=48, flags=flags@entry=64) at ../../../lib/isc/mem.c:307
#4  0x00007ffff7f6fb33 in isc__mem_get (ctx=0x55555559ea10, size=48, flags=flags@entry=64) at ../../../lib/isc/mem.c:692
#5  0x00007ffff7f70377 in isc__mem_reget (ctx=<optimized out>, old_ptr=<optimized out>, old_size=<optimized out>, new_size=<optimized out>, flags=flags@entry=64)
    at ../../../lib/isc/mem.c:813
#6  0x00007ffff7f706f3 in sync_channellist (lcfg=0x5555555a0b50) at ../../../lib/isc/log.c:970
#7  0x00007ffff7f7201b in isc_log_create (mctx=<optimized out>, lctxp=lctxp@entry=0x7fffffffde00, lcfgp=lcfgp@entry=0x7fffffffde08) at ../../../lib/isc/log.c:273
#8  0x00005555555553f9 in main (argc=1, argv=0x7fffffffdfa8) at ../../../doc/misc/cfg_test.c:76

It seems like memory corruption or linker issues of some sort. But I haven't seen the same issues on my dev machines, when used just ``./configure --enable-developer --without-jemalloc``. Might be again something linked to ``-flto=auto`` cflag used for package builds?

Comment 19 Petr Menšík 2024-06-24 15:09:23 UTC
Looks like this:

make[3]: Leaving directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
make[3]: Entering directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
/bin/sh ../../libtool  --tag=CC   --mode=link gcc -Wall -Wextra -Wwrite-strings -Wpointer-arith -Wno-missing-field-initializers -Wformat -Wshadow -Werror=implicit-function-declaration -Werror=missing-prototypes -Werror=format-security -Werror=parentheses -Werror=implicit -Werror=strict-prototypes -Werror=vla -fno-strict-aliasing -fno-delete-null-pointer-checks -fdiagnostics-show-option -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64   -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer  -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64   -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer  -pthread -Wl,--export-dynamic  -Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes  -o cfg_test cfg_test-cfg_test.o ../../lib/isc/libisc.la ../../lib/dns/libdns.la ../../lib/isccfg/libisccfg.la -lpthread -lm  -lurcu-cds -lurcu-common -lurcu
libtool: link: gcc -Wall -Wextra -Wwrite-strings -Wpointer-arith -Wno-missing-field-initializers -Wformat -Wshadow -Werror=implicit-function-declaration -Werror=missing-prototypes -Werror=format-security -Werror=parentheses -Werror=implicit -Werror=strict-prototypes -Werror=vla -fno-strict-aliasing -fno-delete-null-pointer-checks -fdiagnostics-show-option -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Wl,--export-dynamic -Wl,-z -Wl,relro -Wl,--as-needed -Wl,-z -Wl,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes -o .libs/cfg_test cfg_test-cfg_test.o  ../../lib/isc/.libs/libisc.so ../../lib/dns/.libs/libdns.so ../../lib/isccfg/.libs/libisccfg.so /home/pemensik/fedora/bind9-next/bind-9.19.24/build/lib/ns/.libs/libns.so /home/pemensik/fedora/bind9-next/bind-9.19.24/build/lib/dns/.libs/libdns.so -lgssapi_krb5 -lkrb5 -lk5crypto -lcom_err -lmaxminddb -lprotobuf-c -lfstrm -llmdb /home/pemensik/fedora/bind9-next/bind-9.19.24/build/lib/isc/.libs/libisc.so -lz -ljson-c -lnghttp2 -lxml2 -luv -ldl -lrt -lcrypto -lssl -lpthread -lm -lurcu-cds -lurcu-common -lurcu -pthread
make[3]: Leaving directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
make[3]: Entering directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
./cfg_test --zonegrammar forward > forward.zoneopt
*** buffer overflow detected ***: terminated
make[3]: *** [Makefile:903: forward.zoneopt] Error 134
make[3]: Leaving directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
make[3]: *** Waiting for unfinished jobs....
make[3]: Entering directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
./cfg_test --zonegrammar primary > primary.zoneopt
*** buffer overflow detected ***: terminated
make[3]: *** [Makefile:894: primary.zoneopt] Error 134
make[3]: Leaving directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
make[3]: Entering directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
./cfg_test --zonegrammar secondary > secondary.zoneopt
*** buffer overflow detected ***: terminated
make[3]: *** [Makefile:897: secondary.zoneopt] Error 134
make[3]: Leaving directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
make[3]: Entering directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
./cfg_test --zonegrammar mirror > mirror.zoneopt
*** buffer overflow detected ***: terminated
make[3]: *** [Makefile:900: mirror.zoneopt] Error 134
make[3]: Leaving directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
make[3]: Entering directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
./cfg_test --rndc --grammar | /usr/bin/perl ../../../doc/misc/sort-options.pl > rndc.grammar
*** buffer overflow detected ***: terminated
make[3]: Leaving directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
make[3]: Entering directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
./cfg_test --zonegrammar hint > hint.zoneopt
*** buffer overflow detected ***: terminated
make[3]: *** [Makefile:906: hint.zoneopt] Error 134
make[3]: Leaving directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
make[3]: Entering directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
./cfg_test --named --grammar | /usr/bin/perl ../../../doc/misc/sort-options.pl > options
*** buffer overflow detected ***: terminated
make[3]: Leaving directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
make[3]: Entering directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
./cfg_test --zonegrammar stub > stub.zoneopt
*** buffer overflow detected ***: terminated
make[3]: *** [Makefile:909: stub.zoneopt] Error 134
make[3]: Leaving directory '/home/pemensik/fedora/bind9-next/bind-9.19.24/build/doc/misc'
make[2]: *** [Makefile:461: all-recursive] Error 1
make[1]: *** [Makefile:582: all-recursive] Error 1
make: *** [Makefile:509: all] Error 2

Comment 20 Ondrej Moriš 2024-08-07 15:26:22 UTC
For the record, this issue is very likely caused by softhsm token. With a different token [*] there is no crash:

# dnssec-keyfromlabel -a RSASHA256 -l 'pkcs11:id=%00%00' localhost
Enter PIN for PKCS#11 Token (Slot 0 - Kryoptic Slot):
Klocalhost.+008+38544

# dnssec-keyfromlabel -a RSASHA256 -l 'pkcs11:id=%00%01' -f KSK  localhost
Enter PIN for PKCS#11 Token (Slot 0 - Kryoptic Slot):
Klocalhost.+008+01334

# dnssec-signzone -o localhost localhost
Enter PIN for PKCS#11 Token (Slot 0 - Kryoptic Slot):
Verifying the zone using the following algorithms:
- RSASHA256
Zone fully signed:
Algorithm: RSASHA256: KSKs: 1 active, 0 stand-by, 0 revoked
                      ZSKs: 1 active, 0 stand-by, 0 revoked
localhost.signed
# rpm -q bind9-next
bind9-next-9.19.24-4.fc41.x86_64

Petr, please feel free to drop it if you feel the issue is no longer needed.

[*] https://github.com/latchset/kryoptic

Comment 21 Florian Weimer 2024-09-03 09:32:01 UTC
Could we get LD_DEBUG=all from the failing process?

The local search scope for the plugin should have jemalloc for libc, but that doesn't seem to happen for some reason.

Comment 22 Simo Sorce 2024-09-05 13:07:22 UTC
@Florian FWIW the crashes seem to go away if we use kryoptic as the token instead of softhsm2 ... so it may be yet another C++ cause issue,

Comment 23 Aoife Moloney 2025-02-26 13:01:55 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 42 development cycle.
Changing version to 42.


Note You need to log in before you can comment on or make changes to this bug.