Bug 1974132
| Summary: | jitterentropy-3.0.2-1.fc34 and rng-tools-6.13-2.fc34: rngd uses excessive cpu resources | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | bf2006a |
| Component: | rng-tools | Assignee: | Vladis Dronov <vdronov> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 34 | CC: | jaromir.capik, jgarzik, lewk, redhat-bugzilla, vdronov |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-07-21 10:53:50 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Attachments: | |||
|
Description
bf2006a
2021-06-20 20:14:38 UTC
hello, Morris, unfortunately, i still not able to reproduce this even on older AMD CPU machines in our lab (but i still did not find AMD Athlon 64 one). so, i would need your help, otherwise i cannot go further. so first, let me share my result - the CPU contention happens entirely in userspace, so this is rngd code or updated jitterentropy lib (v3.0.2). so, could you please do the following on your machine with the issue (all should be done as root): 1) let's double-check it is jitter rng source (or maybe it is not). please stop rngd service if running and run: # /sbin/rngd -f -d -x hwrng -x rdrand -x tpm -x nist -n jitter -x pkcs11 -x rtlsdr -O jitter:use_aes:0 please, check if it starts to consume CPU again or not. then try the same with AES: # /sbin/rngd -f -d -x hwrng -x rdrand -x tpm -x nist -n jitter -x pkcs11 -x rtlsdr -O jitter:use_aes:1 if the CPU is not eaten, could you please check each rng source by enabling them by one and disabling the others. for example for hwrng it should be: # /sbin/rngd -f -d -n hwrng -x rdrand -x tpm -x nist -x jitter -x pkcs11 -x rtlsdr but this should fail on your system as it does not have hw rng. we need to identify which rng source brings up the issue. 2) then we can try to grab a core for a analysis. first we need debuginfo for glibc: # dnf debuginfo-install glibc it is possible you have older glibc installed, and it was removed from fedora repos already. in this case it is needed to also update glibc itself and reboot: # dnf upgrade glibc # dnf debuginfo-install glibc # reboot then, install debug files for rngd and jitter-lib, gdb and gcore: # dnf debuginfo-install rng-tools-6.13-2.fc34 jitterentropy-3.0.2-1.fc34 # dnf intall gdb then run rngd service or just "/sbin/rngd -f -d", wait for the issue to appear, let it run for, say, 10 seconds, eating the CPU, and then take a core: # gcore -a -o ./rngd.f <rngd PID, for example 629> [New LWP 630] [New LWP 631] [New LWP 632] [New LWP 633] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 0x00007fa2998e85bf in poll () from /lib64/libc.so.6 warning: target file /proc/629/cmdline contained unexpected null characters Saved corefile rngd.f.629 [Inferior 1 (process 629) detached] the output should be similar to this above. please, provide the resulting ./rngd.f.629 for analysis (gzip it please, it should be ~300M). 3) if the first step shows it is jitter rng source, could you please remove rng-tools-6.13-2.fc34 and jitterentropy-3.0.2-1.fc34 completely, install an older jitterentropy-2.2.0-4.fc34 from stable, install rng-tools from: https://koji.fedoraproject.org/koji/taskinfo?taskID=70532506 this is rng-tools-6.13-2 linked to an older jitter lib. after installing, you can double-check this with: # ldd usr/sbin/rngd | grep jitter libjitterentropy.so.2 => /lib64/libjitterentropy.so.2 (0x00007fd8be14b000) ^^^ ^^^ and check if the issue reproduces with the older library linked. thank you. i do understand these are a lot of steps and work, but i do not have another possibility until i can reproduce this in-house. also, could you provide output of "uname -a", please? just to ensure we talk about the same kernel. Created attachment 1792687 [details]
requested core dump for faulty jitterentropy-3.0.2-1.fc34 and rng-tools-6.13-2.fc34
Created attachment 1792688 [details]
requested core dump of revised rng-tools-6.13-2.fc34 linked against older jitterentropy-2.2.0
(In reply to Vladis Dronov from comment #1) > hello, Morris, > unfortunately, i still not able to reproduce this even on older AMD CPU > machines in our lab (but i still did not find AMD Athlon 64 one). > so, i would need your help, otherwise i cannot go further. so first, let me > share my result - the CPU contention happens entirely in > userspace, so this is rngd code or updated jitterentropy lib (v3.0.2). > > so, could you please do the following on your machine with the issue (all > should be done as root): > > 1) let's double-check it is jitter rng source (or maybe it is not). please > stop rngd service if running and run: > > # /sbin/rngd -f -d -x hwrng -x rdrand -x tpm -x nist -n jitter -x pkcs11 -x > rtlsdr -O jitter:use_aes:0 > > please, check if it starts to consume CPU again or not. then try the same > with AES: > > # /sbin/rngd -f -d -x hwrng -x rdrand -x tpm -x nist -n jitter -x pkcs11 -x > rtlsdr -O jitter:use_aes:1 The jitter source causes excessive CPU use, with and without AES. > > if the CPU is not eaten, could you please check each rng source by enabling > them by one and disabling the others. > for example for hwrng it should be: > > # /sbin/rngd -f -d -n hwrng -x rdrand -x tpm -x nist -x jitter -x pkcs11 -x > rtlsdr > > but this should fail on your system as it does not have hw rng. we need to > identify which rng source brings up > the issue. All of the other sources fail. > > 2) then we can try to grab a core for a analysis. first we need debuginfo > for glibc: > > # dnf debuginfo-install glibc > > it is possible you have older glibc installed, and it was removed from > fedora repos already. in this > case it is needed to also update glibc itself and reboot: > > # dnf upgrade glibc > # dnf debuginfo-install glibc > # reboot > > then, install debug files for rngd and jitter-lib, gdb and gcore: > > # dnf debuginfo-install rng-tools-6.13-2.fc34 jitterentropy-3.0.2-1.fc34 > # dnf intall gdb > > then run rngd service or just "/sbin/rngd -f -d", wait for the issue to > appear, let it run for, say, > 10 seconds, eating the CPU, and then take a core: > > # gcore -a -o ./rngd.f <rngd PID, for example 629> > [New LWP 630] > [New LWP 631] > [New LWP 632] > [New LWP 633] > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > 0x00007fa2998e85bf in poll () from /lib64/libc.so.6 > warning: target file /proc/629/cmdline contained unexpected null characters > Saved corefile rngd.f.629 > [Inferior 1 (process 629) detached] > > the output should be similar to this above. please, provide the resulting > ./rngd.f.629 for analysis (gzip it please, it should be ~300M). Submitted. > > 3) if the first step shows it is jitter rng source, could you please remove > rng-tools-6.13-2.fc34 and jitterentropy-3.0.2-1.fc34 completely, > install an older jitterentropy-2.2.0-4.fc34 from stable, install rng-tools > from: > > https://koji.fedoraproject.org/koji/taskinfo?taskID=70532506 > > this is rng-tools-6.13-2 linked to an older jitter lib. after installing, > you can double-check this with: > > # ldd usr/sbin/rngd | grep jitter > libjitterentropy.so.2 => /lib64/libjitterentropy.so.2 (0x00007fd8be14b000) > ^^^ ^^^ > and check if the issue reproduces with the older library linked. > > thank you. There is no excessive CPU use from your modified version of rng-tools that is linked to the older jitterentropy. Core dump submitted. This machine's packages are synchronized with Fedora 34 testing, including the running kernel, which is: Linux ... 5.12.12-300.fc34.x86_64 #1 SMP Fri Jun 18 14:30:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux > > i do understand these are a lot of steps and work, but i do not have another > possibility until i can reproduce this in-house. It's okay. Thank you for your efforts: it's never convenient to debug by proxy. Please let me know if you need anything else to help resolve this problem. hello, Morris,
thanks a ton for your reslults and i'm sorry i did not responded earlier.
> There is no excessive CPU use from your modified version of rng-tools
> that is linked to the older jitterentropy.
yes, this is exactly the issue i was thinking of. so indeed, there is
an issue with jitterentropy-v3. i guess, i'll raise the issue with the lib's
upstream and meanwhile i'll continue looking at the core dumps.
thanks again, your input is most valuable.
hello, Morris, it looks like your issue is exactly the https://github.com/smuellerDD/jitterentropy-library/issues/37 one. i've revoked f34 and f35 updates and will postpone a release until the issue in the jitterentropy-3.0.2 lib is fixed. you can check comments since https://github.com/smuellerDD/jitterentropy-library/issues/37#issuecomment-861185576 for the current status. hello, Morris, i was talking to Neil and Stephan. there a concern arises. we are not sure how rng-tools + jitterlib-v2 were working on your old 1-CPU AMD Athlon 64 system before. please, see 1) at: https://github.com/smuellerDD/jitterentropy-library/issues/37#issuecomment-869637562 if you are still willing to help, could you please grab a jitterlib-v2 from: https://koji.fedoraproject.org/koji/taskinfo?taskID=71263992 and install it and rng-tools from f34-stable (deleting previous versions of jitterlib and rng-tools) to your AMD system and just run: # /sbin/rngd -f -n jitter and then can you post a line with "jent_entropy_init:" ? i have the following in my f34 vm: # dnf erase jitterentropy rng-tools # wget https://kojipkgs.fedoraproject.org//work/tasks/4051/71264051/jitterentropy-2.2.0-5.fc34.x86_64.rpm # dnf install jitterentropy-2.2.0-5.fc34.x86_64.rpm rng-tools # -f -n jitter Enabling 5: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: Initialized [rdrand]: Enabling RDRAND rng support [rdrand]: Initialized jent_entropy_init: ret = OK <<< THIS [jitter]: Initializing AES buffer [jitter]: Enabling JITTER rng support [jitter]: Initialized i have "jent_entropy_init: ret = OK" here, this is expected. we are very curious to learn what would it say on your system. so if you could help, this would be great. thank you. btw, meanwhile Stephan has rolled out a fixed version of jitterlib-v3. would you be able to help us to test it on your unique system too? the "test" should be just installing newer rng-tools and jitterlib-v3, running /sbin/rngd and checking if it still eats CPU. thank you. # /sbin/rngd -f -n jitter , surely. It appears to work, although I've never tested the output: $ sudo dnf list --installed jitterentropy rng-tools Installed Packages jitterentropy.x86_64 2.2.0-5.fc34 @@commandline rng-tools.x86_64 6.12-3.fc34 @@commandline $ sudo /sbin/rngd -f -n jitter Enabling 5: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: Initialization Failed [rdrand]: Initialization Failed jent_entropy_init: ret = OK [jitter]: Initializing AES buffer [jitter]: Enabling JITTER rng support [jitter]: Initialized [pkcs11]: No pkcs11 slots available [pkcs11]: Initialization Failed [rtlsdr]: Initialization Failed In slightly more detail: $ sudo /sbin/rngd -dtf -n jitter Enabling 5: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: read error [hwrng ]: No available rng device [hwrng ]: Initialization Failed [rdrand]: Initialization Failed [jitter]: Limiting thread count to 1 active cpus [jitter]: JITTER starts 1 threads [jitter]: CPU Thread 0 is ready [jitter]: Initializing AES buffer [jitter]: xread_jitter requests 16 bytes from pipe [jitter]: JITTER thread on cpu 0 wakes up for refill [jitter]: jent_read_entropy time on cpu 0 is 4.128277359000e+00 sec [jitter]: Writing to pipe [jitter]: xread_jitter gets 16 bytes [jitter]: xread_jitter requests 128 bytes from pipe [jitter]: xread_jitter gets 128 bytes [jitter]: xread_jitter requests 16535 bytes from pipe [jitter]: xread_jitter falls back to AES [jitter]: Enabling JITTER rng support [jitter]: xread_jitter requests 4 bytes from pipe [jitter]: xread_jitter falls back to AES [jitter]: xread_jitter requests 4 bytes from pipe [jitter]: xread_jitter falls back to AES [jitter]: Initialized [jitter]: DONE Writing to pipe with return 16535 [jitter]: JITTER thread on cpu 0 wakes up for refill [pkcs11]: No pkcs11 slots available [pkcs11]: Initialization Failed [rtlsdr]: No rtlsdr radio devices found [rtlsdr]: Initialization Failed Entering test mode...no entropy will be delivered to the kernel Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes ... Yes, I can try your modified versions of the two packages. I assume that they are related to the recent: https://bodhi.fedoraproject.org/updates/FEDORA-2021-4b1b4c2e34 . Do you have a build for Fedora 34? These types of systems are not the majority of the install base any more, but unique may be overstating the case -- I know of several similar -- my neighbor has a back-up desktop running Fedora with a slightly newer Athlon. Now, using https://bodhi.fedoraproject.org/updates/FEDORA-2021-b30e92acb8 , I see: Installed Packages jitterentropy.x86_64 3.0.2.git.d18d5863-1.fc34 @@commandline rng-tools.x86_64 6.13.git.d207e0b6-1.fc34 @@commandline $ sudo /sbin/rngd -dtf -n jitter Enabling 6: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: read error [hwrng ]: No available rng device [hwrng ]: Initialization Failed [rdrand]: Initialization Failed [jitter]: JITTER rng fails with code -38 [jitter]: Initialization Failed [pkcs11]: No pkcs11 slots available [pkcs11]: Initialization Failed [rtlsdr]: No rtlsdr radio devices found [rtlsdr]: Initialization Failed Can't open any entropy source Maybe RNG device modules are not loaded which prevents the excessive use of the CPU, but obviously can't collect entropy. I didn't follow the discussion in great detail, but nhorman seemed to think that it should be possible to use the hardware timer on this system after all, since it seemed to be working before. Can we do so, or find another way to fix the software timer? Thanks. hello, Morris, thanks for your update. now i probably have an idea of what's happening. would you be able to make one more simple test on your Athlon 64 machine? if you are still willing to help, could you please grab a jitterlib-v3 with debug from: https://koji.fedoraproject.org/koji/taskinfo?taskID=71519025 https://kojipkgs.fedoraproject.org//work/tasks/9132/71519132/jitterentropy-3.0.2.git.d18d5863-1.debug.fc34.x86_64.rpm delete the previous versions and install it and rng-tools from f34-testing (the one you already have) to your AMD system and just run: # /sbin/rngd -f -n jitter again? the versions should be: rng-tools-6.13.git.d207e0b6-1.fc34.x86_64.rpm (from f34-testing) jitterentropy-3.0.2.git.d18d5863-1.debug.fc34.x86_64.rpm (from koji) on my vm it is: # dnf -y erase jitter\* rng-tools\* # dnf -y install koji/jitterentropy-3.0.2.git.d18d5863-1.debug.fc34.x86_64.rpm rng-tools-6.13.git.d207e0b6-1.fc34.x86_64.rpm # /usr/sbin/rngd -f -n jitter Enabling 6: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: Initialized [rdrand]: Enabling RDRAND rng support [rdrand]: Initialized jent_time_entropy_init: enable_notime = 0 jent_get_nstime: cpuid(1,0).EDX[4] = 0x0 jent_get_nstime: defined __x86_64__ jent_get_nstime: defined __x86_64__: rdtsc = 20d38730df83 jent_entropy_init: jent_time_entropy_init(0) = 0 jent_entropy_init: jent_force_internal_timer = 0 ret = 0 jent_notime_enable: JENT_CONF_EIT TRUE, jent_force_internal_timer = 0, flags& = 0 jent_notime_enable: ret 0 [jitter]: Initializing AES buffer [jitter]: Enabling JITTER rng support [jitter]: Initialized [pkcs11]: PKCS11 Engine /usr/lib64/opensc-pkcs11.so Error: No such file or directory i'm especially interested in "jent_" lines, these should confirm or decline my idea. if it is confirmed, i'll compose a detailed wrap-up. thank you. The requested results: $ sudo dnf list --installed jitterentropy\* rng-tools\* Installed Packages jitterentropy.x86_64 3.0.2.git.d18d5863-1.debug.fc34 @@commandline rng-tools.x86_64 6.13.git.d207e0b6-1.fc34 @@commandline $ sudo /usr/sbin/rngd -f -n jitter Enabling 6: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: Initialization Failed [rdrand]: Initialization Failed jent_time_entropy_init: enable_notime = 0 jent_get_nstime: cpuid(1,0).EDX[4] = 0x10 jent_get_nstime: defined __x86_64__ jent_get_nstime: defined __x86_64__: rdtsc = c53772376b jent_entropy_init: jent_time_entropy_init(0) = 10 jent_time_entropy_init: enable_notime = 1 jent_get_nstime: cpuid(1,0).EDX[4] = 0x10 jent_get_nstime: defined __x86_64__ jent_get_nstime: defined __x86_64__: rdtsc = c537c8381e jent_notime_enable_thread: JENT_CONF_EIT TRUE, notime_thread = 0x55d0afe10a80x jent_entropy_init: jent_time_entropy_init(1) = -38 jent_entropy_init: jent_force_internal_timer = 0 ret = -38 [jitter]: JITTER rng fails with code -38 [jitter]: Initialization Failed [pkcs11]: No pkcs11 slots available [pkcs11]: Initialization Failed [rtlsdr]: Initialization Failed Can't open any entropy source Maybe RNG device modules are not loaded Please let me know if you need any more information. Thanks for your efforts. hello, Morris, thanks a ton for your testing, most helpful. so let me provide a summary for this issue. your older system is a great reproduser for this corner case, i just cannot express my thanks properly! jent_get_nstime: cpuid(1,0).EDX[4] = 0x10 (1) jent_get_nstime: defined __x86_64__: rdtsc = c53772376b (2) jent_entropy_init: jent_time_entropy_init(0) = 10 (3) so as debug log shows, the lib tries to use hardware RDTSC cpu timer. this command is present (1) on your system, it gives out a reasonable result (2), ... but it fails certain NIST SP 800-90B test which RNG should conform to (3). the return code 10 is: #define ERCT 10 /* RCT failed during initialization */ i'm not exactly understand what RCT test is, the only i've read is: [ https://lightshipsec.com/nist-800-90b-concepts/ ] Repetition Count Test (RCT) – the goal of the Repetition Count Test is to quickly detect catastrophic failures that cause the noise source to become “stuck” on a single output value for a long period of time. jent_notime_enable_thread: JENT_CONF_EIT TRUE, notime_thread = 0x55d0afe10a80x (1) jent_entropy_init: jent_time_entropy_init(1) = -38 (2) jent_entropy_init: jent_force_internal_timer = 0 ret = -38 [jitter]: Initialization Failed (3) so with that error code, the jitter lib switches to the notime timer emulated by a busy-loop (1), which requires at least 2 cpu cores, one for a busy-loop and another for a jitter processing. otherwise the system constantly hits 100% cpu utilization, exactly as the initial issue reported. as there is only 1 cpu core, the notime timer is not usable (2) and the whole jitter lib fails (3), as it has no hi-res timer. the question was: why this was not the case for the jitterlib-v2. and my research shows that jitterlib-v2 was not using RDTSC on x86_64, but OS' clock_gettime(), which _may_ use RDTSC, but also _may_not_. and so, clock_gettime() can be using an interrupt-driven timer, which could be passing all the checks. so the proper fix, as i see it, is to return the clock_gettime() time source to jitter-lib-v3, so it tries to use it if RDTSC fails for whatever reason. with that, i would like to ask you to make another simple test, i do hope this would be the last one: 1) could you please share OS' clock sources on your Athlon 64 machine? My machine shows: # grep . /sys/devices/system/clocksource/clocksource*/{available,current}_clocksource /sys/devices/system/clocksource/clocksource0/available_clocksource:kvm-clock tsc acpi_pm /sys/devices/system/clocksource/clocksource0/current_clocksource:kvm-clock so if you see "current_clocksource:tsc" - this is not going to work out. you may want to change the clocksource to acpi_pm/jiffies/hpet/etc with either: [ https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt ] clocksource= Override the default clocksource [all] jiffies (this is the base, fallback clocksource) [ACPI] acpi_pm [X86-64] hpet,tsc or: # echo [whatever not tsc] > /sys/devices/system/clocksource/clocksource0/current_clocksource # cat /sys/devices/system/clocksource/clocksource0/current_clocksource # double-check 2) then please install: https://koji.fedoraproject.org/koji/taskinfo?taskID=71715840 https://koji.fedoraproject.org/koji/taskinfo?taskID=71710096 run: # /usr/sbin/rngd -f -n jitter and please share the output with the "jent_" debug lines. i do not expect smth goes wrong, but surely, there may be bugs in my code (though, i've tested it). if it is as i think of it, the clock_gettime() time source should work fine, just as with jitter-v2. it is it so, i'll open an issue/pr with the jitterlib upstream. it happende so that Stephan has released the lib v3.1.0 with significant changes, so probably it'll take some time for me to rebase my clock_gettime() code and open a pr. thank you again! On this system, the only available clocksources are: $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource hpet acpi_pm and the default is: $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource hpet because the kernel has marked the tsc as unstable: tsc: Fast TSC calibration using PIT tsc: Detected 2210.149 MHz processor tsc: Marking TSC unstable due to TSCs unsynchronized and does not make the other timers (PIT, etc.) available. This hpet is: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 76450417870 ns hpet: 3 channels of 0 reserved for per-cpu timers hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31 hpet0: 3 comparators, 32-bit 25.000000 MHz counter I tried to install your test packages: https://kojipkgs.fedoraproject.org//work/tasks/5862/71715862/jitterentropy-3.0.2.git.d18d5863-2.debug.fc34.x86_64.rpm https://kojipkgs.fedoraproject.org//work/tasks/162/71710162/rng-tools-6.13.git.d207e0b6-2.fc34.x86_64.rpm but it seems that your rng-tools package is linked to an older version of jitterentropy, causing a conflict: $ sudo dnf install rng-tools-6.13.git.d207e0b6-2.fc34.x86_64.rpm jitterentropy-3.0.2.git.d18d5863-2.debug.fc34.x86_64.rpm Error: Problem: cannot install both jitterentropy-3.0.2.git.d18d5863-2.debug.fc34.x86_64 and jitterentropy-2.2.0-4.fc34.x86_64 - package rng-tools-6.13.git.d207e0b6-2.fc34.x86_64 requires libjitterentropy.so.2()(64bit), but none of the providers can be installed - conflicting requests (try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages) Do I have the right packages? Was it your intention that I should test them separately? It surprises me that upstream would switch to a rdtsc* from the more portable clock_gettime(), since fewer systems have tscs, and some of those are not invariant. Are we sure that we can revert to the latter -- was the change simply for performance reasons, or correctness? Thanks. (In reply to bf2006a from comment #18) thanks a ton - again! your update is the most useful one indeed. > because the kernel has marked the tsc as unstable: > tsc: Fast TSC calibration using PIT > tsc: Detected 2210.149 MHz processor > tsc: Marking TSC unstable due to TSCs unsynchronized this could explain why jitterlib-v3 also considers tsc as not usable. hpet or acpi_pm as the clocksource0 should both work just fine for our tests. > but it seems that your rng-tools package is linked to an older version of > jitterentropy, causing a conflict: yes, indeed, my fault. i've lost a little bit with all that builds, git repos and IDE windows with a code. *sigh*. i apologize. > Do I have the right packages? Was it your intention that I should test them > separately? these ones: rng-tools from f34-testing repo and another debug build of jitterentropy, v3.1.0-latest this time. this time i've tested they work together. https://kojipkgs.fedoraproject.org/packages/rng-tools/6.13.git.d207e0b6/1.fc34/x86_64/rng-tools-6.13.git.d207e0b6-1.fc34.x86_64.rpm https://kojipkgs.fedoraproject.org/work/tasks/7538/71837538/jitterentropy-3.1.0.git.c29e592e-1.debug.fc34.x86_64.rpm please use the same: # /usr/sbin/rngd -f -n jitter if it still fails (should not), please try to change the clocksource0. > It surprises me that upstream would switch to a rdtsc* from the more > portable clock_gettime(), since fewer systems have tscs, and some of those > are not invariant. Are we sure that we can revert to the latter -- was the > change simply for performance reasons, or correctness? honestly, i'm not sure what Stefan had in mind, i guess, performance reasons indeed. also most of the modern and a somehow older CPUs have a proper RDTSC implemented, so your machine is indeed a corner case. i guess, we may discuss the reason of dropping clock_gettime() in a PR.
> https://kojipkgs.fedoraproject.org/packages/rng-tools/6.13.git.d207e0b6/1.
> fc34/x86_64/rng-tools-6.13.git.d207e0b6-1.fc34.x86_64.rpm
> https://kojipkgs.fedoraproject.org/work/tasks/7538/71837538/jitterentropy-3.
> 1.0.git.c29e592e-1.debug.fc34.x86_64.rpm
>
> please use the same:
>
> # /usr/sbin/rngd -f -n jitter
>
With the above packages, I see this:
$ sudo /sbin/rngd -f -n jitter
Enabling 6: JITTER Entropy generator (jitter)
Initializing available sources
[hwrng ]: Initialization Failed
[rdrand]: Initialization Failed
jent_entropy_init: jent_has_hwtime() = 1
jent_time_entropy_init: TIMER_HARDWARE = 0000022c6eff2087
jent_collector_alloc: ALL OK
jent_entropy_init: jent_time_entropy_init(HARDWARE) = 0
jent_collector_alloc: ALL OK
[jitter]: Initializing AES buffer
[jitter]: Enabling JITTER rng support
[jitter]: Initialized
[pkcs11]: No pkcs11 slots available
[pkcs11]: Initialization Failed
[rtlsdr]: Initialization Failed
...
or, in slightly more detail:
$ sudo /sbin/rngd -dtf -n jitter
Enabling 6: JITTER Entropy generator (jitter)
Initializing available sources
[hwrng ]: read error
[hwrng ]: No available rng device
[hwrng ]: Initialization Failed
[rdrand]: Initialization Failed
[jitter]: Limiting thread count to 1 active cpus
[jitter]: JITTER attempting to start 1 threads
[jitter]: CPU Thread 0 is ready
[jitter]: Initializing AES buffer
[jitter]: xread_jitter requests 16 bytes from pipe
[jitter]: JITTER thread on cpu 0 wakes up for refill
[jitter]: jent_read_entropy time on cpu 0 is 9.731980870000e+00 sec
[jitter]: Writing to pipe
[jitter]: xread_jitter gets 16 bytes
[jitter]: xread_jitter requests 128 bytes from pipe
[jitter]: xread_jitter gets 128 bytes
[jitter]: xread_jitter requests 16535 bytes from pipe
[jitter]: xread_jitter falls back to AES
[jitter]: Enabling JITTER rng support
[jitter]: xread_jitter requests 4 bytes from pipe
[jitter]: xread_jitter falls back to AES
[jitter]: xread_jitter requests 4 bytes from pipe
[jitter]: xread_jitter falls back to AES
[jitter]: Initialized
[jitter]: DONE Writing to pipe with return 16535
[jitter]: JITTER thread on cpu 0 wakes up for refill
[pkcs11]: No pkcs11 slots available
[pkcs11]: Initialization Failed
[rtlsdr]: No rtlsdr radio devices found
[rtlsdr]: Initialization Failed
Kernel entropy pool size 4096, pool watermark 3072
Entering test mode...no entropy will be delivered to the kernel
Reading entropy from JITTER Entropy generator
[jitter]: xread_jitter requests 2500 bytes from pipe
[jitter]: xread_jitter falls back to AES
Reading entropy from JITTER Entropy generator
[jitter]: xread_jitter requests 2500 bytes from pipe
[jitter]: xread_jitter falls back to AES
Reading entropy from JITTER Entropy generator
[jitter]: xread_jitter requests 2500 bytes from pipe
[jitter]: xread_jitter falls back to AES
Reading entropy from JITTER Entropy generator
[jitter]: xread_jitter requests 2500 bytes from pipe
[jitter]: xread_jitter falls back to AES
Reading entropy from JITTER Entropy generator
[jitter]: xread_jitter requests 2500 bytes from pipe
[jitter]: xread_jitter falls back to AES
Reading entropy from JITTER Entropy generator
[jitter]: xread_jitter requests 2500 bytes from pipe
[jitter]: xread_jitter falls back to AES
...
Thanks again for your work. Please let me know if you need any more information.
hello, Morris, so it looks like the latest v3.1.0 changes made by Stephan have eliminated the issue: jent_entropy_init: jent_has_hwtime() = 1 jent_time_entropy_init: TIMER_HARDWARE = 0000022c6eff2087 jent_collector_alloc: ALL OK jent_entropy_init: jent_time_entropy_init(HARDWARE) = 0 i'm still suggesting my changes upstream, as there may be other corner cases: https://github.com/smuellerDD/jitterentropy-library/pull/57 i'm going to build f35/34/33 packages now. thank you for all your help! FEDORA-2021-df20b5de72 has been pushed to the Fedora 35 stable repository. If problem still persists, please make note of it in this bug report. hello, Morris, so, the latest rng-tools and jitterentropy-lib are in updates-testing now. i hope these are the final versions as of now: https://bodhi.fedoraproject.org/updates/?packages=rng-tools unfortunately, a formal downgrade ("dnf distro-sync") is needed. in reality the latest packages are the higher versions, according to upstream commits, despite dnf stating otherwise. alternatively, one could wait for these versions to appear in stable. also, let me note again, with the introduction of jitter-rng in the kernel as of v5.4-rc1 by 50ee7529ec45 ("random: try to actively add entropy rather than passively wait for it") we generally have enough entropy in all cases and do not need rngd to run in userspace anymore. Thus Fedora and RHEL have removed rng-tools from the installed-by-default standard and minimal package sets. i believe your system would run just fine without rngd/jitter-lib (unless it requires massive amounts of entropy for some calculations). with that, i'm closing this bz as CURRENTRELEASE. please, feel free to reopen if any outstanding concerns. I'm a bit confused as to which version of libjitterentropy on fc33 is supposed to contain a fix for this. I am observing something very similar on an aarch64 machine using jitterentropy-3.0.2-2.git.409828cf.fc33.aarch64 where rngd is taking all the CPU for a while on startup, but eventually calms down. perf shows all the time being spent in libjitterentropy: 22.80% rngd libjitterentropy.so.3.1.0 [.] keccakp_chi 19.30% rngd libjitterentropy.so.3.1.0 [.] keccakp_theta 18.31% rngd libjitterentropy.so.3.1.0 [.] rol64 14.52% rngd libjitterentropy.so.3.1.0 [.] jent_memaccess 9.81% rngd libjitterentropy.so.3.1.0 [.] keccakp_rho 6.33% rngd libjitterentropy.so.3.1.0 [.] keccakp_pi 1.92% rngd libjitterentropy.so.3.1.0 [.] ptr_to_le32 1.05% rngd libjitterentropy.so.3.1.0 [.] sha3_init 0.71% rngd libjitterentropy.so.3.1.0 [.] sha3_fill_state 0.70% rngd libjitterentropy.so.3.1.0 [.] keccakp_iota 0.57% rngd libjitterentropy.so.3.1.0 [.] keccakp_1600 0.45% rngd libjitterentropy.so.3.1.0 [.] le32_to_ptr if this version of jitterentropy is supposed to contain the fix I can open another issue. (In reply to Ralf Ertzinger from comment #24) > jitterentropy-3.0.2-2.git.409828cf.fc33.aarch64 where rngd is taking all the > CPU for a while on startup, but eventually calms down. perf shows all the > time being spent in libjitterentropy hello, Ralf, i believe what your describe is a different issue, not the one discussed in this bz. for your issue, having libjitterentropy code consuming 100% CPU (actually, 100% of upto 4 CPU cores) for some shorter time at startup is normal and expected - this is exactly how initial jitter entropy is gathered - and so this is not a bug. in case you have other sources of entropy on your system (/dev/hwrng, RDRAND @ x86_64, RNDR @ ARM v8.5A, etc) you can safely disable the jitter entropy source at all by adding "-x jitter" to the "rngd" command line. and so avoid this CPU usage spike. this can be done by editing /etc/sysconfig/rngd since rng-tools-6.14-1.git.56626083 or /usr/lib/systemd/system/rngd.service for earlier release of rng-tools. |