Created attachment 1792553 [details] output of sudo strace -fvxt -o rngd.log rngd -f -d Description of problem: rngd uses excessive cpu resources. Version-Release number of selected component (if applicable): jitterentropy-3.0.2-1.fc34 and rng-tools-6.13-2.fc34 https://bodhi.fedoraproject.org/updates/FEDORA-2021-ebc6f0c42c How reproducible: Steps to Reproduce: 1. start rngd service. 2. 3. Actual results: rngd fails, and uses all available cpu time. Expected results: rngd efficiently replenishes entropy. Additional info: lscpu: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 40 bits physical, 48 bits virtual CPU(s): 1 On-line CPU(s) list: 0 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 1 NUMA node(s): 1 Vendor ID: AuthenticAMD CPU family: 15 Model: 4 Model name: AMD Athlon(tm) 64 Processor 3400+ Stepping: 8 CPU MHz: 800.000 CPU max MHz: 2200.0000 CPU min MHz: 800.0000 BogoMIPS: 4420.40 L1d cache: 64 KiB L1i cache: 64 KiB L2 cache: 1 MiB NUMA node0 CPU(s): 0 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Not affected Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Full AMD retpoline, STIBP disabled, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm 3dnowext 3dnow rep_good nopl cpuid extd_apicid 3dnowprefetch vmmcall dmidecode: # dmidecode 3.2 Getting SMBIOS data from sysfs. SMBIOS 2.4 present. 22 structures occupying 1144 bytes. Table at 0x000FC7E0. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: American Megatrends Inc. Version: P1.70 Release Date: 08/15/2007 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 512 kB Characteristics: PCI is supported BIOS is upgradeable BIOS shadowing is allowed Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported 5.25"/1.2 MB floppy services are supported (int 13h) 3.5"/720 kB floppy services are supported (int 13h) 3.5"/2.88 MB floppy services are supported (int 13h) Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) CGA/mono video services are supported (int 10h) ACPI is supported USB legacy is supported LS-120 boot is supported ATAPI Zip drive boot is supported BIOS boot specification is supported Function key-initiated network boot is supported Targeted content distribution is supported BIOS Revision: 8.14 Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: To Be Filled By O.E.M. Product Name: To Be Filled By O.E.M. Version: To Be Filled By O.E.M. Serial Number: To Be Filled By O.E.M. UUID: Wake-up Type: Power Switch SKU Number: To Be Filled By O.E.M. Family: To Be Filled By O.E.M. Handle 0x0002, DMI type 2, 15 bytes Base Board Information Manufacturer: Product Name: K8NF6G-VSTA Version: Serial Number: Asset Tag: Features: Board is a hosting board Board is replaceable Location In Chassis: Chassis Handle: 0x0003 Type: Motherboard Contained Object Handles: 0 Handle 0x0003, DMI type 3, 21 bytes Chassis Information Manufacturer: To Be Filled By O.E.M. Type: Desktop Lock: Not Present Version: To Be Filled By O.E.M. Serial Number: To Be Filled By O.E.M. Asset Tag: To Be Filled By O.E.M. Boot-up State: Safe Power Supply State: Safe Thermal State: Safe Security Status: None OEM Information: 0x00000000 Height: Unspecified Number Of Power Cords: 1 Contained Elements: 0 Handle 0x0004, DMI type 4, 35 bytes Processor Information Socket Designation: CPUSocket Type: Central Processor Family: Athlon 64 Manufacturer: AMD ID: 48 0F 00 00 FF FB 8B 07 Signature: Family 15, Model 4, Stepping 8 Flags: FPU (Floating-point unit on-chip) VME (Virtual mode extension) DE (Debugging extension) PSE (Page size extension) TSC (Time stamp counter) MSR (Model specific registers) PAE (Physical address extension) MCE (Machine check exception) CX8 (CMPXCHG8 instruction supported) APIC (On-chip APIC hardware supported) SEP (Fast system call) MTRR (Memory type range registers) PGE (Page global enable) MCA (Machine check architecture) CMOV (Conditional move instruction supported) PAT (Page attribute table) PSE-36 (36-bit page size extension) CLFSH (CLFLUSH instruction supported) MMX (MMX technology supported) FXSR (FXSAVE and FXSTOR instructions supported) SSE (Streaming SIMD extensions) SSE2 (Streaming SIMD extensions 2) Version: AMD Athlon(tm) 64 Processor 3400+ Voltage: 3.3 V 2.9 V External Clock: 200 MHz Max Speed: 2200 MHz Current Speed: 2221 MHz Status: Populated, Enabled Upgrade: Socket 754 L1 Cache Handle: 0x0005 L2 Cache Handle: 0x0006 L3 Cache Handle: Not Provided Serial Number: To Be Filled By O.E.M. Asset Tag: To Be Filled By O.E.M. Part Number: To Be Filled By O.E.M. Handle 0x0005, DMI type 7, 19 bytes Cache Information Socket Designation: L1-Cache Configuration: Enabled, Not Socketed, Level 1 Operational Mode: Varies With Memory Address Location: Internal Installed Size: 128 kB Maximum Size: 128 kB Supported SRAM Types: Pipeline Burst Installed SRAM Type: Pipeline Burst Speed: Unknown Error Correction Type: Single-bit ECC System Type: Data Associativity: 4-way Set-associative Handle 0x0006, DMI type 7, 19 bytes Cache Information Socket Designation: L2-Cache Configuration: Enabled, Not Socketed, Level 2 Operational Mode: Varies With Memory Address Location: Internal Installed Size: 1 MB Maximum Size: 1 MB Supported SRAM Types: Pipeline Burst Installed SRAM Type: Pipeline Burst Speed: Unknown Error Correction Type: Single-bit ECC System Type: Unified Associativity: 4-way Set-associative Handle 0x0007, DMI type 5, 20 bytes Memory Controller Information Error Detecting Method: 64-bit ECC Error Correcting Capabilities: None Supported Interleave: One-way Interleave Current Interleave: One-way Interleave Maximum Memory Module Size: 2048 MB Maximum Total Memory Size: 4096 MB Supported Speeds: 70 ns 60 ns Supported Memory Types: DIMM SDRAM Memory Module Voltage: 3.3 V Associated Memory Slots: 2 0x0008 0x0009 Enabled Error Correcting Capabilities: None Handle 0x0008, DMI type 6, 12 bytes Memory Module Information Socket Designation: DIMM0 Bank Connections: 0 2 Current Speed: Unknown Type: ECC DIMM Installed Size: 1024 MB (Double-bank Connection) Enabled Size: 1024 MB (Double-bank Connection) Error Status: OK Handle 0x0009, DMI type 6, 12 bytes Memory Module Information Socket Designation: DIMM1 Bank Connections: 0 2 Current Speed: Unknown Type: ECC DIMM Installed Size: 1024 MB (Double-bank Connection) Enabled Size: 1024 MB (Double-bank Connection) Error Status: OK Handle 0x000A, DMI type 9, 13 bytes System Slot Information Designation: PCI1 Type: 32-bit PCI Current Usage: In Use Length: Short ID: 1 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Handle 0x000B, DMI type 9, 13 bytes System Slot Information Designation: PCI2 Type: 32-bit PCI Current Usage: In Use Length: Short ID: 2 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Handle 0x000C, DMI type 9, 13 bytes System Slot Information Designation: PCIE1 Type: x16 PCI Express Current Usage: Available Length: Short ID: 17 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Handle 0x000D, DMI type 9, 13 bytes System Slot Information Designation: PCIE2 Type: x1 PCI Express Current Usage: Available Length: Short ID: 18 Characteristics: 3.3 V is provided Opening is shared PME signal is supported Handle 0x000E, DMI type 16, 15 bytes Physical Memory Array Location: System Board Or Motherboard Use: System Memory Error Correction Type: None Maximum Capacity: 8 GB Error Information Handle: Not Provided Number Of Devices: 2 Handle 0x000F, DMI type 19, 15 bytes Memory Array Mapped Address Starting Address: 0x00000000000 Ending Address: 0x0007FFFFFFF Range Size: 2 GB Physical Array Handle: 0x000E Partition Width: 1 Handle 0x0010, DMI type 17, 27 bytes Memory Device Array Handle: 0x000E Error Information Handle: Not Provided Total Width: 64 bits Data Width: 72 bits Size: 1 GB Form Factor: DIMM Set: None Locator: DIMM0 Bank Locator: BANK0 Type: DDR Type Detail: Synchronous Speed: 266 MT/s Manufacturer: Manufacturer0 Serial Number: SerNum0 Asset Tag: AssetTagNum0 Part Number: PartNum0 Handle 0x0011, DMI type 20, 19 bytes Memory Device Mapped Address Starting Address: 0x00000000000 Ending Address: 0x0003FFFFFFF Range Size: 1 GB Physical Device Handle: 0x0010 Memory Array Mapped Address Handle: 0x000F Partition Row Position: 1 Handle 0x0012, DMI type 17, 27 bytes Memory Device Array Handle: 0x000E Error Information Handle: Not Provided Total Width: 64 bits Data Width: 72 bits Size: 1 GB Form Factor: DIMM Set: None Locator: DIMM1 Bank Locator: BANK1 Type: DDR Type Detail: Synchronous Speed: 266 MT/s Manufacturer: Manufacturer1 Serial Number: SerNum1 Asset Tag: AssetTagNum1 Part Number: PartNum1 Handle 0x0013, DMI type 20, 19 bytes Memory Device Mapped Address Starting Address: 0x00040000000 Ending Address: 0x0007FFFFFFF Range Size: 1 GB Physical Device Handle: 0x0012 Memory Array Mapped Address Handle: 0x000F Partition Row Position: 1 Handle 0x0014, DMI type 32, 20 bytes System Boot Information Status: No errors detected Handle 0x0015, DMI type 127, 4 bytes End Of Table
hello, Morris, unfortunately, i still not able to reproduce this even on older AMD CPU machines in our lab (but i still did not find AMD Athlon 64 one). so, i would need your help, otherwise i cannot go further. so first, let me share my result - the CPU contention happens entirely in userspace, so this is rngd code or updated jitterentropy lib (v3.0.2). so, could you please do the following on your machine with the issue (all should be done as root): 1) let's double-check it is jitter rng source (or maybe it is not). please stop rngd service if running and run: # /sbin/rngd -f -d -x hwrng -x rdrand -x tpm -x nist -n jitter -x pkcs11 -x rtlsdr -O jitter:use_aes:0 please, check if it starts to consume CPU again or not. then try the same with AES: # /sbin/rngd -f -d -x hwrng -x rdrand -x tpm -x nist -n jitter -x pkcs11 -x rtlsdr -O jitter:use_aes:1 if the CPU is not eaten, could you please check each rng source by enabling them by one and disabling the others. for example for hwrng it should be: # /sbin/rngd -f -d -n hwrng -x rdrand -x tpm -x nist -x jitter -x pkcs11 -x rtlsdr but this should fail on your system as it does not have hw rng. we need to identify which rng source brings up the issue. 2) then we can try to grab a core for a analysis. first we need debuginfo for glibc: # dnf debuginfo-install glibc it is possible you have older glibc installed, and it was removed from fedora repos already. in this case it is needed to also update glibc itself and reboot: # dnf upgrade glibc # dnf debuginfo-install glibc # reboot then, install debug files for rngd and jitter-lib, gdb and gcore: # dnf debuginfo-install rng-tools-6.13-2.fc34 jitterentropy-3.0.2-1.fc34 # dnf intall gdb then run rngd service or just "/sbin/rngd -f -d", wait for the issue to appear, let it run for, say, 10 seconds, eating the CPU, and then take a core: # gcore -a -o ./rngd.f <rngd PID, for example 629> [New LWP 630] [New LWP 631] [New LWP 632] [New LWP 633] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 0x00007fa2998e85bf in poll () from /lib64/libc.so.6 warning: target file /proc/629/cmdline contained unexpected null characters Saved corefile rngd.f.629 [Inferior 1 (process 629) detached] the output should be similar to this above. please, provide the resulting ./rngd.f.629 for analysis (gzip it please, it should be ~300M). 3) if the first step shows it is jitter rng source, could you please remove rng-tools-6.13-2.fc34 and jitterentropy-3.0.2-1.fc34 completely, install an older jitterentropy-2.2.0-4.fc34 from stable, install rng-tools from: https://koji.fedoraproject.org/koji/taskinfo?taskID=70532506 this is rng-tools-6.13-2 linked to an older jitter lib. after installing, you can double-check this with: # ldd usr/sbin/rngd | grep jitter libjitterentropy.so.2 => /lib64/libjitterentropy.so.2 (0x00007fd8be14b000) ^^^ ^^^ and check if the issue reproduces with the older library linked. thank you. i do understand these are a lot of steps and work, but i do not have another possibility until i can reproduce this in-house.
also, could you provide output of "uname -a", please? just to ensure we talk about the same kernel.
Created attachment 1792687 [details] requested core dump for faulty jitterentropy-3.0.2-1.fc34 and rng-tools-6.13-2.fc34
Created attachment 1792688 [details] requested core dump of revised rng-tools-6.13-2.fc34 linked against older jitterentropy-2.2.0
(In reply to Vladis Dronov from comment #1) > hello, Morris, > unfortunately, i still not able to reproduce this even on older AMD CPU > machines in our lab (but i still did not find AMD Athlon 64 one). > so, i would need your help, otherwise i cannot go further. so first, let me > share my result - the CPU contention happens entirely in > userspace, so this is rngd code or updated jitterentropy lib (v3.0.2). > > so, could you please do the following on your machine with the issue (all > should be done as root): > > 1) let's double-check it is jitter rng source (or maybe it is not). please > stop rngd service if running and run: > > # /sbin/rngd -f -d -x hwrng -x rdrand -x tpm -x nist -n jitter -x pkcs11 -x > rtlsdr -O jitter:use_aes:0 > > please, check if it starts to consume CPU again or not. then try the same > with AES: > > # /sbin/rngd -f -d -x hwrng -x rdrand -x tpm -x nist -n jitter -x pkcs11 -x > rtlsdr -O jitter:use_aes:1 The jitter source causes excessive CPU use, with and without AES. > > if the CPU is not eaten, could you please check each rng source by enabling > them by one and disabling the others. > for example for hwrng it should be: > > # /sbin/rngd -f -d -n hwrng -x rdrand -x tpm -x nist -x jitter -x pkcs11 -x > rtlsdr > > but this should fail on your system as it does not have hw rng. we need to > identify which rng source brings up > the issue. All of the other sources fail. > > 2) then we can try to grab a core for a analysis. first we need debuginfo > for glibc: > > # dnf debuginfo-install glibc > > it is possible you have older glibc installed, and it was removed from > fedora repos already. in this > case it is needed to also update glibc itself and reboot: > > # dnf upgrade glibc > # dnf debuginfo-install glibc > # reboot > > then, install debug files for rngd and jitter-lib, gdb and gcore: > > # dnf debuginfo-install rng-tools-6.13-2.fc34 jitterentropy-3.0.2-1.fc34 > # dnf intall gdb > > then run rngd service or just "/sbin/rngd -f -d", wait for the issue to > appear, let it run for, say, > 10 seconds, eating the CPU, and then take a core: > > # gcore -a -o ./rngd.f <rngd PID, for example 629> > [New LWP 630] > [New LWP 631] > [New LWP 632] > [New LWP 633] > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > 0x00007fa2998e85bf in poll () from /lib64/libc.so.6 > warning: target file /proc/629/cmdline contained unexpected null characters > Saved corefile rngd.f.629 > [Inferior 1 (process 629) detached] > > the output should be similar to this above. please, provide the resulting > ./rngd.f.629 for analysis (gzip it please, it should be ~300M). Submitted. > > 3) if the first step shows it is jitter rng source, could you please remove > rng-tools-6.13-2.fc34 and jitterentropy-3.0.2-1.fc34 completely, > install an older jitterentropy-2.2.0-4.fc34 from stable, install rng-tools > from: > > https://koji.fedoraproject.org/koji/taskinfo?taskID=70532506 > > this is rng-tools-6.13-2 linked to an older jitter lib. after installing, > you can double-check this with: > > # ldd usr/sbin/rngd | grep jitter > libjitterentropy.so.2 => /lib64/libjitterentropy.so.2 (0x00007fd8be14b000) > ^^^ ^^^ > and check if the issue reproduces with the older library linked. > > thank you. There is no excessive CPU use from your modified version of rng-tools that is linked to the older jitterentropy. Core dump submitted. This machine's packages are synchronized with Fedora 34 testing, including the running kernel, which is: Linux ... 5.12.12-300.fc34.x86_64 #1 SMP Fri Jun 18 14:30:51 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux > > i do understand these are a lot of steps and work, but i do not have another > possibility until i can reproduce this in-house. It's okay. Thank you for your efforts: it's never convenient to debug by proxy. Please let me know if you need anything else to help resolve this problem.
hello, Morris, thanks a ton for your reslults and i'm sorry i did not responded earlier. > There is no excessive CPU use from your modified version of rng-tools > that is linked to the older jitterentropy. yes, this is exactly the issue i was thinking of. so indeed, there is an issue with jitterentropy-v3. i guess, i'll raise the issue with the lib's upstream and meanwhile i'll continue looking at the core dumps. thanks again, your input is most valuable.
hello, Morris, it looks like your issue is exactly the https://github.com/smuellerDD/jitterentropy-library/issues/37 one. i've revoked f34 and f35 updates and will postpone a release until the issue in the jitterentropy-3.0.2 lib is fixed. you can check comments since https://github.com/smuellerDD/jitterentropy-library/issues/37#issuecomment-861185576 for the current status.
hello, Morris, i was talking to Neil and Stephan. there a concern arises. we are not sure how rng-tools + jitterlib-v2 were working on your old 1-CPU AMD Athlon 64 system before. please, see 1) at: https://github.com/smuellerDD/jitterentropy-library/issues/37#issuecomment-869637562 if you are still willing to help, could you please grab a jitterlib-v2 from: https://koji.fedoraproject.org/koji/taskinfo?taskID=71263992 and install it and rng-tools from f34-stable (deleting previous versions of jitterlib and rng-tools) to your AMD system and just run: # /sbin/rngd -f -n jitter and then can you post a line with "jent_entropy_init:" ? i have the following in my f34 vm: # dnf erase jitterentropy rng-tools # wget https://kojipkgs.fedoraproject.org//work/tasks/4051/71264051/jitterentropy-2.2.0-5.fc34.x86_64.rpm # dnf install jitterentropy-2.2.0-5.fc34.x86_64.rpm rng-tools # -f -n jitter Enabling 5: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: Initialized [rdrand]: Enabling RDRAND rng support [rdrand]: Initialized jent_entropy_init: ret = OK <<< THIS [jitter]: Initializing AES buffer [jitter]: Enabling JITTER rng support [jitter]: Initialized i have "jent_entropy_init: ret = OK" here, this is expected. we are very curious to learn what would it say on your system. so if you could help, this would be great. thank you. btw, meanwhile Stephan has rolled out a fixed version of jitterlib-v3. would you be able to help us to test it on your unique system too? the "test" should be just installing newer rng-tools and jitterlib-v3, running /sbin/rngd and checking if it still eats CPU. thank you.
# /sbin/rngd -f -n jitter , surely.
It appears to work, although I've never tested the output: $ sudo dnf list --installed jitterentropy rng-tools Installed Packages jitterentropy.x86_64 2.2.0-5.fc34 @@commandline rng-tools.x86_64 6.12-3.fc34 @@commandline $ sudo /sbin/rngd -f -n jitter Enabling 5: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: Initialization Failed [rdrand]: Initialization Failed jent_entropy_init: ret = OK [jitter]: Initializing AES buffer [jitter]: Enabling JITTER rng support [jitter]: Initialized [pkcs11]: No pkcs11 slots available [pkcs11]: Initialization Failed [rtlsdr]: Initialization Failed In slightly more detail: $ sudo /sbin/rngd -dtf -n jitter Enabling 5: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: read error [hwrng ]: No available rng device [hwrng ]: Initialization Failed [rdrand]: Initialization Failed [jitter]: Limiting thread count to 1 active cpus [jitter]: JITTER starts 1 threads [jitter]: CPU Thread 0 is ready [jitter]: Initializing AES buffer [jitter]: xread_jitter requests 16 bytes from pipe [jitter]: JITTER thread on cpu 0 wakes up for refill [jitter]: jent_read_entropy time on cpu 0 is 4.128277359000e+00 sec [jitter]: Writing to pipe [jitter]: xread_jitter gets 16 bytes [jitter]: xread_jitter requests 128 bytes from pipe [jitter]: xread_jitter gets 128 bytes [jitter]: xread_jitter requests 16535 bytes from pipe [jitter]: xread_jitter falls back to AES [jitter]: Enabling JITTER rng support [jitter]: xread_jitter requests 4 bytes from pipe [jitter]: xread_jitter falls back to AES [jitter]: xread_jitter requests 4 bytes from pipe [jitter]: xread_jitter falls back to AES [jitter]: Initialized [jitter]: DONE Writing to pipe with return 16535 [jitter]: JITTER thread on cpu 0 wakes up for refill [pkcs11]: No pkcs11 slots available [pkcs11]: Initialization Failed [rtlsdr]: No rtlsdr radio devices found [rtlsdr]: Initialization Failed Entering test mode...no entropy will be delivered to the kernel Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes ... Yes, I can try your modified versions of the two packages. I assume that they are related to the recent: https://bodhi.fedoraproject.org/updates/FEDORA-2021-4b1b4c2e34 . Do you have a build for Fedora 34? These types of systems are not the majority of the install base any more, but unique may be overstating the case -- I know of several similar -- my neighbor has a back-up desktop running Fedora with a slightly newer Athlon.
Now, using https://bodhi.fedoraproject.org/updates/FEDORA-2021-b30e92acb8 , I see: Installed Packages jitterentropy.x86_64 3.0.2.git.d18d5863-1.fc34 @@commandline rng-tools.x86_64 6.13.git.d207e0b6-1.fc34 @@commandline $ sudo /sbin/rngd -dtf -n jitter Enabling 6: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: read error [hwrng ]: No available rng device [hwrng ]: Initialization Failed [rdrand]: Initialization Failed [jitter]: JITTER rng fails with code -38 [jitter]: Initialization Failed [pkcs11]: No pkcs11 slots available [pkcs11]: Initialization Failed [rtlsdr]: No rtlsdr radio devices found [rtlsdr]: Initialization Failed Can't open any entropy source Maybe RNG device modules are not loaded which prevents the excessive use of the CPU, but obviously can't collect entropy. I didn't follow the discussion in great detail, but nhorman seemed to think that it should be possible to use the hardware timer on this system after all, since it seemed to be working before. Can we do so, or find another way to fix the software timer? Thanks.
hello, Morris, thanks for your update. now i probably have an idea of what's happening. would you be able to make one more simple test on your Athlon 64 machine? if you are still willing to help, could you please grab a jitterlib-v3 with debug from: https://koji.fedoraproject.org/koji/taskinfo?taskID=71519025 https://kojipkgs.fedoraproject.org//work/tasks/9132/71519132/jitterentropy-3.0.2.git.d18d5863-1.debug.fc34.x86_64.rpm delete the previous versions and install it and rng-tools from f34-testing (the one you already have) to your AMD system and just run: # /sbin/rngd -f -n jitter again? the versions should be: rng-tools-6.13.git.d207e0b6-1.fc34.x86_64.rpm (from f34-testing) jitterentropy-3.0.2.git.d18d5863-1.debug.fc34.x86_64.rpm (from koji) on my vm it is: # dnf -y erase jitter\* rng-tools\* # dnf -y install koji/jitterentropy-3.0.2.git.d18d5863-1.debug.fc34.x86_64.rpm rng-tools-6.13.git.d207e0b6-1.fc34.x86_64.rpm # /usr/sbin/rngd -f -n jitter Enabling 6: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: Initialized [rdrand]: Enabling RDRAND rng support [rdrand]: Initialized jent_time_entropy_init: enable_notime = 0 jent_get_nstime: cpuid(1,0).EDX[4] = 0x0 jent_get_nstime: defined __x86_64__ jent_get_nstime: defined __x86_64__: rdtsc = 20d38730df83 jent_entropy_init: jent_time_entropy_init(0) = 0 jent_entropy_init: jent_force_internal_timer = 0 ret = 0 jent_notime_enable: JENT_CONF_EIT TRUE, jent_force_internal_timer = 0, flags& = 0 jent_notime_enable: ret 0 [jitter]: Initializing AES buffer [jitter]: Enabling JITTER rng support [jitter]: Initialized [pkcs11]: PKCS11 Engine /usr/lib64/opensc-pkcs11.so Error: No such file or directory i'm especially interested in "jent_" lines, these should confirm or decline my idea. if it is confirmed, i'll compose a detailed wrap-up. thank you.
The requested results: $ sudo dnf list --installed jitterentropy\* rng-tools\* Installed Packages jitterentropy.x86_64 3.0.2.git.d18d5863-1.debug.fc34 @@commandline rng-tools.x86_64 6.13.git.d207e0b6-1.fc34 @@commandline $ sudo /usr/sbin/rngd -f -n jitter Enabling 6: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: Initialization Failed [rdrand]: Initialization Failed jent_time_entropy_init: enable_notime = 0 jent_get_nstime: cpuid(1,0).EDX[4] = 0x10 jent_get_nstime: defined __x86_64__ jent_get_nstime: defined __x86_64__: rdtsc = c53772376b jent_entropy_init: jent_time_entropy_init(0) = 10 jent_time_entropy_init: enable_notime = 1 jent_get_nstime: cpuid(1,0).EDX[4] = 0x10 jent_get_nstime: defined __x86_64__ jent_get_nstime: defined __x86_64__: rdtsc = c537c8381e jent_notime_enable_thread: JENT_CONF_EIT TRUE, notime_thread = 0x55d0afe10a80x jent_entropy_init: jent_time_entropy_init(1) = -38 jent_entropy_init: jent_force_internal_timer = 0 ret = -38 [jitter]: JITTER rng fails with code -38 [jitter]: Initialization Failed [pkcs11]: No pkcs11 slots available [pkcs11]: Initialization Failed [rtlsdr]: Initialization Failed Can't open any entropy source Maybe RNG device modules are not loaded Please let me know if you need any more information. Thanks for your efforts.
hello, Morris, thanks a ton for your testing, most helpful. so let me provide a summary for this issue. your older system is a great reproduser for this corner case, i just cannot express my thanks properly! jent_get_nstime: cpuid(1,0).EDX[4] = 0x10 (1) jent_get_nstime: defined __x86_64__: rdtsc = c53772376b (2) jent_entropy_init: jent_time_entropy_init(0) = 10 (3) so as debug log shows, the lib tries to use hardware RDTSC cpu timer. this command is present (1) on your system, it gives out a reasonable result (2), ... but it fails certain NIST SP 800-90B test which RNG should conform to (3). the return code 10 is: #define ERCT 10 /* RCT failed during initialization */ i'm not exactly understand what RCT test is, the only i've read is: [ https://lightshipsec.com/nist-800-90b-concepts/ ] Repetition Count Test (RCT) – the goal of the Repetition Count Test is to quickly detect catastrophic failures that cause the noise source to become “stuck” on a single output value for a long period of time. jent_notime_enable_thread: JENT_CONF_EIT TRUE, notime_thread = 0x55d0afe10a80x (1) jent_entropy_init: jent_time_entropy_init(1) = -38 (2) jent_entropy_init: jent_force_internal_timer = 0 ret = -38 [jitter]: Initialization Failed (3) so with that error code, the jitter lib switches to the notime timer emulated by a busy-loop (1), which requires at least 2 cpu cores, one for a busy-loop and another for a jitter processing. otherwise the system constantly hits 100% cpu utilization, exactly as the initial issue reported. as there is only 1 cpu core, the notime timer is not usable (2) and the whole jitter lib fails (3), as it has no hi-res timer. the question was: why this was not the case for the jitterlib-v2. and my research shows that jitterlib-v2 was not using RDTSC on x86_64, but OS' clock_gettime(), which _may_ use RDTSC, but also _may_not_. and so, clock_gettime() can be using an interrupt-driven timer, which could be passing all the checks. so the proper fix, as i see it, is to return the clock_gettime() time source to jitter-lib-v3, so it tries to use it if RDTSC fails for whatever reason. with that, i would like to ask you to make another simple test, i do hope this would be the last one: 1) could you please share OS' clock sources on your Athlon 64 machine? My machine shows: # grep . /sys/devices/system/clocksource/clocksource*/{available,current}_clocksource /sys/devices/system/clocksource/clocksource0/available_clocksource:kvm-clock tsc acpi_pm /sys/devices/system/clocksource/clocksource0/current_clocksource:kvm-clock so if you see "current_clocksource:tsc" - this is not going to work out. you may want to change the clocksource to acpi_pm/jiffies/hpet/etc with either: [ https://www.kernel.org/doc/Documentation/admin-guide/kernel-parameters.txt ] clocksource= Override the default clocksource [all] jiffies (this is the base, fallback clocksource) [ACPI] acpi_pm [X86-64] hpet,tsc or: # echo [whatever not tsc] > /sys/devices/system/clocksource/clocksource0/current_clocksource # cat /sys/devices/system/clocksource/clocksource0/current_clocksource # double-check 2) then please install: https://koji.fedoraproject.org/koji/taskinfo?taskID=71715840 https://koji.fedoraproject.org/koji/taskinfo?taskID=71710096 run: # /usr/sbin/rngd -f -n jitter and please share the output with the "jent_" debug lines. i do not expect smth goes wrong, but surely, there may be bugs in my code (though, i've tested it). if it is as i think of it, the clock_gettime() time source should work fine, just as with jitter-v2. it is it so, i'll open an issue/pr with the jitterlib upstream. it happende so that Stephan has released the lib v3.1.0 with significant changes, so probably it'll take some time for me to rebase my clock_gettime() code and open a pr. thank you again!
On this system, the only available clocksources are: $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource hpet acpi_pm and the default is: $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource hpet because the kernel has marked the tsc as unstable: tsc: Fast TSC calibration using PIT tsc: Detected 2210.149 MHz processor tsc: Marking TSC unstable due to TSCs unsynchronized and does not make the other timers (PIT, etc.) available. This hpet is: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 76450417870 ns hpet: 3 channels of 0 reserved for per-cpu timers hpet0: at MMIO 0xfed00000, IRQs 2, 8, 31 hpet0: 3 comparators, 32-bit 25.000000 MHz counter I tried to install your test packages: https://kojipkgs.fedoraproject.org//work/tasks/5862/71715862/jitterentropy-3.0.2.git.d18d5863-2.debug.fc34.x86_64.rpm https://kojipkgs.fedoraproject.org//work/tasks/162/71710162/rng-tools-6.13.git.d207e0b6-2.fc34.x86_64.rpm but it seems that your rng-tools package is linked to an older version of jitterentropy, causing a conflict: $ sudo dnf install rng-tools-6.13.git.d207e0b6-2.fc34.x86_64.rpm jitterentropy-3.0.2.git.d18d5863-2.debug.fc34.x86_64.rpm Error: Problem: cannot install both jitterentropy-3.0.2.git.d18d5863-2.debug.fc34.x86_64 and jitterentropy-2.2.0-4.fc34.x86_64 - package rng-tools-6.13.git.d207e0b6-2.fc34.x86_64 requires libjitterentropy.so.2()(64bit), but none of the providers can be installed - conflicting requests (try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages) Do I have the right packages? Was it your intention that I should test them separately? It surprises me that upstream would switch to a rdtsc* from the more portable clock_gettime(), since fewer systems have tscs, and some of those are not invariant. Are we sure that we can revert to the latter -- was the change simply for performance reasons, or correctness? Thanks.
(In reply to bf2006a from comment #18) thanks a ton - again! your update is the most useful one indeed. > because the kernel has marked the tsc as unstable: > tsc: Fast TSC calibration using PIT > tsc: Detected 2210.149 MHz processor > tsc: Marking TSC unstable due to TSCs unsynchronized this could explain why jitterlib-v3 also considers tsc as not usable. hpet or acpi_pm as the clocksource0 should both work just fine for our tests. > but it seems that your rng-tools package is linked to an older version of > jitterentropy, causing a conflict: yes, indeed, my fault. i've lost a little bit with all that builds, git repos and IDE windows with a code. *sigh*. i apologize. > Do I have the right packages? Was it your intention that I should test them > separately? these ones: rng-tools from f34-testing repo and another debug build of jitterentropy, v3.1.0-latest this time. this time i've tested they work together. https://kojipkgs.fedoraproject.org/packages/rng-tools/6.13.git.d207e0b6/1.fc34/x86_64/rng-tools-6.13.git.d207e0b6-1.fc34.x86_64.rpm https://kojipkgs.fedoraproject.org/work/tasks/7538/71837538/jitterentropy-3.1.0.git.c29e592e-1.debug.fc34.x86_64.rpm please use the same: # /usr/sbin/rngd -f -n jitter if it still fails (should not), please try to change the clocksource0. > It surprises me that upstream would switch to a rdtsc* from the more > portable clock_gettime(), since fewer systems have tscs, and some of those > are not invariant. Are we sure that we can revert to the latter -- was the > change simply for performance reasons, or correctness? honestly, i'm not sure what Stefan had in mind, i guess, performance reasons indeed. also most of the modern and a somehow older CPUs have a proper RDTSC implemented, so your machine is indeed a corner case. i guess, we may discuss the reason of dropping clock_gettime() in a PR.
> https://kojipkgs.fedoraproject.org/packages/rng-tools/6.13.git.d207e0b6/1. > fc34/x86_64/rng-tools-6.13.git.d207e0b6-1.fc34.x86_64.rpm > https://kojipkgs.fedoraproject.org/work/tasks/7538/71837538/jitterentropy-3. > 1.0.git.c29e592e-1.debug.fc34.x86_64.rpm > > please use the same: > > # /usr/sbin/rngd -f -n jitter > With the above packages, I see this: $ sudo /sbin/rngd -f -n jitter Enabling 6: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: Initialization Failed [rdrand]: Initialization Failed jent_entropy_init: jent_has_hwtime() = 1 jent_time_entropy_init: TIMER_HARDWARE = 0000022c6eff2087 jent_collector_alloc: ALL OK jent_entropy_init: jent_time_entropy_init(HARDWARE) = 0 jent_collector_alloc: ALL OK [jitter]: Initializing AES buffer [jitter]: Enabling JITTER rng support [jitter]: Initialized [pkcs11]: No pkcs11 slots available [pkcs11]: Initialization Failed [rtlsdr]: Initialization Failed ... or, in slightly more detail: $ sudo /sbin/rngd -dtf -n jitter Enabling 6: JITTER Entropy generator (jitter) Initializing available sources [hwrng ]: read error [hwrng ]: No available rng device [hwrng ]: Initialization Failed [rdrand]: Initialization Failed [jitter]: Limiting thread count to 1 active cpus [jitter]: JITTER attempting to start 1 threads [jitter]: CPU Thread 0 is ready [jitter]: Initializing AES buffer [jitter]: xread_jitter requests 16 bytes from pipe [jitter]: JITTER thread on cpu 0 wakes up for refill [jitter]: jent_read_entropy time on cpu 0 is 9.731980870000e+00 sec [jitter]: Writing to pipe [jitter]: xread_jitter gets 16 bytes [jitter]: xread_jitter requests 128 bytes from pipe [jitter]: xread_jitter gets 128 bytes [jitter]: xread_jitter requests 16535 bytes from pipe [jitter]: xread_jitter falls back to AES [jitter]: Enabling JITTER rng support [jitter]: xread_jitter requests 4 bytes from pipe [jitter]: xread_jitter falls back to AES [jitter]: xread_jitter requests 4 bytes from pipe [jitter]: xread_jitter falls back to AES [jitter]: Initialized [jitter]: DONE Writing to pipe with return 16535 [jitter]: JITTER thread on cpu 0 wakes up for refill [pkcs11]: No pkcs11 slots available [pkcs11]: Initialization Failed [rtlsdr]: No rtlsdr radio devices found [rtlsdr]: Initialization Failed Kernel entropy pool size 4096, pool watermark 3072 Entering test mode...no entropy will be delivered to the kernel Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES Reading entropy from JITTER Entropy generator [jitter]: xread_jitter requests 2500 bytes from pipe [jitter]: xread_jitter falls back to AES ... Thanks again for your work. Please let me know if you need any more information.
hello, Morris, so it looks like the latest v3.1.0 changes made by Stephan have eliminated the issue: jent_entropy_init: jent_has_hwtime() = 1 jent_time_entropy_init: TIMER_HARDWARE = 0000022c6eff2087 jent_collector_alloc: ALL OK jent_entropy_init: jent_time_entropy_init(HARDWARE) = 0 i'm still suggesting my changes upstream, as there may be other corner cases: https://github.com/smuellerDD/jitterentropy-library/pull/57 i'm going to build f35/34/33 packages now. thank you for all your help!
FEDORA-2021-df20b5de72 has been pushed to the Fedora 35 stable repository. If problem still persists, please make note of it in this bug report.
hello, Morris, so, the latest rng-tools and jitterentropy-lib are in updates-testing now. i hope these are the final versions as of now: https://bodhi.fedoraproject.org/updates/?packages=rng-tools unfortunately, a formal downgrade ("dnf distro-sync") is needed. in reality the latest packages are the higher versions, according to upstream commits, despite dnf stating otherwise. alternatively, one could wait for these versions to appear in stable. also, let me note again, with the introduction of jitter-rng in the kernel as of v5.4-rc1 by 50ee7529ec45 ("random: try to actively add entropy rather than passively wait for it") we generally have enough entropy in all cases and do not need rngd to run in userspace anymore. Thus Fedora and RHEL have removed rng-tools from the installed-by-default standard and minimal package sets. i believe your system would run just fine without rngd/jitter-lib (unless it requires massive amounts of entropy for some calculations). with that, i'm closing this bz as CURRENTRELEASE. please, feel free to reopen if any outstanding concerns.
I'm a bit confused as to which version of libjitterentropy on fc33 is supposed to contain a fix for this. I am observing something very similar on an aarch64 machine using jitterentropy-3.0.2-2.git.409828cf.fc33.aarch64 where rngd is taking all the CPU for a while on startup, but eventually calms down. perf shows all the time being spent in libjitterentropy: 22.80% rngd libjitterentropy.so.3.1.0 [.] keccakp_chi 19.30% rngd libjitterentropy.so.3.1.0 [.] keccakp_theta 18.31% rngd libjitterentropy.so.3.1.0 [.] rol64 14.52% rngd libjitterentropy.so.3.1.0 [.] jent_memaccess 9.81% rngd libjitterentropy.so.3.1.0 [.] keccakp_rho 6.33% rngd libjitterentropy.so.3.1.0 [.] keccakp_pi 1.92% rngd libjitterentropy.so.3.1.0 [.] ptr_to_le32 1.05% rngd libjitterentropy.so.3.1.0 [.] sha3_init 0.71% rngd libjitterentropy.so.3.1.0 [.] sha3_fill_state 0.70% rngd libjitterentropy.so.3.1.0 [.] keccakp_iota 0.57% rngd libjitterentropy.so.3.1.0 [.] keccakp_1600 0.45% rngd libjitterentropy.so.3.1.0 [.] le32_to_ptr if this version of jitterentropy is supposed to contain the fix I can open another issue.
(In reply to Ralf Ertzinger from comment #24) > jitterentropy-3.0.2-2.git.409828cf.fc33.aarch64 where rngd is taking all the > CPU for a while on startup, but eventually calms down. perf shows all the > time being spent in libjitterentropy hello, Ralf, i believe what your describe is a different issue, not the one discussed in this bz. for your issue, having libjitterentropy code consuming 100% CPU (actually, 100% of upto 4 CPU cores) for some shorter time at startup is normal and expected - this is exactly how initial jitter entropy is gathered - and so this is not a bug. in case you have other sources of entropy on your system (/dev/hwrng, RDRAND @ x86_64, RNDR @ ARM v8.5A, etc) you can safely disable the jitter entropy source at all by adding "-x jitter" to the "rngd" command line. and so avoid this CPU usage spike. this can be done by editing /etc/sysconfig/rngd since rng-tools-6.14-1.git.56626083 or /usr/lib/systemd/system/rngd.service for earlier release of rng-tools.