Bug 1956248

Summary: rngd uses 100% CPU while in a yield() loop
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: rng-toolsAssignee: Vladis Dronov <vdronov>
Status: VERIFIED --- QA Contact: Vilém Maršík <vmarsik>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.3CC: core-kernel-mgr, Roel.Teuwen, rvr, skozina, vdronov, vmarsik
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: rng-tools-6.8-5.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1966437 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1966437    

Description Renaud Métrich 2021-05-03 10:01:29 UTC
This bug was initially created as a copy of Bug #1781346

I am copying this bug because: 

Some customers hit this on RHEL8 as well.
Backtrace of 1 coredump taken shows rngd spins on sched_yield():
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) list
447		cpus = NULL;
448	
449		/* Make sure all our threads are doing their jobs */
450		for (i=0; i < num_threads; i++) {
451			while (tdata[i].active == 0)
452				sched_yield();               <<<<<< HERE
453			message(LOG_DAEMON|LOG_DEBUG, "CPU Thread %d is ready\n", i);
454		}
455	
456		flags = fcntl(pipefds[0], F_GETFL, 0);

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Please backport Upstream commit 62aa34f8f8a567b99b03079aebda8bac71c334bd.


Description of problem:

On my laptop (but not my desktop - related to TPM?) rngd spends 100% of CPU time and keeps the fan running.

Interim solution: kill it.

Version-Release number of selected component (if applicable):

rng-tools-6.7-2.fc30.x86_64

How reproducible:


Steps to Reproduce:
1. Boot up-to-date Fedora 30
2. Wait
3. Profit! If you're the power company, or a fan manufacturer, that is.

Actual results:

A load average of 1.0 when the laptop is otherwise idle, and a noisy spinning fan


Expected results:


Maybe rngd can _occasionally_ spin in a yield loop for jitter entropy or whatever, but doing it enough that the fan stays on for hours at a time isn't great.

Additional info:

As mentioned, this seems to be somewhat hw-specific for unknown reasons. It doesn't happen on my desktop, despite rng-tools being installed there too, and the configuration being pretty much the same (ie up-to-date F30 with my -git kernel, of course).

Comment 2 Vladis Dronov 2021-05-04 14:13:02 UTC
hello, Renaud,
thank you for bringing this up. indeed, 62aa34f8f8a5 ("Move jitter to use proper state locking/thread signaling")
is in the v6.9 of rng-tools while RHEL8 has 6.8-3.el8 version. we will look into backporting the fix or rebasing
rng-tools in the RHEL8.5 and probably earlier minor versions.

Comment 4 Vladis Dronov 2021-05-04 14:15:05 UTC
mis-click:

Comment 6 Vladis Dronov 2021-05-26 09:25:41 UTC
hello, Renaud,
i've build a test package with a fix backported. could you please test it on your laptop where the bug reproduces
reliably? currently this package is on testing, but i believe it is good enough to become a released version.
http://people.redhat.com/~vdronov/bz1956248/

Comment 8 Renaud Métrich 2021-05-26 18:46:55 UTC
Hi Vladis,

Unfortunately I cannot reproduce, nor the customer that reported this.
I asked him to forward the package to his own customers that may be experiencing this.

Renaud.

Comment 9 Vladis Dronov 2021-05-27 13:50:13 UTC
Thanks, Renaud.
Meanwhile I'll work on getting the fixed package to the RHEL-8.5 and probably on a backport to -8.4.

Comment 16 Vilém Maršík 2021-06-01 16:16:09 UTC
Hello,
I cannot reproduce the problem on x86_64 (intel-denlow-r-02) with rng-tools-6.8-4.el8.x86_64 - latest non-fixed version (bug description mentions rng-tools-6.7-2.fc30.x86_64, but it is not available in RHEL). Do we know how to reproduce this bug, or are we okay with SanityOnly?

Comment 17 Vladis Dronov 2021-06-03 12:55:10 UTC
hello, Vilem,
rng-tools-6.8-4.el8.x86_64 actually has this bug fixed, -5.el8 version adds a small "fix for a fix" only.
the problem version is 6.8-3.el8. unfortunately, as reporter mentions, the issue reproduces only at
his laptop, i.e. on a certain hw configuration. the guess is that it can be related to a presence of
a TPM module (see #c0). with that, the SanityOnly would be fine, as the fix itself is simple and
straightforward and it was tested in Fedora since v6.9 (i.e. for a couple of years). thank you.

Comment 18 Vilém Maršík 2021-06-03 14:54:16 UTC
Okay, let's consider this verified + sanityonly.

Comment 19 Roel Teuwen 2021-06-10 11:20:16 UTC
We're actually hitting this on many RHV VMs with RHEL8.3 and 8.4

Latest rng-tools available in release channels is rng-tools-6.8-3.el8.x86_64

Comment 20 Vladis Dronov 2021-06-10 15:56:25 UTC
thank you for an update, Roel.
the rng-tools-6.8-5.el8 update with the fix is being pushed to the RHEL-8.4.z repos currently.