Bug 2137725

Summary: [Azure][rng-tools] rngd daemon has jitter initialization failure
Product: Red Hat Enterprise Linux 9 Reporter: Li Tian <litian>
Component: rng-toolsAssignee: Vladis Dronov <vdronov>
Status: CLOSED NOTABUG QA Contact: Li Tian <litian>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.2CC: akborder, core-kernel-mgr, gveitmic, litian, xuli, yacao, yuxisun
Target Milestone: rcKeywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-12-18 20:22:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Li Tian 2022-10-26 02:16:47 UTC
Description of problem:
On latest RHEL 9.2 which uses rng-tools-6.15-2.el9.x86_64, there's an initialization failure like below:

Oct 25 21:12:29 localhost.localdomain rngd[757]: [rdrand]: Enabling RDSEED rng support
Oct 25 21:12:29 localhost.localdomain rngd[757]: [rdrand]: Initialized
Oct 25 21:12:29 localhost.localdomain rngd[757]: [jitter]: JITTER timeout set to 5 sec
Oct 25 21:12:29 localhost.localdomain rngd[757]: [jitter]: Initializing AES buffer
Oct 25 21:12:34 localhost.localdomain rngd[757]: [jitter]: Unable to obtain AES key, disabling JITTER source
Oct 25 21:12:34 localhost.localdomain rngd[757]: [jitter]: Initialization Failed
Oct 25 21:12:34 localhost.localdomain rngd[757]: Process privileges have been dropped to 2:2

Version-Release number of selected component (if applicable):
rng-tools-6.15-2.el9.x86_64
jitterentropy-3.4.1-1.el9.x86_64
RHEL 9.2 (kernel 5.14.0-176.el9.x86_64)

How reproducible:
100% on Azure

Steps to Reproduce:
1. # grep rngd /var/log/messages
Or
2. # systemctl status rngd

Actual results:
Initialization failure.
Expected results:
No such failure.

Additional info:
1. Issue persists with rng-tools-6.15-2.el9.x86_64 and jitterentropy-3.4.0-1.el9.x86_64 on RHEL 9.2.
2. Issue does not present with rng-tools-6.15-1.el9.x86_64 and jitterentropy-3.4.0-1.el9.x86_64 on RHEL 9.1 (kernel 5.14.0-162.6.1.el9_1.x86_64).
3. Issue does not present with rng-tools-6.15-1.el9.x86_64 and jitterentropy-3.4.0-1.el9.x86_64 on RHEL 9.2.
4. Issue does not present on Hyper-V.

Comment 1 Vladis Dronov 2022-10-31 13:24:16 UTC
hi, Mr.Li,
thank you for reporting this. while i'm not yet sure about a root cause and a fix, can you please do the following test on an affected system?

1) stop rngd service
2) add "-O jitter:timeout:60" option  to rngd command line in /etc/sysconfig/rngd ()
3) start rngd service and check a log file as before

thank you! unfortunately, i cannot reproduce this on my local qemu VM.

Comment 2 Vladis Dronov 2022-10-31 13:33:21 UTC
background information: 

rng-tools-6.15-2.el9 (baad) - git:6dcc9ec2
rng-tools-6.15-1.el9 (good) - git:172bf0e3

so probably this is c29424f10a0d ("Adjust jitterentropy library to timeout/fail on long delay") and friends

Comment 3 Vladis Dronov 2022-10-31 16:28:23 UTC
hi, Mr.Li,
also, can you add "-v" to runs with and without  "-O jitter:timeout:60" option?
this may give us more detailedd debug output in a log. thanks!

Comment 4 Li Tian 2022-11-01 02:01:56 UTC
(In reply to Vladis Dronov from comment #3)
> also, can you add "-v" to runs with and without  "-O jitter:timeout:60"
> option?

Here are the results you asked for. And Seems like extending timeout gets rid of the issue.

-O jitter:timeout:60:
Oct 31 21:26:18 LISAv2-litian-TO23-1101011902-role-0 rngd[6480]: [rdrand]: Enabling RDSEED rng support
Oct 31 21:26:18 LISAv2-litian-TO23-1101011902-role-0 rngd[6480]: [rdrand]: Initialized
Oct 31 21:26:18 LISAv2-litian-TO23-1101011902-role-0 rngd[6480]: [jitter]: JITTER timeout set to 60 sec
Oct 31 21:26:18 LISAv2-litian-TO23-1101011902-role-0 rngd[6480]: [jitter]: Initializing AES buffer
Oct 31 21:26:23 LISAv2-litian-TO23-1101011902-role-0 rngd[6480]: [jitter]: Enabling JITTER rng support
Oct 31 21:26:23 LISAv2-litian-TO23-1101011902-role-0 rngd[6480]: [jitter]: Initialized

Are you sure -v means verbose in this config file? rngd is not running when I add it. --verbose doesn't work either. It happens on 6.15-1 and 2.
Let me know if you need anything else.

BR,
Li Tian

Comment 5 Vladis Dronov 2022-11-01 15:49:16 UTC
1) yep, my bad, i'm sorry. the real switch for debug output is "-d", not "-v".

2) as for the issue, a helping timeout means that this VM is just too slow to produce the needed entropy by the jitter source.
jitter here means a jitter in actions of a process scheduler, so not many events are happening on the system to produce entropy.
this is not a bug and there is nothing to fix - your system just do not have enough events with jitter producing entropy. 

solutions i can think of:

1) use bigger timeout, just as I've suggested. we may still run into an issue when a system is so idle
that it is still not enough jitter to produce entropy even with an increased timeout. this is fine, since other entropy sources
are present in your system: "[rdrand]: Enabling RDSEED rng support".

2) just disable jitter entropy source by adding "-x jitter" to /etc/sysconfig/rngd. this is fine, since other entropy sources
are present in your system: "[rdrand]: Enabling RDSEED rng support".

Comment 6 Li Tian 2022-11-08 07:36:11 UTC
In this case I think we can close this as WONTFIX. But in QE's perspective there ought to be an official solution when customers see this.

Germano, do you think we should have a KCS on this?

Comment 7 Germano Veit Michel 2022-11-08 23:21:28 UTC
Thanks Li Tian, KCS is done.