Bug 1776710
| Summary: | [RHEL8]Bootup is not yet finished for waiting rngd.service start up | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Frank Liang <xiliang> |
| Component: | rng-tools | Assignee: | Neil Horman <nhorman> |
| Status: | CLOSED ERRATA | QA Contact: | Frank Liang <xiliang> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 8.2 | CC: | core-kernel-mgr, leiwang, linl, ribarry, vkuznets, wshi, ymao |
| Target Milestone: | rc | Keywords: | Regression |
| Target Release: | 8.2 | ||
| Hardware: | aarch64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | rng-tools-6.8-2.el8 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-04-28 16:06:42 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Frank Liang
2019-11-26 08:28:44 UTC
This works just fine on a physical system in beaker installed with RHEL8, and on my local RHEL8 system (both x86_64). Can I get access to the system that this is failing on to see whats going on. Just got hold of an aarch64 system, and it seems to work fine there too, so I'm not sure whats going on in your environment. Note that there can be a pipe write failure during shutdown of rngd as we are closing the pipe, but the message is harmless, as the daemon is shutting down. It may be, that on aarch64 systems, rngd is blocked on shutdown as the daemon is executing inside the jitterentropy library, which takes a long time to compute entropy. I do have an upstream fix for that. Does this problem happen consistently, or only occasionally? https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24969019 If you could, test this package out please. It incorporates the faster exit patch from upstream. I think what you're seeing is a problem in systemd-analyze, but this will likely help avoid it. (In reply to Neil Horman from comment #3) > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=24969019 > > If you could, test this package out please. It incorporates the faster exit > patch from upstream. I think what you're seeing is a problem in > systemd-analyze, but this will likely help avoid it. I can always reproduce this issue in a1.xlarge instance(including your scratch build), and I sent the access information to you via mail, you can debug in online system. Thanks Ok, I've taken a look, and it I think it appears that what were seeing is a combination of a few issues: 1) The fact that the initramfs copy of rngd is getting shut down (which is normal), and issuing a warning about pipe write failures (which is not). This is due to us checking errno on the failure of a write call but in a location where it may have changed 2) The time it takes for jitter to generate entropy on a system without any other source of entropy can be long, leading to systemd-analyze to wait a long time for boot to finish I can fix these updstream if this bz gets approved, but note, to test it, you will not only need to install the new rng-tools package, but you will need to regenerate all the initramfs's so that the new rngd gets picked up properly *** Bug 1776086 has been marked as a duplicate of this bug. *** https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=25085017 This appears to fix the problem, please confirm. Note: To properly test this, you will have to not only install the new rpm, but regenerate the initramfs for the running kernel (In reply to Neil Horman from comment #7) > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=25085017 > > This appears to fix the problem, please confirm. > > Note: To properly test this, you will have to not only install the new rpm, > but regenerate the initramfs for the running kernel Thanks Neil, It is working now and cannot reproduce this issue again with your new build. But there is a new error found: Dec 03 05:58:26 ip-10-116-2-232.us-west-2.compute.internal systemd[1]: /usr/lib/systemd/system/rngd.service:11: Failed to parse TimeoutStartSec= parameter, ignoring: "60s" In /usr/lib/systemd/system/rngd.service, seems no need '"' to wrap the value. The error disappeared after I changed it. [root@ip-10-116-2-119 ec2-user]# diff -u rngd.service.orig /usr/lib/systemd/system/rngd.service --- rngd.service.orig 2019-12-03 06:09:11.130429909 +0000 +++ /usr/lib/systemd/system/rngd.service 2019-12-03 06:01:41.713624471 +0000 @@ -8,7 +8,7 @@ CapabilityBoundingSet=CAP_SYS_ADMIN AmbientCapabilities=CAP_SYS_ADMIN ExecStart=/sbin/rngd -f --fill-watermark=0 -TimeoutStartSec="60s" +TimeoutStartSec=60s copy that, thanks! Requesting exception for 8.2. On arm systems using the jitterentropy source can cause additional boot delays, which can be seen as a regression (as noted here). The fix is self contained and straightforward. In response to comment 10: 1) The impact is that systems may not receive as much entropy on bootup leading to boot time delays, and potentially shutdown delays, when systemd waits for the entire service stop period before killing the rngd process 2) The risk is low, this fix is self contained to rngd, and will not impact other components. 3) No other details to provide, but will be on the blocker meeting calls and am happy to answer specific questions regarding this fix. (In reply to Neil Horman from comment #11) > In response to comment 10: > 1) The impact is that systems may not receive as much entropy on bootup > leading to boot time delays, and potentially shutdown delays, when systemd > waits for the entire service stop period before killing the rngd process > > > 2) The risk is low, this fix is self contained to rngd, and will not impact > other components. > > 3) No other details to provide, but will be on the blocker meeting calls and > am happy to answer specific questions regarding this fix. Hi Neil, From qe perspective, customer usually comparing different distributions' boot time by 'systemd-analyze'(1446698 and 1446688). So we added such cases in our automation after that. And there are some bugs got fixed in the past, eg. bug 1625874 in RHEL8.0 and 1591542 in RHEL7. So I prefer to add it to RHEL8.2 to avoid deliver a bug again product. Thanks Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1762 |