|Summary:||Entropy pool not updated (/dev/random blocks)|
|Product:||Red Hat Enterprise Linux 3||Reporter:||Aaron Straus <aaron>|
|Component:||kernel||Assignee:||Ernie Petrides <petrides>|
|Status:||CLOSED ERRATA||QA Contact:|
|Version:||3.0||CC:||guillaume.berche, hudson, jrichard, leonard-rh-bugzilla, petrides, redhat, riel, sopwith, walter|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2005-07-22 20:42:47 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Aaron Straus 2004-03-01 18:44:24 UTC
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20040130 Firebird/0.7 Description of problem: We have a number of Dell PowerEdge 1750s. All are configured identically. On one machine we've gotten it to a state where the entropy pool is never updated so /dev/random always blocks on reads. No process that I can see is reading /dev/random. /dev/random has worked on this machine in the past. We have an identically configured machine where /dev/random will return random bytes. It does not help to do disk accesses or use the keyboard & mouse. I have not tried rebooting yet, my guess is it will fix it? Version-Release number of selected component (if applicable): kernel-2.4.21-9.0.1.EL How reproducible: Sometimes Steps to Reproduce: 1. cat /dev/random | od 2. wiggle mouse, type on keyboard, do disk accesses 3. Actual Results: no output from cat Expected Results: on an identically configured machine random bytes are printed Additional info: All the machines have SCSI drives.
Comment 2 Aaron Straus 2004-03-06 18:04:03 UTC
We had to reboot the machine today. /dev/random is now fine.
Comment 3 Stefan Hudson 2004-03-11 16:55:09 UTC
Just had the same problem on a Dell Poweredge 2550 with an LSI RAID card. Kernel kernel-smp-2.4.21-9.EL. Rebooting corrected it as well. Additional information: "service random stop; service random start" did not help. /dev/urandom does not block.
Comment 4 Stefan Hudson 2004-03-16 23:45:27 UTC
It happened again. Reboot fixed it again. Kernel 9.0.1-smp this time. This is causing downtime on a production server. What should I look for if (when) it happens again to provide additional information?
Comment 5 Mark DeWandel 2004-03-17 14:47:50 UTC
*** Bug 101266 has been marked as a duplicate of this bug. ***
Comment 6 yuval yeret 2004-03-22 09:48:21 UTC
Reproduced on 2.4.21-9 (9.0.1-smp) on two different machines (supermicro p4 dual-xeon with HT with qlogic2300 HBAs, supermicro p4 dual-xeon with HT with 3ware IDE RAID)
Comment 7 Stefan Neufeind 2004-03-22 20:36:18 UTC
also have a look here. seems like a similar problem, here on a Fedora-Core1-system: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=118921
Comment 8 Leonard den Ottolander 2004-03-25 19:53:25 UTC
AFAICT /dev/random running out of entropy is expected behaviour. It needs input for new entropy. This is why /dev/urandom is there. I would say NOTABUG, but wont close as I am not 100% sure.
Comment 9 Leonard den Ottolander 2004-03-25 20:04:03 UTC
man 4 random states: When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered. I.e. known behaviour. Closing NOTABUG.
Comment 10 Aaron Straus 2004-03-25 20:29:24 UTC
I disagree. It is true that /dev/random __should__ block until there is sufficient entropy. However, the problem is that the entropy pool is __never__ refilled. Moving the mouse, typing on the keyboard and disk activity should all refill the pool and you should get bytes out of /dev/random at that point. On my system this never happened. Also nothing was reading /dev/random, so nothing was draining the entropy pool. I believe this is a bug?
Comment 11 Elliot Lee 2004-03-25 20:42:24 UTC
What aaron said. The problem is that it blocks forever. It does not gather additional environmental noise and pass it to the app trying to do the read.
Comment 12 Mark DeWandel 2004-03-25 20:58:46 UTC
This is undoubtedly a bug and will be fixed in RHEL3 U3. I have back-ported changes from 2.6 which will be committed after U3 opens. The problem is that critical data structures are completely unguarded by locks and consequently end up in a state in which no entropy is generated on SMP systems. Has anyone reproduced this problem on a uniprocessor?
Comment 13 Leonard den Ottolander 2004-03-25 22:24:01 UTC
Sorry for this. I got confused by the fact that bug 118921, which refereneces this bug, only states that /dev/random blocks when it runs out of entropy. That is expected behaviour. Instead of closing this bug I should have looked more closely before doing so. Different reporters have different issues. Again, sorry. It hopefully won't happen again.
Comment 14 Stefan Neufeind 2004-03-25 23:02:07 UTC
The problem of not having the entropy pool fill again occured for me on a uniprocessor machine, so this is not (exclusively) SMP-related. Maybe there are two issues with the same effect?
Comment 15 Mark DeWandel 2004-03-30 18:41:49 UTC
Yes, I believe that missed wakeups are also a possibility. However, I have had no success reproducing this on any in-house machine other than production servers. Consequently, I am making a test kernel available for testing on my Red Hat "people" page. The URL is http://people.redhat.com/~mdewand/.dev_random/. Here you will find the following two choices for download: kernel-2.4.21-12.EL.mdewand.rand.1.i686.rpm kernel-smp-2.4.21-12.EL.mdewand.rand.1.i686.rpm These kernels contain changes to the /dev/random driver back-ported from 2.6. I would appreciate any feedback that anyone can provide regarding their experiences with either of these kernels.
Comment 16 Ernie Petrides 2004-03-31 22:35:43 UTC
*** Bug 119526 has been marked as a duplicate of this bug. ***
Comment 17 Jim Richard 2004-04-05 01:02:12 UTC
Ernie, Thanks for including my bug in this sorry I missed it when I searched for it. Just a reminder I see this on RH 8 systems as well, though I'm aware that they are out of support. So this problem probably exists in other kernels as well.
Comment 18 Mark DeWandel 2004-04-06 12:26:47 UTC
Any feedback regarding the RPMs I posted a week ago?
Comment 19 Stefan Neufeind 2004-04-06 13:45:03 UTC
Sorry, currently have no chance to reproduce this on a server, since they are production and I don't have adequat test-equipment here at the moment.
Comment 20 Stefan Hudson 2004-04-06 15:33:47 UTC
They were installed on the server that had the problem last week (wednesday, I think). So far so good, but the problem was rare enough that it will be a few weeks before I can comfortably say it's fixed. If it does turn out to be a fix, can you provide patched kernels for any kernel updates until it's included in U3?
Comment 21 Stefan Neufeind 2004-04-06 15:44:10 UTC
Hi Stefan H., could you maybe do/try some stresstesting? I'm thinking about reading from /dev/random to /dev/null until it's empty or so. It should imho be possible to read faster from /dev/random than the entropy-pool can fill up again. And then we could see if at the point where the pool is exhausted new entropy is still gennerated. by the way: You're also running it on a server without kbd/mouse? And does the disk have few/high hdd-activity?
Comment 23 Stefan Hudson 2004-05-10 22:21:40 UTC
Sorry for not following up on this sooner. We have tested /dev/random a number of times over the last few weeks as you describe, and the entropy pool always fills back up correctly now after being exhausted. The server has a keyboard and mouse attached through a KVM, but it is not selected most of the time - someone logs into it for a few minutes every couple days on average. Did this patch make it into 9.0.3? Or do we need to wait for RHEL3-U3 to get it in the mainline kernel?
Comment 24 Ernie Petrides 2004-05-11 04:01:21 UTC
The fixes for this problem that Mark DeWandel back-ported from 2.6 have just been committed to the RHEL3 U3 patch pool this evening (in kernel version 2.4.21-15.3.EL).
Comment 25 Ernie Petrides 2004-05-11 04:16:19 UTC
Stefan, just to clarify, the fix did *not* make it into -9.0.3.EL nor into -15.EL (the U2 kernel). Thus, the first officially supported RHEL3 kernel with the fix will be the U3 kernel.
Comment 26 Stefan Neufeind 2004-05-11 05:21:44 UTC
Will these fixes also soon be ported over to Fedora? Anything known about their next kernel-release that might include this? Thank you guys for taking this bug seriously!
Comment 27 Ernie Petrides 2004-05-11 18:38:10 UTC
Stefan, my understanding is that the fixes came from 2.6, which is what Fedora (as of FC2) is based on. So I'd guess the fixes are there already. If you need me to check out a specific FC kernel version to verify that the fixes are contained there, please let me know. I'll attach the RHEL3 U3 patch that I committed last night in the next comment for reference.
Comment 28 Ernie Petrides 2004-05-11 18:40:05 UTC
Created attachment 100157 [details] /dev/random driver fixes committed in RHEL3 U3
Comment 29 John Flanagan 2004-09-02 04:31:06 UTC
An errata has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-433.html
Comment 30 Stefan Neufeind 2004-09-02 05:40:20 UTC
Thank you very much. Does somebody know when ports of these fixes will occur in Fedora Core 2?
Comment 31 Leonard den Ottolander 2004-09-02 10:24:21 UTC
Stefan, your question in comment #30 are already answered in comment #27.
Comment 32 Jos Martin 2004-11-25 18:39:27 UTC
Still seeing this on 2.4.21-20.ELsmp - the errata suggests it's been fixed and yet on a server with no keyboard and mouse the /dev/random device can produce no output for minutes at a time. This is causing all java JINI services to hang on startup and some java SSL services. The only safe workaround for JINI is rm /dev/random mknod -m 0444 /dev/random c 1 9 as suggested in http://linux.about.com/od/commands/l/blcmdl4_random.htm
Comment 33 Leonard den Ottolander 2004-11-25 21:45:43 UTC
Comment 32: Jos, please see comment 8, 9 and 10.
Comment 34 Guillaume Berche 2005-07-22 11:58:30 UTC
(In reply to comment #33) > Comment 32: Jos, please see comment 8, 9 and 10. > Jos I indeed read comments #8, to comment #10 but could not find the answer to Jos: is the fix included in RHEL3 U3 and attached in comment #28 makes use of /dev/random possible is a headless server such as the ones in most data centers (i.e. without mouse and keyword). Was the fix able to include other environmental data such as interrupts or hardware specific data such as hard disk statistics (BTW more details about how the fix works would certainly help in understanding how the bug was fixed)? Is the workaround of using /dev/urandom instead still necessary on headless computers running RHEL3 U3?
Comment 35 Guillaume Berche 2005-07-22 13:54:56 UTC
Sorry, it seems that while adding myself to the CC list I have by mistake changed the status of this bug, which was not my intention. I therefore tried to put it back to the previous state left by "John Flanagan on 2004-09-02 00:31 EST", i.e. "CLOSED ERRATA", but was refused permission to do so. But I would still appreciate details about how this bug was fixed.
Comment 36 Ernie Petrides 2005-07-22 20:42:47 UTC
Hello, Guillaume. From reading the patch in comment #28, it looks like the fixes were oriented around sleep/wakeup synchronization (as opposed to incorporating new sources of randomness). Unfortunately, the person who did this work is no longer here. Sorry I'm not able to get better answers for you. Reclosing bug.