Bug 1212082 - RFE: VMs created in virt-manager should include virt-rng pointing at /dev/random by default
Summary: RFE: VMs created in virt-manager should include virt-rng pointing at /dev/ran...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Virtualization Tools
Classification: Community
Component: virt-manager
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Cole Robinson
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-04-15 14:08 UTC by Stephen Gallagher
Modified: 2017-03-08 22:54 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-08 22:54:52 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1389351 0 low CLOSED Should use socket activation 2021-02-22 00:41:40 UTC

Description Stephen Gallagher 2015-04-15 14:08:25 UTC
Description of problem:
Entropy in VMs is a serious problem (particularly for crypto-sensitive activities like running a domain controller). When creating a virtual machine in virt-manager, it would make sense to always add the virt-rng device by default, so that client OSes like Fedora can use the host's entropy pool.

Version-Release number of selected component (if applicable):
virt-manager-1.1.0-7.git6dbe19bd8.fc22.noarch

Comment 1 Cole Robinson 2015-04-15 14:24:49 UTC
I thought the recommendation was that this is not safe in the general case, but I can't remember the details.

Amit, can you comment?

Comment 2 Amit Shah 2015-04-15 18:25:55 UTC
I do support adding the rng device by default; I do it for all my devices.

However, some people may not like this?

Can this be made a global config option in virt-manager, so that if checked once by the user, all new VMs will get the rng device by default?

(btw Cole, I think you misunderstood the question; or I'd be interested in knowing what's unsafe about doing this)

Comment 3 Cole Robinson 2015-04-15 18:57:10 UTC
Sorry for being vague. I though the unsafety was about entropy starvation of the host, not anything to do with exploitability.

I can understand in say the cloud case this would make sense, where the physical host's only job is to serve VMs, and we care about the VMs operating as efficiently as possible. If the host entropy pool is exhausted, we don't really care because it's only purpose it to feed the VMs

But in the desktop user case, which is definitely the majority of virt-manager users, doesn't giving /dev/random to VMs just risk potential interference with proper host operation, if host entropy is exhausted?

If that's true then this doesn't seem like something we can unconditionally enable.

A global option like Amit suggests to add virtio-rng for all new VMs is do-able, but even still I question how useful that is. If people know enough to go look for that option, they know enough to manually add the rng device at the end of VM install. I prefer to avoid adding these types of options unless there's a really compelling reason, since once they are in the UI they are there for good since people start depending on them. I'm not completely against the idea, just not sure if there's enough of an audience to justify the option instead of saying 'just add the device manually'

Comment 4 Daniel Berrangé 2015-04-16 08:26:38 UTC
FWIW, /dev/random is already made available world readable, so any unprivileged user can cause entropy starvation.

Comment 5 Stephen Gallagher 2015-04-16 12:33:00 UTC
(In reply to Cole Robinson from comment #3)
> Sorry for being vague. I though the unsafety was about entropy starvation of
> the host, not anything to do with exploitability.
> 
> I can understand in say the cloud case this would make sense, where the
> physical host's only job is to serve VMs, and we care about the VMs
> operating as efficiently as possible. If the host entropy pool is exhausted,
> we don't really care because it's only purpose it to feed the VMs
> 

Actually, I'd argue that this is the more problematic case. If the host entropy is completely depleted, your cloud infrastructure may start exhibiting problematic behavior (just like it does now, without the virt-rng device) such as crypto actions hanging or failing (or worse, not failing but continuing with insufficiently-strong random values).

But of course, in sane deployments, this would be mitigated by adding entropy-generating hardware on the hosts and serving that through virt-rng. So let's assume we can set that particular case aside.

So the current default case is that all VMs basically start pretty close to the worst-case scenario I described above. With the virt-rng pointing at the host OS's /dev/random, they at least gain the benefit of whatever entropy is available on the host. This entropy is partly generated from the operations performed by the various VMs being run, meaning that what we are effectively doing is sharing unused entropy around so that VMs that don't currently need theirs can give it to the VMs that do. The risk of entropy starvation on the host is significantly lower than it would be on any one of the virtual machines. So I'd say that this is a net win (not a perfect solution; that still needs a hardware entropy device).


> But in the desktop user case, which is definitely the majority of
> virt-manager users, doesn't giving /dev/random to VMs just risk potential
> interference with proper host operation, if host entropy is exhausted?
> 

The desktop case is actually the much less risky case, because the risk of depleting the host entropy pool is almost nonexistent. One of the best sources of entropy to refill the entropy pool is through human interface devices. Human beings are the closest you can get to truly random devices. For example, while typing this paragraph, the entropy pool on my laptop has gone up from 3088 to 3109 (courtesy of /proc/sys/kernel/random/entropy_avail).

Humans move the mouse randomly and type with different pauses. A desktop machine with a human being in front of it is virtually guaranteed to maintain a high entropy pool. It would take an incredible *intentional* effort to completely deplete it. At which point, the situation reverts back to being identical to not having the virt-rng device, which is the current state.


> If that's true then this doesn't seem like something we can unconditionally
> enable.
> 
> A global option like Amit suggests to add virtio-rng for all new VMs is
> do-able, but even still I question how useful that is. If people know enough
> to go look for that option, they know enough to manually add the rng device
> at the end of VM install. I prefer to avoid adding these types of options
> unless there's a really compelling reason, since once they are in the UI
> they are there for good since people start depending on them. I'm not
> completely against the idea, just not sure if there's enough of an audience
> to justify the option instead of saying 'just add the device manually'

I personally don't think this needs to be optional. As best as I can figure, having this enabled all the time (and obviously configurable if you want to point it at an entropy daemon, which it already can do) is at least no worse in any particular situation than not having it enabled. And in the majority of cases, it is far better.

Comment 6 Cole Robinson 2015-04-18 19:44:56 UTC
Thanks for educating me Stephen, that all sounds reasonable. I'll look into this for the next virt-manager release

Comment 7 Amit Shah 2015-04-20 05:35:41 UTC
Thanks - Stephen pointed out why the desktop case is the least of our problems.

For server workloads, we have to rely on an external hwrng.

In any case, we also have a way of rate-limiting each guest's use of host entropy, so if this is made unconditional, we could perhaps have a default rate-limit.

I would say something like 32 bytes per minute is good enough -- but if we want to do this, I can ask experts for guidance.

Comment 8 Stephen Gallagher 2015-05-26 13:25:21 UTC
Just wanted to follow up here. Has there been any discussion with the crypto folks about a reasonable entropy rate-limit? As noted in the original bug report, I'd like to have someone measure the needs of freeipa-server-install (with CA) as part of the discussion (I'd like to make sure the rate-limit is equal to or above the high-water mark for that action).

Comment 9 Cole Robinson 2015-05-26 15:38:03 UTC
I haven't talked to any crypto folks at least. Nor have I moved on enabling this default yet either though...

Comment 10 Amit Shah 2015-05-28 04:49:08 UTC
Steve Grubb can help with specific questions related to the rate-limit.

Steve, please see question in comment 8.

Comment 11 Steve Grubb 2015-05-28 19:36:57 UTC
I've had this conversation with folks many times and I'll reiterate it again.

Why don't they trust urandom? Urandom and random both get seeded from the input pool and both have the same conditioner function and stirred by the same function. The difference is that random stops outputting when the entropy count goes to zero while urandom continues. It doesn't mean that urandom outputs low quality numbers at all.

What happens is the the strength of function implied by presumably true 
hardware entropy is replaced with the strength of of the SHA1 conditioning 
function. (Which is very strong.) All RNGs have a basic reseeding requirement 
to keep an attacker from guessing the internal state of the RNG. If they could 
tell what the state was, they could guess the next number coming out and the 
game is over.

BSI has a document, AIS20, which describes how many rounds may be removed 
before reseeding is needed. Without going into the gory details, the number 
turns out to be 6,52e17 (2**64 bits or 2**61 bytes). If that sounds like a 
lot, it is.

Let's see how long that is. 

# dd if=/dev/urandom of=/dev/null count=100000
100000+0 records in
100000+0 records out
51200000 bytes (51 MB) copied, 3.99051 s, 12.8 MB/s

Let's round that up to 16 MB/s to make calculations easy. 16 MB/s is 2**24. 
So, using the numbers above, it would take reading urandom at top speed for 
2**37 seconds before reseeding is needed. That turns out to be a bit more than 
4300 years.

The fact is that you know hardware is going to do something that generates 
entropy and urandom will be reseeded before that time is up. 

Configuration should be allowed to switch to random if people want. Urandom should be the default, though. Urandom is safe and doesn't need a particular rate limit. 

As for a rate limit for random, you want to make this configurable, too. But an off the cuff estimate is adding 32 bytes a minute is probably more than enough. The guest should be generating entropy on its own. This is to make sure that what entropy the guest is generating isn't overly biased with scheduler artefacts which an attacker may use to gain knowledge of the internal state and thus predict future numbers.

Comment 12 Stephen Gallagher 2015-05-28 20:23:04 UTC
(In reply to Steve Grubb from comment #11)
> I've had this conversation with folks many times and I'll reiterate it again.
> 
> Why don't they trust urandom?

From the urandom(4) manpage:

"A read from the /dev/urandom device will not block waiting for more entropy. As a result, if there is not sufficient entropy in the entropy pool, the returned values are theoretically vulnerable to a cryptographic attack on the algorithms used by the driver. Knowledge of how to do this is not available in the current unclassified literature, but it is theoretically possible that such an attack may exist. If this is a concern in your application, use /dev/random instead."


That is - at the very least - a scary-sounding sentence and likely the origin of all the wariness around the use of /dev/urandom.

That being said...


> Urandom and random both get seeded from the
> input pool and both have the same conditioner function and stirred by the
> same function. The difference is that random stops outputting when the
> entropy count goes to zero while urandom continues. It doesn't mean that
> urandom outputs low quality numbers at all.
> 
> What happens is the the strength of function implied by presumably true 
> hardware entropy is replaced with the strength of of the SHA1 conditioning 
> function. (Which is very strong.) All RNGs have a basic reseeding
> requirement 
> to keep an attacker from guessing the internal state of the RNG. If they
> could 
> tell what the state was, they could guess the next number coming out and the 
> game is over.
> 
> BSI has a document, AIS20, which describes how many rounds may be removed 
> before reseeding is needed. Without going into the gory details, the number 
> turns out to be 6,52e17 (2**64 bits or 2**61 bytes). If that sounds like a 
> lot, it is.
> 
> Let's see how long that is. 
> 
> # dd if=/dev/urandom of=/dev/null count=100000
> 100000+0 records in
> 100000+0 records out
> 51200000 bytes (51 MB) copied, 3.99051 s, 12.8 MB/s
> 
> Let's round that up to 16 MB/s to make calculations easy. 16 MB/s is 2**24. 
> So, using the numbers above, it would take reading urandom at top speed for 
> 2**37 seconds before reseeding is needed. That turns out to be a bit more
> than 
> 4300 years.
> 
> The fact is that you know hardware is going to do something that generates 
> entropy and urandom will be reseeded before that time is up. 
> 
> Configuration should be allowed to switch to random if people want. Urandom
> should be the default, though. Urandom is safe and doesn't need a particular
> rate limit. 
> 
> As for a rate limit for random, you want to make this configurable, too. But
> an off the cuff estimate is adding 32 bytes a minute is probably more than
> enough. The guest should be generating entropy on its own. This is to make
> sure that what entropy the guest is generating isn't overly biased with
> scheduler artefacts which an attacker may use to gain knowledge of the
> internal state and thus predict future numbers.

So, if we have published and clear documentation that says urandom is safe, we should *really* update all of the literature and documentation that says otherwise or such interpretations will continue.

(Of course, no matter what we'll have to deal with customers' cargo-cult fear of anything using /dev/urandom).


I'll also point to this entry[1] from Red Hat Network (which mostly agrees with you, but points out some specific early boot issues, such as those we're trying to ameliorate here...):

"The problems with /dev/urandom only appear if there *all* of the data is known by the attacker --- so all of the keyboard interrupts, all of the network interrupts, all of the mouse interrupts, the initial random seed file --- everything. In practice the time when this has come up is very early in the initial install process, where there is no random seed file, and before any interrupt entropy has had a chance to be mixed into the pool, particularly if it is a headless (i.e., no keyboard, no mouse, no monitor) install process.

And here there is no magic bullet. If you are doing a headless install, and there is no entropy, and you don't have a way of accessing a real hardware random number generator, THIS IS NOT THE RIGHT TIME TO BE GENERATING SSH HOST KEYS."

Unfortunately, there is a large amount of software in Fedora and RHEL that generates keys during %post of the RPM installation. I'm working on a separate solution to fixing that (which involves deferring creation of keys to first boot-up instead of RPM installation), but that's not really there yet.

Anyway, if you're prepared to say unequivocally that /dev/urandom is safe for all purposes, we should just configure rngd to always read from /dev/urandom and call it a day.


[1] https://access.redhat.com/articles/221583

Comment 13 Steve Grubb 2015-05-28 20:50:24 UTC
We addressed the ssh keygen issue by collecting entropy during system install. Installing initscripts package triggers writing a seed file to disk so that first boot can generate keys securely. Not sure about the rngd comment since it writes the numbers right back to it.

Comment 15 Daniel Berrangé 2015-05-29 08:57:20 UTC
(In reply to Steve Grubb from comment #11)
> I've had this conversation with folks many times and I'll reiterate it again.
> 
> Why don't they trust urandom? Urandom and random both get seeded from the
> input pool and both have the same conditioner function and stirred by the
> same function. The difference is that random stops outputting when the
> entropy count goes to zero while urandom continues. It doesn't mean that
> urandom outputs low quality numbers at all.

When virtio-rng was integrated into QEMU, on the upstream discussions Peter Anvin was categorical that seeing virtio-rng from /dev/urandom on the host is cryptographically incorrect and a security hole

  http://lists.gnu.org/archive/html/qemu-devel/2012-09/msg02387.html

I don't have the personal knowledge of RNGs to be able to debate what he says vs what you are saying, but since HPA is the upstream maintainer of the random number subsystem in the kernel, we chose to defer to his expert opinion and so refuse to allow use of /dev/urandom with QEMU.

Comment 16 Stephen Gallagher 2016-04-13 19:54:43 UTC
(In reply to Daniel Berrange from comment #15)

Resurrecting this conversation.

I've been rereading the comments here and it seems like the conversation got stuck on an aside: whether to allow the use of /dev/urandom

However, even if we set that aside, it should be perfectly fine to default to adding the virt-rng device pointing to the host's /dev/random. As I discussed near the beginning of this bug, the worst-case scenario (managing to exhaust the host's entropy) is both unlikely (the presence of a running physical system, likely with other VMs on it, will feed the entropy pool) and still *no worse than the current situation*.

So I'll re-ask the initial question: can we change it so that new VMs default to having virt-rng enabled?

Comment 17 Cole Robinson 2016-04-13 20:16:20 UTC
(In reply to Stephen Gallagher from comment #16)
> However, even if we set that aside, it should be perfectly fine to default
> to adding the virt-rng device pointing to the host's /dev/random. As I
> discussed near the beginning of this bug, the worst-case scenario (managing
> to exhaust the host's entropy) is both unlikely (the presence of a running
> physical system, likely with other VMs on it, will feed the entropy pool)
> and still *no worse than the current situation*.

It's no worse for _VMs_ than the current situation, but it may be worse for _the host_ than the current situation; VMs could exhaust the host entropy pool which could lead to less performance on the host in certain situations. I'm not saying it's likely; I buy your argument in comment #5 about desktop not needing to worry since there's plenty of entropy source, but it's not _unquestionably_ no worse than the current situation IMO. And since the majority of virt-manager users are using it on systems that are doing more than just hosting VMs, it should at least be considered.

> 
> So I'll re-ask the initial question: can we change it so that new VMs
> default to having virt-rng enabled?

Regardless of my comments above, I ACKd the idea in comment #6, I just haven't implemented it yet

Comment 18 Amit Shah 2016-04-14 05:01:04 UTC
(In reply to Cole Robinson from comment #17)
> (In reply to Stephen Gallagher from comment #16)
> > However, even if we set that aside, it should be perfectly fine to default
> > to adding the virt-rng device pointing to the host's /dev/random. As I
> > discussed near the beginning of this bug, the worst-case scenario (managing
> > to exhaust the host's entropy) is both unlikely (the presence of a running
> > physical system, likely with other VMs on it, will feed the entropy pool)
> > and still *no worse than the current situation*.
> 
> It's no worse for _VMs_ than the current situation, but it may be worse for
> _the host_ than the current situation; VMs could exhaust the host entropy
> pool which could lead to less performance on the host in certain situations.
> I'm not saying it's likely; I buy your argument in comment #5 about desktop
> not needing to worry since there's plenty of entropy source, but it's not
> _unquestionably_ no worse than the current situation IMO. And since the
> majority of virt-manager users are using it on systems that are doing more
> than just hosting VMs, it should at least be considered.

I think just pick some value for the rate limiting and see how it goes.  The default rate-limiting value can be tweaked based on user feedback/complaints.

Also, I think we should move to using getrandom() as the default and to avoid the /dev/random and /dev/urandom confusion.  It's something I've been meaning to do for a while but haven't found the time yet.

> > So I'll re-ask the initial question: can we change it so that new VMs
> > default to having virt-rng enabled?
> 
> Regardless of my comments above, I ACKd the idea in comment #6, I just
> haven't implemented it yet

Yes, I fully support the idea of having the virtio-rng on by default.

Comment 19 Stephen Gallagher 2016-09-22 18:24:16 UTC
(In reply to Cole Robinson from comment #17)
> Regardless of my comments above, I ACKd the idea in comment #6, I just
> haven't implemented it yet

I hope I wouldn't be out of line if I asked where this is on the roadmap. The entropy issue continues to be a source of frustration for anyone deploying crypto  applications (such as MIT Kerberos, Dogtag or FreeIPA, to name a few) in a VM.

Comment 20 Cole Robinson 2016-09-30 14:19:04 UTC
(In reply to Stephen Gallagher from comment #19)
> (In reply to Cole Robinson from comment #17)
> > Regardless of my comments above, I ACKd the idea in comment #6, I just
> > haven't implemented it yet
> 
> I hope I wouldn't be out of line if I asked where this is on the roadmap.
> The entropy issue continues to be a source of frustration for anyone
> deploying crypto  applications (such as MIT Kerberos, Dogtag or FreeIPA, to
> name a few) in a VM.

Not out of line, I'm sorry this has taken so long. It's on my short list for the next release, and I'm hoping to get back to virt-manager work within the next few weeks... but my time estimates suck :)

Comment 21 Oliver Henshaw 2016-10-24 15:09:41 UTC
The lack of entropy in VMs now causes problems in desktop guests too (with a F23 desktop host). I installed a F24 VM from the live image (the KDE spin though aiui the package set is the same on the desktop spin) and updated; boot takes around 80 seconds.

$ systemd-analyze blame
    1min 18.317s gssproxy.service
          1.716s firewalld.service
          1.283s dev-mapper-fedora\x2droot.device
          1.255s systemd-udev-settle.service

Adding "strace -tt -T" to the gssproxy.service file reveals that a getrandom() call in krb5 is the culprit:
...
16:22:36.049781 stat("/etc/krb5.conf", {st_mode=S_IFREG|0644, st_size=677, ...}) = 0 <0.000112>
16:22:36.049934 open("/etc/krb5.conf", O_RDONLY) = 3 <0.000011>
16:22:36.049971 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 <0.000005>
16:22:36.050013 fstat(3, {st_mode=S_IFREG|0644, st_size=677, ...}) = 0 <0.000005>
16:22:36.050039 read(3, "# To opt out of the system crypt"..., 4096) = 677 <0.000213>
16:22:36.050298 open("/etc/krb5.conf.d/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 <0.000022>
16:22:36.050349 fstat(4, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0 <0.000005>
16:22:36.050378 getdents(4, /* 3 entries */, 32768) = 88 <0.000680>
16:22:36.051114 open("/etc/krb5.conf.d//crypto-policies", O_RDONLY) = 5 <0.001484>
16:22:36.052662 fstat(5, {st_mode=S_IFREG|0644, st_size=191, ...}) = 0 <0.000006>
16:22:36.052711 read(5, "# This file is automatically gen"..., 4096) = 191 <0.000534>
16:22:36.053293 read(5, "", 4096)       = 0 <0.000006>
16:22:36.053336 close(5)                = 0 <0.000008>
16:22:36.053365 getdents(4, /* 0 entries */, 32768) = 0 <0.000007>
16:22:36.053390 close(4)                = 0 <0.000008>
16:22:36.053457 read(3, "", 4096)       = 0 <0.000005>
16:22:36.053483 close(3)                = 0 <0.000005>
16:22:36.053538 getrandom("YbP\250\1<\377\216I\212\6B9\212\223\227\3050\212i\240%\\\372\264\203woLD\307R", 32, 0) = 32 <70.908856>
16:23:46.962705 access("/etc/krb5.conf", R_OK) = 0 <0.000249>
16:23:46.964311 stat("/etc/krb5.conf", {st_mode=S_IFREG|0644, st_size=677, ...}) = 0 <0.000028>
16:23:46.964496 getrandom("r|\221\360 \\\236\214e\344x\300\362\330\273A\307.g\340~E\261\330=\206\216\247n[\f\371", 32, 0) = 32 <0.000038>
...

According to rharwood in #gssproxy irc this is krb5 working as intended, although the getrandom() usage is new in F24 - F23 krb5 will use an unseeded prng in VMs. Adding a virt-rng device to the VM eliminates gssproxy from the boot time trace.

Comment 22 Colin Walters 2016-10-26 17:59:45 UTC
First, I think gssproxy should not call `getrandom()` when it simply starts.  Why is it doing that?

Alternatively, convert it to socket/dbus activation so it's only started when something uses it.

Second, it really is the responsibility of hypervisors to provide randomness to guests.  I don't understand the argument for not including virtio-rng by default.

Specifically for cloud images, it's possible to seed via cloud-init:
http://blog.dustinkirkland.com/2012/10/seed-devurandom-through-metadata-ec2.html
However, it looks like doing it in a way that would update /dev/random would require the ioctl interface.

Comment 23 Colin Walters 2016-10-26 18:02:32 UTC
Also, if OpenSSH isn't using `getrandom()` for SSH keys, why the hell are we being stricter for anything else?

Comment 24 Colin Walters 2016-10-27 13:32:43 UTC
Filed the gssproxy issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1389351

Comment 25 Steve Milner 2016-10-27 14:07:45 UTC
I agree with Colin's statements above on virtio-rng and strictness.

Comment 26 Oliver Henshaw 2016-10-27 16:29:59 UTC
(In reply to Stephen Gallagher from comment #5)
> The desktop case is actually the much less risky case, because the risk of
> depleting the host entropy pool is almost nonexistent. One of the best
> sources of entropy to refill the entropy pool is through human interface
> devices. Human beings are the closest you can get to truly random devices.
> For example, while typing this paragraph, the entropy pool on my laptop has
> gone up from 3088 to 3109 (courtesy of
> /proc/sys/kernel/random/entropy_avail).

I do see severe entropy drain on a desktop host - see bug #1389469. But that may be because my baseline is more like 800-900 than 3000. Nevertheless virt-rng on /dev/random seems to have the result of prioritising guest entropy over host entropy.

Comment 27 Matthew Miller 2017-01-28 09:46:26 UTC
N.B. this was enabled by default for oVirt (bug #1337101), with /dev/urandom.

Comment 28 Cole Robinson 2017-03-08 22:54:52 UTC
This is upstream now, release coming shortly:

commit d62e97556867e9f929c1ed9a35fdd9bd1044472f
Author: Cole Robinson <crobinso>
Date:   Wed Mar 8 16:54:16 2017 -0500

    guest: Add default virtio-rng /dev/urandom (bz 1212082)


Note You need to log in before you can comment on or make changes to this bug.