Bug 1759566 - CVE-2019-20485 virt:8.1/libvirt: potential DoS by holding a monitor job while querying QEMU guest-agent [rhel-av-8]
Summary: CVE-2019-20485 virt:8.1/libvirt: potential DoS by holding a monitor job while...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: 8.2
Assignee: Jonathon Jongsma
QA Contact: Lili Zhu
URL:
Whiteboard:
: 1809745 (view as bug list)
Depends On: 1705426
Blocks: 1755482 CVE-2019-20485
TreeView+ depends on / blocked
 
Reported: 2019-10-08 14:06 UTC by Jonathon Jongsma
Modified: 2020-05-12 11:30 UTC (History)
13 users (show)

Fixed In Version: libvirt-6.0.0-6.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1705426
Environment:
Last Closed: 2020-05-05 09:50:34 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2017 0 None None None 2020-05-05 09:51:42 UTC

Description Jonathon Jongsma 2019-10-08 14:06:31 UTC
+++ This bug was initially created as a clone of Bug #1705426 +++

It is possible to call some QEMU-GA commands using libvirt interface (e.g. virDomainInterfaceAddresses) but all the commands block until the call to QEMU-GA finishes and it is not possible to specify a timeout. This is problematic for example in situations when the guest is under load when the call can take some time to finish. For the duration the libvirt domain is locked and any other interaction with libvirt is impossible.

The libvirt interface should be extended to allow setting a timeout to the guest agent calls.

--- Additional comment from Daniel Berrangé on 2019-05-02 09:24:38 UTC ---

(In reply to Tomáš Golembiovský from comment #0)
> It is possible to call some QEMU-GA commands using libvirt interface (e.g.
> virDomainInterfaceAddresses) but all the commands block until the call to
> QEMU-GA finishes and it is not possible to specify a timeout. This is
> problematic for example in situations when the guest is under load when the
> call can take some time to finish. For the duration the libvirt domain is
> locked and any other interaction with libvirt is impossible.

This is a bug in its own right. In most cases the only other APIs that should be blocked are those which also need to use the guest agent. The majority of domain APIs should be allowed, as talking to the guest agent doesn't change libvirt's view of the guest domain's state in most cases. 

> The libvirt interface should be extended to allow setting a timeout to the
> guest agent calls.

--- Additional comment from Jonathon Jongsma on 2019-10-03 19:45:17 UTC ---

(In reply to Daniel Berrangé from comment #1)
> (In reply to Tomáš Golembiovský from comment #0)
> > It is possible to call some QEMU-GA commands using libvirt interface (e.g.
> > virDomainInterfaceAddresses) but all the commands block until the call to
> > QEMU-GA finishes and it is not possible to specify a timeout. This is
> > problematic for example in situations when the guest is under load when the
> > call can take some time to finish. For the duration the libvirt domain is
> > locked and any other interaction with libvirt is impossible.
> 
> This is a bug in its own right. In most cases the only other APIs that
> should be blocked are those which also need to use the guest agent. The
> majority of domain APIs should be allowed, as talking to the guest agent
> doesn't change libvirt's view of the guest domain's state in most cases. 

As far as I can tell (from observation as well as testing), virDomainInterfaceAddresses() does not actually block other commands on the domain, because it only queries information from the agent (i.e. it calls qemuDomainObjBeginAgentJob() rather than qemuDomainObjBeginJobWithAgent()). For example, I can do the following: 
- use virsh from two different terminals
- instrument guest so that the agent commands don't respond immediately (e.g. set a breakpoint on qmp_guest_network_get_interfaces())
- in first terminal: call 'domifaddr --source agent $domain'
  - virsh waits indefinitely for a reply which is blocked by the breakpoint set above
- in second terminal: call 'setmem $domain 3G'
  - command completes immediately and memory is updated

If you have different results, please let me know.

On the other hand, qemuDomainGetFSInfo() *does* block other commands while it is running. For example, if you repeat the above procedure for 'domfsinfo $domain', the 'setmem' command will block indefinitely until the agent command completes. This is because 'domfsinfo' starts a job for both the agent and the domain. We do so because virDomainGetFSInfo() attempts to match the filesystems returned by the agent command to disks in the domain's xml definition. I believe this behavior is correct because it prevents the domain definition from changing while we query the filesystem information.

> > The libvirt interface should be extended to allow setting a timeout to the
> > guest agent calls.

So I guess this is the important part of this bug, but it is also a bit tricky. I'll send a proposal patch upstream for discussion.

--- Additional comment from Daniel Berrangé on 2019-10-04 09:02:50 UTC ---

(In reply to Jonathon Jongsma from comment #2)
> On the other hand, qemuDomainGetFSInfo() *does* block other commands while
> it is running. For example, if you repeat the above procedure for 'domfsinfo
> $domain', the 'setmem' command will block indefinitely until the agent
> command completes. This is because 'domfsinfo' starts a job for both the
> agent and the domain. We do so because virDomainGetFSInfo() attempts to
> match the filesystems returned by the agent command to disks in the domain's
> xml definition. I believe this behavior is correct because it prevents the
> domain definition from changing while we query the filesystem information.

I don't think that level of locking is really necessary. Matching up guest disks to host XML is only ever a best effort thing. There's no strong guarantee we'll successfully match.

Given that I think it is not necessary to acquire a job on the domain while the agent is running. We should just do the matching with whatever domain XML exists after we get the data back from the guest. If someone has changed storage in the meantime, so be it, we'll simply not match those changed entries.

It is not *ever* acceptable for a guest agent command to block execution of any other part of the libvirt mgmt API, except for other guest agent commands, as this is a denial of service on the mgmt app.

> > > The libvirt interface should be extended to allow setting a timeout to the
> > > guest agent calls.
> 
> So I guess this is the important part of this bug, but it is also a bit
> tricky. I'll send a proposal patch upstream for discussion.

--- Additional comment from Jonathon Jongsma on 2019-10-04 14:34:05 UTC ---

(In reply to Daniel Berrangé from comment #3)

> I don't think that level of locking is really necessary. Matching up guest
> disks to host XML is only ever a best effort thing. There's no strong
> guarantee we'll successfully match.

OK, I can submit a patch making that change, but...

> It is not *ever* acceptable for a guest agent command to block execution of
> any other part of the libvirt mgmt API, except for other guest agent
> commands, as this is a denial of service on the mgmt app.

If this is really a hard-and-fast rule, that basically means that everywhere that we use qemuDomainObjBeginJobWithAgent() is a bug? At the moment, we have 7 such cases:
 - qemuDomainShutdownFlags()
 - qemuDomainReboot()
 - qemuDomainSetVcpusFlags()
 - qemuDomainPMSuspendForDuration()
 - qemuDomainSetTime()
 - qemuDomainGetFSInfo()
 - qemuDomainGetGuestInfo()

--- Additional comment from Daniel Berrangé on 2019-10-04 14:43:08 UTC ---

Yes, I'd consider them all bugs, because we have to assume that the agent can be malicious / hostile and thus protect against a DoS when using the agent.

Comment 1 Jonathon Jongsma 2020-01-17 20:56:19 UTC
The patch series has been pushed upstream. Relevant patches: 

1cb8bc52c1035573a0c1a87f724a6c7dfee82f12 qemu: don't take agent and monitor job for shutdown
0a9893121187c0c3f9807e9164366e1f6977619c qemu: don't hold a monitor and agent job for reboot
a663a860819287e041c3de672aad1d8543098ecc qemu: don't hold both jobs for suspend
e005c95f56fee9ed780be7f8db103d690bd34cbd qemu: don't hold monitor and agent job when setting time
ffa5066a49686e61991759983b0d7d1ba707fe50 qemu: remove use of qemuDomainObjBeginJobWithAgent()
e888c0f66752bb6516d63a612c20f565cbf9c0ca qemu: rename qemuAgentGetFSInfoInternalDisk()
bdb8a800b4920cf9184fd2fd117b17c67ba74dfb qemu: store complete agent filesystem information
306b4cb070b8f57a22a261d1f097283f4ef84e65 qemu: Don't store disk alias in qemuAgentDiskInfo
599ae372d8cf0923757c5a3792acb07dcf3e8802 qemu: don't access vmdef within qemu_agent.c
3c436c22a4f94c85c2b5e7b5fb84af48219d78e3 qemu: remove qemuDomainObjBegin/EndJobWithAgent()

Comment 8 Jonathon Jongsma 2020-03-11 20:00:29 UTC
*** Bug 1809745 has been marked as a duplicate of this bug. ***

Comment 9 Lili Zhu 2020-03-20 03:04:50 UTC
reproduced this bug with:
libvirt-daemon-5.6.0-10.module+el8.1.1+5309+6d656f05.x86_64
qemu-guest-agent-4.1.0-13.module+el8.1.0+4313+ef76ec61.x86_64
qemu-kvm-4.1.0-23.module+el8.1.1+5938+f5e53076.2.x86_64

1. prepare a guest with guest agent connected
# virsh list --all
 Id   Name             State
--------------------------------
 3    avocado-vt-vm1   running

# virsh domtime avocado-vt-vm1 
Time: 1584621438

2. set breakpoint on qmp_guest_set_time in guest
[root@localhost ~]# gdb -p `pidof qemu-ga`
....
(gdb) br qmp_guest_set_time 
Breakpoint 1 at 0x55eddab50b70
(gdb) c
Continuing.

Breakpoint 1, 0x000055eddab50b70 in qmp_guest_set_time ()

3. try to set guest time 
# virsh domtime avocado-vt-vm1 --time 1584621357 
(will hit the breakpoint, virsh will hang there)

4. try to change guest memory allocation 
# date; virsh setmem avocado-vt-vm1 1G; date
Thu Mar 19 23:01:39 EDT 2020
error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSetTime)

Thu Mar 19 23:02:09 EDT 2020

virsh will hang for 30 seconds

Comment 10 Lili Zhu 2020-03-20 03:17:54 UTC
Verify this bug with:
libvirt-6.0.0-12.el8.x86_64
qemu-kvm-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64
qemu-guest-agent-4.2.0-15.module+el8.2.0+6029+618ef2ec.x86_64

1. execute the first 3 steps in Comment #9

2. try to change guest memory allocation
# date; virsh setmem rhel8.2 1G; date
Thu Mar 19 23:16:18 EDT 2020

Thu Mar 19 23:16:18 EDT 2020
# virsh dommemstat rhel8.2
actual 1048576
swap_in 0
swap_out 0
major_fault 212
minor_fault 139225
unused 1183088
available 1354084
usable 1145764
last_update 1584672350
disk_caches 67852
hugetlb_pgalloc 0
hugetlb_pgfail 0
rss 1028196

command completes immediately and memory is updated

Comment 11 Lili Zhu 2020-03-20 16:23:04 UTC
except for qemuDomainPMSuspendForDuration(), since s3/s4 is disabled.

For other 6 APIs, except for qemuDomainPMSuspendForDuration(), since s3/s4 is disabled.

1) set the breakpoints on corresponding qmp commands in qemu-guest-agent, refer to
the 2nd coloum in the following matrix,

 - qemuDomainShutdownFlags()            ---> qmp_guest_shutdown      ---> virsh shutdown
 - qemuDomainReboot()                   ---> qmp_guest_shutdown      ---> virsh reboot
 - qemuDomainSetVcpusFlags()            ---> qmp_guest_set_vcpus     ---> virsh setvcpus --guest --count 
 - qemuDomainGetFSInfo()                ---> qmp_guest_get_fsinfo    ---> virsh domfsinfo
 - qemuDomainGetGuestInfo()             ---> qmp_guest_get_osinf     ---> virsh guestinfo 

2) execute the virsh cmd in 3rd column

3) try to change guest memory allocation
# date; virsh setmem rhel8.2 $size; date
Fri Mar 20 12:22:03 EDT 2020

Fri Mar 20 12:22:03 EDT 2020

command completes immediately and memory is updated

Comment 12 Lili Zhu 2020-03-23 05:38:24 UTC
As the testing results match with expected results, mark the bug as verified.

Comment 14 errata-xmlrpc 2020-05-05 09:50:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.