Bug 2001404
Summary: | CVE-2021-4145 qemu-kvm: QEMU: NULL pointer dereference in mirror_wait_on_conflicts() in block/mirror.c [rhel-9.0] | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Yanan Fu <yfu> | |
Component: | qemu-kvm | Assignee: | Stefano Garzarella <sgarzare> | |
qemu-kvm sub component: | Storage | QA Contact: | aihua liang <aliang> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | low | |||
Priority: | high | CC: | aliang, coli, hhan, kkiwi, mcascell, mrezanin, ngu, sgarzare, virt-maint, xfu, yfu | |
Version: | 9.0 | Keywords: | Security, SecurityTracking, Triaged | |
Target Milestone: | rc | |||
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | qemu-kvm-6.2.0-1.el9 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 2002607 (view as bug list) | Environment: | ||
Last Closed: | 2022-05-17 12:24:17 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2002607, 2018367, 2034602 |
Description
Yanan Fu
2021-09-06 02:20:26 UTC
(In reply to Yanan Fu from comment #4) > Here is the backtrace: > # gdb qemu-kvm core-qemu-kvm-380249-1630831981 > ... > ... > Core was generated by `/usr/libexec/qemu-kvm -S -name avocado-vt-vm1 > -sandbox on -machine q35,memory-b'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized > out>, bytes=<optimized out>) > at ../block/mirror.c:172 > 172 self->waiting_for_op = op; > [Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))] > (gdb) bt > #0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized > out>, bytes=<optimized out>) > at ../block/mirror.c:172 > #1 0x00005610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=<optimized > out>) at ../block/mirror.c:491 > #2 0x00005610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at > ../job.c:917 > #3 0x00005610c5f046c6 in coroutine_trampoline (i0=<optimized out>, > i1=<optimized out>) > at ../util/coroutine-ucontext.c:173 > #4 0x00007f0909975820 in ?? () at > ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 > from /usr/lib64/libc.so.6 > #5 0x00007f090892e980 in ?? () > #6 0x0000000000000000 in ?? () The issue seems related to the following commit released with QEMU 6.1: d44dae1a7c block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts https://gitlab.com/qemu-project/qemu/-/commit/d44dae1a7cf782ec9235746ebb0e6c1a20dd7288 In mirror_iteration() we call mirror_wait_on_conflicts() with `self` parameter set to NULL. Starting from commit d44dae1a7c we access `self` in mirror_wait_on_conflicts() without checking if it can be NULL. I'll fix it. Patch posted upstream: https://lists.nongnu.org/archive/html/qemu-devel/2021-09/msg02750.html Merged upstream: https://gitlab.com/qemu-project/qemu/-/commit/66fed30c9cd11854fc878a4eceb507e915d7c9cd In qemu-kvm-6.1.0-2.el9 and qemu-kvm-6.1.0-1.module+el8.6.0+12535+4e2af250, image cluster leak after qemu crash. [root@dell-per440-09 images]# qemu-img check rhel900-64-virtio.qcow2 Leaked cluster 175239 refcount=1 reference=0 Leaked cluster 175240 refcount=1 reference=0 Leaked cluster 175241 refcount=1 reference=0 Leaked cluster 175242 refcount=1 reference=0 Leaked cluster 175243 refcount=1 reference=0 Leaked cluster 175244 refcount=1 reference=0 Leaked cluster 175245 refcount=1 reference=0 Leaked cluster 175246 refcount=1 reference=0 Leaked cluster 175247 refcount=1 reference=0 Leaked cluster 175248 refcount=1 reference=0 Leaked cluster 175249 refcount=1 reference=0 Leaked cluster 175250 refcount=1 reference=0 Leaked cluster 175251 refcount=1 reference=0 Leaked cluster 175252 refcount=1 reference=0 Leaked cluster 175253 refcount=1 reference=0 Leaked cluster 175254 refcount=1 reference=0 Leaked cluster 175255 refcount=1 reference=0 Leaked cluster 175256 refcount=1 reference=0 Leaked cluster 175257 refcount=1 reference=0 Leaked cluster 175258 refcount=1 reference=0 Leaked cluster 175259 refcount=1 reference=0 Leaked cluster 175260 refcount=1 reference=0 Leaked cluster 175261 refcount=1 reference=0 Leaked cluster 175262 refcount=1 reference=0 Leaked cluster 175263 refcount=1 reference=0 Leaked cluster 175264 refcount=1 reference=0 Leaked cluster 175265 refcount=1 reference=0 Leaked cluster 175266 refcount=1 reference=0 Leaked cluster 175267 refcount=1 reference=0 Leaked cluster 175268 refcount=1 reference=0 Leaked cluster 175269 refcount=1 reference=0 Leaked cluster 175270 refcount=1 reference=0 Leaked cluster 175271 refcount=1 reference=0 Leaked cluster 175272 refcount=1 reference=0 Leaked cluster 175273 refcount=1 reference=0 Leaked cluster 175274 refcount=1 reference=0 Leaked cluster 175275 refcount=1 reference=0 Leaked cluster 175276 refcount=1 reference=0 Leaked cluster 175277 refcount=1 reference=0 Leaked cluster 175278 refcount=1 reference=0 Leaked cluster 175279 refcount=1 reference=0 Leaked cluster 175280 refcount=1 reference=0 Leaked cluster 175281 refcount=1 reference=0 Leaked cluster 175282 refcount=1 reference=0 Leaked cluster 175283 refcount=1 reference=0 Leaked cluster 175284 refcount=1 reference=0 Leaked cluster 175285 refcount=1 reference=0 Leaked cluster 175286 refcount=1 reference=0 Leaked cluster 175287 refcount=1 reference=0 Leaked cluster 175288 refcount=1 reference=0 Leaked cluster 175289 refcount=1 reference=0 Leaked cluster 175290 refcount=1 reference=0 Leaked cluster 175291 refcount=1 reference=0 Leaked cluster 175292 refcount=1 reference=0 Leaked cluster 175293 refcount=1 reference=0 Leaked cluster 175294 refcount=1 reference=0 Leaked cluster 175295 refcount=1 reference=0 Leaked cluster 175296 refcount=1 reference=0 Leaked cluster 175297 refcount=1 reference=0 Leaked cluster 175298 refcount=1 reference=0 Leaked cluster 175299 refcount=1 reference=0 Leaked cluster 175300 refcount=1 reference=0 Leaked cluster 175301 refcount=1 reference=0 Leaked cluster 175302 refcount=1 reference=0 Leaked cluster 175303 refcount=1 reference=0 Leaked cluster 175304 refcount=1 reference=0 Leaked cluster 175305 refcount=1 reference=0 Leaked cluster 175306 refcount=1 reference=0 Leaked cluster 175307 refcount=1 reference=0 Leaked cluster 175308 refcount=1 reference=0 Leaked cluster 175309 refcount=1 reference=0 Leaked cluster 175310 refcount=1 reference=0 Leaked cluster 175311 refcount=1 reference=0 Leaked cluster 175312 refcount=1 reference=0 Leaked cluster 175313 refcount=1 reference=0 Leaked cluster 175314 refcount=1 reference=0 Leaked cluster 175315 refcount=1 reference=0 Leaked cluster 175316 refcount=1 reference=0 Leaked cluster 175317 refcount=1 reference=0 Leaked cluster 175318 refcount=1 reference=0 Leaked cluster 175319 refcount=1 reference=0 Leaked cluster 175320 refcount=1 reference=0 Leaked cluster 175321 refcount=1 reference=0 Leaked cluster 175322 refcount=1 reference=0 Leaked cluster 175323 refcount=1 reference=0 Leaked cluster 175324 refcount=1 reference=0 Leaked cluster 175325 refcount=1 reference=0 Leaked cluster 175326 refcount=1 reference=0 Leaked cluster 175327 refcount=1 reference=0 Leaked cluster 175328 refcount=1 reference=0 Leaked cluster 175329 refcount=1 reference=0 Leaked cluster 175330 refcount=1 reference=0 Leaked cluster 175331 refcount=1 reference=0 Leaked cluster 175332 refcount=1 reference=0 Leaked cluster 175333 refcount=1 reference=0 Leaked cluster 175334 refcount=1 reference=0 Leaked cluster 175335 refcount=1 reference=0 Leaked cluster 175336 refcount=1 reference=0 Leaked cluster 175337 refcount=1 reference=0 Leaked cluster 175338 refcount=1 reference=0 Leaked cluster 175339 refcount=1 reference=0 Leaked cluster 175340 refcount=1 reference=0 Leaked cluster 175341 refcount=1 reference=0 Leaked cluster 175342 refcount=1 reference=0 Leaked cluster 175343 refcount=1 reference=0 Leaked cluster 175344 refcount=1 reference=0 Leaked cluster 175345 refcount=1 reference=0 Leaked cluster 175346 refcount=1 reference=0 Leaked cluster 175347 refcount=1 reference=0 Leaked cluster 175348 refcount=1 reference=0 Leaked cluster 175349 refcount=1 reference=0 Leaked cluster 175350 refcount=1 reference=0 Leaked cluster 175351 refcount=1 reference=0 Leaked cluster 175352 refcount=1 reference=0 Leaked cluster 175353 refcount=1 reference=0 Leaked cluster 175354 refcount=1 reference=0 Leaked cluster 175355 refcount=1 reference=0 Leaked cluster 175356 refcount=1 reference=0 Leaked cluster 175357 refcount=1 reference=0 Leaked cluster 175358 refcount=1 reference=0 Leaked cluster 175359 refcount=1 reference=0 Leaked cluster 175360 refcount=1 reference=0 Leaked cluster 175361 refcount=1 reference=0 Leaked cluster 175362 refcount=1 reference=0 Leaked cluster 175363 refcount=1 reference=0 Leaked cluster 175364 refcount=1 reference=0 Leaked cluster 175365 refcount=1 reference=0 Leaked cluster 175366 refcount=1 reference=0 Leaked cluster 175367 refcount=1 reference=0 Leaked cluster 175368 refcount=1 reference=0 Leaked cluster 175369 refcount=1 reference=0 Leaked cluster 175370 refcount=1 reference=0 Leaked cluster 175371 refcount=1 reference=0 Leaked cluster 175372 refcount=1 reference=0 Leaked cluster 175373 refcount=1 reference=0 Leaked cluster 175374 refcount=1 reference=0 Leaked cluster 175375 refcount=1 reference=0 Leaked cluster 175376 refcount=1 reference=0 Leaked cluster 175377 refcount=1 reference=0 Leaked cluster 175378 refcount=1 reference=0 Leaked cluster 175379 refcount=1 reference=0 Leaked cluster 175380 refcount=1 reference=0 Leaked cluster 175381 refcount=1 reference=0 Leaked cluster 175382 refcount=1 reference=0 Leaked cluster 175383 refcount=1 reference=0 Leaked cluster 175384 refcount=1 reference=0 Leaked cluster 175385 refcount=1 reference=0 Leaked cluster 175386 refcount=1 reference=0 Leaked cluster 175387 refcount=1 reference=0 Leaked cluster 175388 refcount=1 reference=0 Leaked cluster 175389 refcount=1 reference=0 Leaked cluster 175390 refcount=1 reference=0 Leaked cluster 175391 refcount=1 reference=0 Leaked cluster 175392 refcount=1 reference=0 Leaked cluster 175393 refcount=1 reference=0 Leaked cluster 175394 refcount=1 reference=0 156 leaked clusters were found on the image. This means waste of disk space, but no harm to data. 175203/327680 = 53.47% allocated, 5.20% fragmented, 0.00% compressed clusters Image end offset: 11494752256 And in latest version:qemu-kvm-core-6.1.0-1.module+el8.6.0+12721+8d053ff2.x86_64 and qemu-kvm-6.1.0-3.el9 also hit this issue. *** Bug 2030708 has been marked as a duplicate of this bug. *** Hi, from another case of this bug(https://bugzilla.redhat.com/show_bug.cgi?id=2030708#c0), an unprivileged user could help to cause the crash of qemu processs. So it is a possible DOS vulnerability. Add security keyword here. Hi Han,
> bug(https://bugzilla.redhat.com/show_bug.cgi?id=2030708#c0), an unprivileged
> user could help to cause the crash of qemu processs.
> So it is a possible DOS vulnerability.
> Add security keyword here.
Could you please elaborate on this? Doesn't the user need proper privileges to create a snapshot of the guest and/or execute a block commit operation? I'm not sure we should call it a security issue if that's the case, because I don't see any trust boundary crossed.
(In reply to Mauro Matteo Cascella from comment #17) > Hi Han, > > > bug(https://bugzilla.redhat.com/show_bug.cgi?id=2030708#c0), an unprivileged > > user could help to cause the crash of qemu processs. > > So it is a possible DOS vulnerability. > > Add security keyword here. > > Could you please elaborate on this? Doesn't the user need proper privileges OK. Let's see a method to reproduce this bug: https://bugzilla.redhat.com/show_bug.cgi?id=2030708#c0 The qemu segment fault is caused by the step of `ssh hhan@$IP dd if=/dev/urandom of=file bs=1G count=1`. It means at some conditions, a unprivileged user **hhan** inside the VM can cause the qemu segment fault. > to create a snapshot of the guest and/or execute a block commit operation? > I'm not sure we should call it a security issue if that's the case, because > I don't see any trust boundary crossed. The trust boundary is that an unprivileged user inside VM shouldn't make qemu crash. QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass. Test with qemu-kvm-6.2.0-1.el9, don't hit the core dump issue any more. (In reply to Han Han from comment #18) > The qemu segment fault is caused by the step of `ssh hhan@$IP dd > if=/dev/urandom of=file bs=1G count=1`. It means at some conditions, a > unprivileged user **hhan** inside the VM can cause the qemu segment fault. > The trust boundary is that an unprivileged user inside VM shouldn't make > qemu crash. OK, so if I understand correctly we can consider the various operations on the host (e.g., snapshot-create, blockcommit, domblkthreshold, etc.) as a precondition for this bug to happen. Under such circumstances the guest user writes a huge file and triggers the flaw. No matter how likely these preconditions are, I agree that this should not happen. I think we can opt for a low-severity CVE here. Hi,Hanhan Please help to check if the security issue still exist in qemu-kvm-6.2.0-1.el9? If not, will change bug's status to "VERIFIED". Thanks, Aliang (In reply to aihua liang from comment #24) > Hi,Hanhan > > Please help to check if the security issue still exist in > qemu-kvm-6.2.0-1.el9? If not, will change bug's status to "VERIFIED". > > Thanks, > Aliang Well. My machine resources are occupied for other testings now. You can try to reproduce it as the scripts of https://bugzilla.redhat.com/show_bug.cgi?id=2030708 If it it not reproduced after thousands of loops, I think that means the bug has been fix. As comment25, It needs more time to check the CVE issue and as have no idle resouce at present. So re-set the ITM. Reproduce the security issue that hahan reported in bz2030708. Even we don't set the block threshold and use a root user, we can also trigger this issue. Reproduce ratio: 1/44 Test Env: kernel version:5.14.0-30.el9.x86_64 qemu-kvm version:qemu-kvm-6.1.0-8.el9 libvirt version:libvirt-7.10.0-1.el9.x86_64 Steps to Reproduce: 1. Prepare an VM named avocado-vt-vm1 2. Monitor the events of libvirt # virsh event avocado-vt-vm1 --loop --all 3. Run the scripts to write data when block job reach ready status #!/bin/bash - IP=192.168.122.156 # the IP of the guest VM=avocado-vt-vm1 while true;do virsh start $VM sleep 30 virsh snapshot-create $VM --no-metadata --disk-only virsh blockcommit $VM vda --active ssh root@$IP dd if=/dev/urandom of=file bs=1G count=1 sleep $(shuf -i 1-10 -n1) virsh blockjob $VM vda --pivot virsh destroy $VM if [ $? -ne 0 ];then break fi done Actual results: Sometime qemu will get segment fault: Domain 'avocado-vt-vm1' started Domain snapshot 1641461222 created Active Block Commit started 0+1 records in 0+1 records out 33554431 bytes (34 MB, 32 MiB) copied, 0.212443 s, 158 MB/s error: Requested operation is not valid: domain is not running error: Failed to destroy domain 'avocado-vt-vm1' error: Requested operation is not valid: domain is not running The event log: event 'agent-lifecycle' for domain 'avocado-vt-vm1': state: 'disconnected' reason: 'domain started' event 'lifecycle' for domain 'avocado-vt-vm1': Resumed Unpaused event 'lifecycle' for domain 'avocado-vt-vm1': Started Booted event 'agent-lifecycle' for domain 'avocado-vt-vm1': state: 'connected' reason: 'channel event' event 'rtc-change' for domain 'avocado-vt-vm1': -1 event 'block-job' for domain 'avocado-vt-vm1': Active Block Commit for /var/lib/avocado/data/avocado-vt/images/rhel900-64-virtio-scsi.1641461222 ready event 'block-job-2' for domain 'avocado-vt-vm1': Active Block Commit for vda ready event 'lifecycle' for domain 'avocado-vt-vm1': Stopped Failed Hi, Mauro From my reproduce result: The root caue of this issue is data-writing during block job running, no matter the user is root user or common user. So is it still a security-re issue? BR, Aliang Test the issue that hahan reported in bz2030708 for 900 times, all pass. So set bug's statu to "Verified". Hi Aliang, (In reply to aihua liang from comment #27) > From my reproduce result: > The root cause of this issue is data-writing during block job running, no > matter the user is root user or common user. > So is it still a security-re issue? Well, considering that an unprivileged user can trigger this bug (see comment 18), it doesn't come as a surprise that root is also able to trigger it. As long as you can make QEMU crash from within the guest, I think we should treat this as a (low) security issue. Regards. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (new packages: qemu-kvm), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2307 |