Bug 2001404

Summary: CVE-2021-4145 qemu-kvm: QEMU: NULL pointer dereference in mirror_wait_on_conflicts() in block/mirror.c [rhel-9.0]
Product: Red Hat Enterprise Linux 9 Reporter: Yanan Fu <yfu>
Component: qemu-kvmAssignee: Stefano Garzarella <sgarzare>
qemu-kvm sub component: Storage QA Contact: aihua liang <aliang>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: high CC: aliang, coli, hhan, kkiwi, mcascell, mrezanin, ngu, sgarzare, virt-maint, xfu, yfu
Version: 9.0Keywords: Security, SecurityTracking, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: qemu-kvm-6.2.0-1.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2002607 (view as bug list) Environment:
Last Closed: 2022-05-17 12:24:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2002607, 2018367, 2034602    

Description Yanan Fu 2021-09-06 02:20:26 UTC
Description of problem:
QEMU crash when commit snapshot to base during fio running in guest

Version-Release number of selected component (if applicable):
host:
qemu-kvm-6.1.0-1.el9
kernel-5.14.0-1.el9.x86_64


guest:
kernel-5.14.0-1.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. boot guest with system disk
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \



2. create snapshot
{'execute': 'blockdev-create', 'arguments': {'options': {'driver': 'file', 'filename': '/root/avocado/data/avocado-vt/sn1.qcow2', 'size': 21474836480}, 'job-id': 'file_sn1'}, 'id': 'GEFOF0Ok'}
{"timestamp": {"seconds": 1630893646, "microseconds": 640248}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "file_sn1"}}
{"timestamp": {"seconds": 1630893646, "microseconds": 640428}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "file_sn1"}}
{"return": {}, "id": "GEFOF0Ok"}
{"timestamp": {"seconds": 1630893646, "microseconds": 641472}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "file_sn1"}}
{"timestamp": {"seconds": 1630893646, "microseconds": 641542}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "file_sn1"}}
{"timestamp": {"seconds": 1630893646, "microseconds": 641601}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "file_sn1"}}
{"timestamp": {"seconds": 1630893646, "microseconds": 924017}, "event": "RTC_CHANGE", "data": {"offset": 0}}
{'execute': 'job-dismiss', 'arguments': {'id': 'file_sn1'}, 'id': 'y4Og0JTS'}
{"timestamp": {"seconds": 1630893652, "microseconds": 808242}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "file_sn1"}}
{"return": {}, "id": "y4Og0JTS"}
{"execute": "blockdev-add", "arguments": {"node-name": "file_sn1", "driver": "file", "filename": "/root/avocado/data/avocado-vt/sn1.qcow2", "aio": "threads", "auto-read-only": true, "discard": "unmap"}, "id": "7122LIdl"}
{"return": {}, "id": "7122LIdl"}
{"execute": "blockdev-create", "arguments": {"options": {"driver": "qcow2", "file": "file_sn1", "size": 21474836480}, "job-id": "drive_sn1"}, "id": "DuGg41Qk"}
{"timestamp": {"seconds": 1630893663, "microseconds": 559471}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "drive_sn1"}}
{"timestamp": {"seconds": 1630893663, "microseconds": 559602}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "drive_sn1"}}
{"return": {}, "id": "DuGg41Qk"}
{"timestamp": {"seconds": 1630893663, "microseconds": 564397}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "drive_sn1"}}
{"timestamp": {"seconds": 1630893663, "microseconds": 564473}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "drive_sn1"}}
{"timestamp": {"seconds": 1630893663, "microseconds": 564527}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "drive_sn1"}}
{"execute": "job-dismiss", "arguments": {"id": "drive_sn1"}, "id": "ltXHq8Bu"}
{"timestamp": {"seconds": 1630893667, "microseconds": 278363}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "drive_sn1"}}
{"return": {}, "id": "ltXHq8Bu"}
{"execute": "blockdev-add", "arguments": {"node-name": "drive_sn1", "driver": "qcow2", "file": "file_sn1", "read-only": false}, "id": "MHmlEvUW"}
{"return": {}, "id": "MHmlEvUW"}
{"execute": "blockdev-snapshot", "arguments": {"overlay": "drive_sn1", "node": "drive_image1"}, "id": "B4UIbVn9"}

3. run fio test in guest
/usr/bin/fio --name=stress --filename=/home/atest --ioengine=libaio --rw=write --direct=1 --bs=4K --size=2G --iodepth=256 --numjobs=256 --runtime=1800

4. commit snapshot to base during fio running
'execute': 'block-commit', 'arguments': {'device': 'drive_sn1', 'job-id': 'drive_sn1_vwgZ'}, 'id': 'nstFLfKJ'}
{"timestamp": {"seconds": 1630893702, "microseconds": 96539}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "drive_sn1_vwgZ"}}
{"timestamp": {"seconds": 1630893702, "microseconds": 96617}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "drive_sn1_vwgZ"}}
{"return": {}, "id": "nstFLfKJ"}

QEMU crash at this step



Actual results:
QEMU crash

Expected results:
commit snapshot to base during fio running success.

Additional info:
qemu command line:
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox on  \
    -machine q35,memory-backend=mem-machine_mem \
    -device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
    -device pcie-pci-bridge,id=pcie-pci-bridge-0,addr=0x0,bus=pcie-root-port-0  \
    -nodefaults \
    -device VGA,bus=pcie.0,addr=0x2 \
    -device i6300esb,bus=pcie-pci-bridge-0,addr=0x1 \
    -m 30720 \
    -object memory-backend-ram,size=30720M,id=mem-machine_mem  \
    -smp 20,maxcpus=20,cores=10,threads=1,dies=1,sockets=2  \
    -cpu 'Cascadelake-Server-noTSX',+kvm_pv_unhalt \
    -chardev socket,server=on,path=/tmp/avocado_2x9u27d2/monitor-qmpmonitor1-20210905-044708-NK08vaC8,id=qmp_id_qmpmonitor1,wait=off  \
    -mon chardev=qmp_id_qmpmonitor1,mode=control \
    -device ich9-usb-ehci1,id=usb1,addr=0x1d.0x7,multifunction=on,bus=pcie.0 \
    -device ich9-usb-uhci1,id=usb1.0,multifunction=on,masterbus=usb1.0,addr=0x1d.0x0,firstport=0,bus=pcie.0 \
    -device ich9-usb-uhci2,id=usb1.1,multifunction=on,masterbus=usb1.0,addr=0x1d.0x2,firstport=2,bus=pcie.0 \
    -device ich9-usb-uhci3,id=usb1.2,multifunction=on,masterbus=usb1.0,addr=0x1d.0x4,firstport=4,bus=pcie.0 \
    -device pcie-root-port,id=pcie-root-port-2,port=0x2,addr=0x1.0x2,bus=pcie.0,chassis=3 \
    -device qemu-xhci,id=usb2,bus=pcie-root-port-2,addr=0x0 \
    -device usb-tablet,id=usb-tablet1,bus=usb2.0,port=1 \
    -device pcie-root-port,id=pcie-root-port-3,port=0x3,addr=0x1.0x3,bus=pcie.0,chassis=4 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pcie-root-port-3,addr=0x0 \
    -blockdev node-name=file_image1,driver=file,auto-read-only=on,discard=unmap,aio=threads,filename=/home/kvm_autotest_root/images/rhel900-64-virtio-scsi.qcow2,cache.direct=on,cache.no-flush=off \
    -blockdev node-name=drive_image1,driver=qcow2,read-only=off,cache.direct=on,cache.no-flush=off,file=file_image1 \
    -device scsi-hd,id=image1,drive=drive_image1,write-cache=on \
    -device pcie-root-port,id=pcie-root-port-4,port=0x4,addr=0x1.0x4,bus=pcie.0,chassis=5 \
    -device virtio-net-pci,mac=9a:59:66:96:dd:3d,id=idU6Widf,netdev=idHu8x4m,bus=pcie-root-port-4,addr=0x0  \
    -netdev tap,id=idHu8x4m,vhost=on \
    -vnc :0  \
    -monitor stdio \

Comment 5 Stefano Garzarella 2021-09-09 15:42:11 UTC
(In reply to Yanan Fu from comment #4)
> Here is the backtrace:
> # gdb qemu-kvm core-qemu-kvm-380249-1630831981
> ...
> ...
> Core was generated by `/usr/libexec/qemu-kvm -S -name avocado-vt-vm1
> -sandbox on -machine q35,memory-b'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized
> out>, bytes=<optimized out>)
>     at ../block/mirror.c:172
> 172	                self->waiting_for_op = op;
> [Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))]
> (gdb) bt
> #0  mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized
> out>, bytes=<optimized out>)
>     at ../block/mirror.c:172
> #1  0x00005610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=<optimized
> out>) at ../block/mirror.c:491
> #2  0x00005610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at
> ../job.c:917
> #3  0x00005610c5f046c6 in coroutine_trampoline (i0=<optimized out>,
> i1=<optimized out>)
>     at ../util/coroutine-ucontext.c:173
> #4  0x00007f0909975820 in ?? () at
> ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
>    from /usr/lib64/libc.so.6
> #5  0x00007f090892e980 in ?? ()
> #6  0x0000000000000000 in ?? ()

The issue seems related to the following commit released with QEMU 6.1:
d44dae1a7c block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts
https://gitlab.com/qemu-project/qemu/-/commit/d44dae1a7cf782ec9235746ebb0e6c1a20dd7288

In mirror_iteration() we call mirror_wait_on_conflicts() with `self` parameter set to NULL.
Starting from commit d44dae1a7c we access `self` in mirror_wait_on_conflicts() without checking if it can be NULL.

I'll fix it.

Comment 8 Stefano Garzarella 2021-09-10 10:23:18 UTC
Patch posted upstream: https://lists.nongnu.org/archive/html/qemu-devel/2021-09/msg02750.html

Comment 14 aihua liang 2021-09-26 03:09:22 UTC
In qemu-kvm-6.1.0-2.el9 and qemu-kvm-6.1.0-1.module+el8.6.0+12535+4e2af250, image cluster leak after qemu crash.
[root@dell-per440-09 images]# qemu-img check rhel900-64-virtio.qcow2 
Leaked cluster 175239 refcount=1 reference=0
Leaked cluster 175240 refcount=1 reference=0
Leaked cluster 175241 refcount=1 reference=0
Leaked cluster 175242 refcount=1 reference=0
Leaked cluster 175243 refcount=1 reference=0
Leaked cluster 175244 refcount=1 reference=0
Leaked cluster 175245 refcount=1 reference=0
Leaked cluster 175246 refcount=1 reference=0
Leaked cluster 175247 refcount=1 reference=0
Leaked cluster 175248 refcount=1 reference=0
Leaked cluster 175249 refcount=1 reference=0
Leaked cluster 175250 refcount=1 reference=0
Leaked cluster 175251 refcount=1 reference=0
Leaked cluster 175252 refcount=1 reference=0
Leaked cluster 175253 refcount=1 reference=0
Leaked cluster 175254 refcount=1 reference=0
Leaked cluster 175255 refcount=1 reference=0
Leaked cluster 175256 refcount=1 reference=0
Leaked cluster 175257 refcount=1 reference=0
Leaked cluster 175258 refcount=1 reference=0
Leaked cluster 175259 refcount=1 reference=0
Leaked cluster 175260 refcount=1 reference=0
Leaked cluster 175261 refcount=1 reference=0
Leaked cluster 175262 refcount=1 reference=0
Leaked cluster 175263 refcount=1 reference=0
Leaked cluster 175264 refcount=1 reference=0
Leaked cluster 175265 refcount=1 reference=0
Leaked cluster 175266 refcount=1 reference=0
Leaked cluster 175267 refcount=1 reference=0
Leaked cluster 175268 refcount=1 reference=0
Leaked cluster 175269 refcount=1 reference=0
Leaked cluster 175270 refcount=1 reference=0
Leaked cluster 175271 refcount=1 reference=0
Leaked cluster 175272 refcount=1 reference=0
Leaked cluster 175273 refcount=1 reference=0
Leaked cluster 175274 refcount=1 reference=0
Leaked cluster 175275 refcount=1 reference=0
Leaked cluster 175276 refcount=1 reference=0
Leaked cluster 175277 refcount=1 reference=0
Leaked cluster 175278 refcount=1 reference=0
Leaked cluster 175279 refcount=1 reference=0
Leaked cluster 175280 refcount=1 reference=0
Leaked cluster 175281 refcount=1 reference=0
Leaked cluster 175282 refcount=1 reference=0
Leaked cluster 175283 refcount=1 reference=0
Leaked cluster 175284 refcount=1 reference=0
Leaked cluster 175285 refcount=1 reference=0
Leaked cluster 175286 refcount=1 reference=0
Leaked cluster 175287 refcount=1 reference=0
Leaked cluster 175288 refcount=1 reference=0
Leaked cluster 175289 refcount=1 reference=0
Leaked cluster 175290 refcount=1 reference=0
Leaked cluster 175291 refcount=1 reference=0
Leaked cluster 175292 refcount=1 reference=0
Leaked cluster 175293 refcount=1 reference=0
Leaked cluster 175294 refcount=1 reference=0
Leaked cluster 175295 refcount=1 reference=0
Leaked cluster 175296 refcount=1 reference=0
Leaked cluster 175297 refcount=1 reference=0
Leaked cluster 175298 refcount=1 reference=0
Leaked cluster 175299 refcount=1 reference=0
Leaked cluster 175300 refcount=1 reference=0
Leaked cluster 175301 refcount=1 reference=0
Leaked cluster 175302 refcount=1 reference=0
Leaked cluster 175303 refcount=1 reference=0
Leaked cluster 175304 refcount=1 reference=0
Leaked cluster 175305 refcount=1 reference=0
Leaked cluster 175306 refcount=1 reference=0
Leaked cluster 175307 refcount=1 reference=0
Leaked cluster 175308 refcount=1 reference=0
Leaked cluster 175309 refcount=1 reference=0
Leaked cluster 175310 refcount=1 reference=0
Leaked cluster 175311 refcount=1 reference=0
Leaked cluster 175312 refcount=1 reference=0
Leaked cluster 175313 refcount=1 reference=0
Leaked cluster 175314 refcount=1 reference=0
Leaked cluster 175315 refcount=1 reference=0
Leaked cluster 175316 refcount=1 reference=0
Leaked cluster 175317 refcount=1 reference=0
Leaked cluster 175318 refcount=1 reference=0
Leaked cluster 175319 refcount=1 reference=0
Leaked cluster 175320 refcount=1 reference=0
Leaked cluster 175321 refcount=1 reference=0
Leaked cluster 175322 refcount=1 reference=0
Leaked cluster 175323 refcount=1 reference=0
Leaked cluster 175324 refcount=1 reference=0
Leaked cluster 175325 refcount=1 reference=0
Leaked cluster 175326 refcount=1 reference=0
Leaked cluster 175327 refcount=1 reference=0
Leaked cluster 175328 refcount=1 reference=0
Leaked cluster 175329 refcount=1 reference=0
Leaked cluster 175330 refcount=1 reference=0
Leaked cluster 175331 refcount=1 reference=0
Leaked cluster 175332 refcount=1 reference=0
Leaked cluster 175333 refcount=1 reference=0
Leaked cluster 175334 refcount=1 reference=0
Leaked cluster 175335 refcount=1 reference=0
Leaked cluster 175336 refcount=1 reference=0
Leaked cluster 175337 refcount=1 reference=0
Leaked cluster 175338 refcount=1 reference=0
Leaked cluster 175339 refcount=1 reference=0
Leaked cluster 175340 refcount=1 reference=0
Leaked cluster 175341 refcount=1 reference=0
Leaked cluster 175342 refcount=1 reference=0
Leaked cluster 175343 refcount=1 reference=0
Leaked cluster 175344 refcount=1 reference=0
Leaked cluster 175345 refcount=1 reference=0
Leaked cluster 175346 refcount=1 reference=0
Leaked cluster 175347 refcount=1 reference=0
Leaked cluster 175348 refcount=1 reference=0
Leaked cluster 175349 refcount=1 reference=0
Leaked cluster 175350 refcount=1 reference=0
Leaked cluster 175351 refcount=1 reference=0
Leaked cluster 175352 refcount=1 reference=0
Leaked cluster 175353 refcount=1 reference=0
Leaked cluster 175354 refcount=1 reference=0
Leaked cluster 175355 refcount=1 reference=0
Leaked cluster 175356 refcount=1 reference=0
Leaked cluster 175357 refcount=1 reference=0
Leaked cluster 175358 refcount=1 reference=0
Leaked cluster 175359 refcount=1 reference=0
Leaked cluster 175360 refcount=1 reference=0
Leaked cluster 175361 refcount=1 reference=0
Leaked cluster 175362 refcount=1 reference=0
Leaked cluster 175363 refcount=1 reference=0
Leaked cluster 175364 refcount=1 reference=0
Leaked cluster 175365 refcount=1 reference=0
Leaked cluster 175366 refcount=1 reference=0
Leaked cluster 175367 refcount=1 reference=0
Leaked cluster 175368 refcount=1 reference=0
Leaked cluster 175369 refcount=1 reference=0
Leaked cluster 175370 refcount=1 reference=0
Leaked cluster 175371 refcount=1 reference=0
Leaked cluster 175372 refcount=1 reference=0
Leaked cluster 175373 refcount=1 reference=0
Leaked cluster 175374 refcount=1 reference=0
Leaked cluster 175375 refcount=1 reference=0
Leaked cluster 175376 refcount=1 reference=0
Leaked cluster 175377 refcount=1 reference=0
Leaked cluster 175378 refcount=1 reference=0
Leaked cluster 175379 refcount=1 reference=0
Leaked cluster 175380 refcount=1 reference=0
Leaked cluster 175381 refcount=1 reference=0
Leaked cluster 175382 refcount=1 reference=0
Leaked cluster 175383 refcount=1 reference=0
Leaked cluster 175384 refcount=1 reference=0
Leaked cluster 175385 refcount=1 reference=0
Leaked cluster 175386 refcount=1 reference=0
Leaked cluster 175387 refcount=1 reference=0
Leaked cluster 175388 refcount=1 reference=0
Leaked cluster 175389 refcount=1 reference=0
Leaked cluster 175390 refcount=1 reference=0
Leaked cluster 175391 refcount=1 reference=0
Leaked cluster 175392 refcount=1 reference=0
Leaked cluster 175393 refcount=1 reference=0
Leaked cluster 175394 refcount=1 reference=0

156 leaked clusters were found on the image.
This means waste of disk space, but no harm to data.
175203/327680 = 53.47% allocated, 5.20% fragmented, 0.00% compressed clusters
Image end offset: 11494752256

And in latest version:qemu-kvm-core-6.1.0-1.module+el8.6.0+12721+8d053ff2.x86_64 and qemu-kvm-6.1.0-3.el9 also hit this issue.

Comment 15 Han Han 2021-12-10 01:58:52 UTC
*** Bug 2030708 has been marked as a duplicate of this bug. ***

Comment 16 Han Han 2021-12-10 02:05:25 UTC
Hi, from another case of this bug(https://bugzilla.redhat.com/show_bug.cgi?id=2030708#c0), an unprivileged user could help to cause the crash of qemu processs.
So it is a possible DOS vulnerability.
Add security keyword here.

Comment 17 Mauro Matteo Cascella 2021-12-15 16:00:26 UTC
Hi Han,

> bug(https://bugzilla.redhat.com/show_bug.cgi?id=2030708#c0), an unprivileged
> user could help to cause the crash of qemu processs.
> So it is a possible DOS vulnerability.
> Add security keyword here.

Could you please elaborate on this? Doesn't the user need proper privileges to create a snapshot of the guest and/or execute a block commit operation? I'm not sure we should call it a security issue if that's the case, because I don't see any trust boundary crossed.

Comment 18 Han Han 2021-12-17 09:54:36 UTC
(In reply to Mauro Matteo Cascella from comment #17)
> Hi Han,
> 
> > bug(https://bugzilla.redhat.com/show_bug.cgi?id=2030708#c0), an unprivileged
> > user could help to cause the crash of qemu processs.
> > So it is a possible DOS vulnerability.
> > Add security keyword here.
> 
> Could you please elaborate on this? Doesn't the user need proper privileges
OK. Let's see a method to reproduce this bug: https://bugzilla.redhat.com/show_bug.cgi?id=2030708#c0

The qemu segment fault is caused by the step of `ssh hhan@$IP dd if=/dev/urandom of=file bs=1G count=1`. It means at some conditions, a unprivileged user **hhan** inside the VM can cause the qemu segment fault.
> to create a snapshot of the guest and/or execute a block commit operation?
> I'm not sure we should call it a security issue if that's the case, because
> I don't see any trust boundary crossed.
The trust boundary is that an unprivileged user inside VM shouldn't make qemu crash.

Comment 19 Yanan Fu 2021-12-20 12:45:07 UTC
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.

Comment 20 aihua liang 2021-12-21 08:18:16 UTC
Test with qemu-kvm-6.2.0-1.el9, don't hit the core dump issue any more.

Comment 21 Mauro Matteo Cascella 2021-12-21 09:45:33 UTC
(In reply to Han Han from comment #18)
> The qemu segment fault is caused by the step of `ssh hhan@$IP dd
> if=/dev/urandom of=file bs=1G count=1`. It means at some conditions, a
> unprivileged user **hhan** inside the VM can cause the qemu segment fault.

> The trust boundary is that an unprivileged user inside VM shouldn't make
> qemu crash.

OK, so if I understand correctly we can consider the various operations on the host (e.g., snapshot-create, blockcommit, domblkthreshold, etc.) as a precondition for this bug to happen. Under such circumstances the guest user writes a huge file and triggers the flaw. No matter how likely these preconditions are, I agree that this should not happen. I think we can opt for a low-severity CVE here.

Comment 24 aihua liang 2021-12-23 06:05:29 UTC
Hi,Hanhan

 Please help to check if the security issue still exist in qemu-kvm-6.2.0-1.el9? If not, will change bug's status to "VERIFIED".

Thanks,
Aliang

Comment 25 Han Han 2021-12-23 07:06:47 UTC
(In reply to aihua liang from comment #24)
> Hi,Hanhan
> 
>  Please help to check if the security issue still exist in
> qemu-kvm-6.2.0-1.el9? If not, will change bug's status to "VERIFIED".
> 
> Thanks,
> Aliang

Well. My machine resources are occupied for other testings now.
You can try to reproduce it as the scripts of https://bugzilla.redhat.com/show_bug.cgi?id=2030708
If it it not reproduced after thousands of loops, I think that means the bug has been fix.

Comment 26 aihua liang 2021-12-23 07:32:33 UTC
As comment25, It needs more time to check the CVE issue and as have no idle resouce at present. So re-set the ITM.

Comment 27 aihua liang 2022-01-06 11:02:22 UTC
Reproduce the security issue that hahan reported in bz2030708.

Even we don't set the block threshold and use a root user, we can also trigger this issue.

Reproduce ratio:
  1/44

Test Env:
  kernel version:5.14.0-30.el9.x86_64
  qemu-kvm version:qemu-kvm-6.1.0-8.el9
  libvirt version:libvirt-7.10.0-1.el9.x86_64

Steps to Reproduce:
1. Prepare an VM named avocado-vt-vm1
2. Monitor the events of libvirt
# virsh event avocado-vt-vm1 --loop --all

3. Run the scripts to write data when block job reach ready status
#!/bin/bash - 
IP=192.168.122.156 # the IP of the guest
VM=avocado-vt-vm1
while true;do
    virsh start $VM
    sleep 30
    virsh snapshot-create $VM --no-metadata --disk-only
    virsh blockcommit $VM vda --active
    ssh root@$IP dd if=/dev/urandom of=file bs=1G count=1
    sleep $(shuf -i 1-10 -n1)
    virsh blockjob $VM vda --pivot
    virsh destroy $VM
    if [ $? -ne 0 ];then
        break
    fi
done

Actual results:
Sometime qemu will get segment fault:

Domain 'avocado-vt-vm1' started

Domain snapshot 1641461222 created
Active Block Commit started
0+1 records in
0+1 records out
33554431 bytes (34 MB, 32 MiB) copied, 0.212443 s, 158 MB/s
error: Requested operation is not valid: domain is not running

error: Failed to destroy domain 'avocado-vt-vm1'
error: Requested operation is not valid: domain is not running

The event log:
event 'agent-lifecycle' for domain 'avocado-vt-vm1': state: 'disconnected' reason: 'domain started'
event 'lifecycle' for domain 'avocado-vt-vm1': Resumed Unpaused
event 'lifecycle' for domain 'avocado-vt-vm1': Started Booted
event 'agent-lifecycle' for domain 'avocado-vt-vm1': state: 'connected' reason: 'channel event'
event 'rtc-change' for domain 'avocado-vt-vm1': -1
event 'block-job' for domain 'avocado-vt-vm1': Active Block Commit for /var/lib/avocado/data/avocado-vt/images/rhel900-64-virtio-scsi.1641461222 ready
event 'block-job-2' for domain 'avocado-vt-vm1': Active Block Commit for vda ready
event 'lifecycle' for domain 'avocado-vt-vm1': Stopped Failed



Hi, Mauro

 From my reproduce result:
  The root caue of this issue is data-writing during block job running, no matter the user is root user or common user.
  So is it still a security-re issue?

BR,
Aliang

Comment 28 aihua liang 2022-01-07 11:46:24 UTC
Test the issue that hahan reported in bz2030708 for 900 times, all pass.

So set bug's statu to "Verified".

Comment 29 Mauro Matteo Cascella 2022-01-10 15:03:52 UTC
Hi Aliang,

(In reply to aihua liang from comment #27)
>  From my reproduce result:
>   The root cause of this issue is data-writing during block job running, no
> matter the user is root user or common user.
>   So is it still a security-re issue?

Well, considering that an unprivileged user can trigger this bug (see comment 18), it doesn't come as a surprise that root is also able to trigger it. As long as you can make QEMU crash from within the guest, I think we should treat this as a (low) security issue.

Regards.

Comment 31 errata-xmlrpc 2022-05-17 12:24:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: qemu-kvm), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2307