Description of problem: qemu-kvm crashed when starting guest with hugepage + cpu hotpluggable + kvm dirty-ring enabled Version-Release number of selected component (if applicable): libvirt-8.5.0-5.el9.x86_64 qemu-kvm-7.0.0-12.el9.x86_64 How reproducible: 100% Steps to Reproduce: 1.Edit guest xml with hugepage + cpu hotpluggable + kvm dirty-ring enabled setting: #virsh edit r9 <domain> ... <memoryBacking> <hugepages> <page size='2048' unit='KiB'/> </hugepages> </memoryBacking> <vcpus> <vcpu id='0' enabled='yes' hotpluggable='no' order='1'/> <vcpu id='1' enabled='yes' hotpluggable='no' order='2'/> <vcpu id='2' enabled='yes' hotpluggable='no' order='3'/> <vcpu id='3' enabled='yes' hotpluggable='no' order='4'/> <vcpu id='4' enabled='yes' hotpluggable='no' order='5'/> <vcpu id='5' enabled='yes' hotpluggable='no' order='6'/> <vcpu id='6' enabled='yes' hotpluggable='no' order='7'/> <vcpu id='7' enabled='yes' hotpluggable='no' order='8'/> <vcpu id='8' enabled='no' hotpluggable='yes'/> <vcpu id='9' enabled='no' hotpluggable='yes'/> <vcpu id='10' enabled='no' hotpluggable='yes'/> <vcpu id='11' enabled='no' hotpluggable='yes'/> <vcpu id='12' enabled='no' hotpluggable='yes'/> <vcpu id='13' enabled='no' hotpluggable='yes'/> <vcpu id='14' enabled='no' hotpluggable='yes'/> <vcpu id='15' enabled='no' hotpluggable='yes'/> </vcpus> <features> <acpi/> <apic/> <pae/> <kvm> <poll-control state='on'/> <pv-ipi state='off'/> <dirty-ring state='on' size='4096'/> </kvm> </features> ... </domain> 2.Start the guest: # virsh start r9 error: Failed to start domain 'r9' error: internal error: qemu unexpectedly closed the monitor: qemu-kvm: ../accel/kvm/kvm-all.c:737: uint32_t kvm_dirty_ring_reap_one(KVMState *, CPUState *): Assertion `dirty_gfns && ring_size' failed. 3.Check the coredump file: # coredumpctl list TIME PID UID GID SIG COREFILE EXE SIZE Tue 2022-09-06 23:01:06 EDT 38568 107 107 SIGABRT present /usr/libexec/qemu-kvm 747.2K 4.Check the backtrace: (gdb) t a a bt Thread 5 (Thread 0x7fac3bbff640 (LWP 44860)): #0 futex_wait (private=0, expected=2, futex_word=0x5623367c1688 <qemu_global_mutex>) at ../sysdeps/nptl/futex-internal.h:146 #1 __GI___lll_lock_wait (futex=futex@entry=0x5623367c1688 <qemu_global_mutex>, private=0) at lowlevellock.c:50 #2 0x00007facce410c22 in lll_mutex_lock_optimized (mutex=0x5623367c1688 <qemu_global_mutex>) at pthread_mutex_lock.c:49 #3 ___pthread_mutex_lock (mutex=0x5623367c1688 <qemu_global_mutex>) at pthread_mutex_lock.c:89 #4 0x000056233611ad7f in qemu_mutex_lock_impl (mutex=0x5623367c1688 <qemu_global_mutex>, file=0x80 <error: Cannot access memory at address 0x80>, line=2) at ../util/qemu-thread-posix.c:80 #5 0x0000562335eec972 in kvm_vcpu_thread_fn (arg=0x5623384d0300) at ../softmmu/cpus.c:503 #6 0x000056233611bbfa in qemu_thread_start (args=0x5623384e0250) at ../util/qemu-thread-posix.c:556 #7 0x00007facce40d802 in start_thread (arg=<optimized out>) at pthread_create.c:443 #8 0x00007facce3ad314 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100 Thread 4 (Thread 0x7faccaf8a640 (LWP 44859)): #0 0x00007facce4b071f in __GI___poll (fds=0x7facbc0035d0, nfds=3, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 #1 0x00007facce72359c in g_main_context_poll (priority=<optimized out>, n_fds=3, fds=0x7facbc0035d0, timeout=<optimized out>, context=0x5623384bbb00) at ../glib/gmain.c:4434 #2 g_main_context_iterate.constprop.0 (context=0x5623384bbb00, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4126 #3 0x00007facce6ce463 in g_main_loop_run (loop=0x5623382d3220) at ../glib/gmain.c:4329 #4 0x0000562335f3ba9f in iothread_run (opaque=0x562338364640) at ../iothread.c:74 #5 0x000056233611bbfa in qemu_thread_start (args=0x5623382d12f0) at ../util/qemu-thread-posix.c:556 #6 0x00007facce40d802 in start_thread (arg=<optimized out>) at pthread_create.c:443 #7 0x00007facce3ad314 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100 Thread 3 (Thread 0x7faccc390640 (LWP 44851)): #0 0x00007facce481845 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7faccc38f5e0, rem=rem@entry=0x7faccc38f5d0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48 #1 0x00007facce4863f7 in __GI___nanosleep (req=req@entry=0x7faccc38f5e0, rem=rem@entry=0x7faccc38f5d0) at ../sysdeps/unix/sysv--Type <RET> for more, q to quit, c to continue without paging-- /linux/nanosleep.c:25 #2 0x00007facce6f2ef7 in g_usleep (microseconds=<optimized out>) at ../glib/gtimer.c:277 #3 0x000056233612781a in call_rcu_thread (opaque=<optimized out>) at ../util/rcu.c:253 #4 0x000056233611bbfa in qemu_thread_start (args=0x56233827f4e0) at ../util/qemu-thread-posix.c:556 #5 0x00007facce40d802 in start_thread (arg=<optimized out>) at pthread_create.c:443 #6 0x00007facce3ad450 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 Thread 2 (Thread 0x7faccd395f00 (LWP 44708)): #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5623367c1640 <qemu_cpu_cond+40>) at futex-internal.c:57 #1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x5623367c1640 <qemu_cpu_cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at futex-internal.c:87 #2 0x00007facce40a3ff in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5623367c1640 <qemu_cpu_cond+40>, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at futex-internal.c:139 #3 0x00007facce40cba0 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5623367c1688 <qemu_global_mutex>, cond=0x5623367c1618 <qemu_cpu_cond>) at pthread_cond_wait.c:504 #4 ___pthread_cond_wait (cond=0x5623367c1618 <qemu_cpu_cond>, mutex=0x5623367c1688 <qemu_global_mutex>) at pthread_cond_wait.c:619 #5 0x000056233611b25f in qemu_cond_wait_impl (cond=0x5623367c1640 <qemu_cpu_cond+40>, mutex=0x189, file=0x0, line=-834624614) at ../util/qemu-thread-posix.c:195 #6 0x0000562335c89d37 in qemu_init_vcpu (cpu=0x5623384d0300) at ../softmmu/cpus.c:643 #7 0x0000562335d62b99 in x86_cpu_realizefn (dev=0x5623384d0300, errp=0x7faccd391ba0) at ../target/i386/cpu.c:6554 #8 0x0000562335efbdb1 in device_set_realized (obj=0x5623384d0300, value=true, errp=0x7faccd391bc8) at ../hw/core/qdev.c:531 #9 0x0000562335f06119 in property_set_bool (obj=0x5623384d0300, v=<optimized out>, name=<optimized out>, opaque=0x5623382f29a0, errp=0x7faccd391bc8) at ../qom/object.c:2273 #10 0x0000562335f017de in object_property_set (obj=0x5623384d0300, name=0x5623361c4aaf "realized", v=0x5623384dfaf0, errp=0x7faccd391bc8) at ../qom/object.c:1408 #11 0x0000562335f09aac in object_property_set_qobject (obj=0x5623384d0300, name=0x5623361c4aaf "realized", value=0x5623384bdce0--Type <RET> for more, q to quit, c to continue without paging-- , errp=0x5623367f1d48 <error_fatal>) at ../qom/qom-qobject.c:28 #12 0x0000562335f04647 in object_property_set_bool (obj=0x5623384d0300, name=0x5623361c4aaf "realized", value=true, errp=0x5623367f1d48 <error_fatal>) at ../qom/object.c:1477 #13 0x0000562335d2f750 in x86_cpu_new (x86ms=<optimized out>, apic_id=0, errp=0x5623367f1d48 <error_fatal>) at ../hw/core/qdev.c:333 #14 0x0000562335d2f86a in x86_cpus_init (x86ms=0x5623384318c0, default_cpu_version=<optimized out>) at ../hw/i386/x86.c:128 #15 0x0000562335d35b86 in pc_q35_init (machine=0x5623384318c0) at ../hw/i386/pc_q35.c:182 #16 0x0000562335babe77 in machine_run_board_init (machine=0x5623384318c0) at ../hw/core/machine.c:1416 #17 0x0000562335c9313d in qmp_x_exit_preconfig (errp=<optimized out>) at ../softmmu/vl.c:2665 #18 0x0000562335c9b34e in qemu_init (argc=<optimized out>, argv=0x7ffd3c6dde88, envp=<optimized out>) at ../softmmu/vl.c:3785 #19 0x0000562335b1f7bd in main (argc=914101824, argv=0x189, envp=0x0) at ../softmmu/main.c:49 Thread 1 (Thread 0x7facc9944640 (LWP 44855)): #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44 #1 0x00007facce40f5b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78 #2 0x00007facce3c2ce6 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #3 0x00007facce3967f3 in __GI_abort () at abort.c:79 #4 0x00007facce39671b in __assert_fail_base (fmt=<optimized out>, assertion=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>) at assert.c:92 #5 0x00007facce3bbc66 in __GI___assert_fail (assertion=0x5623361c262f "dirty_gfns && ring_size", file=0x5623361c21af "../accel/kvm/kvm-all.c", line=737, function=0x5623361c2647 "uint32_t kvm_dirty_ring_reap_one(KVMState *, CPUState *)") at assert.c:101 #6 0x0000562335ee5ccb in kvm_dirty_ring_reap_locked (s=0x5623384947d0) at ../accel/kvm/kvm-all.c:737 #7 0x0000562335ee591a in kvm_dirty_ring_reap (s=0x5623384947d0) at ../accel/kvm/kvm-all.c:810 #8 kvm_dirty_ring_reaper_thread (data=0x5623384947d0) at ../accel/kvm/kvm-all.c:1473 #9 0x000056233611bbfa in qemu_thread_start (args=0x5623382d0570) at ../util/qemu-thread-posix.c:556 #10 0x00007facce40d802 in start_thread (arg=<optimized out>) at pthread_create.c:443 #11 0x00007facce3ad314 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100 Actual results: qemu-kvm crashed when starting guest with hugepage + cpu hotpluggable + kvm dirty-ring enabled Expected results: Guest should start successfully. Additional info:
Test on qemu-kvm-7.0.0-12.el9.x86_64 && libvirt-8.5.0-5.el9.x86_64, found: 1. reproduce this bug with rhel 8.7 (q35+seabios) guest, can't reproduce with rhel 9.1 (q35+ovmf) guest; 2. reproduce through libvirt with dirty-ring and cputune configured, not specific to hugepage and hotplugged vcpus; 3. can't reproduce on qemu side with similar qemu commands that libvirt parses ************************************************** <vcpu placement='static' current='8'>16</vcpu> <cputune> <shares>2048</shares> <period>1000000</period> <quota>3000</quota> <global_period>1000000</global_period> <global_quota>4000</global_quota> <emulator_period>1000000</emulator_period> <emulator_quota>5000</emulator_quota> <vcpusched vcpus='0' scheduler='batch'/> <vcpusched vcpus='1' scheduler='batch'/> <vcpusched vcpus='2' scheduler='batch'/> <vcpusched vcpus='3' scheduler='batch'/> <vcpusched vcpus='4' scheduler='batch'/> <vcpusched vcpus='5' scheduler='batch'/> <vcpusched vcpus='6' scheduler='batch'/> <vcpusched vcpus='7' scheduler='batch'/> <vcpusched vcpus='8' scheduler='batch'/> <vcpusched vcpus='9' scheduler='batch'/> <vcpusched vcpus='10' scheduler='batch'/> <vcpusched vcpus='11' scheduler='batch'/> <vcpusched vcpus='12' scheduler='batch'/> <vcpusched vcpus='13' scheduler='batch'/> <vcpusched vcpus='14' scheduler='batch'/> <vcpusched vcpus='15' scheduler='batch'/> </cputune> <features> <acpi/> <apic/> <kvm> <poll-control state='on'/> <pv-ipi state='off'/> <dirty-ring state='on' size='4096'/> </kvm> </features> ************************************************* I found that with or without cputune, the qemu command lines that libvirt parses are same. So I think we need to know the cputune now. Nana, do you know the cputune? And please help needinfo the relevant dev or libvirt qe if you don't know, thanks in advance.
The qemu command lines that libvirt parses: /usr/libexec/qemu-kvm \ -name guest=rhel870,debug-threads=on \ -S \ -object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-rhel870/master-key.aes"}' \ -machine pc-q35-rhel9.0.0,usb=off,dump-guest-core=off,memory-backend=pc.ram \ -accel kvm,dirty-ring-size=4096 \ -cpu Skylake-Server-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rsba=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,kvm-poll-control=on,kvm-pv-ipi=off \ -m 2048 \ -object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":2147483648}' \ -overcommit mem-lock=off \ -smp 8,maxcpus=16,sockets=16,cores=1,threads=1 \ -uuid e18770d8-fb31-4e95-8e22-603608acfe40 \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=23,server=on,wait=off \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \ -device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \ -device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \ -device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \ -device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \ -device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \ -device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \ -device '{"driver":"pcie-root-port","port":23,"chassis":8,"id":"pci.8","bus":"pcie.0","addr":"0x2.0x7"}' \ -device '{"driver":"pcie-root-port","port":24,"chassis":9,"id":"pci.9","bus":"pcie.0","multifunction":true,"addr":"0x3"}' \ -device '{"driver":"pcie-root-port","port":25,"chassis":10,"id":"pci.10","bus":"pcie.0","addr":"0x3.0x1"}' \ -device '{"driver":"pcie-root-port","port":26,"chassis":11,"id":"pci.11","bus":"pcie.0","addr":"0x3.0x2"}' \ -device '{"driver":"pcie-root-port","port":27,"chassis":12,"id":"pci.12","bus":"pcie.0","addr":"0x3.0x3"}' \ -device '{"driver":"pcie-root-port","port":28,"chassis":13,"id":"pci.13","bus":"pcie.0","addr":"0x3.0x4"}' \ -device '{"driver":"pcie-root-port","port":29,"chassis":14,"id":"pci.14","bus":"pcie.0","addr":"0x3.0x5"}' \ -device '{"driver":"qemu-xhci","p2":15,"p3":15,"id":"usb","bus":"pci.2","addr":"0x0"}' \ -device '{"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.3","addr":"0x0"}' \ -device '{"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.4","addr":"0x0"}' \ -blockdev '{"driver":"file","filename":"/mnt/xiaohli/rhel870-64-virtio-scsi.qcow2","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device '{"driver":"scsi-hd","bus":"scsi0.0","channel":0,"scsi-id":0,"lun":0,"device_id":"drive-scsi0-0-0-0","drive":"libvirt-1-format","id":"scsi0-0-0-0","bootindex":1,"write-cache":"on"}' \ -netdev tap,fd=24,vhost=on,vhostfd=26,id=hostnet0 \ -device '{"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"f4:8e:38:c3:83:12","bus":"pci.1","addr":"0x0"}' \ -chardev pty,id=charserial0 \ -device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \ -chardev socket,id=charchannel0,fd=22,server=on,wait=off \ -device '{"driver":"virtserialport","bus":"virtio-serial0.0","nr":1,"chardev":"charchannel0","id":"channel0","name":"org.qemu.guest_agent.0"}' \ -device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' \ -audiodev '{"id":"audio1","driver":"none"}' \ -vnc 0.0.0.0:0,audiodev=audio1 \ -device '{"driver":"VGA","id":"video0","vgamem_mb":16,"bus":"pcie.0","addr":"0x1"}' \ -device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pci.5","addr":"0x0"}' \ -object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \ -device '{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.6","addr":"0x0"}' \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on
(In reply to Li Xiaohui from comment #2) > Test on qemu-kvm-7.0.0-12.el9.x86_64 && libvirt-8.5.0-5.el9.x86_64, found: > 1. reproduce this bug with rhel 8.7 (q35+seabios) guest, can't reproduce > with rhel 9.1 (q35+ovmf) guest; > 2. reproduce through libvirt with dirty-ring and cputune configured, not > specific to hugepage and hotplugged vcpus; > 3. can't reproduce on qemu side with similar qemu commands that libvirt > parses > ************************************************** > <vcpu placement='static' current='8'>16</vcpu> > <cputune> > <shares>2048</shares> > <period>1000000</period> > <quota>3000</quota> > <global_period>1000000</global_period> > <global_quota>4000</global_quota> > <emulator_period>1000000</emulator_period> > <emulator_quota>5000</emulator_quota> > <vcpusched vcpus='0' scheduler='batch'/> > <vcpusched vcpus='1' scheduler='batch'/> > <vcpusched vcpus='2' scheduler='batch'/> > <vcpusched vcpus='3' scheduler='batch'/> > <vcpusched vcpus='4' scheduler='batch'/> > <vcpusched vcpus='5' scheduler='batch'/> > <vcpusched vcpus='6' scheduler='batch'/> > <vcpusched vcpus='7' scheduler='batch'/> > <vcpusched vcpus='8' scheduler='batch'/> > <vcpusched vcpus='9' scheduler='batch'/> > <vcpusched vcpus='10' scheduler='batch'/> > <vcpusched vcpus='11' scheduler='batch'/> > <vcpusched vcpus='12' scheduler='batch'/> > <vcpusched vcpus='13' scheduler='batch'/> > <vcpusched vcpus='14' scheduler='batch'/> > <vcpusched vcpus='15' scheduler='batch'/> > </cputune> > <features> > <acpi/> > <apic/> > <kvm> > <poll-control state='on'/> > <pv-ipi state='off'/> > <dirty-ring state='on' size='4096'/> > </kvm> > </features> > ************************************************* > > > I found that with or without cputune, the qemu command lines that libvirt > parses are same. So I think we need to know the cputune now. > > Nana, do you know the cputune? Hi, The optional cputune element provides details regarding the CPU tunable parameters for the domain. refer to: https://libvirt.org/formatdomain.html#cpu-tuning QEMU doesn't have such parameter in command line, we can use tool 'taskset -pc 0-7 $qemu-pid' and 'chrt -p -b 0 $qemu-pid' separately to do the work. We can try this firstly. There are also some other cgroup configurations that we can't set them in qemu commandline directly, if needed I can find out more to check if we can set it by tools. Add lhuang to cc list in case I miss something here. Please correct me if I'm wrong. Best regards Nana Liu And please help needinfo the relevant dev or > libvirt qe if you don't know, thanks in advance.
Hi Yiqian, Please help provide the correspond qemu command lines or the shell commands about: <cputune> <shares>2048</shares> <period>1000000</period> <quota>3000</quota> <global_period>1000000</global_period> <global_quota>4000</global_quota> <emulator_period>1000000</emulator_period> <emulator_quota>5000</emulator_quota> ... </cputune> Thanks.
Guest fails to start with some call trace if configure cgroup: 1.Boot guest with below CPU commands, other qemu commands same as Comment 3: -smp 16,sockets=16,cores=1,threads=1 \ 2.Execute below commands on host # taskset -pc 0-7 $qemu-pid # chrt -p -b 0 $qemu-pid 3.Config cgroupV2: # cd /sys/fs/cgroup # mkdir blue # echo 5000 > /sys/fs/cgroup/blue/cpu.max # echo $qemu-pid > /sys/fs/cgroup/blue/cgroup.procs 4.Cont guest through hmp: (qemu) cont Actual result: After step 4, wait long times(may be one hour left and right), guest try to start, but finally hit core dump, fail to start. I can't get the dump info. Yiqian, can you help check if the above configures are right? If right, we shall go on to confirm whether this issue is the same as this bug.
Sorry, correct one command in Comment 7 from Step 2: # taskset -pc 0-15 $qemu-pid
(In reply to Li Xiaohui from comment #7) > Guest fails to start with some call trace if configure cgroup: > 1.Boot guest with below CPU commands, other qemu commands same as Comment 3: > -smp 16,sockets=16,cores=1,threads=1 \ > 2.Execute below commands on host > # taskset -pc 0-7 $qemu-pid > # chrt -p -b 0 $qemu-pid > 3.Config cgroupV2: > # cd /sys/fs/cgroup > # mkdir blue > # echo 5000 > /sys/fs/cgroup/blue/cpu.max > # echo $qemu-pid > /sys/fs/cgroup/blue/cgroup.procs > 4.Cont guest through hmp: > (qemu) cont > > > Actual result: > After step 4, wait long times(may be one hour left and right), guest try to > start, but finally hit core dump, fail to start. I can't get the dump info. > > > Yiqian, can you help check if the above configures are right? the above configures are right. > If right, we shall go on to confirm whether this issue is the same as this > bug.
Hi all, If I changed quota to be equal with period value, then guest start successfully: <cputune> <shares>2048</shares> <period>1000000</period> <quota>1000000</quota> <global_period>1000000</global_period> <global_quota>1000000</global_quota> <emulator_period>1000000</emulator_period> <emulator_quota>1000000</emulator_quota> ..... </cputune>
as https://docs.kernel.org/admin-guide/cgroup-v2.html says the cpu.max has following meaning: cpu.max A read-write two value file which exists on non-root cgroups. The default is “max 100000”. The maximum bandwidth limit. It’s in the following format: $MAX $PERIOD which indicates that the group may consume upto $MAX in each $PERIOD duration. “max” for $MAX indicates no limit. If only one number is written, $MAX is updated. So in previous test we used: <cputune> <shares>2048</shares> <period>1000000</period> <quota>3000</quota> <global_period>1000000</global_period> <global_quota>4000</global_quota> <emulator_period>1000000</emulator_period> <emulator_quota>5000</emulator_quota> ... Such as the global_period/quota is 1000000/4000, which means the vm process can only use 4000/1000000 = 0.4% cpu capacity. This maybe why vm process killed. If the crash is due to this, this maybe not a issue.
Hi Pavel, Peter, can you help check if we can close this bug as not a bug per Comment 10, Comment 11?
As no response till now, I would recommend close as not a bug per Comment 11. Please contact me if you have any questions, or reopen the bug if anyone thinks it's worth fixing. Thanks.
Xiaohui, Sorry I missed the message, and only get attention again after 3 days notice.. Irrelevant of the specific cgroup configuration, I think you're right QEMU shouldn't crash. In this case, probably due to the cgroup limits the threads run in a special order so it triggered a possible crash of qemu we hardly reproduce. Here in comment 0 we're reaping the dirty ring of a vcpu during its creation and that's illegal. I'll post a patch shortly for this. The bug can be re-opened but with low priority as long as the customer cannot trigger it with any sane setup.
Hi Yan, do we have libvirt case that corresponds to this bug? If yes, please help add the polarion link of this case, thanks
(In reply to Li Xiaohui from comment #21) > Hi Yan, do we have libvirt case that corresponds to this bug? If yes, please > help add the polarion link of this case, thanks Hi Xiaohui, Sorry. There is no testcases in polarion for this bug.
Hi Peter, will we fix this bug on RHEL 9.3.0 since from your Comment 15, seems upstream has fixed this issue? If we're targeting 9.3.0, please help set the ITR (also DTM if you know the fix plan). Thanks in advance
Hi, Xiaohui, (In reply to Li Xiaohui from comment #23) > Hi Peter, will we fix this bug on RHEL 9.3.0 since from your Comment 15, > seems upstream has fixed this issue? Unfortunately upstream hasn't yet merged the fix. Let me needinfo Paolo for that. > > If we're targeting 9.3.0, please help set the ITR (also DTM if you know the > fix plan). Thanks in advance I'll update the entries after upstream plan consolidates. Thanks, Peter
I replied at https://lore.kernel.org/qemu-devel/3c9e06ce-3166-f7c4-cb56-6df123c145a2@redhat.com
Thank you all for the update
Patch merged and will be in 8.0-rc3. 56adee407f kvm: dirty-ring: Fix race with vcpu creation We'll get the fix in 2-3 weeks automatically Xiaohui, I don't know how to mark this bug, but I think it should be TestOnly after our upcoming c9s/rhel9.3 rebase to upstream qemu 8.0.0. Could you help to update corresponding fields? I know how to set ITR so I did. Thanks.
Hi Peter, (In reply to Peter Xu from comment #27) > Patch merged and will be in 8.0-rc3. > > 56adee407f kvm: dirty-ring: Fix race with vcpu creation > > We'll get the fix in 2-3 weeks automatically > > Xiaohui, I don't know how to mark this bug, but I think it should be > TestOnly after our upcoming c9s/rhel9.3 rebase to upstream qemu 8.0.0. > Could you help to update corresponding fields? I have added the qemu 8.0.0 rebased bug 2180898 on the Depends On field. Based Bug 2180898, I would set DTM 10, ITM 12. Feel free to correct if they're wrong. > > I know how to set ITR so I did. Thanks. I have marked qa_ack+, can you help set devel_ack please?
Sorry, forget this bug is hard to reproduce through qemu, but libvirt did easily. Found the libvirt rebase bug 2175785 for RHEL 9.3.0, but the DTM is 20. Too late for QE to test. So I would update ITM from 12 to 16. During these time, would do the basic tests about dirty-ring via qemu and try to find if have regression issues. Also we can mark this bug verified if find the fix really solves this bug when test through qemu or the libvirt rebase build comes out.
Xiaohui, (In reply to Li Xiaohui from comment #29) > So I would update ITM from 12 to 16. During these time, would do the basic > tests about dirty-ring via qemu Since there'll be no patch to backport, please feel free to set them with your best judgement. (In reply to Li Xiaohui from comment #28) > I have marked qa_ack+, can you help set devel_ack please? Done. Thanks!
QE bot(pre verify): Set 'Verified:Tested,SanityOnly' as gating/tier1 test pass.
Hi Peter, I can boot the guest through libvirt with the same configuration as Comment 2 which can reproduce this bug, The qemu and libvirt versions are libvirt-9.2.0-1.el9.x86_64 and qemu-kvm-8.0.0-1.el9.x86_64. [root@hp-dl385g10-14 home]# virsh start rhel880 Domain 'rhel880' started [root@hp-dl385g10-14 home]# virsh list --all Id Name State ------------------------- 1 rhel880 running The guest would crash when start up. I think it's an expected result, right?
(In reply to Li Xiaohui from comment #34) > The guest would crash when start up. I think it's an expected result, right? Not really.. The fix should avoid qemu from crashing. It should have no impact on guest OS. If QEMU would crash before and now it won't, then I assume current issue (at least what we thought) is fixed. But something else might be wrong. Maybe there's some specific config that failed the VM boot after our 8.0 rebase? I'd try start with a simplest VM config when the guest can still boot (with the same VM image, just to make sure the image is fine), then grow the config until the guest crash can hit.
Per Comment 34 && Comment 36, we can mark this bug verified as the product issue from Description has been fixed. I tried again on the latest qemu-kvm-8.0.0-5.el9.x86_64, guest hang in the boot stage if boot vm with below libvirt configure: <cputune> <shares>2048</shares> <period>1000000</period> <quota>3000</quota> <global_period>1000000</global_period> <global_quota>4000</global_quota> <emulator_period>1000000</emulator_period> <emulator_quota>5000</emulator_quota> <vcpusched vcpus='0' scheduler='batch'/> <vcpusched vcpus='1' scheduler='batch'/> <vcpusched vcpus='2' scheduler='batch'/> <vcpusched vcpus='3' scheduler='batch'/> <vcpusched vcpus='4' scheduler='batch'/> <vcpusched vcpus='5' scheduler='batch'/> <vcpusched vcpus='6' scheduler='batch'/> <vcpusched vcpus='7' scheduler='batch'/> </cputune> Note: no difference if we see qemu cmds with/without above cputune configure. I will file a new bug if need
Hi Yan, Yi Per Comment 37, we still can't boot VM successfully on RHEL 9.3.0 with above cputune configured. Can you help confirm if it's a bug and file one if needed? I am not familiar with cputune and the issue is not easy to reproduce through qemu.
(In reply to Li Xiaohui from comment #38) > Hi Yan, Yi > Per Comment 37, we still can't boot VM successfully on RHEL 9.3.0 with above > cputune configured. > > Can you help confirm if it's a bug and file one if needed? > I am not familiar with cputune and the issue is not easy to reproduce > through qemu. Hi Xiaohui, I can not reproduce the issue. The guest can boot successfully with: host kernel: kernel-5.14.0-325.el9.x86_64 guest kernel: Kernel 5.14.0-331.el9.x86_64 libvirt-9.3.0-2.el9.x86_64 qemu-kvm-8.0.0-5.el9.x86_64
Confirmed with Luyao, the behavior in comment 37 is accetpable. The quota related parameters are to specify the maximum allowed vcpu bandwidth, refer to https://libvirt.org/formatdomain.html#cpu-tuning. We have to be very careful when setting these parameters, since low quota settings could cause the guest to be very very slow, which seems like hang.