Bug 1892669
| Summary: | VM start hang - qemu stuck in query-balloon | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | chhu | ||||||||||
| Component: | qemu-kvm | Assignee: | Marcelo Tosatti <mtosatti> | ||||||||||
| qemu-kvm sub component: | Devices | QA Contact: | Pei Zhang <pezhang> | ||||||||||
| Status: | CLOSED DEFERRED | Docs Contact: | |||||||||||
| Severity: | low | ||||||||||||
| Priority: | low | CC: | broskos, chayang, jinzhao, junzhao, juzhang, kchamart, mhou, mkletzan, mprivozn, mtosatti, nilal, pezhang, virt-maint, yanghliu, ymankad, yuhuang | ||||||||||
| Version: | unspecified | Keywords: | Reopened, Triaged | ||||||||||
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||||||||
| Target Release: | --- | ||||||||||||
| Hardware: | x86_64 | ||||||||||||
| OS: | Linux | ||||||||||||
| Whiteboard: | libvirt_OSP_INT | ||||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | Environment: | ||||||||||||
| Last Closed: | 2023-02-19 07:27:44 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | |||||||||||||
| Bug Blocks: | 1883636, 1922007 | ||||||||||||
| Attachments: |
|
||||||||||||
Created attachment 1725057 [details]
Guest xml
Created attachment 1725058 [details]
client_hang_libvirtd.log
Chenli and I were trying to reproduce another bug (bug 1821277) and when we hit this I dumped the stack trace: #0 0x00007f2aecc7048c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 #1 0x000056513d9e1aed in qemu_cond_wait_impl (cond=<optimized out>, mutex=0x56513e254fa0 <qemu_global_mutex>, file=0x56513da850b0 "/builddir/build/BUILD/qemu-4.2.0/cpus.c", line=1275) at util/qemu-thread-posix.c:173 #2 0x000056513d6b30f7 in qemu_wait_io_event (cpu=0x56513eadf7b0) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/cpus.c:1275 #3 0x000056513d6b4b58 in qemu_kvm_cpu_thread_fn (arg=0x56513eadf7b0) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/cpus.c:1323 #4 0x000056513d9e1734 in qemu_thread_start (args=0x56513eb06c90) at util/qemu-thread-posix.c:519 #5 0x00007f2aecc6a2de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f2aec99be83 in clone () from target:/lib64/libc.so.6 Thread 7 (Thread 0x7f2adf6fc700 (LWP 201805)): #0 0x00007f2aecc7048c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 #1 0x000056513d9e1aed in qemu_cond_wait_impl (cond=<optimized out>, mutex=0x56513e254fa0 <qemu_global_mutex>, file=0x56513da850b0 "/builddir/build/BUILD/qemu-4.2.0/cpus.c", line=1275) at util/qemu-thread-posix.c:173 #2 0x000056513d6b30f7 in qemu_wait_io_event (cpu=0x56513eab7880) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/cpus.c:1275 #3 0x000056513d6b4b58 in qemu_kvm_cpu_thread_fn (arg=0x56513eab7880) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/cpus.c:1323 #4 0x000056513d9e1734 in qemu_thread_start (args=0x56513eadef70) at util/qemu-thread-posix.c:519 #5 0x00007f2aecc6a2de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f2aec99be83 in clone () from target:/lib64/libc.so.6 Thread 6 (Thread 0x7f2adeefb700 (LWP 201804)): #0 0x00007f2aecc7048c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 #1 0x000056513d9e1aed in qemu_cond_wait_impl (cond=<optimized out>, mutex=0x56513e254fa0 <qemu_global_mutex>, file=0x56513da850b0 "/builddir/build/BUILD/qemu-4.2.0/cpus.c", line=1275) at util/qemu-thread-posix.c:173 #2 0x000056513d6b30f7 in qemu_wait_io_event (cpu=0x56513ea8ed10) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/cpus.c:1275 #3 0x000056513d6b4b58 in qemu_kvm_cpu_thread_fn (arg=0x56513ea8ed10) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/cpus.c:1323 #4 0x000056513d9e1734 in qemu_thread_start (args=0x56513eab6240) at util/qemu-thread-posix.c:519 #5 0x00007f2aecc6a2de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f2aec99be83 in clone () from target:/lib64/libc.so.6 Thread 5 (Thread 0x7f2ade6fa700 (LWP 201803)): #0 0x00007f2aecc7048c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 #1 0x000056513d9e1aed in qemu_cond_wait_impl (cond=<optimized out>, mutex=0x56513e254fa0 <qemu_global_mutex>, file=0x56513da850b0 "/builddir/build/BUILD/qemu-4.2.0/cpus.c", line=1275) at util/qemu-thread-posix.c:173 #2 0x000056513d6b30f7 in qemu_wait_io_event (cpu=0x56513ea666a0) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/cpus.c:1275 #3 0x000056513d6b4b58 in qemu_kvm_cpu_thread_fn (arg=0x56513ea666a0) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/cpus.c:1323 #4 0x000056513d9e1734 in qemu_thread_start (args=0x56513ea8e690) at util/qemu-thread-posix.c:519 #5 0x00007f2aecc6a2de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f2aec99be83 in clone () from target:/lib64/libc.so.6 Thread 4 (Thread 0x7f2addef9700 (LWP 201801)): #0 0x00007f2aecc7048c in pthread_cond_wait@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0 #1 0x000056513d9e1aed in qemu_cond_wait_impl (cond=<optimized out>, mutex=0x56513e254fa0 <qemu_global_mutex>, file=0x56513da850b0 "/builddir/build/BUILD/qemu-4.2.0/cpus.c", line=1275) at util/qemu-thread-posix.c:173 #2 0x000056513d6b30f7 in qemu_wait_io_event (cpu=0x56513ea13df0) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/cpus.c:1275 #3 0x000056513d6b4b58 in qemu_kvm_cpu_thread_fn (arg=0x56513ea13df0) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/cpus.c:1323 #4 0x000056513d9e1734 in qemu_thread_start (args=0x56513ea3c6c0) at util/qemu-thread-posix.c:519 #5 0x00007f2aecc6a2de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f2aec99be83 in clone () from target:/lib64/libc.so.6 Thread 3 (Thread 0x7f2add6f8700 (LWP 201800)): #0 0x00007f2aec990f21 in poll () from target:/lib64/libc.so.6 #1 0x00007f2af115f9b6 in g_main_context_iterate.isra () from target:/lib64/libglib-2.0.so.0 #2 0x00007f2af115fd72 in g_main_loop_run () from target:/lib64/libglib-2.0.so.0 #3 0x000056513d7b8771 in iothread_run (opaque=0x56513e994c00) at iothread.c:82 #4 0x000056513d9e1734 in qemu_thread_start (args=0x56513ea11880) at util/qemu-thread-posix.c:519 #5 0x00007f2aecc6a2de in start_thread () from target:/lib64/libpthread.so.0 #6 0x00007f2aec99be83 in clone () from target:/lib64/libc.so.6 Thread 2 (Thread 0x7f2ae5114700 (LWP 201785)): #0 0x00007f2aec9966ed in syscall () from target:/lib64/libc.so.6 #1 0x000056513d9e1f9f in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /usr/src/debug/qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64/include/qemu/futex.h:29 #2 qemu_event_wait (ev=ev@entry=0x56513e2852e8 <rcu_call_ready_event>) at util/qemu-thread-posix.c:459 #3 0x000056513d9f4202 in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:260 #4 0x000056513d9e1734 in qemu_thread_start (args=0x56513e8baac0) at util/qemu-thread-posix.c:519 #5 0x00007f2aecc6a2de in start_thread () from target:/lib64/libpthread.so.0 --Type <RET> for more, q to quit, c to continue without paging-- #6 0x00007f2aec99be83 in clone () from target:/lib64/libc.so.6 Thread 1 (Thread 0x7f2af1cf5f00 (LWP 201751)): #0 0x00007f2aec991016 in ppoll () from target:/lib64/libc.so.6 #1 0x000056513d9dd625 in ppoll (__ss=0x0, __timeout=0x7ffc9a3006d0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=962000000) at util/qemu-timer.c:348 #3 0x000056513d9de4c5 in os_host_main_loop_wait (timeout=962000000) at util/main-loop.c:237 #4 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 #5 0x000056513d7bdda1 in main_loop () at vl.c:1828 #6 0x000056513d669852 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:450 Hi Chenli,
IIUC, when libvirt start vm, vm is in paused status at very first begining(as '-S' in qemu cli), then a 'cont' cmd is sent to qemu monitor to resume vm. So if run virsh list just before 'cont' cmd, the vm state should be paused as expect.
Below is my test, 'paused' state only appears the second when vm starts, then vm goes into running state.
# for i in {1..200}; do date; virsh start rhel8; sleep 5; date; virsh destroy rhel8; sleep 2; done
...
Fri Oct 30 03:18:44 EDT 2020
Domain rhel8 destroyed
Fri Oct 30 03:18:46 EDT 2020
Domain rhel8 started
Fri Oct 30 03:18:52 EDT 2020
Domain rhel8 destroyed
...
# for i in {1..5000}; do date; virsh list --all; sleep 1; done
...
Fri Oct 30 03:18:43 EDT 2020
Id Name State
------------------------
465 rhel8 running
Fri Oct 30 03:18:44 EDT 2020
Id Name State
------------------------
- rhel8 shut off
Fri Oct 30 03:18:45 EDT 2020
Id Name State
------------------------
- rhel8 shut off
Fri Oct 30 03:18:46 EDT 2020
Id Name State
-----------------------
466 rhel8 paused
Fri Oct 30 03:18:47 EDT 2020
Id Name State
------------------------
466 rhel8 running
Fri Oct 30 03:18:48 EDT 2020
Id Name State
------------------------
466 rhel8 running
...
I ran the test on 8.3 av,
qemu-kvm-5.1.0-13.module+el8.3.0+8382+afc3bbea
libvirt-client-6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64
kernel-4.18.0-240.4.el8.x86_64
Would you please have a try with above cmds with your packages? Thanks.
Tried with server from KVM-RT testing pool, I failed to reproduce this issue with below versions. After start/destroy VM 400 times, VM doesn't pause. Versions: 4.18.0-193.30.1.rt13.79.el8_2.x86_64 libvirt-libs-6.0.0-25.4.module+el8.2.1+8060+c0c58169.x86_64 qemu-kvm-4.2.0-29.module+el8.2.1+7990+27f1e480.4.x86_64 Server info: # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 20 On-line CPU(s) list: 0-19 Thread(s) per core: 1 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz Stepping: 2 CPU MHz: 2297.448 BogoMIPS: 4594.74 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts md_clear flush_l1d I was able to reproduce with the xml Chenli attached on her host.
However, if remove cputune from the xml, the issue is gone, guest status changed from paused to running in the first 2 or 3 second always.
<cputune>
<vcpupin vcpu='0' cpuset='12'/>
<vcpupin vcpu='1' cpuset='14'/>
<vcpupin vcpu='2' cpuset='16'/>
<vcpupin vcpu='3' cpuset='18'/>
<vcpupin vcpu='4' cpuset='20'/>
<emulatorpin cpuset='11,13,15,17'/>
<emulatorsched scheduler='fifo' priority='1'/>
<vcpusched vcpus='0' scheduler='fifo' priority='1'/>
<vcpusched vcpus='1' scheduler='fifo' priority='1'/>
<vcpusched vcpus='2' scheduler='fifo' priority='1'/>
<vcpusched vcpus='3' scheduler='fifo' priority='1'/>
<vcpusched vcpus='4' scheduler='fifo' priority='1'/>
</cputune>
(In reply to Yumei Huang from comment #9) > I was able to reproduce with the xml Chenli attached on her host. > > However, if remove cputune from the xml, the issue is gone, guest status > changed from paused to running in the first 2 or 3 second always. > > <cputune> > <vcpupin vcpu='0' cpuset='12'/> > <vcpupin vcpu='1' cpuset='14'/> > <vcpupin vcpu='2' cpuset='16'/> > <vcpupin vcpu='3' cpuset='18'/> > <vcpupin vcpu='4' cpuset='20'/> > <emulatorpin cpuset='11,13,15,17'/> > <emulatorsched scheduler='fifo' priority='1'/> > <vcpusched vcpus='0' scheduler='fifo' priority='1'/> > <vcpusched vcpus='1' scheduler='fifo' priority='1'/> > <vcpusched vcpus='2' scheduler='fifo' priority='1'/> > <vcpusched vcpus='3' scheduler='fifo' priority='1'/> > <vcpusched vcpus='4' scheduler='fifo' priority='1'/> > </cputune> Also tried adding above cputune to my own xml, and ran 'virsh start' and 'virsh destroy' 2000 times on Chenli's rt host, sometimes guest stay paused for about 50 seconds before running. BTW, above cputune xml can't run on non-rt host, would hit error. # virsh start rhel8 error: Failed to start domain rhel8 error: Cannot set scheduler parameters for pid 504844: Operation not permitted (In reply to Yumei Huang from comment #10) > (In reply to Yumei Huang from comment #9) > > I was able to reproduce with the xml Chenli attached on her host. > > > > However, if remove cputune from the xml, the issue is gone, guest status > > changed from paused to running in the first 2 or 3 second always. > > > > <cputune> > > <vcpupin vcpu='0' cpuset='12'/> > > <vcpupin vcpu='1' cpuset='14'/> > > <vcpupin vcpu='2' cpuset='16'/> > > <vcpupin vcpu='3' cpuset='18'/> > > <vcpupin vcpu='4' cpuset='20'/> > > <emulatorpin cpuset='11,13,15,17'/> > > <emulatorsched scheduler='fifo' priority='1'/> > > <vcpusched vcpus='0' scheduler='fifo' priority='1'/> > > <vcpusched vcpus='1' scheduler='fifo' priority='1'/> > > <vcpusched vcpus='2' scheduler='fifo' priority='1'/> > > <vcpusched vcpus='3' scheduler='fifo' priority='1'/> > > <vcpusched vcpus='4' scheduler='fifo' priority='1'/> > > </cputune> > > Also tried adding above cputune to my own xml, and ran 'virsh start' and > 'virsh destroy' 2000 times on Chenli's rt host, sometimes guest stay paused > for about 50 seconds before running. > > > BTW, above cputune xml can't run on non-rt host, would hit error. > > # virsh start rhel8 > error: Failed to start domain rhel8 > error: Cannot set scheduler parameters for pid 504844: Operation not > permitted Test with the cputune part but without emulatorsched and vcpusched on non-rt host, start and destroy guest 5000 times, guest only stays paused for two seconds at most, then change to running. In conclusion, the issue is only reproducible on rt host with vcpusched. Luiz - given the above analysis, I'll assign to your team for further analysis. Probably need to adjust the component/subcomponent, but I wasn't sure what to choose. Hi Luiz, Chenli, On looking at the shared XML I didn't find anything unusual (there were a few things that I skip in my configuration but I don't think that those configs should cause the reported issue). However, I am wondering if there is any specific reason for using the 8.2.z batch#1 rt-kernel here? If not then as a first step we should reproduce the issue with batch#4 or preferably the latest batch#5 candidate. I tried reproducing the issue at my end with the reported VM config but I failed in doing so for over 300 start/destroy operations. @Chenli once you reproduce the issue with the latest 8.2.z kernel, please share the machine with me and I can further investigate this. Thanks Per Comment 14 Hi, I wanted to check if there is any update on the testing with the latest z-stream kernel? Thanks Hello Minxi, I didn't hit this issue in all the past 8.2.z testings. I would ask: did you hit this issue again in your recent testings? If yes, would you please share your server with Nitesh to debug once you hit it again? Thanks a lot. Best regards, Pei Sorry for late reply, we didn't run this test these days, let's wait for Minxi's reply to see if she hit it again on .z build in their testing. Created attachment 1743162 [details]
g3.xml
Created attachment 1743172 [details]
libvirtd.log
Here is the summary of the findings so far: On an rt-kernel, if we keep starting and destroying a VM with the emulator thread sched:priority set to fifo:1 it ends up in the paused state after some iterations and fails to recover. The minimal configuration with which I was able to reproduce this: <cputune> <vcpupin vcpu='0' cpuset='12'/> <vcpupin vcpu='1' cpuset='14'/> <emulatorpin cpuset='11,13'/> <emulatorsched scheduler='fifo' priority='1'/> <vcpusched vcpus='0' scheduler='fifo' priority='1'/> <vcpusched vcpus='1' scheduler='fifo' priority='1'/> </cputune> Another scenario that I tried was with only 1 CPU designated for the emulator thread. In this case, the VM fails to start for the first time itself. It just remains in the paused state. The above two could be two different issues or they could be a result of the same issue that is getting triggered when we have emulatoresched set to fifio:1. I am adding Martin who has worked on the BZ1580229 to see if he has any suggestions on how to find some meaningful information from the libvirt logs. In the meanwhile, I have started capturing some kernel traces to see if I can find something in there. The first log suggests that the qemu stops responding slightly after the daemon sets affinity for an I/O Thread. The I/O Thread does not seem to be pinned anywhere, so my guess is that it might sometimes starve the emulator thread out of its cpu time. I can't find anything else, at least for now. Try setting the I/O thread pinning using <iothreadpin/> or restricting its scheduler using <iothreadsched/> or not using io='threads' at least to test the theory. I wonder why the scheduler and pinning of I/O threads default to the same ones as the emulator. I guess we ought to fix that, but there probably was some reasoning behind it. Or maybe it defaulted to the common behaviour for vcpu pinning. Hi Martin, Thanks for taking a look. So, one change that I have made and forgot to mention in the guest XML was to set the sched:priority of vcpu0 as well to fifo:1 as that's what we use and recommend for KVM-RT. I also tried removing the io='threads', pinning the iothread to one of the housekeeping CPUs, and setting the iothreadsched to fifio but unfortunately, nothing helped. I think Minxi has started using the test box for some other testing so I couldn't look into the logs. Another interesting thing that is worth mentioning here is that we have recently found that the issue (Bug 1580229) for which this emulatorsched was introduced in the KVM-RT testing is not reproducible anymore. (In reply to Nitesh Narayan Lal from comment #30) Thanks for getting back to me so quickly. The vcpu0 change should not be significant. Did you also tried changing the scheduler of the I/O thread to some other scheduler? It already inherited the same pinning and scheduler settings as the emulator thread. I'd try changing it to something else, e.g. move it to a different housekeeping cpu (not any of those for the emulator) and/or change the scheduler to something else, ideally both. About removing the I/O thread, if this happens without io='threads' I would probably try forcing it to io='native' if you haven't tried that already. Also when this bug happens, could you check what threads and processes are running on the cpus that the emulator thread is assigned to? Another idea I had in the meantime was that maybe QEMU changed and it is now spawning some new threads/processes which might hog the CPU. Maybe? About the previous bug that stopped happening. Do you mean it stopped happening after upgrading to new versions (i.e. the ones with the fix)? That would be expected, no? If you mean it stopped happening even with the older libvirt, then it just confirms my doubt in the commit for the fix: "If the scheduler is set before vCPU0 cannot be moved into its cpu,cpuacct cgroup. While it is not yet known whether this is a bug or not..." -- https://www.redhat.com/archives/libvir-list/2019-May/msg00620.html (In reply to Martin Kletzander from comment #31) > (In reply to Nitesh Narayan Lal from comment #30) > Thanks for getting back to me so quickly. > > The vcpu0 change should not be significant. > > Did you also tried changing the scheduler of the I/O thread to some other > scheduler? It already inherited the same pinning and scheduler settings as Good point, I was not particularly sure about this. I didn't change the scheduler for iothread but I did try changing the emulator thread scheduler to batch and the issue was resolved with that. > the emulator thread. I'd try changing it to something else, e.g. move it to > a different housekeeping cpu (not any of those for the emulator) and/or > change the scheduler to something else, ideally both. So, I did move the iothread to a housekeeping CPU which is not used for any other purposes in the VM but it didn't help. I have lost access to the test environment, so I am trying to reproduce this in one of my boxes. I can try changing the iothread scheduler to something else along with pinning it to HK CPU once I reproduce this. > > About removing the I/O thread, if this happens without io='threads' I would > probably try forcing it to io='native' if you haven't tried that already. > > Also when this bug happens, could you check what threads and processes are > running on the cpus that the emulator thread is assigned to? Another idea I > had in the meantime was that maybe QEMU changed and it is now spawning some > new threads/processes which might hog the CPU. Maybe? I will try checking the process and threads running on the CPU to which iothread and emulator threads are pinned. > > About the previous bug that stopped happening. Do you mean it stopped > happening after upgrading to new versions (i.e. the ones with the fix)? Yes, and I think there is a gap in my understanding of the issue. I was under an impression that the failed reboot issue was happening because the emulator thread was starved due to other vcpu threads that were running with SCHED_FIFO. Then we added the emulatorsched option in the libvirt and as we explicitly started setting it with SCHED_FIFO as well the issue was resolved. > That would be expected, no? If you mean it stopped happening even with the > older libvirt, then it just confirms my doubt in the commit for the fix: > > "If the scheduler is set before vCPU0 cannot be moved into its cpu,cpuacct > cgroup. While it is not yet known whether this is a bug or not..." > -- https://www.redhat.com/archives/libvir-list/2019-May/msg00620.html I haven't tried with the older libvirt version. (In reply to Nitesh Narayan Lal from comment #32) Interesting. So making the emulator *not* run RT actually fixed it? Now I wonder whether reverting the patch for Bug 1580229 would also fix it. Let me know if you want to try that and I can do a scratch build. Unfortunately it also means that now I have no idea how it works. (In reply to Martin Kletzander from comment #33) > (In reply to Nitesh Narayan Lal from comment #32) > Interesting. So making the emulator *not* run RT actually fixed it? Now I > wonder whether reverting the patch for Bug 1580229 would also fix it. Let > me know if you want to try that and I can do a scratch build. Unfortunately > it also means that now I have no idea how it works. TBH I am not particularly sure about the root cause either. I have been trying to reproduce the issue at my end with emulator threads set to SCHED_FIFO and even by explicitly pinning the emulator and iothread to the same CPU but like the last time, I am again failing in doing so. @Mixi can you please make the environment available again for some more debugging? We can surely try the scratch build on Minxi's setup. Thanks, Minxi for confirming. The issue here is the usage of userspace based IOAPIC with the emulator thread that is running with SCHED_FIFO. In this scenario, the userspace IOAPIC thread is starved sometimes due to a higher priority emulator threat holding a mutex lock and never releasing it. This is the reason why either removing emulatorsched that is set to fifo:1 or setting it to batch or removing userspace IOAPIC resolved the issue. Since userspace-based IOAPIC is not supported with KVM-RT configuration as it requires the vcpu and emulator thread to be running with SCHED_FIFO, I am closing this BZ as Not A Bug. (In reply to Nitesh Narayan Lal from comment #38) > Thanks, Minxi for confirming. > > The issue here is the usage of userspace based IOAPIC with the emulator > thread that is running with SCHED_FIFO. In this scenario, the userspace > IOAPIC > thread is starved sometimes due to a higher priority emulator threat holding > a mutex lock and never releasing it. This is the reason why either removing > emulatorsched that is set to fifo:1 or setting it to batch or removing > userspace > IOAPIC resolved the issue. > > Since userspace-based IOAPIC is not supported with KVM-RT configuration as > it requires the vcpu and emulator thread to be running with SCHED_FIFO, I am > closing this BZ as Not A Bug. Nitesh, Attempting to change ioapic from "qemu" to "kvm" results in [root@dell-per430-11 ~]# virsh edit rhel8.2.0.z_rt_1vcpu error: XML error: IOMMU interrupt remapping requires split I/O APIC (ioapic driver='qemu') Failed. Try again? [y,n,i,f,?]: <iommu model='intel'> <driver intremap='on' caching_mode='on' iotlb='on'/> </iommu> https://wiki.qemu.org/Features/VT-d And guest vIOMMU, IIRC, is necessary for DPDK to properly enable the FPGA in the guest. Nitesh, do you have a guest with DPDK enabled in the virtlab testbox ? Maybe Brent can confirm as well. (reopening for now). (In reply to Marcelo Tosatti from comment #39) > (In reply to Nitesh Narayan Lal from comment #38) > > Thanks, Minxi for confirming. > > > > The issue here is the usage of userspace based IOAPIC with the emulator > > thread that is running with SCHED_FIFO. In this scenario, the userspace > > IOAPIC > > thread is starved sometimes due to a higher priority emulator threat holding > > a mutex lock and never releasing it. This is the reason why either removing > > emulatorsched that is set to fifo:1 or setting it to batch or removing > > userspace > > IOAPIC resolved the issue. > > > > Since userspace-based IOAPIC is not supported with KVM-RT configuration as > > it requires the vcpu and emulator thread to be running with SCHED_FIFO, I am > > closing this BZ as Not A Bug. > > Nitesh, > > Attempting to change ioapic from "qemu" to "kvm" results in But why do you have to manually change the ioapic from "qemu" to "kvm" since "kvm" is already the default mode? > > [root@dell-per430-11 ~]# virsh edit rhel8.2.0.z_rt_1vcpu > error: XML error: IOMMU interrupt remapping requires split I/O APIC (ioapic > driver='qemu') > Failed. Try again? [y,n,i,f,?]: > > <iommu model='intel'> > <driver intremap='on' caching_mode='on' iotlb='on'/> > </iommu> > > https://wiki.qemu.org/Features/VT-d > > And guest vIOMMU, IIRC, is necessary for DPDK to properly enable the > FPGA in the guest. > > Nitesh, do you have a guest with DPDK enabled in the virtlab testbox ? No. Thanks (In reply to Nitesh Narayan Lal from comment #41) > (In reply to Marcelo Tosatti from comment #39) > > (In reply to Nitesh Narayan Lal from comment #38) > > > Thanks, Minxi for confirming. > > > > > > The issue here is the usage of userspace based IOAPIC with the emulator > > > thread that is running with SCHED_FIFO. In this scenario, the userspace > > > IOAPIC > > > thread is starved sometimes due to a higher priority emulator threat holding > > > a mutex lock and never releasing it. This is the reason why either removing > > > emulatorsched that is set to fifo:1 or setting it to batch or removing > > > userspace > > > IOAPIC resolved the issue. > > > > > > Since userspace-based IOAPIC is not supported with KVM-RT configuration as > > > it requires the vcpu and emulator thread to be running with SCHED_FIFO, I am > > > closing this BZ as Not A Bug. > > > > Nitesh, > > > > Attempting to change ioapic from "qemu" to "kvm" results in > > But why do you have to manually change the ioapic from "qemu" to > "kvm" since "kvm" is already the default mode? > > > > > [root@dell-per430-11 ~]# virsh edit rhel8.2.0.z_rt_1vcpu > > error: XML error: IOMMU interrupt remapping requires split I/O APIC (ioapic > > driver='qemu') > > Failed. Try again? [y,n,i,f,?]: So that interrupt remapping works (see above). (In reply to Marcelo Tosatti from comment #42) > (In reply to Nitesh Narayan Lal from comment #41) <snip> > > > > > > Nitesh, > > > > > > Attempting to change ioapic from "qemu" to "kvm" results in > > > > But why do you have to manually change the ioapic from "qemu" to > > "kvm" since "kvm" is already the default mode? > > > > > > > > [root@dell-per430-11 ~]# virsh edit rhel8.2.0.z_rt_1vcpu > > > error: XML error: IOMMU interrupt remapping requires split I/O APIC (ioapic > > > driver='qemu') > > > Failed. Try again? [y,n,i,f,?]: > > So that interrupt remapping works (see above). Ah I see, so we need the userspace IOAPIC for IOMMU which is required by DPDK to properly enable the FPGA in the guest but with emulatorsched set to fifo:1 this might not work. In that case, we can explore two options: - A way by which the IOAPIC userspace thread can inherit the sched priority of emulator thread Or, - We will have to add another tag to specify the sched priority of the userspace IOAPIC thread, so that we can set it to fifo:1 as well. Martin may have more suggestions on this. Marcelo, Based on Brent's comment, is it right to conclude that we don't need vIOMMU for FPGA enablement? If so then we can close the bug or is there any other use-case where this might be required? (In reply to Nitesh Narayan Lal from comment #46) > Marcelo, Based on Brent's comment, is it right to conclude that we don't > need vIOMMU > for FPGA enablement? > > If so then we can close the bug or is there any other use-case where this > might be > required? Nitesh, I think we should understand the problem (its probably fixable). But perhaps not high prio for RHEL 8.4 (we should double check CISCO is not using IOMMU in the guest... do we have a guest kernel commandline from them?). You can reproduce it, correct ? Would have to look into what each thread is doing, when the lockup happens (i can help you with that if necessary). So based on the previous comments it doesn't look like that anyone is using vIOMMU (hence userspace IOAPIC) with KVM-RT at the moment. Hence, this looks like a low priority item, for now, however, it would still, be good to get this issue resolved. I am currently occupied with some other high priority items at the moment so I will get back to this once I have some free cycles. Thanks (In reply to Nitesh Narayan Lal from comment #38) > Thanks, Minxi for confirming. > > The issue here is the usage of userspace based IOAPIC with the emulator > thread that is running with SCHED_FIFO. In this scenario, the userspace > IOAPIC > thread is starved sometimes due to a higher priority emulator threat holding > a mutex lock and never releasing it. This is the reason why either removing > emulatorsched that is set to fifo:1 or setting it to batch or removing > userspace > IOAPIC resolved the issue. > > Since userspace-based IOAPIC is not supported with KVM-RT configuration as > it requires the vcpu and emulator thread to be running with SCHED_FIFO, I am > closing this BZ as Not A Bug. Some threads do not have SCHED_FIFO priority: [root@hp-dl388g10-02 ~]# ps -L -A -o pid,tname,time,cmd,policy,rtprio,lwp |grep qemu 58229 ? 00:01:24 /usr/libexec/qemu-kvm -name FF 1 58229 58229 ? 00:00:00 /usr/libexec/qemu-kvm -name TS - 58264 58229 ? 00:00:00 /usr/libexec/qemu-kvm -name TS - 58266 58229 ? 00:00:02 /usr/libexec/qemu-kvm -name TS - 58276 58229 ? 00:00:00 /usr/libexec/qemu-kvm -name TS - 58277 58229 ? 00:00:00 /usr/libexec/qemu-kvm -name FF 1 58278 58229 ? 00:00:00 /usr/libexec/qemu-kvm -name FF 1 58279 58229 ? 00:00:00 /usr/libexec/qemu-kvm -name FF 1 58280 58229 ? 00:00:00 /usr/libexec/qemu-kvm -name FF 1 58281 58302 pts/1 00:00:00 grep --color=auto qemu TS - 58302 Feb 18 14:39:12 hp-dl388g10-02 kernel: call_rcu R running task 0 58264 1 0x000803a4 Feb 18 14:39:12 hp-dl388g10-02 kernel: Call Trace: Feb 18 14:39:12 hp-dl388g10-02 kernel: ? __schedule+0x316/0x7c0 Feb 18 14:39:12 hp-dl388g10-02 kernel: schedule+0x39/0xd0 Feb 18 14:39:12 hp-dl388g10-02 kernel: futex_wait_queue_me+0xbb/0x110 Feb 18 14:39:12 hp-dl388g10-02 kernel: futex_wait+0x133/0x230 Feb 18 14:39:12 hp-dl388g10-02 kernel: ? finish_task_switch+0x108/0x300 Feb 18 14:39:12 hp-dl388g10-02 kernel: ? __switch_to+0x419/0x470 Feb 18 14:39:12 hp-dl388g10-02 kernel: do_futex+0x308/0x670 Feb 18 14:39:12 hp-dl388g10-02 kernel: ? __seccomp_filter+0x3e/0x490 Feb 18 14:39:12 hp-dl388g10-02 kernel: __x64_sys_futex+0x143/0x180 Feb 18 14:39:12 hp-dl388g10-02 kernel: do_syscall_64+0x87/0x1a0 Feb 18 14:39:12 hp-dl388g10-02 kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca Feb 18 14:39:12 hp-dl388g10-02 kernel: RIP: 0033:0x7f149beb96ed Feb 18 14:39:12 hp-dl388g10-02 kernel: Code: Bad RIP value. Feb 18 14:39:12 hp-dl388g10-02 kernel: worker R running task 0 58266 1 0x000803a0 Feb 18 14:39:12 hp-dl388g10-02 kernel: Call Trace: Feb 18 14:39:12 hp-dl388g10-02 kernel: ? __schedule+0x316/0x7c0 Feb 18 14:39:12 hp-dl388g10-02 kernel: ? _raw_spin_unlock_irqrestore+0x20/0x60 Feb 18 14:39:12 hp-dl388g10-02 kernel: ? hrtimer_start_range_ns+0x21f/0x390 Feb 18 14:39:12 hp-dl388g10-02 kernel: schedule+0x39/0xd0 Feb 18 14:39:12 hp-dl388g10-02 kernel: futex_wait_queue_me+0xbb/0x110 Feb 18 14:39:12 hp-dl388g10-02 kernel: futex_wait+0x133/0x230 Feb 18 14:39:12 hp-dl388g10-02 kernel: ? __hrtimer_init_sleeper+0x60/0x60 Feb 18 14:39:12 hp-dl388g10-02 kernel: do_futex+0x308/0x670 Feb 18 14:39:12 hp-dl388g10-02 kernel: ? __seccomp_filter+0x3e/0x490 Feb 18 14:39:12 hp-dl388g10-02 kernel: __x64_sys_futex+0x143/0x180 Feb 18 14:39:12 hp-dl388g10-02 kernel: do_syscall_64+0x87/0x1a0 Feb 18 14:39:12 hp-dl388g10-02 kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca Feb 18 14:39:12 hp-dl388g10-02 kernel: RIP: 0033:0x7f149c196082 Feb 18 14:39:12 hp-dl388g10-02 kernel: Code: Bad RIP value. Feb 18 14:39:12 hp-dl388g10-02 kernel: RSP: 002b:00007f149360f600 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca Feb 18 14:39:12 hp-dl388g10-02 kernel: RAX: ffffffffffffffda RBX: 00007f149360f6a0 RCX: 00007f149c196082 Feb 18 14:39:12 hp-dl388g10-02 kernel: RDX: 0000000000000000 RSI: 0000000000000189 RDI: 0000564372f2bbc8 Feb 18 14:39:12 hp-dl388g10-02 kernel: RBP: 0000564372f2bbc8 R08: 0000000000000000 R09: 00000000ffffffff Feb 18 14:39:12 hp-dl388g10-02 kernel: R10: 00007f149360f6a0 R11: 0000000000000246 R12: 0000000000000000 Feb 18 14:39:12 hp-dl388g10-02 kernel: R13: 0000000000000000 R14: 00007f149360f6a0 R15: 00007f149360f800 Feb 18 14:39:12 hp-dl388g10-02 kernel: IO mon_iothread R running task 0 58276 1 0x000843a0 Feb 18 14:39:12 hp-dl388g10-02 kernel: Call Trace: Feb 18 14:39:12 hp-dl388g10-02 kernel: ? __schedule+0x316/0x7c0 Feb 18 14:39:12 hp-dl388g10-02 kernel: ? ___preempt_schedule+0x16/0x18 Feb 18 14:39:12 hp-dl388g10-02 kernel: preempt_schedule_common+0x23/0x80 Feb 18 14:39:12 hp-dl388g10-02 kernel: ___preempt_schedule+0x16/0x18 Feb 18 14:39:12 hp-dl388g10-02 kernel: rt_mutex_postunlock+0x5a/0x60 Feb 18 14:39:12 hp-dl388g10-02 kernel: rt_mutex_futex_unlock+0xa1/0xb0 Feb 18 14:39:12 hp-dl388g10-02 kernel: rt_spin_unlock+0x39/0x40 Feb 18 14:39:12 hp-dl388g10-02 kernel: eventfd_write+0xbe/0x290 Feb 18 14:39:12 hp-dl388g10-02 kernel: ? migrate_enable+0x3a0/0x3a0 Feb 18 14:39:12 hp-dl388g10-02 kernel: vfs_write+0xa5/0x1a0 Feb 18 14:39:12 hp-dl388g10-02 kernel: ksys_write+0x52/0xc0 Feb 18 14:39:12 hp-dl388g10-02 kernel: do_syscall_64+0x87/0x1a0 Feb 18 14:39:12 hp-dl388g10-02 kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca Feb 18 14:39:12 hp-dl388g10-02 kernel: RIP: 0033:0x7f149c196af7 Feb 18 14:39:12 hp-dl388g10-02 kernel: Code: Bad RIP value. Feb 18 14:39:12 hp-dl388g10-02 kernel: RSP: 002b:00007f148bffd470 EFLAGS: 00000293 ORIG_RAX: 0000000000000001 Feb 18 14:39:12 hp-dl388g10-02 kernel: RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f149c196af7 Feb 18 14:39:12 hp-dl388g10-02 kernel: RDX: 0000000000000008 RSI: 00005643711a2eb8 RDI: 0000000000000004 Feb 18 14:39:12 hp-dl388g10-02 kernel: RBP: 00005643711a2eb8 R08: 0000000000000000 R09: 0000000000000000 Feb 18 14:39:12 hp-dl388g10-02 kernel: R10: 0000000000000019 R11: 0000000000000293 R12: 0000000000000008 Feb 18 14:39:12 hp-dl388g10-02 kernel: R13: 0000564372e9b1a0 R14: 0000000000000049 R15: 0000000000000168 Feb 18 14:39:12 hp-dl388g10-02 kernel: CPU 0/KVM S 0 58277 1 0x000843a0 Feb 18 14:39:12 hp-dl388g10-02 kernel: Call Trace: Feb 18 14:39:12 hp-dl388g10-02 kernel: ? __schedule+0x316/0x7c0 Feb 18 14:39:12 hp-dl388g10-02 kernel: schedule+0x39/0xd0 Feb 18 14:39:12 hp-dl388g10-02 kernel: futex_wait_queue_me+0xbb/0x110 Feb 18 14:39:12 hp-dl388g10-02 kernel: futex_wait+0x133/0x230 Feb 18 14:39:12 hp-dl388g10-02 kernel: ? get_futex_key+0x3a1/0x420 Feb 18 14:39:12 hp-dl388g10-02 kernel: do_futex+0x308/0x670 Feb 18 14:39:12 hp-dl388g10-02 kernel: ? __seccomp_filter+0x3e/0x490 Feb 18 14:39:12 hp-dl388g10-02 kernel: ? do_vfs_ioctl+0xa4/0x630 Feb 18 14:39:12 hp-dl388g10-02 kernel: __x64_sys_futex+0x143/0x180 Feb 18 14:39:12 hp-dl388g10-02 kernel: do_syscall_64+0x87/0x1a0 Feb 18 14:39:12 hp-dl388g10-02 kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca Feb 18 14:39:12 hp-dl388g10-02 kernel: RIP: 0033:0x7f149c19348c Feb 18 14:39:12 hp-dl388g10-02 kernel: Code: Bad RIP value. Feb 18 14:39:12 hp-dl388g10-02 kernel: RSP: 002b:00007f14909485e0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca Feb 18 14:39:12 hp-dl388g10-02 kernel: RAX: ffffffffffffffda RBX: 0000564372fabcf0 RCX: 00007f149c19348c Feb 18 14:39:12 hp-dl388g10-02 kernel: RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000564372fabd1c Feb 18 14:39:12 hp-dl388g10-02 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 000056437188c6e0 Feb 18 14:39:12 hp-dl388g10-02 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000056437186ffa0 Feb 18 14:39:12 hp-dl388g10-02 kernel: R13: 000000000000000b R14: 0000000000000000 R15: 0000564372fabd1c Agree with previous conclusion that this can be closed: its not a customer supported configuration, so closing again. Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. This bug has bounced into various closed states a few times and had it's stale state extended due to reopening; however, there's no indication in recent such changes what active adjustments are being made to resolve the problem. As it stands how, the stale date will occur during the current release cycle, but it doesn't seem we're any closer to resolution. Can you please provide an update on expectations for resolution? Is this actively being worked or is it just some low priority backlog task that could be tracked outside of bugzilla? tks - (In reply to John Ferlan from comment #71) > This bug has bounced into various closed states a few times and had it's > stale state extended due to reopening; however, there's no indication in > recent such changes what active adjustments are being made to resolve the > problem. > > As it stands how, the stale date will occur during the current release > cycle, but it doesn't seem we're any closer to resolution. > > Can you please provide an update on expectations for resolution? Is this > actively being worked or is it just some low priority backlog task that > could be tracked outside of bugzilla? > > tks - John, This is an item that we'd like to fix - but that is on lower priority backlog. Is there a problem to continue tracking this in bugzilla? There is valuable information in the BZ, thats why it would be interesting to keep it open. (In reply to Marcelo Tosatti from comment #72) > (In reply to John Ferlan from comment #71) > > This bug has bounced into various closed states a few times and had it's > > stale state extended due to reopening; however, there's no indication in > > recent such changes what active adjustments are being made to resolve the > > problem. > > > > As it stands how, the stale date will occur during the current release > > cycle, but it doesn't seem we're any closer to resolution. > > > > Can you please provide an update on expectations for resolution? Is this > > actively being worked or is it just some low priority backlog task that > > could be tracked outside of bugzilla? > > > > tks - > > John, > > This is an item that we'd like to fix - but that is on lower priority > backlog. > Is there a problem to continue tracking this in bugzilla? > > There is valuable information in the BZ, thats why it would be interesting > to keep it open. In theory not a problem to keep bugs open; however, we deal with RHEL processes (and bz bots) that handle stale/aging (e.g. bugs open longer than 18 months) bugs by setting a deadline date to resolve. As you've seen above when we reach that date, the bug gets closed. Then someone has reopened it (more than once). It's perfectly fine to extend the date, but for how long does one reasonably expect to keep a bug open without resolution? The conundrum is should bugzilla be used as a long term todo list or should it be used as a bug / feature tracker for things that are/can be addressed within 18 months. I try to be proactive and in an effort to reduce the number of bugs open that are in a backlog (>200 open bugs within Virt SST) I look to give a bump/push to the oldest bugs to make sure developers consider addressing them. The reality is if something isn't going to be fixed after 2+ years, what's the likelihood that in the next 6-12 months the problem will be addressed. Additionally, there's no customer case on this bug. Still as a former developer I also realize some bugs can take time to resolve and other priorities exist, but we still should at least update status from time to time especially when the stale date approaches or has to be extended. After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened. Changing the CLOSE reason to DEFERRED. As indicated by Marcelo in #c72 - we intend to fix this BZ, but we consider it a lower priority right now. Will re-open the BZ in the future if the priority ever changes. |
Description of problem: After starting and destroying VM for many times, `vm start` hang, qemu stuck in query-balloon Tested on packages: libvirt-daemon-kvm-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64 qemu-kvm-4.2.0-29.module+el8.2.1+7297+a825794d.x86_64 RT kernel: 4.18.0-193.rt13.51.el8.x86_64 Test steps: 1. Start VM successfully with xml: g2.xml, destroy the VM 2. Start and destroy the VM for 20 times # for i in {1..20}; do virsh start g2; sleep 5; virsh destroy g2; done 3. Try to start and destroy the VM for 200 time one terminal 1, and `virsh list --all` on terminate 2. The `vm start` hang on terminal 1, the VM status is paused. Terminal 1: # for i in {1..200}; do virsh start g2; sleep 1; virsh destroy g2; done Terminal 2: # for i in {1..500}; do virsh list --all; sleep 1; done Id Name State ----------------------- 1 g2 paused ... Additional information: - libvirtd.log - g2.xml