Description of problem: virt-manager is currently using 100% cpu, apparently in usb_ev_thread after a pair of migrations (but is otherwise working) Version-Release number of selected component (if applicable): virt-manager-1.0.0-6.fc20.noarch libvirt-client-1.1.3.4-3.fc20.x86_64 spice-gtk-0.23-2.fc20.x86_64 How reproducible: Not sure, just happened Steps to Reproduce: 1. Connect virt-manager to a remote host ('a') running a RHEL7 guest, open it's display and see it's running 2. use virsh on that host to migrate it to another host ('b') migrate --live --compressed --verbose --desturi qemu+ssh://b/system --domain rhel7b1-v3 3. Open the VM on the new host - see it's display, (leave the window open for the original guest) 4. On 'b' use virsh to migrate the guest back to a (Note this is happening while step 4 is in progress - they're long slow migrations). Actual results: virt-manager using 100% cpu apparently while not doing much, top 'H' shows the time spent in usb_ev_thread. Looks like it's a spice/USB problem #0 0x00000037b660cb78 in __GI___pthread_getspecific (key=1) at pthread_getspecific.c:32 data = <optimized out> result = <optimized out> #1 0x00000037ba28a974 in g_private_get (key=key@entry=0x37ba52aae0 <private_thread_memory>) at gthread-posix.c:1019 No locals. #2 0x00000037ba2655dc in thread_memory_from_self () at gslice.c:513 tmem = <optimized out> #3 g_slice_free1 (mem_size=mem_size@entry=24, mem_block=mem_block@entry=0x453e840) at gslice.c:1100 tmem = 0x7f15300029b0 ix = <optimized out> chunk_size = 32 #4 0x00000037ba268a02 in g_string_free (string=string@entry=0x453e840, free_segment=free_segment@entry=0) at gstring.c:229 segment = 0x7f15300017e0 "\n(virt-manager:4187): GSpice-WARNING **: Error handling USB events: Other error [-99]\n" __PRETTY_FUNCTION__ = "g_string_free" #5 0x00000037ba24fe8f in g_log_default_handler (log_domain=log_domain@entry=0x38d7acc5a5 "GSpice", log_level=log_level@entry=G_LOG_LEVEL_WARNING, message=message@entry=0x7f15300028e0 "Error handling USB events: Other error [-99]", unused_data=unused_data@entry=0x0) at gmessages.c:1390 level_prefix = "WARNING **\000\265\067\000\000\000\310\334 \272\067\000\000\000\060\335\177C\025\177\000\000\245Ŭ\327\070", '\000' <repeats 19 times>, "\300\351\177" string = <optimized out> gstring = 0x453e840 fd = 2 domains = <optimized out> #6 0x00000037ba250429 in g_logv (log_domain=0x38d7acc5a5 "GSpice", log_level=G_LOG_LEVEL_WARNING, format=<optimized out>, args=args@entry=0x7f15437fde10) at gmessages.c:960 domain = 0x0 data = 0x0 depth = 1 log_func = 0x37ba24fc80 <g_log_default_handler> domain_fatal_mask = <optimized out> masquerade_fatal = 0 test_level = G_LOG_LEVEL_WARNING was_fatal = <optimized out> was_recursion = <optimized out> msg = 0x7f15300028e0 "Error handling USB events: Other error [-99]" msg_alloc = 0x7f15300028e0 "Error handling USB events: Other error [-99]" i = 4 #7 0x00000037ba25063f in g_log (log_domain=log_domain@entry=0x38d7acc5a5 "GSpice", log_level=log_level@entry=G_LOG_LEVEL_WARNING, format=format@entry=0x38d7ad5cc8 "Error handling USB events: %s [%i]") at gmessages.c:1025 args = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 0x7f15437fdef0, reg_save_area = 0x7f15437fde30}} #8 0x00000038d7a3519b in spice_usb_device_manager_usb_ev_thread (user_data=<optimized out>) at usb-device-manager.c:1152 desc = <optimized out> self = <optimized out> priv = 0x4987920 rc = -99 #9 0x00000037ba26ea45 in g_thread_proxy (data=0x39e4d40) at gthread.c:798 thread = 0x39e4d40 #10 0x00000037b6607f33 in start_thread (arg=0x7f15437fe700) at pthread_create.c:309 __res = <optimized out> pd = 0x7f15437fe700 now = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139729303496448, -2763313655369288088, 0, 0, 139729303497152, 139729303496448, 2849716576536114792, -2753440861348940184}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> pagesize_m1 = <optimized out> sp = <optimized out> freesize = <optimized out> #11 0x00000037b5ef4ded in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Expected results: no particularly heavy CPU usage. Additional info: I'm not doing anything like USB redirect or anything like that. <domain type='kvm'> <name>rhel7b1-v3</name> <uuid>7340a501-4171-4230-8000-48c9da15d4cd</uuid> <memory unit='KiB'>4194304</memory> <currentMemory unit='KiB'>4194304</currentMemory> <vcpu placement='static'>6</vcpu> <os> <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/home/vms/rhel7b1-v3.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> doesn't seem anything relevent in the log: [Wed, 26 Mar 2014 19:25:33 virt-manager 4187] DEBUG (console:1469) Starting connect process for proto=spice trans=ssh connhost=vl402 connuser=root connport=None gaddr=127.0.0.1 gport=5900 gtlsport=None gsocket=None [Wed, 26 Mar 2014 19:25:34 virt-manager 4187] DEBUG (console:271) Creating SSH tunnel: ssh -l root vl402 sh -c 'nc -q 2>&1 | grep "requires an argument" >/dev/null;if [ $? -eq 0 ] ; then CMD="nc -q 0 127.0.0.1 5900";else CMD="nc 127. 0.0.1 5900";fi;eval "$CMD";' [Wed, 26 Mar 2014 19:25:34 virt-manager 4187] DEBUG (console:291) Open tunnel PID=31903 OUTFD=19 ERRFD=23 [Wed, 26 Mar 2014 19:25:37 virt-manager 4187] DEBUG (console:660) Opening tunnel for channel: <RecordChannel object at 0x40dff50 (SpiceRecordChannel at 0x4e6dde0)> [Wed, 26 Mar 2014 19:25:37 virt-manager 4187] DEBUG (console:660) Opening tunnel for channel: <PlaybackChannel object at 0x40dff50 (SpicePlaybackChannel at 0x4ecbb80)> [Wed, 26 Mar 2014 19:25:37 virt-manager 4187] DEBUG (console:660) Opening tunnel for channel: <DisplayChannel object at 0x40dff50 (SpiceDisplayChannel at 0x4ef29c0)> [Wed, 26 Mar 2014 19:25:37 virt-manager 4187] DEBUG (console:660) Opening tunnel for channel: <CursorChannel object at 0x40e5f00 (SpiceCursorChannel at 0x47e10e0)> [Wed, 26 Mar 2014 19:25:37 virt-manager 4187] DEBUG (console:1376) Viewer connected [Wed, 26 Mar 2014 19:25:37 virt-manager 4187] DEBUG (console:660) Opening tunnel for channel: <InputsChannel object at 0x40e9410 (SpiceInputsChannel at 0x4ebba10)> [Wed, 26 Mar 2014 19:25:37 virt-manager 4187] DEBUG (console:271) Creating SSH tunnel: ssh -l root vl402 sh -c 'nc -q 2>&1 | grep "requires an argument" >/dev/null;if [ $? -eq 0 ] ; then CMD="nc -q 0 127.0.0.1 5900";else CMD="nc 127. 0.0.1 5900";fi;eval "$CMD";' [Wed, 26 Mar 2014 19:25:37 virt-manager 4187] DEBUG (console:291) Open tunnel PID=31905 OUTFD=37 ERRFD=50 [Wed, 26 Mar 2014 19:25:38 virt-manager 4187] DEBUG (console:271) Creating SSH tunnel: ssh -l root vl402 sh -c 'nc -q 2>&1 | grep "requires an argument" >/dev/null;if [ $? -eq 0 ] ; then CMD="nc -q 0 127.0.0.1 5900";else CMD="nc 127. 0.0.1 5900";fi;eval "$CMD";' [Wed, 26 Mar 2014 19:25:38 virt-manager 4187] DEBUG (console:291) Open tunnel PID=31906 OUTFD=52 ERRFD=55 [Wed, 26 Mar 2014 19:25:38 virt-manager 4187] DEBUG (console:271) Creating SSH tunnel: ssh -l root vl402 sh -c 'nc -q 2>&1 | grep "requires an argument" >/dev/null;if [ $? -eq 0 ] ; then CMD="nc -q 0 127.0.0.1 5900";else CMD="nc 127. 0.0.1 5900";fi;eval "$CMD";' [Wed, 26 Mar 2014 19:25:38 virt-manager 4187] DEBUG (console:291) Open tunnel PID=31907 OUTFD=59 ERRFD=63 [Wed, 26 Mar 2014 19:25:39 virt-manager 4187] DEBUG (console:271) Creating SSH tunnel: ssh -l root vl402 sh -c 'nc -q 2>&1 | grep "requires an argument" >/dev/null;if [ $? -eq 0 ] ; then CMD="nc -q 0 127.0.0.1 5900";else CMD="nc 127. 0.0.1 5900";fi;eval "$CMD";' [Wed, 26 Mar 2014 19:25:39 virt-manager 4187] DEBUG (console:291) Open tunnel PID=31908 OUTFD=67 ERRFD=70 [Wed, 26 Mar 2014 19:25:40 virt-manager 4187] DEBUG (console:271) Creating SSH tunnel: ssh -l root vl402 sh -c 'nc -q 2>&1 | grep "requires an argument" >/dev/null;if [ $? -eq 0 ] ; then CMD="nc -q 0 127.0.0.1 5900";else CMD="nc 127. 0.0.1 5900";fi;eval "$CMD";' [Wed, 26 Mar 2014 19:25:40 virt-manager 4187] DEBUG (console:291) Open tunnel PID=31909 OUTFD=72 ERRFD=74 </disk> <controller type='usb' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'/> <controller type='virtio-serial' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </controller> <interface type='direct'> <mac address='52:54:00:70:10:01'/> <source dev='eno1' mode='bridge'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <channel type='spicevmc'> <target type='virtio' name='com.redhat.spice.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'/> <input type='mouse' bus='ps2'/> <graphics type='spice' autoport='yes'/> <sound model='ich6'> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </sound> <video> <model type='qxl' ram='65536' vram='65536' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </memballoon> </devices> </domain>
Hmm, interesting. Moving to spice-gtk, but maybe this is tickled by something virt-manager is doing
Hans, this seems to be usbredir related.
Hi, David, can you please start virt-manager on a terminal like this: LIBUSB_DEBUG=4 virt-manager &> log Reproduce the problem, gzip the log file and attach it here ? Regards, Hans
You'll want to do virt-manager --debug or virt-manager --no-fork, since virt-manager forks by default and that command won't print much useful.
Created attachment 881486 [details] output of log file with libusb debug enabled As requested. Note that I think the thing that triggers the vast PCU usage might be having the window open for the VM on both hosts; i.e. it starts off on vl403 and I open it's virt-manager windows, then I kick off a migrate (using migrate --live --compressed --verbose --desturi qemu+ssh://^Crtlab402/system --domain rhel7b1-v3 ) then when it gets to vl402 I open the window for the VM on that host.
Hi, We're you seeing the 100% cpu load problem when you generated that log file ? And if not can you please create another log file from a run where you actually see the problem ? If your initial analysis of usb_ev_thread being the culprit is correct, in combination with your backtrace showing libusb_handle_events failing with an error of -99, then I would expect to see a ton of log messages about this in the log file. But I only see a few regular libusb_handle_events calls, and not the ton of failures I would expect when libusb_handle_events fails unexpectedly and is called repeatedly. Regards, Hans
In that run it did take ~100% towards the end of the run (at the point where the 2nd migration happened I think) - however, this time it wasn't showing the usb thread being hot in top - so where it was spending it's time I don't know. Dave
Hi, (In reply to Dr. David Alan Gilbert from comment #7) > In that run it did take ~100% towards the end of the run (at the point where > the 2nd migration happened I think) - however, this time it wasn't showing > the usb thread being hot in top - so where it was spending it's time I don't > know. Hmm sounds like there is a deeper lying issue then which causes some thread to break out of poll with an error (or something like that) which happens to hit the usb thread sometimes, and sometimes another thread. Regards, Hans
Hi, So looking at the new log, what stands out is that the poll in the usb-thread is returning with 2 because it is seeing activity on both the control pipe and the timer fd. The control pipe activity is normal the timer fd activity not so much. Then when handling the timer fd activity it gets an error while trying to disarm the timer. So it seems something has gone wrong with the timer fd, possibly some other thread has closed it and then later opened something else causing the fd number to now point to something which is not a timerfd. This would explain both the fact that the timerfd is waking poll while it should not as well as why disarming the timerfd is failing. Another possibilty is that libusb itself has closed the timerfd, because the context has been destroyed and that we're calling libusb_handle_events on a destroyed context. I've checked the spice-gtk code and it has g_return_if_fail code to check for this scenario, so if the context is getting destroyed underneath us it is likely not being done from the spice-gtk code. Regards, Hans
One possible reason why this could be hapening is other libusb using code in virt-manager calling libusb_init(NULL) and libusb_exit(NULL) and it doing so in an unbalanced fashion.
(In reply to Hans de Goede from comment #11) > One possible reason why this could be hapening is other libusb using code in > virt-manager calling libusb_init(NULL) and libusb_exit(NULL) and it doing so > in an unbalanced fashion. Without knowing anything about the innards of virt-manager, I would point out that when I trigger this, it normally involves a pair of migrations, so I've got the spice interface open on host A, then I've got it open on host B and then got it open on host A again; so whether there is some period where they fight as they hand off between the two hosts, or perhaps something like when it comes back to the same host again their is something stale - shrug.
FWIW virt-manager doesn't use libusb directly, nor libvirt.
(In reply to Cole Robinson from comment #13) > FWIW virt-manager doesn't use libusb directly, nor libvirt. Right, how do you get the list of usb-devices for the non-spice usb-redir parts of virt-manager ? Note I still think this whole USB thing is a red-herring as David is seeing the same hang sometimes occure without USB being involved at all. So chances are that we're just getting some memory corruption which sometimes hit USB. Or some init / exit race which sometime hits USB...
(In reply to Hans de Goede from comment #14) > (In reply to Cole Robinson from comment #13) > > FWIW virt-manager doesn't use libusb directly, nor libvirt. > > Right, how do you get the list of usb-devices for the non-spice usb-redir > parts of virt-manager ? > via libvirt APIs that talk to udev.
Okay, David gave me access to his setup and I can reliably reproduce. This seems to involve spice-gtk's seamless migration support, and an orphaned usb_device_manager or something along those lines. I added some printfs to spice-gtk to confirm that the usb_device_manager instance for each spice connection is correctly closed when we close the virt-manager console window, or the VM is destroyed. That all seems to work correctly. However, when the VM is migrated, despite us .destroy()ing the display widget and trying to clear all our state, the usb_device_manager isn't dispose()d at the spice-gtk level. I assume this is intended and part of the seamless migration magic... however we don't use seamless migration in virt-manager (would be nice to support, but would take a bit of work for us). So spice folks, any tips here? Is there something we can do to say 'dont use migration magic', or maybe this is just a spice-gtk or virt-manager bug? I'm happy to do more digging if people have suggestions
Created attachment 941230 [details] Debug output from virt-manager run
Although I did local migrations for testing (with a small patch in libvirt), I am unable to reproduce so far on f20. I try to open the Spice console as early as I can, and leave it open until the migration complete and closes it. Nevertheless, it would probably make sense to exit the loop on usb_handle_events() io/mem errors
The main issue was virt-manager closing a fd it doesn't own [0]. For some reason, the same fd ends up being a libusb pipe. Then the USB thread spins, since no error are reported [1], and spice-gtk ignores errors [2]. Second issue I found, Spice session was leaked sometime on race cases (and USB pipe remained open too). This is related to spice-gtk audio manager [3] Then I figured when fd are provided to spice-gtk, there is no way migration can be made to work, and the migration session ends up stuck, and the spice-gtk session is stucked and leaked too. I solved a couple of migration issues in [4] Finally, there is a bad object model in spice-gtk, which keeps channels in session as long as the channel is alive. This is fine if client is aware of that, and only use weak references. But with bindings, this is problematic. Then calling spice_session_disconnect() several time might end up removing references that aren't owned, and I got virt-manager to crash then. I changed the spice-gtk mode [5] [0] https://www.redhat.com/archives/virt-tools-list/2014-November/msg00022.html [1] http://sourceforge.net/p/libusb/mailman/message/33015335/ [2] http://lists.freedesktop.org/archives/spice-devel/2014-November/017865.html [3] http://lists.freedesktop.org/archives/spice-devel/2014-November/017864.html [4] http://lists.freedesktop.org/archives/spice-devel/2014-November/017870.html [5] http://lists.freedesktop.org/archives/spice-devel/2014-November/017884.html
Excellent work; a lot of fixes from one report :-) I'll give the new virt-manager a try when it lands for [0] but I guess it might be best to wait for the spice fixes to bubble through as well. Dave
The virt-manager fix is in F21. spice-gtk fixes should be in F22+ now. Probably not big enough of a deal to backport the remaining stuff though, so just closing as CURRENTRELEASE for f22.