Bug 526952

Summary: virt-manager/libvirt locks up frequently using a slow remote connection
Product: [Fedora] Fedora Reporter: Matthew Booth <mbooth>
Component: libvirtAssignee: Daniel Veillard <veillard>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 13CC: berrange, chrisw, clalance, crobinso, hbrock, itamar, Jes.Sorensen, jforbes, markmc, petersen, veillard, virt-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-10 13:54:10 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 498969    
Attachments:
Description Flags
virt-manager.log
none
libvirt.log
none
virt-manager.log
none
libvirt.log
none
virt-manager.log none

Description Matthew Booth 2009-10-02 13:48:41 EDT
Description of problem:
I am encountering frequent lock-ups using virt-manager connecting to a remote hypervisor over SSH. Average ping time to the remote host is about 500ms, so latency is pretty high. Remote system is RHEL 5.4, connecting to Xen.

I have not yet noticed any particular pattern to the lockups. However they are quite frequent (>5 today). When it locks up, the only recovery method is to kill the virt-manager process.

Nothing stands out in virt-manager.log.

Local machine is F-11 x86_64, running virt-manager from virt-preview.

Version-Release number of selected component (if applicable):
virt-manager-0.8.0-6.fc11.noarch

How reproducible:
Occasional

Steps to Reproduce:
1. Connect to remote Xen libvirt over SSH with avg ping ~ 500ms.
2. Start and stop guests frequently for some time
Comment 1 Chris Wright 2009-10-02 14:23:00 EDT
I'm able to wedge virt-manager as well w/ same version (virt-manager-0.8.0-6.fc12.noarch).

In my scenario the server has stopped responding (due to completely unrelated issue, not virt-manager's fault).  I see two threads left, one (presumably the main loop) stuck on waitpid().

#0  0x00000034b400ea2d in waitpid () from /lib64/libpthread.so.0
#1  0x00000034c970e0ca in ?? () from /usr/lib64/libpython2.6.so.1.0
#2  0x00000034c96dd33b in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#3  0x00000034c96de699 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#4  0x00000034c96def85 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#5  0x00000034c966db7f in ?? () from /usr/lib64/libpython2.6.so.1.0
#6  0x00000034c9643ac3 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#7  0x00000034c965897f in ?? () from /usr/lib64/libpython2.6.so.1.0
#8  0x00000034c9643ac3 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#9  0x00000034c96d7603 in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.6.so.1.0
#10 0x00007f7415804d50 in ?? () from /usr/lib64/python2.6/site-packages/gtk-2.0/gobject/_gobject.so
#11 0x0000003aa660b81e in g_closure_invoke () from /lib64/libgobject-2.0.so.0
#12 0x0000003aa6620b43 in ?? () from /lib64/libgobject-2.0.so.0
#13 0x0000003aa6621ed9 in g_signal_emit_valist () from /lib64/libgobject-2.0.so.0
#14 0x0000003aa6622423 in g_signal_emit () from /lib64/libgobject-2.0.so.0
#15 0x00007f7410617ee0 in ?? () from /usr/lib64/libgtk-vnc-1.0.so.0
#16 0x0000003aa623790e in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#17 0x0000003aa623b0e8 in ?? () from /lib64/libglib-2.0.so.0
#18 0x0000003aa623b535 in g_main_loop_run () from /lib64/libglib-2.0.so.0
#19 0x0000003aa87422b7 in gtk_main () from /usr/lib64/libgtk-x11-2.0.so.0
#20 0x00007f74155024da in ?? () from /usr/lib64/python2.6/site-packages/gtk-2.0/gtk/_gtk.so
#21 0x00000034c96dd171 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#22 0x00000034c96de699 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#23 0x00000034c96def85 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#24 0x00000034c96df092 in PyEval_EvalCode () from /usr/lib64/libpython2.6.so.1.0
#25 0x00000034c96fa1ac in ?? () from /usr/lib64/libpython2.6.so.1.0
#26 0x00000034c96fa280 in PyRun_FileExFlags () from /usr/lib64/libpython2.6.so.1.0
#27 0x00000034c96fb67e in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.6.so.1.0
#28 0x00000034c9707dc9 in Py_Main () from /usr/lib64/libpython2.6.so.1.0
#29 0x00000034b341ea2d in __libc_start_main () from /lib64/libc.so.6
#30 0x0000000000400649 in _start ()





The other (presumably the hung connection to the remote server) stuck on poll().

#0  0x00000034b34d50d3 in poll () from /lib64/libc.so.
#1  0x00007f7417c38ef2 in ?? () from /usr/lib64/libvirt.so.0
#2  0x00007f7417c3ed21 in ?? () from /usr/lib64/libvirt.so.0
#3  0x00007f7417c197ff in virDomainGetInfo () from /usr/lib64/libvirt.so.0
#4  0x00007f7417eb3e7a in ?? () from /usr/lib64/python2.6/site-packages/libvirtmod.so
#5  0x00000034c96dd33b in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#6  0x00000034c96de699 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#7  0x00000034c96de699 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#8  0x00000034c96def85 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#9  0x00000034c96dd718 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#10 0x00000034c96def85 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#11 0x00000034c966dc7b in ?? () from /usr/lib64/libpython2.6.so.1.0
#12 0x00000034c9643ac3 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#13 0x00000034c96dbdb1 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#14 0x00000034c96de699 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#15 0x00000034c96de699 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0
#16 0x00000034c96def85 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.6.so.1.0
#17 0x00000034c966db7f in ?? () from /usr/lib64/libpython2.6.so.1.0
#18 0x00000034c9643ac3 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#19 0x00000034c965897f in ?? () from /usr/lib64/libpython2.6.so.1.0
#20 0x00000034c9643ac3 in PyObject_Call () from /usr/lib64/libpython2.6.so.1.0
#21 0x00000034c96d7603 in PyEval_CallObjectWithKeywords () from /usr/lib64/libpython2.6.so.1.0
#22 0x00000034c970a10d in ?? () from /usr/lib64/libpython2.6.so.1.0
#23 0x00000034b400686a in start_thread () from /lib64/libpthread.so.0
#24 0x00000034b34de3bd in clone () from /lib64/libc.so.6
#25 0x0000000000000000 in ?? ()
Comment 2 Mark McLoughlin 2009-10-09 08:39:23 EDT
Any thoughts on this Cole?
Comment 3 Cole Robinson 2009-10-09 11:39:37 EDT
Sorry, thought I commented.

One way to find out exactly where the app is locking: From terminal, run virt-manager --no-fork. When app locks, hit ctrl-c in the terminal to get a backtrace. Post here.

cdub gave me more info offline, and in his case the app locks up when the connection gets screwy: some remote libvirt call doesn't return and the whole app locks. Not sure if it's just one call getting stuck or the whole connection is unresponsive.

Things we can do to alleviate the problem:

- virt-manager never runs a libvirt call in the main thread. Certainly a nice goal to shoot for, but non trivial. And the reality is, if the libvirt connection is unresponsive, virt-manager (and virsh) is useless anyways.

- Have libvirt client return in a timely manner if the connection craps out. No idea what this would entail, but certainly can't hurt. If a remote 'GetInfo' call can get stuck over a bad link, virsh would effectively be useless as well.
Comment 4 Bug Zapper 2010-03-15 08:54:52 EDT
This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle.
Changing version to '13'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 5 Jes Sorensen 2010-04-08 06:05:57 EDT
I am seeing this problem too when connecting to a remote system over a high-latency SSH tunnel. It is effectively preventing me from installing new guests on the remote system :(

I am able to connect to the remote box, enter the password. I then click create new guest, it pops up the window where I am supposed to enter the name of the guest. However I am unable to type anything in the name box, and if I try to refresh the window it just goes blank.

I connect to the system using the --no-fork trick described above. The remote is connected to via an SSH tunnel.

[jes@red-feather ~]$ virt-manager --no-fork
root@some-host-far-away.com's password: 
^CTraceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/create.py", line 833, in host_space_tick
    max_storage = self.host_disk_space()
  File "/usr/share/virt-manager/virtManager/create.py", line 822, in host_disk_space
    pool.refresh(0)
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 893, in refresh
    ret = libvirtmod.virStoragePoolRefresh(self._o, flags)
KeyboardInterrupt

I'll try to see if I can reproduce it from an F13 box connecting to the F13 box.

Cheers,
Jes
Comment 6 Jes Sorensen 2010-04-08 11:22:57 EDT
Ok, so I can reproduce this problem with virt-manager-0.8.4-1.fc14 as well.

This is what I see when I run virt-manager --no-fork and strace the stuck task.

Cheers,
Jes


jes@eye3:~> virt-manager --no-fork
root@far-far-away.com's password: 
^CTraceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/util.py", line 340, in _safe_wrapper
    return func(*args)
  File "/usr/share/virt-manager/virtManager/uihelpers.py", line 94, in host_space_tick
    max_storage = host_disk_space(conn, config)
  File "/usr/share/virt-manager/virtManager/uihelpers.py", line 81, in host_disk_space
    pool = virtinst.util.lookup_pool_by_path(conn.vmm, path)
  File "/usr/lib/python2.6/site-packages/virtinst/util.py", line 553, in lookup_pool_by_path
    inactive_list = conn.listDefinedStoragePools()
  File "/usr/lib64/python2.6/site-packages/libvirt.py", line 1644, in listDefinedStoragePools
    if ret is None: raise libvirtError ('virConnectListDefinedStoragePools() failed', conn=self)
libvirt.libvirtError: poll on socket failed: Interrupted system call



[root@eye3 ~]# strace -p 21551
Process 21551 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 1
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
recvfrom(17, "\0\0\0@", 4, 0, NULL, NULL) = 4
recvfrom(17, " \0\200\206\0\0\0\1\0\0\0\4\0\0\0\1\0\0\1\372\0\0\0\0\0\0\0\0\0\0.\343"..., 60, 0, NULL, NULL) = 60
futex(0x7f58c410555c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f58c4105558, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
write(20, "\1", 1)                      = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1bf5dcc, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0
poll([{fd=17, events=POLLIN|POLLOUT}, {fd=18, events=POLLIN}], 2, -1) = 2 ([{fd=17, revents=POLLOUT}, {fd=18, revents=POLLIN}])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(18, "\1", 1)                       = 1
sendto(17, "\0\0\0\34 \0\200\206\0\0\0\1\0\0\0G\0\0\0\0\0\0\1\375\0\0\0\0", 28, 0, NULL, 0) = 28
rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0
poll([{fd=17, events=POLLIN}, {fd=18, events=POLLIN}], 2, -1) = 1 ([{fd=17, revents=POLLIN}])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
recvfrom(17, "\0\0\0<", 4, 0, NULL, NULL) = 4
recvfrom(17, " \0\200\206\0\0\0\1\0\0\0G\0\0\0\1\0\0\1\374\0\0\0\0\0\0\0\1\0\0\0\0"..., 56, 0, NULL, NULL) = 56
futex(0x7f58c410555c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f58c4105558, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
write(20, "\1", 1)                      = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1bf5dcc, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
futex(0x7f58c4085460, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0
poll([{fd=17, events=POLLIN|POLLOUT}, {fd=18, events=POLLIN}], 2, -1) = 2 ([{fd=17, revents=POLLIN|POLLOUT}, {fd=18, revents=POLLIN}])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(18, "\1", 1)                       = 1
sendto(17, "\0\0\0  \0\200\206\0\0\0\1\0\0\0H\0\0\0\0\0\0\1\377\0\0\0\0\0\0\0\1", 32, 0, NULL, 0) = 32
recvfrom(17, "\0\0\0H", 4, 0, NULL, NULL) = 4
recvfrom(17, " \0\200\206\0\0\0\1\0\0\0H\0\0\0\1\0\0\1\376\0\0\0\0\0\0\0\1\0\0\0\7"..., 68, 0, NULL, NULL) = 68
futex(0x7f58c410555c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f58c4105558, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
write(20, "\1", 1)                      = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1bf5dcc, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0
poll([{fd=17, events=POLLIN}, {fd=18, events=POLLIN}], 2, -1) = 1 ([{fd=18, revents=POLLIN}])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(18, "\1", 1)                       = 1
rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0
poll([{fd=17, events=POLLIN|POLLOUT}, {fd=18, events=POLLIN}], 2, -1) = 1 ([{fd=17, revents=POLLOUT}])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
sendto(17, "\0\0\0\34 \0\200\206\0\0\0\1\0\0\0I\0\0\0\0\0\0\2\1\0\0\0\0", 28, 0, NULL, 0) = 28
rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0
poll([{fd=17, events=POLLIN}, {fd=18, events=POLLIN}], 2, -1) = 1 ([{fd=17, revents=POLLIN}])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
recvfrom(17, "\0\0\0<", 4, 0, NULL, NULL) = 4
recvfrom(17, " \0\200\206\0\0\0\1\0\0\0G\0\0\0\1\0\0\2\0\0\0\0\0\0\0\0\1\0\0\0\0"..., 56, 0, NULL, NULL) = 56
futex(0x7f58c410555c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f58c4105558, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
write(20, "\1", 1)                      = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1bf5dcc, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0
poll([{fd=17, events=POLLIN|POLLOUT}, {fd=18, events=POLLIN}], 2, -1) = 2 ([{fd=17, revents=POLLOUT}, {fd=18, revents=POLLIN}])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(18, "\1", 1)                       = 1
sendto(17, "\0\0\0( \0\200\206\0\0\0\1\0\0\0T\0\0\0\0\0\0\2\3\0\0\0\0\0\0\0\7"..., 40, 0, NULL, 0) = 40
rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0
poll([{fd=17, events=POLLIN}, {fd=18, events=POLLIN}], 2, -1) = 1 ([{fd=17, revents=POLLIN}])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
recvfrom(17, "\0\0\0H", 4, 0, NULL, NULL) = 4
recvfrom(17, " \0\200\206\0\0\0\1\0\0\0H\0\0\0\1\0\0\2\2\0\0\0\0\0\0\0\1\0\0\0\7"..., 68, 0, NULL, NULL) = 68
futex(0x7f58c410555c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f58c4105558, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
write(20, "\1", 1)                      = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1bf5dcc, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0
poll([{fd=17, events=POLLIN|POLLOUT}, {fd=18, events=POLLIN}], 2, -1) = 2 ([{fd=17, revents=POLLOUT}, {fd=18, revents=POLLIN}])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(18, "\1", 1)                       = 1
sendto(17, "\0\0\0$ \0\200\206\0\0\0\1\0\0\0o\0\0\0\0\0\0\2\5\0\0\0\0\0\0\0\0"..., 36, 0, NULL, 0) = 36
rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0
poll([{fd=17, events=POLLIN}, {fd=18, events=POLLIN}], 2, -1) = 1 ([{fd=17, revents=POLLIN}])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
recvfrom(17, "\0\0\0<", 4, 0, NULL, NULL) = 4
recvfrom(17, " \0\200\206\0\0\0\1\0\0\0I\0\0\0\1\0\0\2\4\0\0\0\0\0\0\0\0\0\0\0\0"..., 56, 0, NULL, NULL) = 56
futex(0x7f58c410555c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f58c4105558, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
write(20, "\1", 1)                      = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1bf5dcc, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0
poll([{fd=17, events=POLLIN|POLLOUT}, {fd=18, events=POLLIN}], 2, -1) = 2 ([{fd=17, revents=POLLOUT}, {fd=18, revents=POLLIN}])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
read(18, "\1", 1)                       = 1
sendto(17, "\0\0\0( \0\200\206\0\0\0\1\0\0\0p\0\0\0\0\0\0\2\7\0\0\0\0\0\0\0\0"..., 40, 0, NULL, 0) = 40
rt_sigprocmask(SIG_BLOCK, [PIPE CHLD WINCH], [], 8) = 0
poll([{fd=17, events=POLLIN}, {fd=18, events=POLLIN}], 2, -1) = 1 ([{fd=17, revents=POLLIN}])
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
recvfrom(17, "\0\0\0T", 4, 0, NULL, NULL) = 4
recvfrom(17, " \0\200\206\0\0\0\1\0\0\0T\0\0\0\1\0\0\2\6\0\0\0\0\0\0\0\7defa"..., 80, 0, NULL, NULL) = 80
futex(0x7f58c410555c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f58c4105558, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
write(20, "\1", 1)                      = 1
futex(0x7f58c4085460, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1bf5dcc, FUTEX_WAIT_PRIVATE, 1, NULL^C <unfinished ...>
Process 21551 detached
Comment 7 Jes Sorensen 2010-06-23 03:17:20 EDT
This problem is still present in virt-manager-0.8.3-3.fc13.noarch

It happens both when I use it over a very high latency VPN link and also when I use a reverse SSH tunnel back to where I am, which has much lower latency than over the VPN.

This issue is a major show stopper for using virt-manager for anything, please raise the priority to have this fixed.
Comment 8 Jes Sorensen 2010-06-23 03:17:20 EDT
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.

New Contents:
virt-manager-0.8.3-3.fc13.noarch
Comment 9 Cole Robinson 2010-06-23 09:22:19 EDT
Re-assigning to libvirt, since it sounds like things are hanging talking to the network socket. Danpb mentioned in a private mail the likely issue, but I don't know how to deal with it:

- The libvirt remote driver code won't timeout an RPC request when a TCP
  connection hangs until the TCP connection itself times out. IIRC linux
  networking defaults to a 2 hour timeout or something crazy.
Comment 10 Cole Robinson 2010-06-23 09:22:19 EDT
Deleted Technical Notes Contents.

Old Contents:
virt-manager-0.8.3-3.fc13.noarch
Comment 11 Daniel Berrange 2010-06-23 09:30:32 EDT
In order to diagnose this we really need someone who can reproduce this to provide a combination of data captured at the same time.

 - Set two environment variables:

    LIBVIRT_LOG_FILTERS="1:libvirt 1:remote 1:util"
    LIBVIRT_LOG_OUTPUTS="1:file:$HOME/.virt-manager/libvirt.log"

 - Install python-debuginfo, gtk2-debuginfo & libvirt-debuginfo

Now launch virt-manager --no-fork  and wait for it to hang and

 -  Attach to virt-manager using GDB and capture 'thread apply all bt' and attach the *full* unedited results to this BZ

 - Attach the $HOME/.virt-manager/virt-manager.log captured at this time.

 - Attach the $HOME/virt-manager/libvirt.log capture at this time
Comment 12 Jes Sorensen 2010-06-28 04:44:41 EDT
Created attachment 427347 [details]
virt-manager.log
Comment 13 Jes Sorensen 2010-06-28 04:45:52 EDT
Created attachment 427348 [details]
libvirt.log
Comment 14 Jes Sorensen 2010-06-28 04:46:35 EDT
Created attachment 427349 [details]
virt-manager.log
Comment 15 Jes Sorensen 2010-06-28 04:48:45 EDT
Ok, attached the requested info. Unfortunately the GDB command produced zero output. The output from the virt-manager command line was when I hit C-c to stop it before submitting the log files. At this point it was hanging solid.

virt-manager-0.8.4-1.fc13.noarch
libvirt-0.7.7-4.fc13.x86_64

[jes@red-feather .virt-manager]$ ps -aux|grep virt-manager
Warning: bad syntax, perhaps a bogus '-'? See /usr/share/doc/procps-3.2.8/FAQ
jes       4404  1.6  1.0 599960 43008 pts/3    S+   10:41   0:01 python /usr/share/virt-manager/virt-manager.py --no-fork
jes       4470  0.0  0.0   6288   640 pts/7    S+   10:42   0:00 grep virt-manager
[jes@red-feather .virt-manager]$ gdb -p 4470
GNU gdb (GDB) Fedora (7.1-26.fc13)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 4470
ptrace: No such process.
(gdb) thread apply all bt
(gdb) thread apply all bt
(gdb) 

[jes@red-feather .virt-manager]$ virt-manager --no-fork
^CUnhandled exception in thread started by <bound method Thread.__bootstrap of <Thread(Tick thread, stopped daemon 140126319630096)>>
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/share/virt-manager/virt-manager.py", line 172, in exception_log
    s = traceback.format_exception(typ, val, tb)
AttributeError: 'NoneType' object has no attribute 'format_exception'

Original exception was:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 497, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib64/python2.6/threading.py", line 538, in __bootstrap_inner
    (self.name, _format_exc()))
  File "/usr/lib64/python2.6/traceback.py", line 235, in format_exc
    etype, value, tb = sys.exc_info()
AttributeError: 'NoneType' object has no attribute 'exc_info'
[jes@red-feather .virt-manager]$
Comment 16 Jes Sorensen 2010-06-28 04:49:30 EDT
Ok ignore that last add - let me try again. I am obviously unable to read :(

Jes
Comment 17 Jes Sorensen 2010-06-28 04:51:59 EDT
Ok, this time with sane gdb output. will update the log files to match this trace in a second.

Jes

(gdb) thread apply all bt

Thread 1 (Thread 0x7f954a6e4700 (LWP 4622)):
#0  0x000000351c0d7de3 in poll () from /lib64/libc.so.6
#1  0x000000352c476842 in remoteIOEventLoop (conn=0x7f953c003da0, priv=
    0x7f953c050290, in_open=0, thiscall=0x2bccda0)
    at remote/remote_driver.c:8488
#2  0x000000352c47709b in remoteIO (conn=0x7f953c003da0, priv=0x7f953c050290, 
    flags=0, thiscall=0x2bccda0) at remote/remote_driver.c:8707
#3  0x000000352c477650 in call (conn=0x7f953c003da0, priv=0x7f953c050290, 
    flags=0, proc_nr=<value optimized out>, args_filter=
    0x352c488ff0 <xdr_remote_storage_pool_lookup_by_name_args>, 
    args=<value optimized out>, ret_filter=
    0x352c489e40 <xdr_remote_storage_pool_lookup_by_name_ret>, ret=
    0x7fff86a54b80 "") at remote/remote_driver.c:8797
#4  0x000000352c481f58 in remoteStoragePoolLookupByName (conn=0x7f953c003da0, 
    name=<value optimized out>) at remote/remote_driver.c:4891
#5  0x000000352c45d2af in virStoragePoolLookupByName (conn=0x7f953c003da0, 
    name=0x2a3bbc4 "default") at libvirt.c:7158
#6  0x00007f954a3634e2 in libvirt_virStoragePoolLookupByName (
    self=<value optimized out>, args=<value optimized out>) at libvirt.c:726
#7  0x00000035290ddae6 in call_function (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:3764
#8  PyEval_EvalFrameEx (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:2445
#9  0x00000035290de312 in fast_function (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:3850
#10 call_function (f=<value optimized out>, throwflag=<value optimized out>)
    at Python/ceval.c:3785
#11 PyEval_EvalFrameEx (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:2445
#12 0x00000035290de312 in fast_function (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:3850
#13 call_function (f=<value optimized out>, throwflag=<value optimized out>)
    at Python/ceval.c:3785
#14 PyEval_EvalFrameEx (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:2445
#15 0x00000035290df4e9 in PyEval_EvalCodeEx (co=0x2870d50, 
    globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=1, kws=0x2bb15c0, kwcount=0, defs=
    0x28fb868, defcount=1, closure=0x0) at Python/ceval.c:3026
#16 0x00000035290dd897 in fast_function (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:3860
#17 call_function (f=<value optimized out>, throwflag=<value optimized out>)
    at Python/ceval.c:3785
#18 PyEval_EvalFrameEx (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:2445
#19 0x00000035290df4e9 in PyEval_EvalCodeEx (co=0x2870558, 
    globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=1, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at Python/ceval.c:3026
#20 0x000000352906df70 in function_call (func=<function at remote 0x2901848>, 
    arg=
    (<vmmConnection(vmm=<virConnect(_o=<PyCObject at remote 0x7f954a6e0d78>) at remote 0x2a44878>, netdev_error='', activeUUIDs=[], _nodedev_capable=None, mediadev_initialized=False, netdev_initialized=False, nodedevs={}, _xml_flags={}, connectThread=None, connectError=None, vms={}, nets={'6986d488-bd5e-484a-ac06-7bdee46a3d5c': <vmmNetwork(uuid='6986d488-bd5e-484a-ac06-7bdee46a3d5c', _xml="<network>\n  <name>default</name>\n  <uuid>6986d488-bd5e-484a-ac06-7bdee46a3d5c</uuid>\---Type <return> to continue, or q <return> to quit---
n  <forward mode='nat'/>\n  <bridge name='virbr0' stp='on' delay='0' />\n  <ip address='192.168.122.1' netmask='255.255.255.0'>\n    <dhcp>\n      <range start='192.168.122.2' end='192.168.122.254' />\n    </dhcp>\n  </ip>\n</network>\n", connection=<...>, active=True, net=<virNetwork(_conn=<...>, _o=<PyCObject at remote 0x7f954a6e0918>) at remote 0x2a44830>, config=<vmmConfig(glade_dir='/usr/share/virt-manager', data_dir='/usr/share', conf_dir='/apps/virt-manager', appname='virt-manager', status_icons={0: <gtk.gdk.Pixbuf at remote 0x27460f0>...(truncated), kw=0x0) at Objects/funcobject.c:524
#21 0x0000003529043db3 in PyObject_Call (func=<function at remote 0x2901848>, 
    arg=<value optimized out>, kw=<value optimized out>)
    at Objects/abstract.c:2492
#22 0x00000035290dc309 in ext_do_call (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:4077
#23 PyEval_EvalFrameEx (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:2485
#24 0x00000035290df4e9 in PyEval_EvalCodeEx (co=0x2765c60, 
    globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=1, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at Python/ceval.c:3026
#25 0x000000352906df70 in function_call (func=<function at remote 0x279b8c0>, 
    arg=(<instancemethod at remote 0x2a48be0>,), kw=0x0)
    at Objects/funcobject.c:524
#26 0x0000003529043db3 in PyObject_Call (func=<function at remote 0x279b8c0>, 
    arg=<value optimized out>, kw=<value optimized out>)
    at Objects/abstract.c:2492
#27 0x00000035290d7ba3 in PyEval_CallObjectWithKeywords (func=
    <function at remote 0x279b8c0>, arg=
    (<instancemethod at remote 0x2a48be0>,), kw=<value optimized out>)
    at Python/ceval.c:3633
#28 0x00007f95483c31bb in _pyglib_handler_marshal ()
   from /usr/lib64/libpyglib-2.0-python.so.0
#29 0x000000351bc3bd02 in g_main_context_dispatch ()
   from /lib64/libglib-2.0.so.0
#30 0x000000351bc3fae8 in ?? () from /lib64/libglib-2.0.so.0
#31 0x000000351bc3fff5 in g_main_loop_run () from /lib64/libglib-2.0.so.0
#32 0x0000003524d493c7 in IA__gtk_main () at gtkmain.c:1219
#33 0x00007f9547e9ea16 in ?? ()
   from /usr/lib64/python2.6/site-packages/gtk-2.0/gtk/_gtk.so
#34 0x00000035290dd768 in call_function (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:3748
#35 PyEval_EvalFrameEx (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:2445
#36 0x00000035290de312 in fast_function (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:3850
#37 call_function (f=<value optimized out>, throwflag=<value optimized out>)
    at Python/ceval.c:3785
#38 PyEval_EvalFrameEx (f=<value optimized out>, 
    throwflag=<value optimized out>) at Python/ceval.c:2445
#39 0x00000035290df4e9 in PyEval_EvalCodeEx (co=0x7f954a614648, 
    globals=<value optimized out>, locals=<value optimized out>, 
    args=<value optimized out>, argcount=0, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at Python/ceval.c:3026
#40 0x00000035290df5b2 in PyEval_EvalCode (co=<value optimized out>, 
    globals=<value optimized out>, locals=<value optimized out>)
    at Python/ceval.c:526
#41 0x00000035290fa52c in run_mod (mod=<value optimized out>, 
    filename=<value optimized out>, globals=
    {'glade_dir': '/usr/share/virt-manager', 'launch_specific_window': <function---Type <return> to continue, or q <return> to quit---
 at remote 0x1c7e5f0>, 'locale': <module at remote 0x1b7e360>, 'setup_pypath': <function at remote 0x1c3bc08>, 'num': '3', 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x1db5798>, __doc__=<gobject.GObject.__doc__ at remote 0x7f954a6650d0>, __module__='gtk') at remote 0x1e9db10>, 'OptionValueError': <type at remote 0x1cdc9a0>, '_show_startup_error': <function at remote 0x1c389b0>, 'data_dir': '/usr/share', 'appname': 'virt-manager', 'setup_logging': <function at remote 0x1c7e230>, 'gettext_dir': '/usr/share/locale', 'setup_glade_i18n': <function at remote 0x1c3b848>, 'asset_dir': '/usr/share/virt-manager', 'gettext': <module at remote 0x1b9d7c0>, '__package__': None, 'drop_tty': <function at remote 0x1c42500>, 'icon_dir': '/usr/share/virt-manager/pixmaps', 'main': <function at remote 0x1c8d410>, '__doc__': None, 'gettext_app': 'virt-manager', 'virtinst_version': (0, 500, 3), 'show_engine': <function at remote 0x...(truncated), locals=
    {'glade_dir': '/usr/share/virt-manager', 'launch_specific_window': <function at remote 0x1c7e5f0>, 'locale': <module at remote 0x1b7e360>, 'setup_pypath': <function at remote 0x1c3bc08>, 'num': '3', 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x1db5798>, __doc__=<gobject.GObject.__doc__ at remote 0x7f954a6650d0>, __module__='gtk') at remote 0x1e9db10>, 'OptionValueError': <type at remote 0x1cdc9a0>, '_show_startup_error': <function at remote 0x1c389b0>, 'data_dir': '/usr/share', 'appname': 'virt-manager', 'setup_logging': <function at remote 0x1c7e230>, 'gettext_dir': '/usr/share/locale', 'setup_glade_i18n': <function at remote 0x1c3b848>, 'asset_dir': '/usr/share/virt-manager', 'gettext': <module at remote 0x1b9d7c0>, '__package__': None, 'drop_tty': <function at remote 0x1c42500>, 'icon_dir': '/usr/share/virt-manager/pixmaps', 'main': <function at remote 0x1c8d410>, '__doc__': None, 'gettext_app': 'virt-manager', 'virtinst_version': (0, 500, 3), 'show_engine': <function at remote 0x...(truncated), flags=<value optimized out>, arena=<value optimized out>)
    at Python/pythonrun.c:1335
#42 0x00000035290fa600 in PyRun_FileExFlags (fp=0x1b0d890, filename=
    0x7fff86a579c1 "/usr/share/virt-manager/virt-manager.py", 
    start=<value optimized out>, globals=
    {'glade_dir': '/usr/share/virt-manager', 'launch_specific_window': <function at remote 0x1c7e5f0>, 'locale': <module at remote 0x1b7e360>, 'setup_pypath': <function at remote 0x1c3bc08>, 'num': '3', 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x1db5798>, __doc__=<gobject.GObject.__doc__ at remote 0x7f954a6650d0>, __module__='gtk') at remote 0x1e9db10>, 'OptionValueError': <type at remote 0x1cdc9a0>, '_show_startup_error': <function at remote 0x1c389b0>, 'data_dir': '/usr/share', 'appname': 'virt-manager', 'setup_logging': <function at remote 0x1c7e230>, 'gettext_dir': '/usr/share/locale', 'setup_glade_i18n': <function at remote 0x1c3b848>, 'asset_dir': '/usr/share/virt-manager', 'gettext': <module at remote 0x1b9d7c0>, '__package__': None, 'drop_tty': <function at remote 0x1c42500>, 'icon_dir': '/usr/share/virt-manager/pixmaps', 'main': <function at remote 0x1c8d410>, '__doc__': None, 'gettext_app': 'virt-manager', 'virtinst_version': (0, 500, 3), 'show_engine': <function at remote 0x...(truncated), locals=
    {'glade_dir': '/usr/share/virt-manager', 'launch_specific_window': <function at remote 0x1c7e5f0>, 'locale': <module at remote 0x1b7e360>, 'setup_pypath': <function at remote 0x1c3bc08>, 'num': '3', 'GInitiallyUnowned': <GObjectMeta(__gtype__=<gobject.GType at remote 0x1db5798>, __doc__=<gobject.GObject.__doc__ at remote 0x7f954a6650d0>, __module__='gtk') at remote 0x1e9db10>, 'OptionValueError': <type at remote 0x1cdc9a0>, '_show_startup_error': <function at remote 0x1c389b0>, 'data_dir': '/usr/share', 'appname': 'virt-manager', 'setup_logging': <function at remote 0x1c7e230>, 'gettext_dir': '/usr/share/locale', 'setup_glade_i18n': <function at remote 0x1c3b848>, 'asset_dir': '/usr/share/virt-manager', 'gettext': <module at remote 0x1b9d7c0>, '__package__': None, 'drop_tty': <function at remote 0x1c42500>, 'icon_dir': '/usr/share/virt-manager/pixmaps', 'main': <function at remote 0x1c8d410>, '__doc__': None, 'gettext_app': 'virt-manager', 'virtinst_version': (0, 500, 3), 'show_engine': <function at remote 0x...(truncate---Type <return> to continue, or q <return> to quit---
d), closeit=1, flags=0x7fff86a55e30) at Python/pythonrun.c:1321
#43 0x00000035290fb9dc in PyRun_SimpleFileExFlags (fp=0x1b0d890, filename=
    0x7fff86a579c1 "/usr/share/virt-manager/virt-manager.py", closeit=1, flags=
    0x7fff86a55e30) at Python/pythonrun.c:931
#44 0x000000352910807d in Py_Main (argc=<value optimized out>, 
    argv=<value optimized out>) at Modules/main.c:599
#45 0x000000351c01ec5d in __libc_start_main () from /lib64/libc.so.6
#46 0x0000000000400649 in _start ()
(gdb)
Comment 18 Jes Sorensen 2010-06-28 04:55:51 EDT
Created attachment 427355 [details]
libvirt.log
Comment 19 Jes Sorensen 2010-06-28 04:56:19 EDT
Created attachment 427356 [details]
virt-manager.log
Comment 20 Jes Sorensen 2010-06-28 04:57:51 EDT
Hi,

Just one more observation. When running in --no-fork mode, after having
run the GDB command, exiting GDB. When I then hit C-c in the shell used
to start virt-manager, virt-manager finally comes back to life.

Jes
Comment 21 Daniel Berrange 2010-06-28 06:03:46 EDT
Hmm, this is odd - it doesn't look like a deadlock in virt-manager. Rather it si just waiting for a remote API call to complete. The API call in question is virStoragePoolLookupByName which is one of the simplest we have that should be pretty much instantaneous reply. 

From the same remote host can you tell me if this completes :


  virsh -c qemu+ssh://root@batcave.lab.eng.brq.redhat.com/system pool-list
Comment 22 Bug Zapper 2011-06-02 13:38:50 EDT
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 23 Cole Robinson 2011-06-10 13:54:10 EDT
Closing as INSUFFICIENT_DATA. If any of the reporters can still reproduce on f14 or f15, please reopen.