Bug 517778 - virt-manager hangs waiting for VNC ssh tunnel to exit on remote debian host
virt-manager hangs waiting for VNC ssh tunnel to exit on remote debian host
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: virt-manager (Show other bugs)
13
All Linux
high Severity medium
: ---
: ---
Assigned To: Cole Robinson
Fedora Extras Quality Assurance
:
: 522527 (view as bug list)
Depends On:
Blocks: F11VirtTarget
  Show dependency treegraph
 
Reported: 2009-08-17 02:45 EDT by Alex Hudson (Fedora Address)
Modified: 2010-11-17 13:52 EST (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-11-17 13:51:38 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
Output from virt-manager (10.96 KB, text/plain)
2009-08-19 06:18 EDT, Alex Hudson (Fedora Address)
no flags Details
strace from virt-manager while trying to close a console window (299.67 KB, text/plain)
2009-08-19 09:42 EDT, Alex Hudson (Fedora Address)
no flags Details
strace from ssh tunnel while trying to close the console. (90.44 KB, text/plain)
2009-08-19 09:43 EDT, Alex Hudson (Fedora Address)
no flags Details
Test case demonstrating the problem here (575 bytes, text/plain)
2009-08-19 11:29 EDT, Alex Hudson (Fedora Address)
no flags Details
libvirt patch from Ubuntu (Debian) (1.05 KB, patch)
2010-02-05 10:16 EST, Ziegler Karel
no flags Details | Diff
virt-manager patch from Ubuntu (Debian) (734 bytes, patch)
2010-02-05 10:18 EST, Ziegler Karel
no flags Details | Diff

  None (edit)
Description Alex Hudson (Fedora Address) 2009-08-17 02:45:38 EDT
Description of problem:

I have virt-manager set up to access to VM servers; localhost and a remote one over ssh. When I close a console to a VM on the remote server, the user interface simply hangs - I have to close the window, wait for the force quit dialog to come up, etc.

Version-Release number of selected component (if applicable):

virt-manager-0.7.0-5.fc11.x86_64

How reproducible:

Every time, doesn't happen on localhost consoles though.

Additional info:

I've attached my virt-manager log from a run showing the problem. The auth  error is because I didn't enter my root password for the localhost VMs.
Comment 1 Mark McLoughlin 2009-08-19 05:56:53 EDT
Alex, I think you forgot to include the virt-manager log?
Comment 2 Alex Hudson (Fedora Address) 2009-08-19 06:18:06 EDT
Created attachment 357912 [details]
Output from virt-manager

Apologies for forgetting the log; I've recreated it.

I've simply opened virt-manager, connected to the remote system, opened a VM console, closed it again and the thing hangs.

Only suspicious thing I can see in the log is that PID and FD get swapped on lines 300 and 310, but that could just be a logging code error.
Comment 3 Mark McLoughlin 2009-08-19 07:24:11 EDT
It seems to work fine for me here

As you guessed, the PID/FD mixup is just in the debug code:

       logging.debug("Tunnel PID %d FD %d" % (fds[0].fileno(), pid))

Could you try this:

  1) start up virt-manager, connect to the remote guest console

  2) attach 'strace -ttt' to the virt-manager.py process

  3) attach 'strace -ttt' to the 'ssh -p 22 -l root ... nc 127.0.0.1 5900'
     process

  4) close the console

and attach both of those strace logs, that should help us figure out what's
going on

I suspect that for some reason, closing the fd isn't causing ssh to exit
and we hang while waiting for it to exit
Comment 4 Alex Hudson (Fedora Address) 2009-08-19 09:42:28 EDT
Created attachment 357937 [details]
strace from virt-manager while trying to close a console window
Comment 5 Alex Hudson (Fedora Address) 2009-08-19 09:43:46 EDT
Created attachment 357938 [details]
strace from ssh tunnel while trying to close the console.

Find attached traces from the two processes.

Interestingly, when the virt-manager process eventually got killed off, the strace on the ssh process hung around - the ssh tunnel was still going.
Comment 6 Mark McLoughlin 2009-08-19 10:32:52 EDT
virt-manager:

1250689226.708182 close(26)             = -1 EBADF (Bad file descriptor)
1250689226.708228 wait4(7824,  <unfinished ...>

i.e. it's hung waiting for SSH to finish

The EBADFD is because gtk-vnc has already closed it, AFAICS

ssh:

1250689226.656615 read(5, ""..., 16384) = 0
1250689226.656653 close(5)              = 0
1250689226.656698 select(8, [4], [4], NULL, NULL) = 1 (out [4])
1250689226.656740 write(4, "<Z\2464\310Z\353OSK\221\33q\21_\3437\nt\33+5\353l\303\307#\245i\330\250\365"..., 32) = 32
1250689226.656801 select(8, [4], [], NULL, NULL) = 1 (in [4])
1250689226.688092 read(4, "\tB$\341\331U\226\210\250\32\315\21\37M\355\332m\217\304\367M*Q+pU\265!B\201?\26\323"..., 8192) = 48
1250689226.688187 select(8, [4], [6], NULL, NULL) = 1 (out [6])
1250689226.688253 write(6, "\0\0\0\0"..., 4) = -1 EPIPE (Broken pipe)
1250689226.688325 --- SIGPIPE (Broken pipe) @ 0 (0) ---
1250689226.688428 close(6)              = 0

I'm pretty sure fds 5 and 6 are the read and write sides of the socketpair

What i don't understand is why the ssh process doesn't exit, because it does for me
Comment 7 Daniel Berrange 2009-08-19 11:06:04 EDT
virt-manager runs netcat on the remote end of the SSH connection. In theory closing the FD should cause netcat to see the EOF, and exit, causing SSH to exit. I reckon netcat is not behaving nicely though and thus holding open the connection. The Debian netcat has certainly got such bugs, and Fedora one has patched many bugs like that. We should explicitly kill() the SSH pid.
Comment 8 Alex Hudson (Fedora Address) 2009-08-19 11:29:03 EDT
Created attachment 357955 [details]
Test case demonstrating the problem here

I think Daniel is right on the money - I'm seeing this because I'm connecting to a Debian box.

I was actually about to make the same suggestion; I've attached my testcase.

With the kill commented out, it never returns here. Killing the child process and it does work.

The alternative might be to make ssh port-forward the remote VNC socket to the local machine and access it directly, without netcat in the way, but whether or not everyone's ssh config allows that is another matter...
Comment 9 Eric Paris 2009-09-11 10:36:05 EDT
*** Bug 522527 has been marked as a duplicate of this bug. ***
Comment 10 b52+rhbugzilla 2010-02-02 08:28:00 EST
I am able to reproduce the same bug with virsh and a nonexistant uri. Hence I am not sure this bug is related to libvirt or virt-manager.

@managentserver using libvirt-0.7.5 and virt-manager-0.8.2
/usr/bin/virsh -c qemu+ssh://root@kvmhost/nonexistant
OR
virt-manager -c qemu+ssh://root@kvmhost/system

@kvmhost (remote) using libvirt-0.7.5 and netcat-openbsd-1.89
top
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 3736 root      40   0  7248  492  412 R  100  0.0  27:11.17 2 nc
 3743 root      40   0  7248  504  412 R  100  0.0  25:54.38 0 nc
 3750 root      40   0  7248  492  412 R  100  0.0  24:25.71 1 nc

The stalled netcat commands stay there forever and eat up 100% of cpu usage. This behaviour is the same regardless of using virsh wirh a nonexistant uri or virt-manager with an existant uri. Hence these netcat commands extemely slow down the managed host could some raise the severity to high, please? Thanks
Comment 11 Ziegler Karel 2010-02-05 10:09:26 EST
Maybe this error has something to do with https://bugzilla.redhat.com/show_bug.cgi?id=562176
Comment 12 Ziegler Karel 2010-02-05 10:16:02 EST
Created attachment 389092 [details]
libvirt patch from Ubuntu (Debian)

In Ubuntu (Debian) nc has added -q option (quit after EOF on stdin and delay of secs).
Comment 13 Ziegler Karel 2010-02-05 10:18:53 EST
Created attachment 389093 [details]
virt-manager patch from Ubuntu (Debian)
Comment 14 Daniel Berrange 2010-02-08 05:13:39 EST
These patches are really very bad. They make it impossible for an Ubuntu libvirt / virt-manager client to talk to any other OS running libvirt, because other OS do not have this custom '-q' option that Ubuntu added. The correct solution here is for the libvirt/virt-manager client end to explicitly kill(TERM) the SSH client they spawned once they've received EOF, rather than waiting for SSH to auto-shutdown after nc quits on EOF.
Comment 15 Cole Robinson 2010-02-26 21:09:54 EST
This is now fixed upstream: we pass a shell script as the SSH command, and try to detect these incompatibilities. It's definitely a hack, but we don't have a lot of options:

http://hg.fedorahosted.org/hg/virt-manager/rev/1f781890ea4a
Comment 16 Bug Zapper 2010-04-28 05:47:25 EDT
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 17 Cole Robinson 2010-05-27 17:47:01 EDT
To avoid risk here, we aren't going to be backporting this to F11. Reassigning to F12
Comment 18 Bug Zapper 2010-11-04 06:28:46 EDT
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 19 Cole Robinson 2010-11-17 13:51:38 EST
Not backporting to F12 at this point, so closing against F13 where this is currently fixed.

Note You need to log in before you can comment on or make changes to this bug.