Bug 517778

Summary: virt-manager hangs waiting for VNC ssh tunnel to exit on remote debian host
Product: [Fedora] Fedora Reporter: Alex Hudson (Fedora Address) <fedora>
Component: virt-managerAssignee: Cole Robinson <crobinso>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: high    
Version: 13CC: b52+rhbugzilla, berrange, crobinso, eparis, hbrock, markmc, virt-maint, ziegleka
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-11-17 18:51:38 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 480594    
Attachments:
Description Flags
Output from virt-manager
none
strace from virt-manager while trying to close a console window
none
strace from ssh tunnel while trying to close the console.
none
Test case demonstrating the problem here
none
libvirt patch from Ubuntu (Debian)
none
virt-manager patch from Ubuntu (Debian) none

Description Alex Hudson (Fedora Address) 2009-08-17 06:45:38 UTC
Description of problem:

I have virt-manager set up to access to VM servers; localhost and a remote one over ssh. When I close a console to a VM on the remote server, the user interface simply hangs - I have to close the window, wait for the force quit dialog to come up, etc.

Version-Release number of selected component (if applicable):

virt-manager-0.7.0-5.fc11.x86_64

How reproducible:

Every time, doesn't happen on localhost consoles though.

Additional info:

I've attached my virt-manager log from a run showing the problem. The auth  error is because I didn't enter my root password for the localhost VMs.

Comment 1 Mark McLoughlin 2009-08-19 09:56:53 UTC
Alex, I think you forgot to include the virt-manager log?

Comment 2 Alex Hudson (Fedora Address) 2009-08-19 10:18:06 UTC
Created attachment 357912 [details]
Output from virt-manager

Apologies for forgetting the log; I've recreated it.

I've simply opened virt-manager, connected to the remote system, opened a VM console, closed it again and the thing hangs.

Only suspicious thing I can see in the log is that PID and FD get swapped on lines 300 and 310, but that could just be a logging code error.

Comment 3 Mark McLoughlin 2009-08-19 11:24:11 UTC
It seems to work fine for me here

As you guessed, the PID/FD mixup is just in the debug code:

       logging.debug("Tunnel PID %d FD %d" % (fds[0].fileno(), pid))

Could you try this:

  1) start up virt-manager, connect to the remote guest console

  2) attach 'strace -ttt' to the virt-manager.py process

  3) attach 'strace -ttt' to the 'ssh -p 22 -l root ... nc 127.0.0.1 5900'
     process

  4) close the console

and attach both of those strace logs, that should help us figure out what's
going on

I suspect that for some reason, closing the fd isn't causing ssh to exit
and we hang while waiting for it to exit

Comment 4 Alex Hudson (Fedora Address) 2009-08-19 13:42:28 UTC
Created attachment 357937 [details]
strace from virt-manager while trying to close a console window

Comment 5 Alex Hudson (Fedora Address) 2009-08-19 13:43:46 UTC
Created attachment 357938 [details]
strace from ssh tunnel while trying to close the console.

Find attached traces from the two processes.

Interestingly, when the virt-manager process eventually got killed off, the strace on the ssh process hung around - the ssh tunnel was still going.

Comment 6 Mark McLoughlin 2009-08-19 14:32:52 UTC
virt-manager:

1250689226.708182 close(26)             = -1 EBADF (Bad file descriptor)
1250689226.708228 wait4(7824,  <unfinished ...>

i.e. it's hung waiting for SSH to finish

The EBADFD is because gtk-vnc has already closed it, AFAICS

ssh:

1250689226.656615 read(5, ""..., 16384) = 0
1250689226.656653 close(5)              = 0
1250689226.656698 select(8, [4], [4], NULL, NULL) = 1 (out [4])
1250689226.656740 write(4, "<Z\2464\310Z\353OSK\221\33q\21_\3437\nt\33+5\353l\303\307#\245i\330\250\365"..., 32) = 32
1250689226.656801 select(8, [4], [], NULL, NULL) = 1 (in [4])
1250689226.688092 read(4, "\tB$\341\331U\226\210\250\32\315\21\37M\355\332m\217\304\367M*Q+pU\265!B\201?\26\323"..., 8192) = 48
1250689226.688187 select(8, [4], [6], NULL, NULL) = 1 (out [6])
1250689226.688253 write(6, "\0\0\0\0"..., 4) = -1 EPIPE (Broken pipe)
1250689226.688325 --- SIGPIPE (Broken pipe) @ 0 (0) ---
1250689226.688428 close(6)              = 0

I'm pretty sure fds 5 and 6 are the read and write sides of the socketpair

What i don't understand is why the ssh process doesn't exit, because it does for me

Comment 7 Daniel Berrangé 2009-08-19 15:06:04 UTC
virt-manager runs netcat on the remote end of the SSH connection. In theory closing the FD should cause netcat to see the EOF, and exit, causing SSH to exit. I reckon netcat is not behaving nicely though and thus holding open the connection. The Debian netcat has certainly got such bugs, and Fedora one has patched many bugs like that. We should explicitly kill() the SSH pid.

Comment 8 Alex Hudson (Fedora Address) 2009-08-19 15:29:03 UTC
Created attachment 357955 [details]
Test case demonstrating the problem here

I think Daniel is right on the money - I'm seeing this because I'm connecting to a Debian box.

I was actually about to make the same suggestion; I've attached my testcase.

With the kill commented out, it never returns here. Killing the child process and it does work.

The alternative might be to make ssh port-forward the remote VNC socket to the local machine and access it directly, without netcat in the way, but whether or not everyone's ssh config allows that is another matter...

Comment 9 Eric Paris 2009-09-11 14:36:05 UTC
*** Bug 522527 has been marked as a duplicate of this bug. ***

Comment 10 b52+rhbugzilla 2010-02-02 13:28:00 UTC
I am able to reproduce the same bug with virsh and a nonexistant uri. Hence I am not sure this bug is related to libvirt or virt-manager.

@managentserver using libvirt-0.7.5 and virt-manager-0.8.2
/usr/bin/virsh -c qemu+ssh://root@kvmhost/nonexistant
OR
virt-manager -c qemu+ssh://root@kvmhost/system

@kvmhost (remote) using libvirt-0.7.5 and netcat-openbsd-1.89
top
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P COMMAND
 3736 root      40   0  7248  492  412 R  100  0.0  27:11.17 2 nc
 3743 root      40   0  7248  504  412 R  100  0.0  25:54.38 0 nc
 3750 root      40   0  7248  492  412 R  100  0.0  24:25.71 1 nc

The stalled netcat commands stay there forever and eat up 100% of cpu usage. This behaviour is the same regardless of using virsh wirh a nonexistant uri or virt-manager with an existant uri. Hence these netcat commands extemely slow down the managed host could some raise the severity to high, please? Thanks

Comment 11 Ziegler Karel 2010-02-05 15:09:26 UTC
Maybe this error has something to do with https://bugzilla.redhat.com/show_bug.cgi?id=562176

Comment 12 Ziegler Karel 2010-02-05 15:16:02 UTC
Created attachment 389092 [details]
libvirt patch from Ubuntu (Debian)

In Ubuntu (Debian) nc has added -q option (quit after EOF on stdin and delay of secs).

Comment 13 Ziegler Karel 2010-02-05 15:18:53 UTC
Created attachment 389093 [details]
virt-manager patch from Ubuntu (Debian)

Comment 14 Daniel Berrangé 2010-02-08 10:13:39 UTC
These patches are really very bad. They make it impossible for an Ubuntu libvirt / virt-manager client to talk to any other OS running libvirt, because other OS do not have this custom '-q' option that Ubuntu added. The correct solution here is for the libvirt/virt-manager client end to explicitly kill(TERM) the SSH client they spawned once they've received EOF, rather than waiting for SSH to auto-shutdown after nc quits on EOF.

Comment 15 Cole Robinson 2010-02-27 02:09:54 UTC
This is now fixed upstream: we pass a shell script as the SSH command, and try to detect these incompatibilities. It's definitely a hack, but we don't have a lot of options:

http://hg.fedorahosted.org/hg/virt-manager/rev/1f781890ea4a

Comment 16 Bug Zapper 2010-04-28 09:47:25 UTC
This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 11 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 17 Cole Robinson 2010-05-27 21:47:01 UTC
To avoid risk here, we aren't going to be backporting this to F11. Reassigning to F12

Comment 18 Bug Zapper 2010-11-04 10:28:46 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 19 Cole Robinson 2010-11-17 18:51:38 UTC
Not backporting to F12 at this point, so closing against F13 where this is currently fixed.