Bug 517778
Summary: | virt-manager hangs waiting for VNC ssh tunnel to exit on remote debian host | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Alex Hudson (Fedora Address) <fedora> |
Component: | virt-manager | Assignee: | Cole Robinson <crobinso> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 13 | CC: | b52+rhbugzilla, berrange, crobinso, eparis, hbrock, markmc, virt-maint, ziegleka |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-11-17 18:51:38 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 480594 | ||
Attachments: |
Description
Alex Hudson (Fedora Address)
2009-08-17 06:45:38 UTC
Alex, I think you forgot to include the virt-manager log? Created attachment 357912 [details]
Output from virt-manager
Apologies for forgetting the log; I've recreated it.
I've simply opened virt-manager, connected to the remote system, opened a VM console, closed it again and the thing hangs.
Only suspicious thing I can see in the log is that PID and FD get swapped on lines 300 and 310, but that could just be a logging code error.
It seems to work fine for me here As you guessed, the PID/FD mixup is just in the debug code: logging.debug("Tunnel PID %d FD %d" % (fds[0].fileno(), pid)) Could you try this: 1) start up virt-manager, connect to the remote guest console 2) attach 'strace -ttt' to the virt-manager.py process 3) attach 'strace -ttt' to the 'ssh -p 22 -l root ... nc 127.0.0.1 5900' process 4) close the console and attach both of those strace logs, that should help us figure out what's going on I suspect that for some reason, closing the fd isn't causing ssh to exit and we hang while waiting for it to exit Created attachment 357937 [details]
strace from virt-manager while trying to close a console window
Created attachment 357938 [details]
strace from ssh tunnel while trying to close the console.
Find attached traces from the two processes.
Interestingly, when the virt-manager process eventually got killed off, the strace on the ssh process hung around - the ssh tunnel was still going.
virt-manager: 1250689226.708182 close(26) = -1 EBADF (Bad file descriptor) 1250689226.708228 wait4(7824, <unfinished ...> i.e. it's hung waiting for SSH to finish The EBADFD is because gtk-vnc has already closed it, AFAICS ssh: 1250689226.656615 read(5, ""..., 16384) = 0 1250689226.656653 close(5) = 0 1250689226.656698 select(8, [4], [4], NULL, NULL) = 1 (out [4]) 1250689226.656740 write(4, "<Z\2464\310Z\353OSK\221\33q\21_\3437\nt\33+5\353l\303\307#\245i\330\250\365"..., 32) = 32 1250689226.656801 select(8, [4], [], NULL, NULL) = 1 (in [4]) 1250689226.688092 read(4, "\tB$\341\331U\226\210\250\32\315\21\37M\355\332m\217\304\367M*Q+pU\265!B\201?\26\323"..., 8192) = 48 1250689226.688187 select(8, [4], [6], NULL, NULL) = 1 (out [6]) 1250689226.688253 write(6, "\0\0\0\0"..., 4) = -1 EPIPE (Broken pipe) 1250689226.688325 --- SIGPIPE (Broken pipe) @ 0 (0) --- 1250689226.688428 close(6) = 0 I'm pretty sure fds 5 and 6 are the read and write sides of the socketpair What i don't understand is why the ssh process doesn't exit, because it does for me virt-manager runs netcat on the remote end of the SSH connection. In theory closing the FD should cause netcat to see the EOF, and exit, causing SSH to exit. I reckon netcat is not behaving nicely though and thus holding open the connection. The Debian netcat has certainly got such bugs, and Fedora one has patched many bugs like that. We should explicitly kill() the SSH pid. Created attachment 357955 [details]
Test case demonstrating the problem here
I think Daniel is right on the money - I'm seeing this because I'm connecting to a Debian box.
I was actually about to make the same suggestion; I've attached my testcase.
With the kill commented out, it never returns here. Killing the child process and it does work.
The alternative might be to make ssh port-forward the remote VNC socket to the local machine and access it directly, without netcat in the way, but whether or not everyone's ssh config allows that is another matter...
*** Bug 522527 has been marked as a duplicate of this bug. *** I am able to reproduce the same bug with virsh and a nonexistant uri. Hence I am not sure this bug is related to libvirt or virt-manager. @managentserver using libvirt-0.7.5 and virt-manager-0.8.2 /usr/bin/virsh -c qemu+ssh://root@kvmhost/nonexistant OR virt-manager -c qemu+ssh://root@kvmhost/system @kvmhost (remote) using libvirt-0.7.5 and netcat-openbsd-1.89 top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 3736 root 40 0 7248 492 412 R 100 0.0 27:11.17 2 nc 3743 root 40 0 7248 504 412 R 100 0.0 25:54.38 0 nc 3750 root 40 0 7248 492 412 R 100 0.0 24:25.71 1 nc The stalled netcat commands stay there forever and eat up 100% of cpu usage. This behaviour is the same regardless of using virsh wirh a nonexistant uri or virt-manager with an existant uri. Hence these netcat commands extemely slow down the managed host could some raise the severity to high, please? Thanks Maybe this error has something to do with https://bugzilla.redhat.com/show_bug.cgi?id=562176 Created attachment 389092 [details]
libvirt patch from Ubuntu (Debian)
In Ubuntu (Debian) nc has added -q option (quit after EOF on stdin and delay of secs).
Created attachment 389093 [details]
virt-manager patch from Ubuntu (Debian)
These patches are really very bad. They make it impossible for an Ubuntu libvirt / virt-manager client to talk to any other OS running libvirt, because other OS do not have this custom '-q' option that Ubuntu added. The correct solution here is for the libvirt/virt-manager client end to explicitly kill(TERM) the SSH client they spawned once they've received EOF, rather than waiting for SSH to auto-shutdown after nc quits on EOF. This is now fixed upstream: we pass a shell script as the SSH command, and try to detect these incompatibilities. It's definitely a hack, but we don't have a lot of options: http://hg.fedorahosted.org/hg/virt-manager/rev/1f781890ea4a This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping To avoid risk here, we aren't going to be backporting this to F11. Reassigning to F12 This message is a reminder that Fedora 12 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 12. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '12'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 12's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 12 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Not backporting to F12 at this point, so closing against F13 where this is currently fixed. |