| Summary: | Install rhel5.6 Xen Fully virtualized domain guest failed,with error"TCP/IP error:vnc connection to hypervisor hsot got fefused or disconnected". | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 5 | Reporter: | wangyimiao <yimwang> | ||||||||||||
| Component: | xen | Assignee: | Xen Maintainance List <xen-maint> | ||||||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||||||||
| Severity: | urgent | Docs Contact: | |||||||||||||
| Priority: | urgent | ||||||||||||||
| Version: | 5.7 | CC: | ccui, dallan, dyuan, hjiang, jzheng, leiwang, llim, minovotn, mrezanin, pcao, qwan, syeghiay, xen-maint, yoyzhang | ||||||||||||
| Target Milestone: | rc | Keywords: | Regression | ||||||||||||
| Target Release: | --- | ||||||||||||||
| Hardware: | x86_64 | ||||||||||||||
| OS: | Linux | ||||||||||||||
| Whiteboard: | |||||||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
| Doc Text: | Story Points: | --- | |||||||||||||
| Clone Of: | Environment: | ||||||||||||||
| Last Closed: | 2011-06-13 06:47:07 UTC | Type: | --- | ||||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||||
| Documentation: | --- | CRM: | |||||||||||||
| Verified Versions: | Category: | --- | |||||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
| Attachments: |
|
||||||||||||||
|
Description
wangyimiao
2011-04-29 07:10:29 UTC
Created attachment 495725 [details]
Virt-manager log
Note messgae log:
#tail -f /var/log/messages
....................................................
Apr 29 00:07:28 localhost kernel: device vif16.0 left promiscuous mode
Apr 29 00:07:28 localhost kernel: virbr0: port 3(vif16.0) entering disabled state
00:16:36:05:b9:e4
Apr 29 00:08:02 localhost dnsmasq[3344]: DHCPREQUEST(virbr0) 192.168.122.99 00:16:36:05:b9:e4
Apr 29 00:08:02 localhost dnsmasq[3344]: DHCPACK(virbr0) 192.168.122.99 00:16:36:05:b9:e4
Apr 29 00:09:48 localhost kernel: virbr0: port 1(tap0) entering disabled state
Apr 29 00:09:48 localhost kernel: virbr0: port 1(tap0) entering disabled state
Apr 29 00:09:48 localhost kernel: device tap0 left promiscuous mode
Apr 29 00:09:48 localhost kernel: virbr0: port 1(tap0) entering disabled state
Virt-manger Log details please sees the attachment.
...............................
Thu, 28 Apr 2011 21:58:22 virt-manager 4529] DEBUG (create:736) Install completed
[Thu, 28 Apr 2011 21:58:22 virt-manager 4529] DEBUG (manager:485) About to append vm: cdrom
[Thu, 28 Apr 2011 21:58:22 virt-manager 4529] DEBUG (manager:469) VM cdrom started
[Thu, 28 Apr 2011 21:58:22 virt-manager 4529] DEBUG (details:1205) Trying console login
[Thu, 28 Apr 2011 21:58:22 virt-manager 4529] DEBUG (details:1229) Graphics console configured at vnc://127.0.0.1:5909
[Thu, 28 Apr 2011 21:58:22 virt-manager 4529] DEBUG (details:1242) Starting connect process for 127.0.0.1 5909
[Thu, 28 Apr 2011 21:58:22 virt-manager 4529] DEBUG (engine:323) window counter incremented to 3
[Thu, 28 Apr 2011 21:58:22 virt-manager 4529] DEBUG (details:1205) Trying console login
[Thu, 28 Apr 2011 21:58:22 virt-manager 4529] DEBUG (details:1229) Graphics console configured at vnc://127.0.0.1:5909
[Thu, 28 Apr 2011 21:58:22 virt-manager 4529] DEBUG (details:1242) Starting connect process for 127.0.0.1 5909
[Thu, 28 Apr 2011 21:58:22 virt-manager 4529] DEBUG (details:1125) VNC initialized
[Thu, 28 Apr 2011 22:00:47 virt-manager 4529] DEBUG (details:1111) VNC disconnected
..................................................
Could not reproduce this bug on the following components: libvirt-0.8.2-15.el5 virt-manager-0.6.1-13.el5 kernel-xen-2.6.18-238.el5 Well, I am a bit confused while trying to reproduce this bug. First you define and run 'cdrom' domain via virsh. Right after that you try to define domain with the same name. Virt-manager should really resist here. And it is on my box. And if I change the name of domain in virt-manager, I am still unable to reproduce the bug (running the same versions as you). However, this is marked as libvirt bug, so could you please provide libvirt logs as well? They might give a picture what's going on. Thanks. So finally I found the right way of reproducing this bug. Although, I believe this is a Xen bug. What is happening here: one start installation, during which xen dies. The qemu-dm process becomes zombie and therefore every attempt to connect to VNC gets rejected (in TCP handsake). However, I was unable to reproduce this with xen-3.0.3-132.el5 (libvirt & kernel stay the same) What is more interesting, libvirt logs do not show any sign of this fail. Virsh does at least. Domain is in 'no state'. Created attachment 502322 [details]
ps_axf
ps -axf
Created attachment 502323 [details]
qemu-dm.6033.log
Created attachment 502324 [details]
qemu-dm.6706.log
Created attachment 502325 [details]
xend.log
While trying to reproduce again, xen randomly rebooted (previous PID 6033, new 6706):
[root@dhcp-27-62 ~]# netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:5901 0.0.0.0:* LISTEN 6033/qemu-dm
[root@dhcp-27-62 ~]# netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:5902 0.0.0.0:* LISTEN 6706/qemu-dm
And then I reproduced this bug successfuly. As we can see from attachment 502322 [details] qemu-dm is in Zombie state, thus netstat doesn't show any process listening on localhost for incoming TCP/IP connections.
This looks like problem in guest configuration - qemu gets invalid parameters so it crash. Can you please provide more info on how to reproduce problem? What about /var/log/xen/qemu-dm.{PID}.log ? Anything relevant there?
Michal
(In reply to comment #14) > What about /var/log/xen/qemu-dm.{PID}.log ? Anything relevant there? > > Michal Sorry, overlooked that the qemu-dm is already there. I remember I run into the issue of "inp: bad size 0 0" when I was working on SCSI patchset however this is coming from the do_inp() which emulates the "in" instruction of the CPU as far as I know. Since the size coming there is bogus (zero) then it's failing with this message and dies silently (as can be seen in ioemu/target-i386-dm/helper2.c of the source codes). This means something is emulated the wrong way AFAIK. Michal Can you please retest with xen-3.0.3-128.el5? Cdrom patches should be reverted in this version so testing should be successfull. retest it on build: xen-3.0.3-132.el5 libvirt-0.8.2-20.el5 kernel-xen-2.6.18-266.el5 That issue has be reverted successful. |