Bug 740657

Summary: Fedora 16 beta Xen kbdfront hangs on rhel5 dom0
Product: [Fedora] Fedora Reporter: Pasi Karkkainen <pasik>
Component: kernelAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 16CC: drjones, gansalmon, itamar, jonathan, kernel-maint, ketuzsezr, lersek, madhu.chinakonda, mrezanin
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-03 16:42:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 741684    

Description Pasi Karkkainen 2011-09-22 20:38:11 UTC
Description of problem:

Installing Fedora 16 beta (tc2) as Xen PV domU on rhel 5.6 or 5.7 dom0 with virt-install or virt-manager fails. The graphical console for the domU does not open at all. 

Installing Fedora 14 or Fedora 15 domUs with graphical pvfb works OK on the same dom0.


Version-Release number of selected component (if applicable):
Fedora 16 beta tc2 x86_64.

How reproducible:
Always.

Steps to Reproduce:
1. On rhel5 dom0 use virt-install to install, usage syntax below.
2. VNC console (virt-viewer) does not open.

  
Actual results:
virt-viewer/vnc console does not open, because the xen pvfb backend stays in semi-initialized state (2). It never gets fully initialized state (4) based on xenstore entries. Manually launching virt-viewer or vncviewer doesn't work either.

Expected results:
Xen pvfb graphical console works for the F16 domU and VNC window opens.

Additional info:

xenstore vfb entry for failing F16 beta tc2 Xen PV domU (from xenstore-ls):

     4 = ""
      0 = ""
       vncunused = "1"
       domain = "f16test64"
       frontend = "/local/domain/4/device/vfb/0"
       state = "2"
       keymap = "fi"
       online = "1"
       frontend-id = "4"
       type = "vnc"
       hotplug-status = "connected"

xenstore vfb entry for a working F15 Xen PV domU:

     5 = ""
      0 = ""
       vncunused = "1"
       domain = "f15test64"
       frontend = "/local/domain/5/device/vfb/0"
       state = "4"
       keymap = "fi"
       online = "1"
       frontend-id = "5"
       type = "vnc"
       hotplug-status = "connected"
       request-update = "1"

xenstore vfb entry for a working EL5 Xen PV domU:

     2 = ""
      0 = ""
       vncunused = "1"
       domain = "srv1"
       frontend = "/local/domain/2/device/vfb/0"
       xauthority = "/root/.Xauthority"
       state = "4"
       keymap = "fi"
       online = "1"
       frontend-id = "2"
       type = "vnc"
       display = "localhost:10.0"
       hotplug-status = "connected"
       request-update = "1"


The difference seems to be the "state", which is "2" for the non-working F16 domU, and "4" for the working domUs.

Command I used to install F16 beta tc2:
virt-install -d -n f16test64 -r 1024 --vcpus=2 -f /dev/VolGroup00/f16test64 --vnc -p -l "http://webserver/fedora/mounted-f16-beta-tc2-x64-iso/"

If this bug should be re-assigned against rhel5, please do so..

Comment 1 Laszlo Ersek 2011-10-03 14:52:22 UTC
The backend side of fbfront is in xen-userspace (qemu-dm), file "tools/ioemu/hw/xenfb.c". I reproduced the hang (-133) and looked at the qemu-dm process with gdb.

#0  0x0000003b9180aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

#1  0x000000381d602e12 in xs_read_watch (h=0x1480aa0, num=0x7fff6376526c) at xs.c:586

#2  0x00000000004716ef in xenfb_wait_for_state (xsh=0x1480aa0, dir=0x14809e0 "/local/domain/2/device/vkbd/0", 
    awaited=<value optimized out>) at /usr/src/debug/xen-3.1.0-src/tools/ioemu/hw/xenfb.c:257

#3  0x0000000000471ddd in xenfb_wait_for_frontend_initialised (dev=<value optimized out>)
    at /usr/src/debug/xen-3.1.0-src/tools/ioemu/hw/xenfb.c:360

#4  0x0000000000472716 in xenfb_attach_dom (xenfb_pub=0x14808b0, domid=2)
    at /usr/src/debug/xen-3.1.0-src/tools/ioemu/hw/xenfb.c:659

#5  0x000000000047097e in xen_init_pv (ram_size=<value optimized out>, vga_ram_size=<value optimized out>, 
    boot_device=<value optimized out>, ds=0x6d2180, fd_filename=<value optimized out>, 
    snapshot=<value optimized out>, kernel_filename=0x0, kernel_cmdline=0x483d8e "", initrd_filename=0x0, 
    timeoffset=0) at /usr/src/debug/xen-3.1.0-src/tools/ioemu/hw/xen_machine_pv.c:254

#6  0x000000000040b14f in main (argc=12, argv=0x7fff637680c8) at /usr/src/debug/xen-3.1.0-src/tools/ioemu/vl.c:6794


Importantly, frame #4 @ tools/ioemu/hw/xenfb.c:659 describes a call to xenfb_wait_for_frontend_initialised() that waits for the *keyboard*.

(gdb) frame 4
#4  0x0000000000472716 in xenfb_attach_dom (xenfb_pub=0x14808b0, domid=2)
    at /usr/src/debug/xen-3.1.0-src/tools/ioemu/hw/xenfb.c:659
659             if (xenfb_wait_for_frontend_initialised(&xenfb->kbd) < 0)

In xenfb_wait_for_frontend_initialised(), there's a comment like "TODO fudging state to permit restarting; to be removed"; added in (huge) commit 0ba0e891 for bug 218050.

Comment 2 Pasi Karkkainen 2011-10-03 14:58:51 UTC
Hmm.. so I wonder if it'll just work when there's a fedora 16 build with this bugfix included:
https://bugzilla.redhat.com/show_bug.cgi?id=740378

Comment 3 Laszlo Ersek 2011-10-03 16:13:46 UTC
From bug 740378 comment 0:
> After the install was finished I poked around in the initrd and noticed that
> the xen-kbdfront was missing.

That's it.

Comment 4 Laszlo Ersek 2011-10-03 16:42:39 UTC

*** This bug has been marked as a duplicate of bug 740378 ***