Bug 461714 - Windows PV drivers 0.97 / 64bit & 32bit sometimes cause boot hang
Summary: Windows PV drivers 0.97 / 64bit & 32bit sometimes cause boot hang
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xenpv-win
Version: 5.2
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Barry Donahue
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-09-10 05:25 UTC by Jan Mark Holzer
Modified: 2010-10-07 16:40 UTC (History)
4 users (show)

Fixed In Version: xenpv-win-0.97.4-3.el5
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-29 18:57:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Windows boot screen during hang (after 30 minutes) (48.46 KB, image/png)
2008-09-10 05:25 UTC, Jan Mark Holzer
no flags Details

Description Jan Mark Holzer 2008-09-10 05:25:11 UTC
Created attachment 316271 [details]
Windows boot screen during hang (after 30 minutes)

Description of problem:

In appr 3.7% of reboots with the PV drivers (0.97) installed in a 
64bit Windows 2k3 guest. The boot process of the guest will lockup

Version-Release number of selected component (if applicable):

0.97-0

How reproducible:

Install 0.97-0 drivers in 64bit W2k3 guest and reboot 
In a subset of reboots the guest will lockup

Steps to Reproduce:
1. Install drivers (64bit)
2. reboot guest multiple times
3. Boot process will get stuck in the "marching ants" screen
  
Actual results:

Boot hang
Expected results:

Guest should boot to login screen

Additional info:


	whilst running some test at a customer site we are finding that a small
	percentage (about 3.5%) of Windows instances get stuck in the boot
	process and never fully boot up.  The hang is so early in the process 
	that we can’t tell what is going on.  

	I was wondering if you have seen this already or if you have any
	ability to a) reproduce and b) attach a debugger.
	Also any hints/suggestions on how to get you better debug data would
	be much appreciate (ie trigger a Windows dump etc)
	A screenshot of an instance is attached to show you what the console 
	looks like when this happens.  Note that this screen shot is taken at 
	least 30 minutes after the instance was launched.

	The customer can replicate this reliably with the 0.97-0 drivers 
	on 64bit Windows images.  
	They don’t seem to be able to reproduce it on 32 bit Windows versions.

Comment 1 Jan Mark Holzer 2008-09-10 05:28:33 UTC
It seems the hang only happens with multiple vCPUs (the test guests have 4 vCPUs configured).
It usually happens right after the installation of the drivers and using a cold boot. Reboots have not shown the problem (yet).

From the Xen side the guest will run at 100% CPU with no messages in the Xen logfiles

xentop - 07:15:30   Xen 3.1.2-92.el5
2 domains: 2 running, 0 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown
Mem: 16775784k total, 16774464k used, 1320k free    CPUs: 4 @ 2600MHz
      NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k) MAXMEM(%) VCPUS
 NETS NETTX(k) NETRX(k) VBDS   VBD_OO   VBD_RD   VBD_WR SSID
dom_4388734 -----r       4720  100.2   15736704   93.8   15745024      93.9
4    0        0        0    5        0        0        0    0
  Domain-0 -----r       1873    0.8     668856    4.0   no limit       n/a     4
    4        0        0    0        0        0        0    0

Comment 2 Jan Mark Holzer 2008-09-10 05:29:33 UTC
One more data point .
It seems vCPU in the Windows guest is accumulating CPU time but none of the other vCPUs are 

xentop - 07:24:50   Xen 3.1.2-92.el5
2 domains: 1 running, 0 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown
Mem: 16775784k total, 16774464k used, 1320k free    CPUs: 4 @ 2600MHz
      NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k) MAXMEM(%) VCPUS
 NETS NETTX(k) NETRX(k) VBDS   VBD_OO   VBD_RD   VBD_WR SSID
dom_4388734 ------       5280  100.2   15736704   93.8   15745024      93.9
4    0        0        0    5        0        0        0    0
VCPUs(sec):   0:       5198s  1:         27s  2:         27s  3:         26s
  Domain-0 -----r       1887    0.8     668856    4.0   no limit       n/a     4
    4        0        0    0        0        0        0    0
VCPUs(sec):   0:        940s  1:        302s  2:        277s  3:        367s

Comment 3 Jan Mark Holzer 2008-09-11 06:36:54 UTC
 some more data from the boot hang

    - in the test environment the customer has 2 32bit W2k3 instances
      and 2 x 64bit instances. The hang does affect both architectures.
      One of the 64bit instances actually hangs appr 50% of the
      time where the other three will hang appr 10% of the time
      during boot. Both 32bit and the 64bit images are copies of
      the same corresponding "golden" master images (ie identical
      from a sw point of view/configuration inside the guest

    - Often using 'xm destroy/virsh destroy' and booting the guest
      again will get it passed the hang

    - The problem will happen not only after the initial install but
       also continue to happen afterwards

    - We will try to reproduce the hang with the PV devices removed/reduced
      but it will take some time

Comment 4 Jan Mark Holzer 2008-09-11 06:38:31 UTC
Below is the output of xenstore-ls and xenstore-ls -p /local/domain/3 (where
    3 is the domid for the hanging Windows domain)
Looks like the devices are all connected 

# xenstore-ls

tool = ""
 xenstored = ""
vm = ""
 00000000-0000-0000-0000-000000000000 = ""
  shadow_memory = "0"
  uuid = "00000000-0000-0000-0000-000000000000"
  on_reboot = "restart"
  on_poweroff = "destroy"
  name = "Domain-0"
  xend = ""
   restart_count = "0"
  vcpus = "4"
  vcpu_avail = "15"
  memory = "653"
  on_crash = "restart"
  maxmem = "1536"
 00000000-0000-0000-0000-00ec24388734 = ""
  image = "(hvm (kernel /usr/lib/xen/boot/hvmloader) (device_model /usr/lib6..."
   kernel = "/usr/lib/xen/boot/hvmloader"
   cmdline = ""
   ramdisk = ""
   dmargs = "-boot c -localtime -serial pty -vcpus 4 -acpi -domain-name dom_..."
   device-model = "/usr/lib64/xen/bin/qemu-dm"
  vncpasswd = ""
  shadow_memory = "124"
  uuid = "00000000-0000-0000-0000-00ec24388734"
  on_reboot = "destroy"
  start_time = "1221019068.54"
  on_poweroff = "destroy"
  name = "dom_4388734"
  xend = ""
   restart_count = "0"
  vcpus = "4"
  vcpu_avail = "15"
  memory = "15360"
  on_crash = "destroy"
  maxmem = "15360"
local = ""
 domain = ""
  0 = ""
   vm = "/vm/00000000-0000-0000-0000-000000000000"
   cpu = ""
    1 = ""
     availability = "online"
    3 = ""
     availability = "online"
    2 = ""
     availability = "online"
    0 = ""
     availability = "online"
   name = "Domain-0"
   console = ""
    limit = "1048576"
   domid = "0"
   memory = ""
    target = "668672"
   backend = ""
    vif = ""
     3 = ""
      0 = ""
       domain = "dom_4388734"
       handle = "0"
       script = "/etc/xen/scripts/ec2-vif-route-dhcpd"
       state = "4"
       frontend = "/local/domain/3/device/vif/0"
       mac = "12:31:3B:00:35:01"
       online = "1"
       frontend-id = "3"
       feature-sg = "1"
       feature-gso-tcpv4 = "1"
       feature-rx-copy = "1"
       hotplug-status = "connected"
    vbd = ""
     3 = ""
      51728 = ""
       domain = "dom_4388734"
       frontend = "/local/domain/3/device/vbd/51728"
       format = "raw"
       dev = "xvdb"
       state = "4"
       params = "/dev/mapper/cow-VolGroupDomU-instance_ephemeral_store_0_b"
       mode = "w"
       online = "1"
       frontend-id = "3"
       type = "phy"
       physical-device = "fd:8"
       hotplug-status = "connected"
       sectors = "880732160"
       info = "0"
       sector-size = "512"
      51744 = ""
       domain = "dom_4388734"
       frontend = "/local/domain/3/device/vbd/51744"
       format = "raw"
       dev = "xvdc"
       state = "4"
       params = "/dev/mapper/cow-VolGroupDomU-instance_ephemeral_store_0_c"
       mode = "w"
       online = "1"
       frontend-id = "3"
       type = "phy"
       physical-device = "fd:a"
       hotplug-status = "connected"
       sectors = "880732160"
       info = "0"
       sector-size = "512"
      51760 = ""
       domain = "dom_4388734"
       frontend = "/local/domain/3/device/vbd/51760"
       format = "raw"
       dev = "xvdd"
       state = "4"
       params = "/dev/mapper/cow-VolGroupDomU-instance_ephemeral_store_0_d"
       mode = "w"
       online = "1"
       frontend-id = "3"
       type = "phy"
       physical-device = "fd:c"
       hotplug-status = "connected"
       sectors = "880732160"
       info = "0"
       sector-size = "512"
      51776 = ""
       domain = "dom_4388734"
       frontend = "/local/domain/3/device/vbd/51776"
       format = "raw"
       dev = "xvde"
       state = "4"
       params = "/dev/mapper/cow-VolGroupDomU-instance_ephemeral_store_0_e"
       mode = "w"
       online = "1"
       frontend-id = "3"
       type = "phy"
       physical-device = "fd:e"
       hotplug-status = "connected"
       sectors = "880732160"
       info = "0"
       sector-size = "512"
      768 = ""
       domain = "dom_4388734"
       frontend = "/local/domain/3/device/vbd/768"
       format = "raw"
       dev = "hda"
       state = "4"
       params = "/mnt/instance_image_store_0/89736"
       mode = "w"
       online = "1"
       frontend-id = "3"
       type = "file"
       node = "/dev/loop1"
       physical-device = "7:1"
       hotplug-status = "connected"
       sectors = "20971520"
       info = "0"
       sector-size = "512"
  3 = ""
   vm = "/vm/00000000-0000-0000-0000-00ec24388734"
   device-misc = ""
    vif = ""
     nextDeviceID = "1"
   device = ""
    vif = ""
     0 = ""
      backend-id = "0"
      mac = "12:31:3B:00:35:01"
      handle = "0"
      state = "4"
      backend = "/local/domain/0/backend/vif/3/0"
      tx-ring-ref = "768"
      rx-ring-ref = "769"
      event-channel = "12"
      request-rx-copy = "1"
      feature-rx-notify = "1"
      feature-sg = "0"
      feature-gso-tcpv4 = "0"
    vbd = ""
     51728 = ""
      backend-id = "0"
      virtual-device = "51728"
      device-type = "disk"
      state = "4"
      backend = "/local/domain/0/backend/vbd/3/51728"
      ring-ref = "12"
      event-channel = "11"
      protocol = "x86_64-abi"
     51744 = ""
      backend-id = "0"
      virtual-device = "51744"
      device-type = "disk"
      state = "4"
      backend = "/local/domain/0/backend/vbd/3/51744"
      ring-ref = "11"
      event-channel = "10"
      protocol = "x86_64-abi"
     51760 = ""
      backend-id = "0"
      virtual-device = "51760"
      device-type = "disk"
      state = "4"
      backend = "/local/domain/0/backend/vbd/3/51760"
      ring-ref = "10"
      event-channel = "9"
      protocol = "x86_64-abi"
     51776 = ""
      backend-id = "0"
      virtual-device = "51776"
      device-type = "disk"
      state = "4"
      backend = "/local/domain/0/backend/vbd/3/51776"
      ring-ref = "9"
      event-channel = "8"
      protocol = "x86_64-abi"
     768 = ""
      backend-id = "0"
      virtual-device = "768"
      device-type = "disk"
      state = "4"
      backend = "/local/domain/0/backend/vbd/3/768"
      ring-ref = "8"
      event-channel = "7"
      protocol = "x86_64-abi"
    savedstate = "FFFFFADE751FB3D0"
   console = ""
    port = "6"
    limit = "1048576"
    vnc-port = "5900"
    tty = "/dev/pts/0"
   cpu = ""
    3 = ""
     availability = "online"
    2 = ""
     availability = "online"
    0 = ""
     availability = "online"
    1 = ""
     availability = "online"
   name = "dom_4388734"
   domid = "3"
   memory = ""
    target = "15728640"
   store = ""
    ring-ref = "983038"
    port = "5"
   serial = ""
    0 = ""
     tty = "/dev/pts/0"

# xenstore-ls -p /local/domain/3

vm = "/vm/00000000-0000-0000-0000-00ec24388734"  . . . . . .  (n3)
device-misc = "" . . . . . . . . . . . . . . . . . . . . . .  (n3)
 vif = ""  . . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
  nextDeviceID = "1" . . . . . . . . . . . . . . . . . . . .  (n3)
device = ""  . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
 vif = ""  . . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
  0 = "" . . . . . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   backend-id = "0"  . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   mac = "12:31:3B:00:35:01" . . . . . . . . . . . . . . . .  (n3,r0)
   handle = "0"  . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   state = "4" . . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   backend = "/local/domain/0/backend/vif/3/0" . . . . . . .  (n3,r0)
   tx-ring-ref = "768" . . . . . . . . . . . . . . . . . . .  (n3,r0)
   rx-ring-ref = "769" . . . . . . . . . . . . . . . . . . .  (n3,r0)
   event-channel = "12"  . . . . . . . . . . . . . . . . . .  (n3,r0)
   request-rx-copy = "1" . . . . . . . . . . . . . . . . . .  (n3,r0)
   feature-rx-notify = "1" . . . . . . . . . . . . . . . . .  (n3,r0)
   feature-sg = "0"  . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   feature-gso-tcpv4 = "0" . . . . . . . . . . . . . . . . .  (n3,r0)
 vbd = ""  . . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
  51728 = "" . . . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   backend-id = "0"  . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   virtual-device = "51728"  . . . . . . . . . . . . . . . .  (n3,r0)
   device-type = "disk"  . . . . . . . . . . . . . . . . . .  (n3,r0)
   state = "4" . . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   backend = "/local/domain/0/backend/vbd/3/51728" . . . . .  (n3,r0)
   ring-ref = "12" . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   event-channel = "11"  . . . . . . . . . . . . . . . . . .  (n3,r0)
   protocol = "x86_64-abi" . . . . . . . . . . . . . . . . .  (n3,r0)
  51744 = "" . . . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   backend-id = "0"  . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   virtual-device = "51744"  . . . . . . . . . . . . . . . .  (n3,r0)
   device-type = "disk"  . . . . . . . . . . . . . . . . . .  (n3,r0)
   state = "4" . . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   backend = "/local/domain/0/backend/vbd/3/51744" . . . . .  (n3,r0)
   ring-ref = "11" . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   event-channel = "10"  . . . . . . . . . . . . . . . . . .  (n3,r0)
   protocol = "x86_64-abi" . . . . . . . . . . . . . . . . .  (n3,r0)
  51760 = "" . . . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   backend-id = "0"  . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   virtual-device = "51760"  . . . . . . . . . . . . . . . .  (n3,r0)
   device-type = "disk"  . . . . . . . . . . . . . . . . . .  (n3,r0)
   state = "4" . . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   backend = "/local/domain/0/backend/vbd/3/51760" . . . . .  (n3,r0)
   ring-ref = "10" . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   event-channel = "9" . . . . . . . . . . . . . . . . . . .  (n3,r0)
   protocol = "x86_64-abi" . . . . . . . . . . . . . . . . .  (n3,r0)
  51776 = "" . . . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   backend-id = "0"  . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   virtual-device = "51776"  . . . . . . . . . . . . . . . .  (n3,r0)
   device-type = "disk"  . . . . . . . . . . . . . . . . . .  (n3,r0)
   state = "4" . . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   backend = "/local/domain/0/backend/vbd/3/51776" . . . . .  (n3,r0)
   ring-ref = "9"  . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   event-channel = "8" . . . . . . . . . . . . . . . . . . .  (n3,r0)
   protocol = "x86_64-abi" . . . . . . . . . . . . . . . . .  (n3,r0)
  768 = "" . . . . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   backend-id = "0"  . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   virtual-device = "768"  . . . . . . . . . . . . . . . . .  (n3,r0)
   device-type = "disk"  . . . . . . . . . . . . . . . . . .  (n3,r0)
   state = "4" . . . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   backend = "/local/domain/0/backend/vbd/3/768" . . . . . .  (n3,r0)
   ring-ref = "8"  . . . . . . . . . . . . . . . . . . . . .  (n3,r0)
   event-channel = "7" . . . . . . . . . . . . . . . . . . .  (n3,r0)
   protocol = "x86_64-abi" . . . . . . . . . . . . . . . . .  (n3,r0)
 savedstate = "FFFFFADE751FB3D0" . . . . . . . . . . . . . .  (n3)
console = "" . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
 port = "6"  . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
 limit = "1048576" . . . . . . . . . . . . . . . . . . . . .  (n3)
 vnc-port = "5900" . . . . . . . . . . . . . . . . . . . . .  (n3)
 tty = "/dev/pts/0"  . . . . . . . . . . . . . . . . . . . .  (n3)
cpu = "" . . . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
 3 = ""  . . . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
  availability = "online"  . . . . . . . . . . . . . . . . .  (n3)
 2 = ""  . . . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
  availability = "online"  . . . . . . . . . . . . . . . . .  (n3)
 0 = ""  . . . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
  availability = "online"  . . . . . . . . . . . . . . . . .  (n3)
 1 = ""  . . . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
  availability = "online"  . . . . . . . . . . . . . . . . .  (n3)
name = "dom_4388734" . . . . . . . . . . . . . . . . . . . .  (n3)
domid = "3"  . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
memory = ""  . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
 target = "15728640" . . . . . . . . . . . . . . . . . . . .  (n3)
store = "" . . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
 ring-ref = "983038" . . . . . . . . . . . . . . . . . . . .  (n3)
 port = "5"  . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
serial = ""  . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
 0 = ""  . . . . . . . . . . . . . . . . . . . . . . . . . .  (n3)
  tty = "/dev/pts/0" . . . . . . . . . . . . . . . . . . . .  (n3)

Comment 5 Jan Mark Holzer 2008-09-11 06:39:22 UTC
Guest config via '# xm list --long'

# xm li -l
(domain
    (domid 0)
    (uuid 00000000-0000-0000-0000-000000000000)
    (vcpus 4)
    (cpu_weight 1.0)
    (memory 653)
    (shadow_memory 0)
    (maxmem 1536)
    (features )
    (name Domain-0)
    (on_poweroff destroy)
    (on_reboot restart)
    (on_crash restart)
    (state r-----)
    (shutdown_reason poweroff)
    (cpu_time 3042.59622221)
    (online_vcpus 4)
)
(domain
    (domid 3)
    (uuid 00000000-0000-0000-0000-00ec24388734)
    (vcpus 4)
    (cpu_weight 1.0)
    (memory 15360)
    (shadow_memory 124)
    (maxmem 15360)
    (features )
    (name dom_4388734)
    (on_poweroff destroy)
    (on_reboot destroy)
    (on_crash destroy)
    (image
        (hvm
            (kernel /usr/lib/xen/boot/hvmloader)
            (device_model /usr/lib64/xen/bin/qemu-dm)
            (boot c)
            (localtime 1)
            (serial pty)
            (apic 1)
            (acpi 1)
            (pae 1)
            (vnc 1)
            (vncpasswd xyz)
            (vncunused 0)
            (vnclisten 127.0.0.1)
            (vncdisplay 0)
            (vcpus 4)
        )
    )
    (device
        (vif (backend 0) (script XXX-vif-route-dhcpd) (mac 12:31:3B:00:35:01))
    )
    (device
        (vbd
            (backend 0)
            (dev xvdb:disk)
            (uname
                phy:/dev/mapper/cow-VolGroupDomU-instance_ephemeral_store_0_b
            )
            (mode w)
        )
    )
    (device
        (vbd
            (backend 0)
            (dev xvdc:disk)
            (uname
                phy:/dev/mapper/cow-VolGroupDomU-instance_ephemeral_store_0_c
            )
            (mode w)
        )
    )
    (device
        (vbd
            (backend 0)
            (dev xvdd:disk)
            (uname
                phy:/dev/mapper/cow-VolGroupDomU-instance_ephemeral_store_0_d
            )
            (mode w)
        )
    )
    (device
        (vbd
            (backend 0)
            (dev xvde:disk)
            (uname
                phy:/dev/mapper/cow-VolGroupDomU-instance_ephemeral_store_0_e
            )
            (mode w)
        )
    )
    (device
        (vbd
            (backend 0)
            (dev hda:disk)
            (uname file:/mnt/instance_image_store_0/89736)
            (mode w)
        )
    )
    (state r-----)
    (shutdown_reason poweroff)
    (cpu_time 49859.5228111)
    (online_vcpus 4)
    (up_time 49762.3442221)
    (start_time 1221019068.54)
    (store_mfn 983038)
)

Comment 6 Jan Mark Holzer 2008-09-15 02:12:24 UTC
The problem has been reproduced with 32bit and 64bit guests

Some more info from the customer

1. We are able to reproduce the boot hang problem with the latest
   PV drivers (fix for network performance regression and save/restore)
2. We are not able to repro it on the image without any PV drivers 
    installed which indicate that this actually is PV driver issue. 
3. We have confirmed that windows hang only after all the drivers 
   are loaded by enabling boot logging in windows.
4. We tried to further isolate between network and disk driver but 
   looks like uninstall specific driver does not work properly.
   I uninstalled network driver, rebooted the machine. Network 
   did not come back but device driver still showed Network driver 
   attached so I don't know if net PV driver loaded or not. 
   But I was able to repro hang with this image so likely hood of 
   the problem is higher in Disk driver.

Comment 7 Jan Mark Holzer 2008-09-16 01:40:02 UTC

Hi Scott,

the customer has reproduced the hang with just a boot disk device and a network device. So without any secondary disk drivse. They still are not able to
reproduce hang without network device which might point more towards the
 culprit being the network PV drivers.

The customer is also using network-route configuration and not bridge.
We have captured the tcp traffic on the virtual network interface in dom0
corresponding to the domU/Windows guest and there are some network traffic
right at the time when the hang happens.
I have attached the traffic for the run when boot hangs and when boot does not
hang. You can see that when boot hangs, domU has not even made DHCP request but
some network traffic before so it seems the network driver is at least
initialized.


As it might be difficult to reproduce the customer's network configuration
it would be great if you could give us PV drivers that has detailed logging/
tracing enabled to log at every step.
That way the customer could reproduce the hang and send us the logs which
might help track down the problem.


If there is anything else we can do to help you track this down please let
us know .

Thx in advance,

    Jan



tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vifI4458598.0, link-type EN10MB (Ethernet), capture size 96 bytes
23:40:57.517069 IP6 fe80::fcff:ffff:feff:ffff > ff02::2: ICMP6, router solicitation, length 16
23:40:57.525025 IP6 fe80::fcff:ffff:feff:ffff > ff02::16: HBH ICMP6, multicast listener report v2, 2 group record(s), length 48
23:40:57.617146 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 6/0/0[|domain]
23:40:57.625136 IP6 	.mdns > ff02::fb.mdns:  0 [5q] [9n][|domain]
23:40:57.877185 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0 [5q] [9n][|domain]
23:40:58.129123 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0 [5q] [9n][|domain]
23:40:58.329341 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 9/0/0[|domain]
23:40:58.673111 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 6/0/0[|domain]
23:40:59.385255 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 9/0/0[|domain]
23:41:00.729070 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 13/0/0[|domain]
23:41:01.440951 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 5/0/0[|domain]
23:41:01.441118 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 4/0/0[|domain]
23:41:01.516754 IP6 fe80::fcff:ffff:feff:ffff > ff02::2: ICMP6, router solicitation, length 16
23:41:01.940723 IP6 fe80::fcff:ffff:feff:ffff > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
23:41:05.516509 IP6 fe80::fcff:ffff:feff:ffff > ff02::2: ICMP6, router solicitation, length 16
23:41:26.880919 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 12:31:3b:00:34:91 (oui Unknown), length: 320
23:41:26.882087 IP 169.254.1.0.bootps > 10.250.59.95.bootpc: BOOTP/DHCP, Reply, length: 300
23:41:26.883604 arp who-has 10.250.59.95 tell 10.250.59.95
23:41:27.631049 arp who-has 10.250.59.95 tell 10.250.59.95
23:41:28.630968 arp who-has 10.250.59.95 tell 10.250.59.95
23:41:29.709246 IP 10.250.59.95.netbios-ns > 10.250.59.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
23:41:30.459040 IP 10.250.59.95.netbios-ns > 10.250.59.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
23:41:31.208906 IP 10.250.59.95.netbios-ns > 10.250.59.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
23:41:31.958990 IP 10.250.59.95.netbios-ns > 10.250.59.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
23:41:32.709232 IP 10.250.59.95.netbios-ns > 10.250.59.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
23:41:33.458780 IP 10.250.59.95.netbios-ns > 10.250.59.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
23:41:34.208705 IP 10.250.59.95.netbios-ns > 10.250.59.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST
23:41:34.958658 IP 10.250.59.95.netbios-ns > 10.250.59.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST



23:37:17.492742 IP6 :: > ff02::1:ffff:ffff: ICMP6, neighbor solicitation, who has fe80::fcff:ffff:feff:ffff, length 24
23:37:18.492689 IP6 fe80::fcff:ffff:feff:ffff > ff02::2: ICMP6, router solicitation, length 16
23:37:18.500629 IP6 fe80::fcff:ffff:feff:ffff > ff02::16: HBH ICMP6, multicast listener report v2, 2 group record(s), length 48
23:37:18.692909 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0 [5q] [9n][|domain]
23:37:18.720727 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 6/0/0[|domain]
23:37:18.944778 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0 [5q] [9n][|domain]
23:37:19.196820 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0 [5q] [9n][|domain]
23:37:19.397035 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 9/0/0[|domain]
23:37:19.868664 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 6/0/0[|domain]
23:37:20.544956 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 9/0/0[|domain]
23:37:21.516422 IP6 fe80::fcff:ffff:feff:ffff > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
23:37:22.016771 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 13/0/0[|domain]
23:37:22.516370 IP6 fe80::fcff:ffff:feff:ffff > ff02::2: ICMP6, router solicitation, length 16
23:37:22.692773 IP6 fe80::fcff:ffff:feff:ffff.mdns > ff02::fb.mdns:  0*- [0q] 9/0/0[|domain]
23:37:26.516105 IP6 fe80::fcff:ffff:feff:ffff > ff02::2: ICMP6, router solicitation, length 16



_______________________________________________

Comment 8 RHEL Program Management 2008-09-26 22:03:53 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 9 Perry Myers 2008-11-02 16:52:26 UTC
This has been fixed in version 0.97.4 of the drivers and needs to be verified by QE.

Comment 10 Barry Donahue 2008-11-03 17:46:33 UTC
verified in 97.4


Note You need to log in before you can comment on or make changes to this bug.