Bug 1375086

Summary: [Q35] win7 BSOD when the system device connected to root port or downstream port
Product: Red Hat Enterprise Linux 7 Reporter: jingzhao <jinzhao>
Component: qemu-kvm-rhevAssignee: Marcel Apfelbaum <marcel>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 7.3CC: ailan, chayang, jinzhao, juzhang, knoel, lijin, lprosek, virt-maint, vrozenfe, xutian
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Release Note
Doc Text:
Cause: On Q35 the virtio devices connected to PCIe Root Ports or PCIe Downstream Ports are virtio 1.0 with no legacy support by default as opposed to previous versions. Consequence: Existing Windows guests having the system storage device a virtio device connected to PCIe Root Ports or PCIe Downstream Ports will not boot anymore after upgrade. Workaround: Add a second virtio non-boot disk with "disable-legacy=on", install a driver for the controller, reboot and then remove it. Result: Windows will load the right driver for the virtio boot disk and the system will boot successfully.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-12 16:31:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
BSOD screen
none
screenshot-bug1375086 none

Description jingzhao 2016-09-12 07:02:05 UTC
Created attachment 1200074 [details]
BSOD screen

Description of problem:
win7 BSOD when the system device connected to root port or downstream port

Version-Release number of selected component (if applicable):
host kernel:3.10.0-503.el7.x86_64
qemu-kvm-rhev-2.6.0-23.el7.x86_64

How reproducible:
3/3

Steps to Reproduce:
1. Boot win7 system with following command
/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-nodefaults -rtc base=utc \
-m 4G \
-smp 2,sockets=2,cores=1,threads=1 \
-enable-kvm \
-name rhel7.3 \
-k en-us \
-serial unix:/tmp/console,server,nowait \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/e1000e/seabios.log,id=seabios \
-device isa-debugcon,chardev=seabios,iobase=0x402 \
-qmp tcp::8887,server,nowait \
-vga qxl \
-spice port=5932,disable-ticketing \
-device ioh3420,id=root.0,slot=1 \
-device x3130-upstream,bus=root.0,id=upstream1 \
-device xio3130-downstream,bus=upstream1,id=downstream1,chassis=1 \
-device ioh3420,id=root.1,slot=2 \
-drive file=/home/win7bk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,bus=root.1 \
-drive file=/home/e1000e/test.qcow2,if=none,id=drive-virtio-disk1,format=qcow2,cache=none,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-disk1,id=virtio-disk1 \
-netdev tap,id=hostnet1 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=54:52:00:B6:40:22 \
-netdev tap,id=hostnet2 \
-device virtio-net-pci,netdev=hostnet2,id=net2,mac=54:52:00:B6:40:23 \
-drive file=/home/virtio-win.iso,if=none,id=ide1,format=raw,media=cdrom \
-device ide-drive,bus=ide.0,unit=0,drive=ide1,id=ide1 \
-monitor stdio \


Actual results:
Win7 system BSOD

Expected results:
win7 run successfully

Additional info:
The same problem when connected to downstream port
Didn't hit the problem when connected to pcie bus or ide controller
Also, set the kernel dump file, but didn't get the kernel file in the guest, only attach the bsod screen

Comment 4 Marcel Apfelbaum 2016-09-28 11:30:02 UTC
Commit 9a4c0e220d (hw/virtio-pci: fix virtio behaviour) modified virtio devices
behavior to be 1.0 by default for PCIe Ports and Downstream Ports.

What happens is that the Windows guests do not have virtio 1.0 drivers
installed, so Windows can't access the disk. (Possibly virtio 0.9 drivers
are installed in the images used by the QE.


For QE: please check again with virtio 1.0 drivers.

Comment 5 jingzhao 2016-09-29 02:54:37 UTC
(In reply to Marcel Apfelbaum from comment #4)
> Commit 9a4c0e220d (hw/virtio-pci: fix virtio behaviour) modified virtio
> devices
> behavior to be 1.0 by default for PCIe Ports and Downstream Ports.
> 
> What happens is that the Windows guests do not have virtio 1.0 drivers
> installed, so Windows can't access the disk. (Possibly virtio 0.9 drivers
> are installed in the images used by the QE.
> 
> 
> For QE: please check again with virtio 1.0 drivers.

Hi Marcel

  First, I check the driver when the virtio-blk device attached to the pcie bus, and it is really the virtio1.0 driver(check the screenshot-bug1372086).then I used the default driver and the device attach to root.1 port and hit the issue.

  Second, I used the disalbe-modern parameter and also hit the bz again


/usr/libexec/qemu-kvm \
-M q35 \
-cpu SandyBridge \
-nodefaults -rtc base=utc \
-m 4G \
-smp 2,sockets=2,cores=1,threads=1 \
-enable-kvm \
-name rhel7.3 \
-k en-us \
-serial unix:/tmp/console,server,nowait \
-boot menu=on \
-bios /usr/share/seabios/bios.bin \
-chardev file,path=/home/e1000e/seabios.log,id=seabios \
-device isa-debugcon,chardev=seabios,iobase=0x402 \
-qmp tcp::8887,server,nowait \
-vga qxl \
-spice port=5932,disable-ticketing \
-device ioh3420,id=root.0,slot=1 \
-device x3130-upstream,bus=root.0,id=upstream1 \
-device xio3130-downstream,bus=upstream1,id=downstream1,chassis=1 \
-device ioh3420,id=root.1,slot=2 \
-drive file=/home/win7bk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,bus=root.1,disable-legacy=on,disable-modern=off \
-drive file=/home/e1000e/test.qcow2,if=none,id=drive-virtio-disk1,format=qcow2,cache=none,werror=stop,rerror=stop \
-device virtio-blk-pci,drive=drive-virtio-disk1,id=virtio-disk1 \
-netdev tap,id=hostnet1 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=54:52:00:B6:40:22 \
-netdev tap,id=hostnet2 \
-device virtio-net-pci,netdev=hostnet2,id=net2,mac=54:52:00:B6:40:23 \
-drive file=/usr/share/virtio-win/virtio-win-1.9.0.iso,if=none,id=ide1,format=raw,media=cdrom \
-device ide-drive,bus=ide.0,unit=0,drive=ide1,id=ide1 \
-monitor stdio \


Thanks
Jing Zhao

Comment 6 jingzhao 2016-09-29 02:55:31 UTC
Created attachment 1205766 [details]
screenshot-bug1375086

Comment 7 jingzhao 2016-09-29 02:56:35 UTC
virtio-win driver version
[root@jinzhao e1000e]# rpm -qa |grep virtio-win
virtio-win-1.9.0-3.el7.noarch

Comment 8 Marcel Apfelbaum 2016-09-29 09:30:39 UTC
(In reply to jingzhao from comment #5)
> (In reply to Marcel Apfelbaum from comment #4)
> > Commit 9a4c0e220d (hw/virtio-pci: fix virtio behaviour) modified virtio
> > devices
> > behavior to be 1.0 by default for PCIe Ports and Downstream Ports.
> > 
> > What happens is that the Windows guests do not have virtio 1.0 drivers
> > installed, so Windows can't access the disk. (Possibly virtio 0.9 drivers
> > are installed in the images used by the QE.
> > 
> > 
> > For QE: please check again with virtio 1.0 drivers.
> 
> Hi Marcel
> 

Hi,

>   First, I check the driver when the virtio-blk device attached to the pcie
> bus, and it is really the virtio1.0 driver(check the
> screenshot-bug1372086).then I used the default driver and the device attach
> to root.1 port and hit the issue.
> 

Hi,
Please show the driver details with version number, not the resources page.


>   Second, I used the disalbe-modern parameter and also hit the bz again
> 

You need to use: "disable-legacy=off".

> 
[...]

> -device
> virtio-blk-pci,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,bus=root.
> 1,disable-legacy=on,disable-modern=off \

You used "disable-legacy=on,disable-modern=off", which is done by default
for PCIe ports.


Please use: disable-legacy=off, it should be enough to load Windows.
  
Thanks,
Marcel


[...]

Comment 9 Marcel Apfelbaum 2016-09-29 10:57:51 UTC
OK, I tried by myself with the virtio drivers from brew-build:
    https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=513949
and it doesn't work.

The problem seems to be that virtio-win drivers don't work with option
"disable-legacy=on". (you don't need a root port or switch, maybe even Q35)
Vadim, Ladi, can you please have a look?

Thanks,
Marcel

Comment 10 Ladi Prosek 2016-09-29 11:52:01 UTC
"disable-legacy=on" changes the device ID (from 1001 to 1042 for virtio-blk-pci) so it looks like a different device to Windows than the one it installed the driver for. I don't think that Windows does full driver discovery, ID matching, and all that jazz at boot time so in this scenario it just gives up.

My Win7 VM loads a different viostor driver depending on the disable-legacy switch (it's not a fresh install and has already seen both) so that would support the theory I think.

Can you try adding a second virtio-blk-pci non-boot disk with "disable-legacy=on", installing a driver for the controller, and then removing it and switching the boot disk to "disable-legacy=on"? That should do it.

Comment 11 Marcel Apfelbaum 2016-09-29 12:23:43 UTC
(In reply to Ladi Prosek from comment #10)

Hi Ladi, nice trick, thanks!
Before that, I tried the same, but without a second disk...

Now the question is how we document the transition to virtio 1.0 ?
Q35 for example uses virtio 1.0 (with no legacy) for devices
connected to root ports or switches. When upgrading to 7.3 they need
to follow those instructions.

Thanks,
Marcel

Comment 12 Ladi Prosek 2016-09-29 12:41:14 UTC
Hi Marcel,

Wait, so we support upgrading existing VMs from i440fx to Q35? Or moving devices between ports/switches as part of migrations? I would expect this issue to come up only when someone toggles the switch manually. If the VM was installed with this specific virtual HW config, everything should be fine. If the VM changes underneath an already installed guest OS (which I believe is the case of this BZ), all bets are off.

Would a blog post be appropriate or do you think that it calls for something more formal?

Thanks,
Ladi

Comment 13 Marcel Apfelbaum 2016-09-29 13:15:26 UTC
(In reply to Ladi Prosek from comment #12)
> Hi Marcel,
> 
> Wait, so we support upgrading existing VMs from i440fx to Q35?

No

 Or moving
> devices between ports/switches as part of migrations?

No

 I would expect this
> issue to come up only when someone toggles the switch manually.

What happens is starting with 7.3 the virtio devices have no legacy support by default (if connected to PCIe ports).

So a Windows VM on a RHEL 7.2 host would see virtio 0.95 devices. After upgrade to 7.3 the VM will see virtio 1.0 devices if connected to PCie Root Ports or Downstream Ports.

 If the VM
> was installed with this specific virtual HW config, everything should be
> fine. If the VM changes underneath an already installed guest OS (which I
> believe is the case of this BZ), all bets are off.
> 


> Would a blog post be appropriate or do you think that it calls for something
> more formal?
> 

I think we need something more formal, I'll ask for documentation, please
review my doc proposal.

> Thanks,
> Ladi

Comment 14 Ladi Prosek 2016-09-29 13:34:32 UTC
(In reply to Marcel Apfelbaum from comment #13)
> What happens is starting with 7.3 the virtio devices have no legacy support
> by default (if connected to PCIe ports).
> 
> So a Windows VM on a RHEL 7.2 host would see virtio 0.95 devices. After
> upgrade to 7.3 the VM will see virtio 1.0 devices if connected to PCie Root
> Ports or Downstream Ports.

I see, so technically a breaking a change.

> I think we need something more formal, I'll ask for documentation, please
> review my doc proposal.

Looks good!

I don't know if upgrading virtio-win drivers is usually communicated as a must when upgrading to a new RHEL version. If not, it should be in this case, or at least it should be emphasized that all drivers, not just blk/scsi, could break.