Description of problem: When trying to load the vioscsi in Windows 10 installation, the guest stucks at the loading screen. It does not freeze though, I can still move the cursor and the progress bar is still going back and forth. Then when I try to kill qemu in terminal with ^C, the process turns into some kind of zombie (some "qemu-system-x86" which doesn't really exist). The device is a first gen Intel X25-M SSD connected to USB 3 through a StarTech UAS adapter: http://ark.intel.com/products/56604/Intel-SSD-X25-M-Series-80GB-2_5in-SATA-3Gbs-50nm-MLC http://www.startech.com/HDD/Adapters/USB-3-SATA-adapter-cable-with-UASP~USB3S2SAT3CB I can load vioscsi successfully in the guest if I use the "u" quirk of "usb-storage" in host so that the device does not bind to the uas driver. scsi-hd and viostor works on uas too, just not scsi-block. Also the device never showed any problem when being used physically under Windows or Linux. Same issue occured on a Windows Server 2012 R2 installation. I can complete the installation of Windows 10 with scsi-block on usb-storage and Arch Linux with scsi-block on uas. Feel free to let me know if you need some more other info. Version-Release number of selected component (if applicable): 0.1.110 Additional info: linux 4.2.5-1 and qemu-2.4.0.1-1 on Arch Linux command: qemu-system-x86_64 -enable-kvm -cpu host -m 4G -device virtio-scsi-pci -drive file=PATH_TO_DRIVE,if=none,format=raw,id=system -device scsi-block,drive=system -drive file=PATH_TO_WIN_ISO,media=cdrom -drive file=PATH_TO_VIRTIO_WIN_ISO,media=cdrom -full-screen
Created attachment 1088489 [details] screenshot of guest when issue occured it did not freeze, just stuck
Created attachment 1088490 [details] screenshot of terminals on host
can we try reproducing this problem? Thanks, Vadim.
(In reply to Vadim Rozenfeld from comment #3) > can we try reproducing this problem? > > Thanks, > Vadim. QE already reproduce similar issue with the following builds, qemu-kvm-rhev-2.3.0-31.el7.x86_64 virtio-win-1.8.0-4.el7.noarch.rpm or build 110 kernel-3.10.0-326.el7.x86_64 cli,/usr/libexec/qemu-kvm -name win10-32 -enable-kvm -m 4G -smp 4 -uuid 9e59c1d2-23ee-41cb-a992-6c45ae4d9cb5 -nodefconfig -nodefaults -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device virtio-scsi-pci,bus=pci.0,addr=0x5,id=scsi0 -drive file=/dev/vgtest/lvtest,if=none,id=drive-virtio-disk,format=raw,cache=none,aio=native,werror=stop,rerror=stop -device scsi-block,bus=scsi0.0,drive=drive-virtio-disk,id=scsi1 -drive file=en_windows_10_enterprise_x86_dvd_6851156.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=0 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=isa_serial0 -vnc 0.0.0.0:0 -vga cirrus -monitor stdio -drive file=virtio-win-prewhql-0.1-110.iso,if=none,media=cdrom,id=drive-ide0-1-1,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -drive file=/usr/share/virtio-win/virtio-win-1.8.0_x86.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none -global isa-fdc.driveA=drive-fdc0-0-0 -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:14:21:12:13,bus=pci.0
Created attachment 1089427 [details] screenshot
Any progress so far? I am still experiencing the issue with virtio-win-0.1.117.iso / qemu 2.5.0 / linux 4.5.1 / Windows 10 Build 10586. Btw this works fine (I mean the viostor driver loads without a problem): qemu-system-x86_64 -enable-kvm -cpu host -smp cores=4 -m 4G -net none -full-screen -drive file=Downloads/10586.0.151029-1700.TH2_RELEASE_CLIENTENTERPRISEEVAL_OEMRET_X64FRE_EN-US.ISO,media=cdrom -drive file=Downloads/virtio-win-0.1.117.iso,media=cdrom -drive file=/dev/sdc,format=raw,cache=none,aio=native,if=none,id=system -device virtio-blk-pci,drive=system,scsi=on With "scsi=on" virtio-blk-pci does SCSI passthrough just like virtio-scsi-pci does (can be checked with sg3_utils for example, and also `hdparm -I` if qemu is run with root/sudo), so that's probably not the cause of the issue? Should I also file a bug report on the qemu tracker, in case it's a bug in the virtio-scsi-pci code instead of the vioscsi Windows driver?
(In reply to Tom Yan from comment #6) > > With "scsi=on" virtio-blk-pci does SCSI passthrough just like > virtio-scsi-pci does (can be checked with sg3_utils for example, and also > `hdparm -I` if qemu is run with root/sudo), so that's probably not the cause > of the issue? > Hmm never mind. It seems that "scsi=on" only has effect on Linux guests. viostor on Windows seems to respond to SCSI commands (issued with sg_vpd, sg_inq...) but expose only a virtual SCSI layer anyway, while on Linux guests virtio-blk does not respond to SCSI commands unless "scsi=on" is specified, and when it is specified, it expose the physical SCSI layer to the guest.
Created attachment 1153006 [details] ehci + uas + vioscsi + scsi-block Just found out that this issue only occur with xhci + uas but not ehci + uas (same computer with "Intel xHCI Mode" in UEFI settings set to "Disabled" instead of "Enabled"). Maybe it has something to do with the "streams" thing (https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/storage/uas.c?h=v4.5#n869)?
Created attachment 1153009 [details] xhci + uas + vioscsi + scsi-block I then disable the disk drive in Device Manager of the Windows guest, shutdown the guest, then reboot the host with "Intel xHCI Mode" set to "Enabled". The guest starts fine with the drive (in xhci + uas mode) attached (but disabled in the guest). It only struggles when I try to enable the disk drive in Device Manager again.
Hi Tom, Is this still an issue for you?Could you try with latest virtio-win version? Thanks.
Well I will not be at liberty to try reproduce it with similar hardware configuration in the foreseeable future. (Namely, connecting the adapter to a SuperSpeed port) As I told in my past message, ths issue only occurs with XHCI. However, what I meant by "XHCI" was merely an XHCI SuperSpeed port. I never tested with an XHCI HiSpeed port, which is now the only hardware I can get access to. The problem does not occur with that, just like it didn't occur with EHCI. I don't see how it could suddenly get fixed anyway, since I am not seeing any commit to the vioscsi win driver or the uas kernel driver since I last tested. And this is still the best lead I could see myself: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/storage/uas.c?h=v4.12#n905 (mainly usb_alloc_streams() / use_streams, because IIRC, qdepth is set to the same value for either case with this particular device)
Hi, Tried to reproduce this bug with our test env: 1. cannot reproduce it with the comment#4 steps and versions. vioscsi loading normally with scsi-block. 2. cannot reproduce it with the comment#4 steps and our latest versions. vioscsi loading normally with scsi-block. Used versions: kernel-3.10.0-691.el7.x86_64 qemu-kvm-rhev-2.9.0-14.el7.x86_64 virtio-win-1.9.3-1.el7.noarch seabios-1.10.2-3.el7.x86_64 3. Tried to reproduce this bug with a USB3.0-HD due to we don't have the env "Intel X25-M SSD connected to USB 3 through a StarTech UAS adapter", didn't reproduce it. vioscsi driver loading normally with scsi-block. Used vioscsi driver version is build 110. Best Regards~ Peixiu Hou
Hi Vadim, QE can't reproduce this issue as we don't have such device. Do you have any idea about this bug?
(In reply to lijin from comment #13) > Hi Vadim, > > QE can't reproduce this issue as we don't have such device. > > Do you have any idea about this bug? Hi Li Jin, Honestly I don't see any other way to troubleshoot this case rather than reproducing the problem on our setup with a slightly customized driver to print out SRBs execution flow. Otherwise, only if Tom can dump scsi related traffic with "trace-event scsi_* on" command, issued in qemu monitor for both, successul and non-successful runs, we can try analysing two different trace logs and try to guess the problem. Best regards, Vadim.
Okay I end up with getting access to a machine with XHCI SuperSpeed ports (and HiSpeed ports). I can reproduce the issue with virtio-win-0.1.110-2 ISO, and can also confirm that the issue is apparently fixed in both virtio-win-0.1.126-2 and virtio-win-0.1.141-1 ISOs (current stable and latest). I tested three cases with the "trace-event scsi_* on" suggested by Vadim anyway and did found something. 110-2 on SuperSpeed port: ... 1955:scsi_req_parsed target 0 lun 0 tag 925820064 command 40 dir 1 length 512 1955:scsi_req_parsed_lba target 0 lun 0 tag 925820064 command 40 lba 0 1955:scsi_req_alloc target 0 lun 0 tag 925820064 1955:scsi_req_continue target 0 lun 0 tag 925820064 1955:scsi_req_dequeue target 0 lun 0 tag 925820064 1955:scsi_req_parsed target 0 lun 0 tag 1904113856 command 0 dir 1 length 556 1955:scsi_req_parsed_lba target 0 lun 0 tag 1904113856 command 0 lba 0 1955:scsi_req_alloc target 0 lun 0 tag 1904113856 1955:scsi_test_unit_ready target 0 lun 0 tag 1904113856 1955:scsi_req_continue target 0 lun 0 tag 1904113856 ^Cqemu-system-x86_64: terminating on signal 2 (Had to unplug the device to get qemu killed) 1955:scsi_req_dequeue target 0 lun 0 tag 1904113856 ... 110-2 on HiSpeed port: ... 2033:scsi_req_parsed target 0 lun 0 tag -829434720 command 40 dir 1 length 512 2033:scsi_req_parsed_lba target 0 lun 0 tag -829434720 command 40 lba 0 2033:scsi_req_alloc target 0 lun 0 tag -829434720 2033:scsi_req_continue target 0 lun 0 tag -829434720 2033:scsi_req_dequeue target 0 lun 0 tag -829434720 2033:scsi_req_parsed target 0 lun 0 tag 1599908032 command 0 dir 1 length 556 2033:scsi_req_parsed_lba target 0 lun 0 tag 1599908032 command 0 lba 0 2033:scsi_req_alloc target 0 lun 0 tag 1599908032 2033:scsi_test_unit_ready target 0 lun 0 tag 1599908032 2033:scsi_req_continue target 0 lun 0 tag 1599908032 2033:scsi_req_data target 0 lun 0 tag 1599908032 len 556 2033:scsi_req_continue target 0 lun 0 tag 1599908032 2033:scsi_req_dequeue target 0 lun 0 tag 1599908032 2033:scsi_req_parsed target 0 lun 0 tag -813637728 command 26 dir 1 length 192 2033:scsi_req_parsed_lba target 0 lun 0 tag -813637728 command 26 lba 7168 2033:scsi_req_alloc target 0 lun 0 tag -813637728 2033:scsi_req_continue target 0 lun 0 tag -813637728 ... 141-1 on SuperSpeed port: ... 2425:scsi_req_parsed target 0 lun 0 tag -417548272 command 40 dir 1 length 512 2425:scsi_req_parsed_lba target 0 lun 0 tag -417548272 command 40 lba 0 2425:scsi_req_alloc target 0 lun 0 tag -417548272 2425:scsi_req_continue target 0 lun 0 tag -417548272 2425:scsi_req_dequeue target 0 lun 0 tag -417548272 2425:scsi_req_parsed target 0 lun 0 tag -417545360 command 26 dir 1 length 192 2425:scsi_req_parsed_lba target 0 lun 0 tag -417545360 command 26 lba 7168 2425:scsi_req_alloc target 0 lun 0 tag -417545360 2425:scsi_req_continue target 0 lun 0 tag -417545360 ... It seems that the issue was triggered by the peculiar "command 0 dir 1 length 556", which could be got through if the port is a HiSpeed one (because not using streams?) and no longer found if the latest ISO is used. It would be great if its origin (and the exact commit that got rid of it) can be identified, otherwise we cannot call it fixed, because we don't really know its nature and can't be certain that it wouldn't be triggered in some other occasions. Will attach the full trace-event logs of the three cases. There is a single blank line in all three of them, which I created at the point just before I press "Next" after I picked the driver to install. Btw, in case you are interested in reproducing it yourself, maybe it can be done with nested kvm (nec-usb-xhci + usb-uas + scsi-hd/scsi-block on the first level and virtio-scsi-pci + scsi-block on the second level).
Created attachment 1318354 [details] "trace-event scsi_* on" (110-2 on SuperSpeed port)
Created attachment 1318355 [details] "trace-event scsi_* on" (110-2 on HiSpeed port)
Created attachment 1318356 [details] "trace-event scsi_* on" (141-1 on SuperSpeed port)
Just tested with nested kvm. Apparently the emulated UAS will not trigger the same issue. I even tried to change this to 5: https://git.qemu.org/gitweb.cgi?p=qemu.git;a=blob;f=hw/usb/dev-uas.c;h=fffc4243969611d263247889b5175e938cd6ce57;hb=359c41abe32638adad503e386969fa428cecff52#l104 So that its max streams will be 32 just like the adapter. Still it does not stall.
(In reply to Tom Yan from comment #15) > Okay I end up with getting access to a machine with XHCI SuperSpeed ports > (and HiSpeed ports). I can reproduce the issue with virtio-win-0.1.110-2 > ISO, and can also confirm that the issue is apparently fixed in both > virtio-win-0.1.126-2 and virtio-win-0.1.141-1 ISOs (current stable and > latest). > ......................................................... > I tested three cases with the "trace-event scsi_* on" suggested by Vadim > anyway and did found something. > It would be great if its origin (and the exact commit that got rid of it) > can be identified, otherwise we cannot call it fixed, because we don't > really know its nature and can't be certain that it wouldn't be triggered in > some other occasions. > Hi Tom. First of all let me thank you for your invaluable help with this issue. You are really doing a great job helping us ut. I don't like to put more pressure on you, but if you have some time, could you give me a favor and give a try to some more drivers released between builds 110 and 126? The difference between these two versions is just as huge, that I even wouldn't try guessing which particular patch fixed the problem without trying to narrow down the options. Thank you in advance, Vadim.
Apparently 126 is the first version that works. I tried 118, 117 and 113, none of them works. There seems to be quite a gap between 126 and 118 though. However, there isn't any ISO in between them that can be found here: https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/archive-virtio/ So I am not sure if I can help further.
builds from 119 to 125 were created for internal testing purpose only, this is the reason why they are not available for direct download. If you have some time to give a try to the drivers from those builds I can pack and send them to you. Thanks, Vadim.
No problem. I can have them tested.
Created attachment 1320357 [details] vioscsi drivers for testing from builds 119,121,123 and 124
(In reply to Tom Yan from comment #23) > No problem. I can have them tested. That will be great. There is collection of vioscsi drivers from four different builds at https://bugzilla.redhat.com/attachment.cgi?id=1320357 Thank you a lot. Vadim.
I confirmed that all four builds work. In other words, it starts working since 119. I also ran "trace-event scsi_* on" to confirm that the apparently-relevant "command 0 dir 1 length 556" is gone since 119.
can you give another try to 119 with virtio 1.0 disabled. You can turn it off by adding the following parameters ",disable-legacy=off,disable-modern=on" to "-device virtio-scsi-pci," string. Thanks, Vadim.
The options do not stop 119 from working.
Can anyone still build commits between 118 and 119? I would like to do or help with a bisect. Can't even open the project after installing three versions of Visual Studio and WDK.
Hi Tom, There are around 240 commits between builds 118 and 119. Almost 20 commits are adressing vioscsi issues and a lot of of virtio lib related changes. We definitely can try bisecting the problem by making custom builds, But if tere is any other way to trace the the problem, I would probbly go with that option. IIUC you are still hitting the same problem "vioscsi loading stuck with uas device on scsi-block" with vioscsi driver from build 163, and setup is similar to the one, described in https://bugzilla.redhat.com/show_bug.cgi?id=1277060#c4, right? If so, could you please create the VM memory dump with dump-guest_memory command from qemu monitor? Thanks, Vadim.
What do you mean by build 163? Isn't the latest build 0.1.160 (or 0.1.161 for pre-whql)? I am not hitting exactly the same problem anymore (as it was fixed since 119, that's why I want to see what fixed it with a bisect), but a very similar one: https://bugzilla.redhat.com/show_bug.cgi?id=1662418, for which I might be able to do a bisect myself, even though I just failed to reproduce it (I wonder if it's because the drive I am testing on now is slightly smaller). I am also hitting a similar problem with apparently all versions of vioscsi driver and both drives, when I use scsi-generic instead of scsi-block. For that I might be able to do the dump-guest_memory command. Can you tell me the commit id of 0.1.140 and 0.1.141 respectively so that maybe I can do a bisect myself for them later?
(In reply to Tom Yan from comment #31) > What do you mean by build 163? Isn't the latest build 0.1.160 (or 0.1.161 > for pre-whql)? > > I am not hitting exactly the same problem anymore (as it was fixed since > 119, that's why I want to see what fixed it with a bisect), but a very > similar one: https://bugzilla.redhat.com/show_bug.cgi?id=1662418, for which > I might be able to do a bisect myself, even though I just failed to > reproduce it (I wonder if it's because the drive I am testing on now is > slightly smaller). > > I am also hitting a similar problem with apparently all versions of vioscsi > driver and both drives, when I use scsi-generic instead of scsi-block. For > that I might be able to do the dump-guest_memory command. > > Can you tell me the commit id of 0.1.140 and 0.1.141 respectively so that > maybe I can do a bisect myself for them later? You are right, 163 is a kind of internal build for now. there were no changes related to vioscsi between 140 and 141 [vrozenfe@panda vioscsi]$ git log --pretty=oneline mm160..mm161 3326a11cd0b3cd12c024093c2a5955cb3ebd4f80 (tag: mm161) [virtio-win] update status file mm161 <--> b141 4104e2903eacdc21f4ca8d646b85c092a27b6b13 netkvm: Send clone of original announcement NBL 193e675af1371f7138598cd95fd3bad0d89e8d44 Fix incorrect condition for send led status 0d49d1154aabaa381076a96d0fe63605839c7e31 viorng: Publish viorngci.pdb and viorngum.pdb to the Install directory 0a5538b8c63d51d56d4290d0a9792403504bdfc8 netkvm: Publish netkvmco.pdb to the Install directory [
Yeah never mind, I can't reproduce it anymore. But as I said, I can still reproduce similar problem with scsi-generic. I can create a memory dump as small as ~768M, which is still really big, so how can I pass that on to you? Btw, when I try to create the memory dump after it got stuck, I have to unplug the drive for `dump_guest_memory` to complete, otherwise it gets stuck too. (So should I just create the dump after I unplug the drive? Or should I run `dump_guest_memory` *and then* unplug the drive?)
(In reply to Tom Yan from comment #33) > Yeah never mind, I can't reproduce it anymore. > > But as I said, I can still reproduce similar problem with scsi-generic. I > can create a memory dump as small as ~768M, which is still really big, so > how can I pass that on to you? > > Btw, when I try to create the memory dump after it got stuck, I have to > unplug the drive for `dump_guest_memory` to complete, otherwise it gets > stuck too. (So should I just create the dump after I unplug the drive? Or > should I run `dump_guest_memory` *and then* unplug the drive?) Unfortunately I don't have any official place to receive some big files form outside of RH. If you can use your own dropbox or Google drive or anything else, it will be just great. Btw, you can compress the dump file before uploading, it should reduce the file size dramatically, because most of the pages in the dump file are zeroed. You should dump and then unplug the driver, because I will be interested in checking some of internal variables, stored in the device driver extension. Best, Vadim.
Oops sorry, should have thought of those myself. I got you two dump, first one is created when installation got stuck/frozen with scsi-generic and the current latest driver: https://drive.google.com/open?id=1PuDS5UPShwj3CIvQehopsg_r871BSz9t (qemu-system-x86_64 -enable-kvm -M q35 -cpu host,kvm=off,hypervisor=off -smp cores=4 -m 768M -drive file=/dev/sg2,if=none,format=raw,id=sys -device virtio-scsi-pci -device scsi-generic,drive=sys --bios /usr/share/ovmf/x64/OVMF_CODE.fd -rtc base=localtime -nic none -drive file=/storage/17763.107.101029-1455.rs5_release_svc_refresh_CLIENTENTERPRISEEVAL_OEMRET_x64FRE_en-us.iso,media=cdrom -drive file=Downloads/virtio-win-0.1.160.iso,media=cdrom) Second one is created when loading of the driver got stuck with scsi-block and 0.1.118-2: https://drive.google.com/open?id=10lycIAVA5eMkVxeCWhNouv3o_XtGfIvL (qemu-system-x86_64 -enable-kvm -M q35 -cpu host,kvm=off,hypervisor=off -smp cores=4 -m 768M -drive file=/dev/sdb,if=none,format=raw,id=sys -device virtio-scsi-pci -device scsi-block,drive=sys --bios /usr/share/ovmf/x64/OVMF_CODE.fd -rtc base=localtime -nic none -drive file=/storage/17763.107.101029-1455.rs5_release_svc_refresh_CLIENTENTERPRISEEVAL_OEMRET_x64FRE_en-us.iso,media=cdrom -drive file=Downloads/virtio-win-0.1.118.iso,media=cdrom) Hope that they help!