Bug 950611
Summary: | [NetKVM] Pass WHQL tests with RSC feature enabled | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Yvugenfi <yvugenfi> | ||||||
Component: | virtio-win | Assignee: | Wei <wexu> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 7.0 | CC: | ailan, audgiri, cww, inetkach, jherrman, juzhang, knoel, lijin, michen, mkolaja, mohammed.gamal, qiangmin2016, rbalakri, rpacheco, sapandit, sherold, syang, usurse, virt-maint, vrozenfe, wexu, wyu, ybendito, yvugenfi | ||||||
Target Milestone: | rc | ||||||||
Target Release: | 7.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Enhancement | |||||||
Doc Text: |
This update introduces the receive segment coalescing (RSC) feature for the virtio-net driver. This allows the driver to report coalesced transmission control protocol (TCP) segments to the OS. Note that the support for RSC is included in virtio-net for Windows Server 2012 and later versions.
|
Story Points: | --- | ||||||
Clone Of: | |||||||||
: | 990225 (view as bug list) | Environment: | |||||||
Last Closed: | 2016-11-04 08:41:48 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 958737, 990225, 1002816, 1203710, 1304818, 1305606, 1313485 | ||||||||
Attachments: |
|
Description
Yvugenfi@redhat.com
2013-04-10 13:45:20 UTC
*** Bug 958743 has been marked as a duplicate of this bug. *** Dynamic guest offloads configuration commits: http://git.engineering.redhat.com/?p=users/yvugenfi/internal-kvm-guest-drivers-windows/.git;a=commit;h=03dbd5515cc0585c1490d36277ef658cefdc3e54 http://git.engineering.redhat.com/?p=users/yvugenfi/internal-kvm-guest-drivers-windows/.git;a=commit;h=17c03988a3f994bd0bf888468bf0313ab393db42 following two jobs failed when run netkvm whql job on win2012 and win2012R2 1.Run RSC Tests 2.NDISTest 6.5 - [1 Machine] - StandardizedKeywords package info: kernel-3.10.0-126.el7.x86_64 qemu-kvm-rhev-1.5.3-60.el7ev_0.2.x86_64 seabios-1.7.2.2-12.el7.x86_64 virtio-win-prewhql-86 qemu-kvm command: nic1: /usr/libexec/qemu-kvm -name 086NIC201264CF2 -enable-kvm -m 6G -smp 8 -uuid 26a92f2f-6175-4618-9188-4956df472ba5 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/tmp/086NIC201264CF2,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=086NIC201264CF2,if=none,id=drive-ide0-0-0,format=raw,serial=mike_cao,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=en_windows_server_2012_x64_dvd_915478.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=086NIC201264CF2.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none -global isa-fdc.driveA=drive-fdc0-0-0 -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=00:52:4b:2b:95:87,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=isa_serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -vga cirrus -netdev tap,script=/etc/qemu-ifup-private,downscript=no,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:42:0e:a5:7b,bus=pci.0,mq=on -monitor stdi nic2: /usr/libexec/qemu-kvm -name 086NIC201264SVM -enable-kvm -m 6G -smp 8 -uuid d54df0cf-de94-4016-a08b-19770d34b75c -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/tmp/086NIC201264SVM,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=086NIC201264SVM,if=none,id=drive-ide0-0-0,format=raw,serial=mike_cao,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=en_windows_server_2012_x64_dvd_915478.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=086NIC201264SVM.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none -global isa-fdc.driveA=drive-fdc0-0-0 -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=00:52:32:66:b0:24,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=isa_serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:1 -vga cirrus -netdev tap,script=/etc/qemu-ifup-private,downscript=no,id=hostnet1,vhost=on -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:24:1c:c9:2b,bus=pci.0,mq=on -monitor stdio job "Run RSC Tests " still failed with virtio-win-prewhql-87,guest bsod. 1: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses. If kernel debugger is available get stack backtrace. Arguments: Arg1: 0000000000000029, memory referenced Arg2: 0000000000000002, IRQL Arg3: 0000000000000000, value 0 = read operation, 1 = write operation Arg4: fffff80000ca5ce6, address which referenced memory Debugging Details: ------------------ READ_ADDRESS: fffff801c32b8340: Unable to get special pool info fffff801c32b8340: Unable to get special pool info 0000000000000029 CURRENT_IRQL: 2 FAULTING_IP: ndis!ndisXlateReturnNetBufferListToPacket+36 fffff800`00ca5ce6 0fb67729 movzx esi,byte ptr [rdi+29h] DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT BUGCHECK_STR: AV PROCESS_NAME: System TAG_NOT_DEFINED_c000000f: FFFFD000201A2FB0 TRAP_FRAME: ffffd0002019b750 -- (.trap 0xffffd0002019b750) NOTE: The trap frame does not contain all registers. Some register values may be zeroed or incorrect. rax=ffffcf80010d2f50 rbx=0000000000000000 rcx=ffffcf80010d2df0 rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000 rip=fffff80000ca5ce6 rsp=ffffd0002019b8e0 rbp=ffffd0002019b990 r8=ffffd0002019b920 r9=ffffd0002019b940 r10=ffffcf80010d2df0 r11=ffffd0002019b918 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 iopl=0 nv up ei pl zr na po nc ndis!ndisXlateReturnNetBufferListToPacket+0x36: fffff800`00ca5ce6 0fb67729 movzx esi,byte ptr [rdi+29h] ds:00000000`00000029=?? Resetting default scope LAST_CONTROL_TRANSFER: from fffff801c315f7e9 to fffff801c3153ca0 STACK_TEXT: ffffd000`2019b608 fffff801`c315f7e9 : 00000000`0000000a 00000000`00000029 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx ffffd000`2019b610 fffff801`c315e03a : 00000000`00000000 ffffcf80`010d2df0 00000000`00000000 ffffd000`2019b750 : nt!KiBugCheckDispatch+0x69 ffffd000`2019b750 fffff800`00ca5ce6 : 00000000`00000001 00000000`00000001 ffffe000`021f3958 00000000`00000000 : nt!KiPageFault+0x23a ffffd000`2019b8e0 fffff800`00cbf1b8 : 00000000`00000b01 ffffcf80`010d2df0 00000000`00000001 00000000`00000000 : ndis!ndisXlateReturnNetBufferListToPacket+0x36 ffffd000`2019b920 fffff800`01aa4a4a : 00000000`00000001 ffffd000`2019ba00 ffffe000`0219f800 00000000`0000041e : ndis!ndisMIndicatePacketsToNetBufferLists+0x1a0 ffffd000`2019b9d0 fffff800`01aa452e : ffffcf80`00b36ef0 ffffe000`021f31a0 00000000`00000001 00000000`00000001 : Rtnic64!RTFast_IndicatePacket+0x1ba ffffd000`2019ba00 fffff800`01aa292e : ffffe000`021f31a0 fffff800`01aa2920 ffffd000`2019bc00 00000000`00369e99 : Rtnic64!RTFast_HandleInterrupt+0xce ffffd000`2019ba50 fffff800`00cb8da0 : 00000000`00000000 00000000`00000000 ffffb158`00000000 ffffb158`7d6d6b29 : Rtnic64!MPHandleInterrupt+0xe ffffd000`2019ba80 fffff800`00cb892c : ffffe000`021f31a0 ffffd000`2019bbe0 ffffd000`2019bc60 ffffe000`0226d7e0 : ndis!ndisMDpcX+0xa8 ffffd000`2019bab0 fffff801`c3085840 : ffffd000`20be9f00 ffffe000`0219fb20 ffffd000`20be7180 ffffe000`0219f8e0 : ndis!ndis5InterruptDpc+0x94 ffffd000`2019bae0 fffff801`c3085520 : ffffe000`0219f8e0 00000000`00000001 ffffd000`2019bd50 fffff801`c3080aec : nt!KiExecuteAllDpcs+0x1b0 ffffd000`2019bc30 fffff801`c31577ea : ffffd000`20be7180 ffffd000`20be7180 00000000`00000000 ffffd000`20bf31c0 : nt!KiRetireDpcList+0xd0 ffffd000`2019bda0 00000000`00000000 : ffffd000`2019c000 ffffd000`20196000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x5a STACK_COMMAND: kb FOLLOWUP_IP: Rtnic64!RTFast_IndicatePacket+1ba fffff800`01aa4a4a 488d4f78 lea rcx,[rdi+78h] SYMBOL_STACK_INDEX: 5 SYMBOL_NAME: Rtnic64!RTFast_IndicatePacket+1ba FOLLOWUP_NAME: MachineOwner MODULE_NAME: Rtnic64 IMAGE_NAME: Rtnic64.sys DEBUG_FLR_IMAGE_TIMESTAMP: 48401957 BUCKET_ID_FUNC_OFFSET: 1ba FAILURE_BUCKET_ID: AV_VRF_Rtnic64!RTFast_IndicatePacket BUCKET_ID: AV_VRF_Rtnic64!RTFast_IndicatePacket Followup: MachineOwner --------- Please also have a look on win2012 guest because it has the same issue. Run RSC Test I completely disabled RSC in the driver, waiting for a build. Patched posted upstream to qemu-devel: [Qemu-devel] [ RFC Patch v4 0/3] Support Receive-Segment-Offload(RSC) for WHQL http://comments.gmane.org/gmane.comp.emulators.qemu/404788 Changing component to qemu-kvm-rhev. IPv4 works now, while IPv6 still fails the case at current, the virtio nic can't recevice any test packets. The brew always failed with signature checking after adding new fields to 'virtio_net_hdr' in kernel header files, havn't figured it out after tried for several days, disabled signature checking temporarily for qe test first. Brew link: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11126172 To enable a vm with rsc feature on, put 'guest_rsc=on' in the virtio device option, this feature is turned off by default. Before running the RSC test case, please make sure have turned 'tso,gso' of the tap devices off in host. ethtool -K tap$n tso off ethtool -K tap$n gso off ethtool -K tap$n gro off Created attachment 1173363 [details]
updated virtio spec
This is updated virtio spec about RSC
Passed the test with Yan's latest driver, please verify it. brew build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11456267 this feature only supports modern mode(virtio 1.0) for virtio net device, add 'disable-modern=off' to enable it. -netdev tap,script=/etc/qemu-ifup-private,downscript=no,id=hostnet1 \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:52:39:69:12:21,disable-modern=off Created attachment 1185245 [details]
Latest windows guest virtio driver received from Yan
Hi Yan and Wei, I have tried with your qemu and driver on win8-64 and win8.1-64, it can pass the RSC whql job with filter(1406). Thanks for your great support BR Yu Wang Hi Yan and Wei win8 and win8.1 32bit still have RSC whql job, and it cannot pass with configuration above. It failed as the "Cannot Find Pattern "\\HCK21\tests\x86\nttest\nettest\Transports\Sparta\autosrv.exe Error Code 2 (The system cannot find the file specified)" when copy binaries file to client. Could you help to check if we will support RSC on 32-bit systems and pass the job on whql? Thanks Yu Wang Hi Yu, I have put my img files to file server which can pass the test regularly without any filter, please verify if it works on your test bed, i'm using 64bit 2012 server. http://file.nay.redhat.com/wexu/rsc/ Official build with RSC support: Task Info: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11462816 Build Info: https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=506954 Hi Yu, Please try Yan's build, what I tested for this case is launch 2 VM images from 2 separate ssd disks on my developing machine, and you are using a sata disk AFAIK, if you still encounter filter pass issue, maybe you could try running the VM from memory file system, just copy the qcow2 images to a memory partition and launch them, i have verified it and put a scripts for your reference there. http://file.nay.redhat.com/wexu/rsc/start-vm.sh *** Bug 1304818 has been marked as a duplicate of this bug. *** a. Please check that the firewall is disabled on the clients. b. Please check that "Check connectivity" test passes to see that basic connectivity between test clients is working. Hi, New update, after change to a new network bridge, it can get 169.x.x.x ip automatically and pass "Run RSC Logo Test(ipv4), "Run RSC Logo Test(ipv6)" still failed as "[IGN-]2016-08-02 15:20:14[-IGN]StartScript: AutoSrv service is not responding. Make sure service is running and allowed by firewall" Thanks Yu Wang Hi Yan and Wei, I tried on my site, it could pass this job w/o filter on ws2012, but failed on win8-64 (can filter pass ) BTW, as I said in comment#38, win8-32 bit cannot start to run the test, failed at the beginning. Thanks Yu Wang all RSC job passed with filter/errata. change status to verified. Is the fix now included in any upstream version of virtio-win? not yet, it's still a RFC for upstream. Let me clarify: it is included in the latest virtio-win NetKVM upstream. The part that is RFC upstream is the QEMU part, but this part is only required in order to pass WHQL tests with the driver. Hi, I tried testing network performance with virtio-win 0.1.126-1 and I am still getting the same poor performance figures in #1304818. Is there any thing I am missing here? Hi Mohammed, This feature doesn't help much more for performance according to my test, maybe you should check it out from your host configuration. Hi Mohammed, This feature only works for userspace virtio-net backend, thus it merely does help much for performance test according to my test, maybe you should check out other configuration. Driver should also work with aggregated packets from other backends, but it is not certifiable by WHQL. The questions is what are the setting of you back end? Can you post ethtool -k output? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2609.html (In reply to Yan Vugenfirer from comment #64) > Driver should also work with aggregated packets from other backends, but it > is not certifiable by WHQL. > > The questions is what are the setting of you back end? Can you post ethtool > -k output? You mean ethtool -k in the host or the guest? In which case the problem was only occuring on Windows guests which naturally don't have ethtool. If you're curious what ethtool -k returns on Linux guests (which don't have the problem), then here they are: Features for eth3: rx-checksumming: off [fixed] tx-checksumming: off tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: off tx-scatter-gather: off [fixed] tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: off tx-tcp-segmentation: off [fixed] tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: off [fixed] udp-fragmentation-offload: off [fixed] generic-segmentation-offload: off [requested on] generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: off [fixed] tx-vlan-offload: off [fixed] ntuple-filters: off [fixed] receive-hashing: off [fixed] highdma: on [fixed] rx-vlan-filter: on [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] busy-poll: off [fixed] By the way, using parallel connections with iperf (i.e. iperf3 -c [SERVER] -P 32 -i 6 -p [PORT]) on windows guests seems to improves performance greatly. I am getting an improvement from 1 Gpbs to 3 Gbps. But this is still less than what Linux guests achieve without parallel connections. Don't know if the original bug in #1304818 is in this case a result of some Windows networking quirk or a problem with the virtio-win drivers. Hi, * I wanted to see results of ethtool -k on the host for the tap device that is the backend of NIC on Windows VM. * What are TX results for Windows WM in your setting (when you run iperf -c on the host)? * Also we suggest to use iperf 2, in our experience iperf3 was very unreliable and buggy in Windows environment, including nor really supporting parallel processing of networking traffic. Best regards, Yan. Hi Yan, Sorry for the late reply: * Here are the results of ethtool -k: Features for n020173d9ea21: rx-checksumming: off [fixed] tx-checksumming: off tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: off [requested on] tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: on tcp-segmentation-offload: off tx-tcp-segmentation: off [requested on] tx-tcp-ecn-segmentation: off [requested on] tx-tcp6-segmentation: off [requested on] udp-fragmentation-offload: off [requested on] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: off [fixed] tx-vlan-offload: on ntuple-filters: off [fixed] receive-hashing: off [fixed] highdma: off [fixed] rx-vlan-filter: off [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-mpls-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: on rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] * I used iperf2 instead of iperf3 and indeed the throughput is much better. TX speed is 2.0Gbps without parallel connections. I could get up to 5 Gbps with parallel connections. So far this performance seems satisfactory for me. When on _TAP_ the setting is "tx-checksumming: off, tcp-segmentation-offload: off" this actually disables the RSC (coalescing) on guest receiving path, so the throughput is expected to be lower than with "tx-checksumming: on, tcp-segmentation-offload: on". (In reply to Mohammed Gamal from comment #70) > Hi Yan, > Sorry for the late reply: > > * Here are the results of ethtool -k: > Features for n020173d9ea21: > rx-checksumming: off [fixed] > tx-checksumming: off > tx-checksum-ipv4: off [fixed] > tx-checksum-ip-generic: off [requested on] > tx-checksum-ipv6: off [fixed] > tx-checksum-fcoe-crc: off [fixed] > tx-checksum-sctp: off [fixed] > scatter-gather: on > tx-scatter-gather: on > tx-scatter-gather-fraglist: on > tcp-segmentation-offload: off > tx-tcp-segmentation: off [requested on] > tx-tcp-ecn-segmentation: off [requested on] > tx-tcp6-segmentation: off [requested on] > udp-fragmentation-offload: off [requested on] > generic-segmentation-offload: on > generic-receive-offload: on > large-receive-offload: off [fixed] > rx-vlan-offload: off [fixed] > tx-vlan-offload: on > ntuple-filters: off [fixed] > receive-hashing: off [fixed] > highdma: off [fixed] > rx-vlan-filter: off [fixed] > vlan-challenged: off [fixed] > tx-lockless: off [fixed] > netns-local: off [fixed] > tx-gso-robust: off [fixed] > tx-fcoe-segmentation: off [fixed] > tx-gre-segmentation: off [fixed] > tx-udp_tnl-segmentation: off [fixed] > tx-mpls-segmentation: off [fixed] > fcoe-mtu: off [fixed] > tx-nocache-copy: off > loopback: off [fixed] > rx-fcs: off [fixed] > rx-all: off [fixed] > tx-vlan-stag-hw-insert: on > rx-vlan-stag-hw-parse: off [fixed] > rx-vlan-stag-filter: off [fixed] > > > * I used iperf2 instead of iperf3 and indeed the throughput is much better. > TX speed is 2.0Gbps without parallel connections. I could get up to 5 Gbps > with parallel connections. So far this performance seems satisfactory for me. When on _TAP_ the setting is "tx-checksumming: off, tcp-segmentation-offload: off" this actually disables the RSC (coalescing) on guest receiving path, so the throughput is expected to be lower than with "tx-checksumming: on, tcp-segmentation-offload: on". Plase note that optimal setttings for performance are different from optimal settings for certification. All the discussion regarding current bug is related to certification and not to performance. Any progress with this? I tried using network performance with virtio-win 0.1.164-1 and I am still getting the same poor performance figures in #1304818. How should I improve the network performance of my windows 2k12R2 guest? |