Description of problem: ----------------- IxChariot sends and receives TCP packets to measure network throughput between endpoints. In our test one endpoint was Windows 2008 R2 running in virtual machine and the other endpoint is a linux machine. They are TCP packets. It will be better if Redhat can install ixChariot on your labs. Its very easy to reproduce. The endpoint libraries can be downloaded from here http://www.ixiacom.com/support/endpoint_library/ and windows 2008 R2 can be downloaded here http://www.microsoft.com/windowsserver2008/en/us/default.aspx You will need a IxChariot license for the GUI which starts, stops and collects test results http://www.ixchariot.com/products/datasheets/ixchariot.html Be sure to run the virtual machine in SMP mode (two cores). The crash doesn't happen with just one core or if network device emulated is e1000. Version-Release number of selected component (if applicable): pre-5.4 snapshot 105 virtio network driver from virtio-win-1.0.0-3.31351.el5.noarch.rpm Actual results: No error codes, No core file, Nothing on stderr/stdout The crash is pretty instataneous once we start the test Expected results: Additional info:
Created attachment 362444 [details] Adding IxChariot win2k8 R2 workaround
From the customer: "We use the high-performance script included as part of the ixChariot software #1: launch ixChariot #2: define endpoints (IP addresses). #3: click on select script and choose high-performance-throughput.scr that comes bundled with ixChariot. Also when running traffic tests with w2k8-r2 as an endpoint, you need to disable the windows firewall. One way is to open the cmd window and type 'netsh firewall set opmode disable' "
From the customer: "I may been able to reproduce this with iperf v 2.0.4 server is a virtual machine running windows server 2008 R2 with 2 CPU in SMP mode iperf -s -w 256k client a real machine running windows server 2003 iperf -c <ip> -w 256k -n 100000M -M 65535 "
(In reply to comment #6) > From the customer: "I may been able to reproduce this with iperf v 2.0.4 > server is a virtual machine running windows server 2008 R2 with 2 CPU in SMP > mode > iperf -s -w 256k > > client a real machine running windows server 2003 > iperf -c <ip> -w 256k -n 100000M -M 65535 " Hi Dor I test with iperf 1.7.0 several times. but can not reproduce the issue. My question is 1. iperf _v2.0.4_ is must ? 2. any configuration about the v-NIC and bridge I've missed to reproduce the issue ? ( I just used the default setting ). Thanks Lijun Huang
Moving under virtio-win. Adding info received by email through James: "I was able to get the driver to run. The Driver details in adapter properties is showing Driver Date 10/25/2009 Driver Version 6.0.209.427 But it does not solve the issue. I made other changes to the virtio.c file so it retries getting the virtqueue_num_heads and it is still showing an index difference < 256 on the second call to vring_avail_idx(vq) after an initial failure. Guest moved used index from 62670 to 62736, max 256 Guest moved used index from 6107 to 6168, max 256 Guest moved used index from 35495 to 35593, max 256 Guest moved used index from 57570 to 57615, max 256 Guest moved used index from 56259 to 56325, max 256 Guest moved used index from 51156 to 51220, max 256 Guest moved used index from 15062 to 15126, max 256 Guest moved used index from 54207 to 54272, max 256 Looks like some a race condition in updating these values from different threads. James FYI... my patch is as follows and this workaround has successfully prevented unwanted exit so far. diff -urp a/qemu/hw/virtio.c b/qemu/hw/virtio.c --- a/qemu/hw/virtio.c 2009-10-23 15:42:18.000000000 -0700 +++ b/qemu/hw/virtio.c 2009-10-23 14:08:01.000000000 -0700 @@ -328,12 +328,15 @@ void virtqueue_push(VirtQueue *vq, const static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx) { uint16_t num_heads = vring_avail_idx(vq) - idx; + int retry = 3; /* Check it isn't doing very strange things with descriptor numbers. */ - if (num_heads > vq->vring.num) { - fprintf(stderr, "Guest moved used index from %u to %u", - idx, vring_avail_idx(vq)); - exit(1); + while (num_heads > vq->vring.num) { + fprintf(stderr, "Guest moved used index from %u to %u, max %u\n", + idx, vring_avail_idx(vq), vq->vring.num); + if (retry-- == 0) + exit(1); + num_heads = vring_avail_idx(vq) - idx; } return num_heads; " Michael/Yan, any ideas?
Did you manage to reproduce it over RHEL?
Still working on trying to get RHEL reproduction. Network performance under RHEL is rather poor however. The traffic curve is very jagged and we have not been unable to get the traffic rate up over 600Mbps. One other issue with the test on RHEL is that ixChariot endpoint on Windows is reporting an error in the high precision timer which causes the test to abort. This is the case for when the endpoint is primarily the sender rather than receiver. This is happening soon after the traffic starts, seconds in some cases.
(In reply to comment #11) > Still working on trying to get RHEL reproduction. > > Network performance under RHEL is rather poor however. The traffic curve is > very jagged and we have not been unable to get the traffic rate up over > 600Mbps. Added Mark Wagner to help. How do you test traffic? : Is it bi-direction/rx/tx? UDP/TCP? What packet sizes? What's the load on the host? In addition, there are registry settings on the VM that increases performance: http://www.linux-kvm.org/page/WindowsGuestDrivers/kvmnet/registry For large packet sizes we can go up to several Gb. > > One other issue with the test on RHEL is that ixChariot endpoint on Windows is > reporting an error in the high precision timer which causes the test to abort. > This is the case for when the endpoint is primarily the sender rather than > receiver. This is happening soon after the traffic starts, seconds in some > cases. Can you try using pmtimer in the guest. It is recommended anyway: http://support.microsoft.com/kb/833721
(In reply to comment #12) > (In reply to comment #11) > > Still working on trying to get RHEL reproduction. > > > > Network performance under RHEL is rather poor however. The traffic curve is > > very jagged and we have not been unable to get the traffic rate up over > > 600Mbps. > > Added Mark Wagner to help. > > How do you test traffic? : > Is it bi-direction/rx/tx? UDP/TCP? What packet sizes? What's the load on the > host? > We are primarily using ixChariot to test but also iperf. Traffic is mostly uni-directional and failure appears to be associated more when guest is receiving high traffic rate. See Comment #6 for more on iperf settings. The host load average is showing 1.98 0.91 0.67 after a couple minutes of running the test. The qemu process is consuming 99.9% of the cpu's assigned to it PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7679 admin 20 0 571m 524m 1304 S 99.9 13.7 23:35.00 qemu If the test is not running the host load average is : 0.28, 0.92, 0.76 And qemu consumes only about 5% to 6% of the CPU. > In addition, there are registry settings on the VM that increases performance: > http://www.linux-kvm.org/page/WindowsGuestDrivers/kvmnet/registry > > For large packet sizes we can go up to several Gb. > > > > > One other issue with the test on RHEL is that ixChariot endpoint on Windows is > > reporting an error in the high precision timer which causes the test to abort. > > This is the case for when the endpoint is primarily the sender rather than > > receiver. This is happening soon after the traffic starts, seconds in some > > cases. > > Can you try using pmtimer in the guest. It is recommended anyway: > http://support.microsoft.com/kb/833721 From what I could find online, the problem pmtimer fixes is not present in Server 2008.
What's your qemu cmdline? Do you add -no-hpet (you should)? Regarding the network performance, iperf and the windows/linux stack is sensitive to socket buffers, you can play with it a bit to see if you get better number. Also don't forget to configure the win registry settings. We expect much better performance with 64k packets. Yan, any idea?
You should see performance boost with packets of 4K and up. There are several important things to configure in registry: TCPWindow Scaling: [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters] "Tcp1323Opts"=dword:00000003 And TCP windows size: [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters] "TcpWindowSize"=dword:00100000 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters] "DefaultReceiveWindow"=dword:00100000 "DefaultSendWindow"=dword:00100000 With UDP it is important to configure the fast copy threshld option (otherwise you will see a performance drop with messages bigger than 1K): [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters] "FastSendDatagramThreshold"=dword:00004000 Check also the following WiKi page: http://www.linux-kvm.org/page/WindowsGuestDrivers/kvmnet/registry
According to http://www.microsoft.com/whdc/system/sysperf/Perf_tun_srv-R2.mspx: If your hardware supports TOE, then you must enable that option in the operating system to benefit from the hardware’s capability. You can enable TOE by running the following command: netsh int tcp set global chimney = enabled
Chimney won't be supported ATM since it needs multiple virtio receive queues.
I don't really need to boost the traffic rate any higher. I am able to reproduce the failure with our host with just 100 to 200 Mbps throughput and 32k transaction sizes. The ixChariot High Performance test script is configured with 32k send and receive buffer sizes (bytes of data in each SEND/RECEIVE). Still unable to reproduce it with RHEL as host however. Can you point me to the relevant kernel specific virtio-net files or tun/tap files which the kvm module and qemu depends on?
For tun, look for uses of macro TUN_VNET_HDR in drivers/net/tun.c Relevant routines tun_get_iff tun_set_iff tun_get_user tun_put_user
Created attachment 373182 [details] drivers\net\tun.c from Cisco's kernel 2.6.23
Our version of tun drivers does not have the TUN_VNET_HDR macro. I've included our version of the file as an attachment. Looks like I need this patch from your kernel sources... linux-2.6-net-tun-add-iff_vnet_hdr-tungetfeatures-tungetiff.patch Looks like there are a few conflicts so it's not a simple patch application. Any other patches I should be looking at?
Have you tried to disable the HW acceleration on the host NIC ? ethtool ethX -K rx off tx off These will most likely hurt throughput but will help narrow down where the issue is.
I've applied the patch above as best I could with a minor change to csum_start calculation since our sk_buff does not have h and nh struct members. The patch did not solve the problem. I also disabled the HW acceleration as instructed but the failures persist. HW acceleration setting did not appear to make any difference to throughput.
We can try running without vnet_hdr on rhel and check if we can reproduce the bug.
James, could you attach the actual patch that you applied?
Created attachment 373504 [details] applied net-tun-add-iff_vnet_hdr-tungetfeatures-tungetiff.patch This is the patch applied to Cisco's kernel 2.6.23 for testing possible fix.
(In reply to comment #25) > We can try running without vnet_hdr on rhel and check if we can reproduce the > bug. Please provide instructions on how to remove this for testing.
in recent qemu tap has an option vnet_hdr. you just set it to false.
(In reply to comment #29) > in recent qemu tap has an option vnet_hdr. > you just set it to false. I added ,vnet_hdr=false to the -net tap,... option list but I did not notice any performance difference nor did it help in reproducing the bug.
Can you reproduce the bug on kernel v2.6.23.17 from kernel.org?
(In reply to comment #27) > Created an attachment (id=373504) [details] > applied net-tun-add-iff_vnet_hdr-tungetfeatures-tungetiff.patch > > This is the patch applied to Cisco's kernel 2.6.23 for testing possible fix. BTW, do you see the "GSO" debug print when you use this?
Not sure I'll be able to test on that version from kernel.org. How would I go about doing that on our RHEL 5.4 installation? I do not see the GSO debug print messages. Just to be sure I changed it to pr_info incase DEBUG wasn't enabled in kernel.
to test on kernel from kernel.org, git clone it, make oldconfig and make install. sometimes you need to create initrd yourself. The fact you do not get GSO explains why you do not get performance changes from disabling vnet header.
clarification: when I say make oldconfig, I really mean you should take the .config from your 2.6.23 kernel. Hope this makes sense.
(In reply to comment #29) > in recent qemu tap has an option vnet_hdr. > you just set it to false. Michael can you try reproduce it with vnet_hdr=false on rhel5.4? Also, send a similar patch to Cisco so they can try it out too.
Would a driver with specific debugs covering the code segment of interest help with the testing since enabling the currently available debugs impacts the performance and timing such that the problem is no longer reproducible. I will try to run with 2.6.23.17 kernel.
I compiled and booted our RHEL test box with kernel 2.6.23.17 but kvm-qemu won't start with the following errors. TUNGETIFF ioctl() failed TUNSETSNDBUF ioctl() failed Hypervisor too old: KVM_CAP_USER_MEMORY extension not supported Looks like I need to apply some patches for these items. I can find patches for the first two but not for the last one.
(In reply to comment #37) > Would a driver with specific debugs covering the code segment of interest help > with the testing since enabling the currently available debugs impacts the > performance and timing such that the problem is no longer reproducible. > > I will try to run with 2.6.23.17 kernel. I do think a debug version of the driver may help us in narrowing down the problem. From what I can tell, there is a race condition which allows the virt queue window to overflow.
(In reply to comment #38) > I compiled and booted our RHEL test box with kernel 2.6.23.17 but kvm-qemu > won't start with the following errors. > > TUNGETIFF ioctl() failed > TUNSETSNDBUF ioctl() failed > Hypervisor too old: KVM_CAP_USER_MEMORY extension not supported > > Looks like I need to apply some patches for these items. I can find patches for > the first two but not for the last one. How does it work on Cisco's kernel 2.6.23? Does it have backported kvm or are you using kvm_kmod?
(In reply to comment #40) > > How does it work on Cisco's kernel 2.6.23? > Does it have backported kvm or are you using kvm_kmod? We are using kvm_kmod in our system. debugshell# modinfo /sw/kvm/lib/modules/kvm-intel.ko filename: /sw/kvm/lib/modules/kvm-intel.ko version: author: Qumranet license: GPL vermagic: 2.6.23waas64. SMP mod_unload depends: kvm,kvm srcversion: 3A2787BAF8EFED8D06C2554 parm: emulate_invalid_guest_state:bool parm: enable_ept:bool parm: flexpriority_enabled:bool parm: enable_vpid:bool parm: bypass_guest_pf:bool debugshell# modinfo /sw/kvm/lib/modules/kvm.ko filename: /sw/kvm/lib/modules/kvm.ko version: author: Qumranet license: GPL vermagic: 2.6.23waas64. SMP mod_unload depends: srcversion: 1959A152FDDD3E22E0C87D9 parm: oos_shadow:bool parm: force_kvmclock:bool
I have successfully compiled the kvm_kmod version and loaded it instead of the kernel default version. Test traffic so far is running without failure. I'm getting about a 50% received traffic rate.
I would like to understand whether the issues are attributable to using SMP on host. Could you please verify whether the issue happens if you limit qemu to run on a single cpu on host? I think you can do this using taskset command.
The failure is not reproducible when -smp 1 is used. The failure is reproducible when -smp 2 is used. We are not using more than 2 for -smp configuration. Our qemu tasks are configured to run on real CPUs 2 & 3. All other tasks on the system are limited to real CPUs 0 & 1. If starting with -smp 2 and then changing all tasks' affinity to use only real CPU 2, the throughput drops to about 25% and the problem is not reproducible. debugshell# cat /var/run/vb1.pid 7666 debugshell# ps -L -p 7666 -o tid TID 7666 7699 7700 7701 7703 debugshell# taskset -pc 2 7666 bash: taskset: command not found debugshell# /sw/kvm/bin/taskset -pc 2 7666 pid 7666's current affinity list: 2,3 pid 7666's new affinity list: 2 debugshell# /sw/kvm/bin/taskset -pc 2 7699 pid 7699's current affinity list: 2,3 pid 7699's new affinity list: 2 debugshell# /sw/kvm/bin/taskset -pc 2 7700 pid 7700's current affinity list: 2,3 pid 7700's new affinity list: 2 debugshell# /sw/kvm/bin/taskset -pc 2 7701 pid 7701's current affinity list: 2,3 pid 7701's new affinity list: 2 debugshell# /sw/kvm/bin/taskset -pc 2 7703 pid 7703's current affinity list: 2,3 pid 7703's new affinity list: 2 Our stdout and stderr is redirected /var/urn/vb1.qemu debugshell# tail -f /var/run/vb1.qemu char device redirected to /dev/pts/0 rom checksum: c9f19e31 After changing the CPU affinity back to use 2 & 3, the throughput is around 70% to 80% and the problem reoccurs. debugshell# /sw/kvm/bin/taskset -pc 2,3 7666 pid 7666's current affinity list: 2 pid 7666's new affinity list: 2,3 debugshell# /sw/kvm/bin/taskset -pc 2,3 7699 pid 7699's current affinity list: 2 pid 7699's new affinity list: 2,3 debugshell# /sw/kvm/bin/taskset -pc 2,3 7700 pid 7700's current affinity list: 2 pid 7700's new affinity list: 2,3 debugshell# /sw/kvm/bin/taskset -pc 2,3 7701 pid 7701's current affinity list: 2 pid 7701's new affinity list: 2,3 debugshell# /sw/kvm/bin/taskset -pc 2,3 7703 pid 7703's current affinity list: 2 pid 7703's new affinity list: 2,3 debugshell# tail -f /var/run/vb1.qemu char device redirected to /dev/pts/0 rom checksum: c9f19e31 Guest moved used index from 56535 to 56600, max 256 Guest moved used index from 29902 to 29962, max 256 Guest moved used index from 48348 to 48413, max 256
Sorry I was not clear. Please try launching qemu with taskset so that *all* qemu threads run *on the same real CPU*, while still using -smp 2 with qemu. Does the problem re-occur?
How is running with taskset at startup any different from running taskset on each of the tasks after start? I have changed all threads to run on the same real CPU.
Hmm, it won't be different unless guest manages to do something with virtio meanwhile. Might this be the case? Also, I assumed that since you say "-pc 2,3" qemu will use 2 host processors?
Um, I did not read Comment 44 correctly. So I think we can confirm that the problem does not happen when run on a single host CPU. Unfortunately this is no guarantee that the problem is SMP related as we know the problem only presents itself when throughput is high, and, when run on a single CPU, throughput is low.
I don't think the throughput has as much to do with it since I have also limited it in ixChariot to 155Mbps (~16%) and still reproduce the problem.
We're still at a loss to why it's only happening on our test setup currently. FYI... these are our config options. Error: libpci check failed Disable KVM Device Assignment capability. Install prefix /sw/kvm BIOS directory /sw/kvm/share/qemu binary directory /sw/kvm/bin Manual directory /sw/kvm/share/man ELF interp prefix /usr/gnemul/qemu-%M Source path /work/koj/ros_vb_enh/x86_64-derived/src/kvm/qemu C compiler /adbu-waas-tools/nptl/linux-2.6.10/gcc-4.1.1-glibc-2.3.6-mallocfix/x86_64-unknown-linux-gnu/bin/x86_64-unknown-linux-gnu-gcc Host C compiler gcc ARCH_CFLAGS -m64 make make install install host CPU x86_64 host big endian no target list x86_64-softmmu gprof enabled no sparse enabled no profiler no static build no -Werror enabled no SDL support yes SDL static link yes curses support no mingw32 support no Audio drivers oss Extra audio cards ac97 Mixer emulation no VNC TLS support no kqemu support no kvm support yes CPU emulation yes brlapi support no Documentation yes NPTL support yes vde support no AIO support yes QXL yes Spice no SMB directores yes SCSI devices yes ISAPC support yes KVM nested yes USB storage yes USB wacom yes USB serial yes USB net yes USB bluez no VMware drivers yes NBD support yes bluetooth support no Only generic cpus no
This is qemu-kvm bug, not virtio-win bug: we are using memcpy to read index value, which is not guaranteed to be atomic. Verified that replacing memcpy with read/write for index accesses fixes the problem.
@James @Cisco, It appears our QE team is having trouble reproducing this issue and we'll need your help to confirm that the fix proposed for RHEL 5.5 update performs as expected. RHEL 5.5 Beta will be out soon and should contain the updated KVM packages including this fix. I will post an announcement to this list when the Beta bits have been made available on RHN. If you could, please grab those bits when they're available and let us know the results of your testing. Also, if you would not be able to complete this test feedback request, we would appreciate to know that in advance too. Thank you for your support.
(In reply to comment #60) Our release stream is currently based on RHEL 5.4 so I will not be able to pull a RHEL 5.5 Beta for this testing. Bug 561022 copies this bug and it is expected that this be used for the backporting to 5.4 z-stream which we can update to.
(In reply to comment #61) So do u have access to kvm-83-105.el5_4.20 as pointed out in #c4? <https://bugzilla.redhat.com/show_bug.cgi?id=561022#c4>
Latest version on RHN is only at kvm-83-105.el5_4.13.x86_64 currently. Do you know when el5_4.20 will be posted?
Created attachment 389384 [details] after applying this patch, and rebuilding, qemu will crash The problem only triggers when qemu RPM is built with custom compiler from source. and running on custom kernel. The only way I found to approximately reproduce this issue on our systems, is by replacing memcpy implementation with a custom routine.
How to reproduce: On both host and guest, build netperf from source: find it here: ftp://ftp.netperf.org/netperf/netperf-2.4.5.tar.bz2 Apply the patch above (attachment id=389384) and rebuild qemu-kvm from source. This replaces memcpy with a custom function while preventing compiler from optimizing it. Run qemu-kvm with userspace networking. E.g. qemu-kvm -drive file=$HOME/disk.raw -net user -net nic,model=virtio -redir tcp:8022::22 on host, run netserver. ssh into guest: ssh -P 8022 <host> once there, run netperf repeatedly on guest: while date do netperf -H <host address> done qemu will crash shortly.
sorry ssh -P 8022 <host> should have been: ssh -p 8022 <host>
@Cisco, @Michael, As far as I understand, this issue should be fixed in the latest RHEL 5.5 Beta snapshot. Could you please verify this and report back your test results as soon as possible. Thanks!
I have verified that this is fixed as of kvm-83-156.
kvm-83-154 is also fine.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0271.html